Thursday, April 23, 2015

C++11 bailout trickery

Is someone C++11 guru enough to make a statement whether
the following C++11 code is correct? In particular on
whats happening on line 24, as the lambda should not
harvest the memory structures (scope?).
To me, everything looks OK. If thats the case, it would ease
a lot of cleanup routines on error returns from functions.
Please leave a comment.

Sample C++ code

Wednesday, March 25, 2015

troubleshooter trickery

Demo of SELinux disable on a Fedora 21 default desktop. A full writeup can be found here.

Thursday, January 8, 2015

U2F trickery

In the last post I promised to stop threat analyzing. So here
is some dev again which I already started developing back
in 2014 and where I finally found some time to finish.
Its a small U2F stack with the APDU framing code based
on Googles U2F reference code. After reviewing a lot of other
U2F code (sigh!), I found this reference code comprehensive enough to be usable for myself and for PAM code.
It also builds on Darwin, but I didnt have time to test it

Friday, December 19, 2014

QI for the win

Now that we officially know that 3G can be broken and that
it makes sense to place particular (passive) hardware on the
roof top of embassies (the cellar is already stuffed with
torture equipment and you have better gain at the roof),
my threat analysis here was correct. In particular the
last paragraph should be repeated, as you can start sending
your QI before the victim packet is even close to the
target if you just captured the SYN packet on air.
As a bonus, you dont need to deploy evil hardware in the
target network.
Nevermind, I am not going to torture you with more threat
analysis posts. There are enough of them. :)

Thursday, December 11, 2014

sshttp tproxy trickery

I updated sshttpd to allow muxing of HTTP(S)/SSH
to whole subnets. Until recently, the setup was
per single host. Now you can run it via -T on your
gateway and Layer5-switch your whole internal net.

Thursday, October 30, 2014

lophttpd fucks the POODLE

Not just because they are ugly but also because lophttpd
never was affected by POODLE, since SSLv3 was
disabled for a reason in favor of TLSv1. I think about dropping
TLSv1 too and just allowing TLSv1.1+ in future.

To my knowledge lophttpd is also the first webserver
utilizing Linux seccomp sandbox.

I also added SO_REUSEPORT support today, since Google
told us that when handling c10k, their processes
are un-evenly distributed across the cores (what the hell
are they doing there?)
lophttpd wins again, since even without SO_REUSEPORT,
such problem never existed for you:

which shows running it on 4 cores, handling c10k. Could
there be a smaller footprint?


After carefully reading the commit for the Linux kernel
that introduced SO_REUSEPORT, I came to the conclusion that
for Linux there was an entirely different reason to introduce
it than there was for 4.4BSD at the time. According to the
git commit message, uneven distribution among the cores
only happen when the "number of listener sockets bound to a
port changes (new ones are added, or old ones closed)".
So its clear that it never affected lophttpd and it "leaks"
interesting info about the architecture of the google
webserver, why it had such problems before the patch
and which kernel they are using. OSINT for the win.
(Such architecture where new listeners are created or removed
do not make much sense to me anyway, so there are some
questions left.)

So in theory, I could remove SO_REUSEPORT again but I will
leave it as is for now, in case more webserver instances
are started later. Note that it doesn't make sense to have
more lophttpds running than there are cores on
your machine. I do not support hyperthreading yet, so
-n 0 is what you want, if you have such heavy load at all.

sshttp btw also uses the same multi/single-thread architecture
without SO_REUSEPORT (yet).

Monday, October 13, 2014

trusted bootloader RCE trickery

So you are safe, because you updated your bash, run your policy in enforcing mode, enabled NX and ASLR and boot using a cryptographically signed shim bootloader.

Well, you actually failed by the first step.

Here is why:

Even if thats hard to believe, but there might be
remotely exploitable buffer overflows in your secure
bootloader. The 90's party of today just happen
downstairs. Bootloaders make use of UEFI network stacks
to implement PXE and PXEv6 and sometimes fail to do so:

The issue however is not so severe because a lot of UEFI
firmwares fail to verify what they get via PXEv6 anyway,so
delivering an overflow payload is overkill. :)
Thats actually why you see 'schlimm' debug output in
secure mode in the PoC demo screenshot at all.

There were some twists necesarry to actually achieve *ptr = value
as seen above, mainly due to protocol specifics. If you are interested, we can discuss this privately.

Thursday, August 7, 2014

Some new stuff on github

I updated some of my github stuff recently.

First, I added SNI support to lophttpd for better supporting
virtual hosting with TLS. Its straight forward to use. Please
refer to the updated README.

Next, I pushed my POSIX realtime AIO implementation for
Linux to github. The glibc aio implementation is creating
an own thread for each aio_read/write that you submit,
not utilizing the kernels io_ syscalls. Clearly, for a large number of AIO contexts this performs badly to not at all.
I am using the io_ syscalls and an event-fd to get
notified about operations that became ready. I also
made sure that it works on Android. :p

Monday, July 7, 2014

XKS speedup trickery

Lets have a look on how our traffic is XKey-scored and whether
its done with efficiency.

The XKS source seems to be some kind of mangled-C++, just like
a lot of C/C++-based languages exist for big/parallel
data processing (CUDA or other parallelizing extensions).

Given that, DB is obviously some kind of nested std::map or 
apparently of a derived type, as can be seen by the apply() 
member which is not part of a STL map.
Its probably not a multimap either, as denoted by the clear()
and in that [][] assignments are not possible with multimaps [1].

These types (as well as a multimap) are sorted associative
containers (dictionaries) who's lookup complexity is guaranteed
to be O(log(N)) at worst [2], where N denotes the number
of keys in the map. DB has at least 3 keys as seen from the 
snippet, but chances are that the number is much larger.The 
larger it is, the more need is for optimizing the map access.
I doubt that XKS has their own implementation of dictionaries
that have a better O() and are optimized in a way that
access could be O(1). After all (look at the boost include), it
looks pretty much like STL-C++ code.

Given that, inside a loop the following XKS code is rather

for (values_t::const_iterator iter = VALUES.begin();
          iter != VALUES.end();
          ++iter) {
        DB["tor_onion_survey"]["onion_address"] = iter->address() + ".onion";
        if (iter->has_scheme())
          DB["tor_onion_survey"]["onion_scheme"] = iter->scheme();
        if (iter->has_port())
          DB["tor_onion_survey"]["onion_port"] = iter->port();
        DB["tor_onion_survey"]["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
      return true;

because inside the loop, the STL's find-routine walks the DB map
4 times until it gets to DB["tor_onion_survey"]. Since the first 
key tor_onion_survey is static, it would be much better to keep
cached iterator to save the lookup time in each cycle.
Additionally, the find for the second key again has O(log(N)),
where N seems to be 4 (onion_address, onion_scheme, onion_port 
and onion_count).

The loop should rather be organized like this:

      auto cit = DB_fast.begin();
        pair < string, map < string, string > > sm;
        sm.first = "tor_onion_survey";
        for (...) {
                sm.second["onion_address"] = iter->address() + ".onion";
                if (iter->has_scheme())
                  sm.second["onion_scheme"] = iter->scheme();
                if (iter->has_port())
                  sm.second["onion_port"] = iter->port();
                sm.second["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
                DB_fast.insert(cit, sm);

The full speedup-demo with comparison of both methods can be
found here. The average speedup in my tests are about 30% which
can save a lot of tax payers money if the agency scales
their XKS horizontally. The speedup here is the O(1) access
via the pair<>, compared to the O(log(N)) access in the original
code via a map<>. And thats for a DB map that
just has N=1 (tor_onion_survey). In reality N should be much

Nevertheless C++ is a good choice for XKS for various reasons
and they seem to be learning-by-doing just like any other
coder out there.

Edit: Meanwhile I found another reason to avoid operator[]
for assignments in a row inside one of Scott Meyers excellent
books on C++ effectiveness [3] which I really recommend reading
to any XKS developers (there are also classes for it).

[1] The clear() is important for our later optimization, as
    insert() has the same semantics like operator[] assignment
    only if the key doesn't already exist - otherwise the
    assignment-step after finding the key won't happen with

[2] Generic Programming and the STL, using and extending the C++
    Standard Template Library
    Matthew H. Austern, Addison Wesley, 1999,

[3] Effective STL, 50 Specific Ways to Improve Your Use of the
    Standard Template Library
    Scott Meyers, Addison Wesley, 2001,
    Item 24, p.106ff

Thursday, May 22, 2014

Quantum-DNS trickery


I made quantum-dns available in my github.

Its simple to use (non-recursive) DNS server for
IPv4 and IPv6 and also works without having an
IP address assigned to the interface (i.e. it can
answer any DNS query).

Similar to my writeup on QUANTUMINSERT it also contains
a demo FoxAcid script for HTTP. Theoretically it'd also quite easy to make STARTTLS disappear with quantum-dns if its not
enforced on the sender side. While with QUANTUMINSERT
you need to see the TCP sequence# and port, with DNS you
need the XID and port, so it makes entirely sense to
have good passive capabilities for e.g. 3G/4G.
A monitor port on a large peering point is enough capability though.

Thats a sample run from my lab (please forgive me :)

And yes thats trivially to implement, but so is
QUANTUMINSERT which is so easy that I never considered it
an attacking scenario either. It was fun to code though
to get hands on DNS again. For DNSSEC support, you need
to purchase special license. :)