Thursday, October 30, 2014

lophttpd fucks the POODLE

Not just because they are ugly but also because lophttpd
never was affected by POODLE, since SSLv3 was
disabled for a reason in favor of TLSv1. I think about dropping
TLSv1 too and just allowing TLSv1.1+ in future.

To my knowledge lophttpd is also the first webserver
utilizing Linux seccomp sandbox.

I also added SO_REUSEPORT support today, since Google
told us that when handling c10k, their processes
are un-evenly distributed across the cores (what the hell
are they doing there?)
lophttpd wins again, since even without SO_REUSEPORT,
such problem never existed for you:





which shows running it on 4 cores, handling c10k. Could
there be a smaller footprint?


Update:

After carefully reading the commit for the Linux kernel
that introduced SO_REUSEPORT, I came to the conclusion that
for Linux there was an entirely different reason to introduce
it than there was for 4.4BSD at the time. According to the
git commit message, uneven distribution among the cores
only happen when the "number of listener sockets bound to a
port changes (new ones are added, or old ones closed)".
So its clear that it never affected lophttpd and it "leaks"
interesting info about the architecture of the google
webserver, why it had such problems before the patch
and which kernel they are using. OSINT for the win.
(Such architecture where new listeners are created or removed
do not make much sense to me anyway, so there are some
questions left.)

So in theory, I could remove SO_REUSEPORT again but I will
leave it as is for now, in case more webserver instances
are started later. Note that it doesn't make sense to have
more lophttpds running than there are cores on
your machine. I do not support hyperthreading yet, so
-n 0 is what you want, if you have such heavy load at all.

sshttp btw also uses the same multi/single-thread architecture
without SO_REUSEPORT (yet).




Monday, October 13, 2014

trusted bootloader RCE trickery

So you are safe, because you updated your bash, run your policy in enforcing mode, enabled NX and ASLR and boot using a cryptographically signed shim bootloader.

Well, you actually failed by the first step.

Here is why:





Even if thats hard to believe, but there might be
remotely exploitable buffer overflows in your secure
bootloader. The 90's party of today just happen
downstairs. Bootloaders make use of UEFI network stacks
to implement PXE and PXEv6 and sometimes fail to do so:





The issue however is not so severe because a lot of UEFI
firmwares fail to verify what they get via PXEv6 anyway,so
delivering an overflow payload is overkill. :)
Thats actually why you see 'schlimm' debug output in
secure mode in the PoC demo screenshot at all.

There were some twists necesarry to actually achieve *ptr = value
as seen above, mainly due to protocol specifics. If you are interested, we can discuss this privately.

Thursday, August 7, 2014

Some new stuff on github

I updated some of my github stuff recently.

First, I added SNI support to lophttpd for better supporting
virtual hosting with TLS. Its straight forward to use. Please
refer to the updated README.

Next, I pushed my POSIX realtime AIO implementation for
Linux to github. The glibc aio implementation is creating
an own thread for each aio_read/write that you submit,
not utilizing the kernels io_ syscalls. Clearly, for a large number of AIO contexts this performs badly to not at all.
I am using the io_ syscalls and an event-fd to get
notified about operations that became ready. I also
made sure that it works on Android. :p

Monday, July 7, 2014

XKS speedup trickery

Lets have a look on how our traffic is XKey-scored and whether
its done with efficiency.

The XKS source seems to be some kind of mangled-C++, just like
a lot of C/C++-based languages exist for big/parallel
data processing (CUDA or other parallelizing extensions).

Given that, DB is obviously some kind of nested std::map or 
apparently of a derived type, as can be seen by the apply() 
member which is not part of a STL map.
Its probably not a multimap either, as denoted by the clear()
and in that [][] assignments are not possible with multimaps [1].

These types (as well as a multimap) are sorted associative
containers (dictionaries) who's lookup complexity is guaranteed
to be O(log(N)) at worst [2], where N denotes the number
of keys in the map. DB has at least 3 keys as seen from the 
snippet, but chances are that the number is much larger.The 
larger it is, the more need is for optimizing the map access.
I doubt that XKS has their own implementation of dictionaries
that have a better O() and are optimized in a way that
DB["tor_onion_survey"]["onion_count"]
access could be O(1). After all (look at the boost include), it
looks pretty much like STL-C++ code.

Given that, inside a loop the following XKS code is rather
inefficient:

for (values_t::const_iterator iter = VALUES.begin();
          iter != VALUES.end();
          ++iter) {
        DB["tor_onion_survey"]["onion_address"] = iter->address() + ".onion";
        if (iter->has_scheme())
          DB["tor_onion_survey"]["onion_scheme"] = iter->scheme();
        if (iter->has_port())
          DB["tor_onion_survey"]["onion_port"] = iter->port();
        DB["tor_onion_survey"]["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
        DB.apply();
        DB.clear();
      }
      return true;



because inside the loop, the STL's find-routine walks the DB map
4 times until it gets to DB["tor_onion_survey"]. Since the first 
key tor_onion_survey is static, it would be much better to keep
cached iterator to save the lookup time in each cycle.
Additionally, the find for the second key again has O(log(N)),
where N seems to be 4 (onion_address, onion_scheme, onion_port 
and onion_count).

The loop should rather be organized like this:

      auto cit = DB_fast.begin();
        pair < string, map < string, string > > sm;
        sm.first = "tor_onion_survey";
        for (...) {
                sm.second["onion_address"] = iter->address() + ".onion";
                if (iter->has_scheme())
                  sm.second["onion_scheme"] = iter->scheme();
                if (iter->has_port())
                  sm.second["onion_port"] = iter->port();
                sm.second["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
                DB_fast.insert(cit, sm);
                DB_fast.apply();
                DB_fast.clear();
        }


The full speedup-demo with comparison of both methods can be
found here. The average speedup in my tests are about 30% which
can save a lot of tax payers money if the agency scales
their XKS horizontally. The speedup here is the O(1) access
via the pair<>, compared to the O(log(N)) access in the original
code via a map<>. And thats for a DB map that
just has N=1 (tor_onion_survey). In reality N should be much
larger.

Nevertheless C++ is a good choice for XKS for various reasons
and they seem to be learning-by-doing just like any other
coder out there.

Edit: Meanwhile I found another reason to avoid operator[]
for assignments in a row inside one of Scott Meyers excellent
books on C++ effectiveness [3] which I really recommend reading
to any XKS developers (there are also classes for it).


[1] The clear() is important for our later optimization, as
    insert() has the same semantics like operator[] assignment
    only if the key doesn't already exist - otherwise the
    assignment-step after finding the key won't happen with
    insert().

[2] Generic Programming and the STL, using and extending the C++
    Standard Template Library
    Matthew H. Austern, Addison Wesley, 1999,
    p.159f

[3] Effective STL, 50 Specific Ways to Improve Your Use of the
    Standard Template Library
    Scott Meyers, Addison Wesley, 2001,
    Item 24, p.106ff

Thursday, May 22, 2014

Quantum-DNS trickery

(SIGILL//FVPNS//NOPORN//FORNFCK//MRKLBANG)

I made quantum-dns available in my github.

Its simple to use (non-recursive) DNS server for
IPv4 and IPv6 and also works without having an
IP address assigned to the interface (i.e. it can
answer any DNS query).

Similar to my writeup on QUANTUMINSERT it also contains
a demo FoxAcid script for HTTP. Theoretically it'd also quite easy to make STARTTLS disappear with quantum-dns if its not
enforced on the sender side. While with QUANTUMINSERT
you need to see the TCP sequence# and port, with DNS you
need the XID and port, so it makes entirely sense to
have good passive capabilities for e.g. 3G/4G.
A monitor port on a large peering point is enough capability though.

Thats a sample run from my lab (please forgive me :)




And yes thats trivially to implement, but so is
QUANTUMINSERT which is so easy that I never considered it
an attacking scenario either. It was fun to code though
to get hands on DNS again. For DNSSEC support, you need
to purchase special license. :)


Friday, May 16, 2014

load balancing trickery

After cleaning up the sources a bit and making
sure it compiles on current Linux distros, I uploaded
my old IPv4/IPv6 load balancer to my github.

I started this project in 2004, back in the days
at university. 10 years ago, it was the first load balancer available for IPv6 and in 2006 I finally presented the project at
some balancing conference in Silicon Valley.
(Even though you see some other names of my CS department
there, the whole code is written by me. In academics however
you form research groups and you are not going to rock
the world single-core.)

It works on IP level, so its suitable to balance
SSL/VPN/tor traffic etc too. For IPv4 it has integrated
failover/hotplug support for the backend nodes.

Thursday, March 27, 2014

Weapons of mass-pty considered harmful (trickery!)

Fixed a bug in enabler which is part of pam_schroedinger
that made it exit() when no more pty's could be allocated.
That's wrong of course, we just need to continue dictumerating
(enumerating via dictionary) the account. 500 parallel
su/sudo are of no problem.

enabler allows you to mount dictionary attacks using su,
sudo, passwd or alike. You can stop this by using
pam_schroedinger, or something like introducing an
enforced RLIMIT_PTY and having su, sudo etc. call
isatty(0), otherwise socketpairs etc could be used too.


I also went ahead, signing my github stuff with
this key. Any release tag containing an s at the end
of the version is a signed tag. Also, all commits will
be signed in future.
You can verify this via git log --show-signature or
git tag --verify TAG after having above DSA key
imported into your gpg keyring.


Friday, March 7, 2014

crypto shell trickery!

I recently imported crash into my github. It features
IP6-ready SSH-like remote shell, using strong public key
authentication and TLS-encrypted transport. It does not
rely on SSL/TLS internal X509 cecking but compares
hostkeys bit-wise. It runs on Linux and embedded derivates,
Android, BSD, Solaris and OSX/Darwin. It does not require root
and has back-connect and trigger modes built in. It can
also be invoked as a CGI.

Update: Pushed a fix into git to use SHA512 rather than
SHA1 for signing authentication requests. That makes
it incompatible with earlier versions. Also fixed a bug
where crashc did not properly distribute SIGWINCH to the
remote peer. Now you can use your ncurses porn and resize
your xterm and it gets properly adjusted! Also tested
authentication RSA keys of up to 7500 bit in size. That
should resist upcoming (TS//SI//REL) QUANTUMFUCK computers.
I need to find the time to enforce cipher-lists and add
ephemeral keying though. (done)
Also good news: crash also integrates with sshttp!

Friday, February 28, 2014

lophttpd OSX trickery

I ported lophttpd to OSX/Darwin (10.8 tested).

As OSX/Darwin is almost POSIX-compliant (live_free() || die())
and I already separated the low-level stuff to the -flavor
files, this was not overly complicated.
Now it pays that I chose to do it that way, rather than
having a dozen of #ifdef stacked around.
lophttpd now cleanly builds on BSD (untested for some time),
Linux, Android and OSX/Darwin.

What nerves most is the various integer-size issues you
have with size_t, off_t, suseconds_t etc. and the corresponding
format specifiers with the *printf() family. However you do it,
one OS shouts at you for passing wrong sized parameters to
*printf(). Despite of any standards. Live free or die.

You can easily build it on OSX/Darwin by installing Xcode
and then installing its command-line tools. Thats not
gcc AFAIS, but it should also build if you manage to install
gcc toolchain there.

I had to disable warnings about deprecated use of OpenSSL
in OSX and I have hard times not commenting on that in light
of gotofail.
Live free and die. :)






Thursday, January 16, 2014

Fernmelder to the rescue

I've been experimenting with mass DNS resolving lately.

Imagine you have some large list of DNS names (FQDN's)
which you want to map to its IPv4 or IPv6 address.
That could be a GB sized zone-file or an enumerated list of
names for some double-flux network when you research
how you can take down a botnet. In either way, sometimes
you need to do that in finite time and clearly
gethostbyname() in a loop is not the way to go!

For asynchronous resolving the glibc already has
getaddrinfo_a() but it turns out that this function is
entirely useless because its using threads. So, for
every request you send, a thread is cloned() which does
not scale well. [The glibc aio_ functions also use threads,
its a pitty that glibc async support is so toast!]

So I hacked up something from scratch that works for me.
Its on my github. The output resembles that from dig
and from the zone files you know.

The problem is to find right parameters for the amount
of requests to send in a row and the amount of usecs you
want to usleep before doing that again. Otherwise you will
just hammer the DNS server and gain no response. The default
values are sane enough that it yields some good result.
The better your uplink to your recursive DNS, the
smaller amount of time you need to usleep().
You can also distribute the requests across multiple DNS
servers by using more than one -N switch. The more reliable
DNS servers you have, the better because you do not run
in any rate limiting.