Thursday, August 7, 2014

Some new stuff on github

I updated some of my github stuff recently.

First, I added SNI support to lophttpd for better supporting
virtual hosting with TLS. Its straight forward to use. Please
refer to the updated README.

Next, I pushed my POSIX realtime AIO implementation for
Linux to github. The glibc aio implementation is creating
an own thread for each aio_read/write that you submit,
not utilizing the kernels io_ syscalls. Clearly, for a large number of AIO contexts this performs badly to not at all.
I am using the io_ syscalls and an event-fd to get
notified about operations that became ready. I also
made sure that it works on Android. :p

Monday, July 7, 2014

XKS speedup trickery

Lets have a look on how our traffic is XKey-scored and whether
its done with efficiency.

The XKS source seems to be some kind of mangled-C++, just like
a lot of C/C++-based languages exist for big/parallel
data processing (CUDA or other parallelizing extensions).

Given that, DB is obviously some kind of nested std::map or 
apparently of a derived type, as can be seen by the apply() 
member which is not part of a STL map.
Its probably not a multimap either, as denoted by the clear()
and in that [][] assignments are not possible with multimaps [1].

These types (as well as a multimap) are sorted associative
containers (dictionaries) who's lookup complexity is guaranteed
to be O(log(N)) at worst [2], where N denotes the number
of keys in the map. DB has at least 3 keys as seen from the 
snippet, but chances are that the number is much larger.The 
larger it is, the more need is for optimizing the map access.
I doubt that XKS has their own implementation of dictionaries
that have a better O() and are optimized in a way that
DB["tor_onion_survey"]["onion_count"]
access could be O(1). After all (look at the boost include), it
looks pretty much like STL-C++ code.

Given that, inside a loop the following XKS code is rather
inefficient:

for (values_t::const_iterator iter = VALUES.begin();
          iter != VALUES.end();
          ++iter) {
        DB["tor_onion_survey"]["onion_address"] = iter->address() + ".onion";
        if (iter->has_scheme())
          DB["tor_onion_survey"]["onion_scheme"] = iter->scheme();
        if (iter->has_port())
          DB["tor_onion_survey"]["onion_port"] = iter->port();
        DB["tor_onion_survey"]["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
        DB.apply();
        DB.clear();
      }
      return true;



because inside the loop, the STL's find-routine walks the DB map
4 times until it gets to DB["tor_onion_survey"]. Since the first 
key tor_onion_survey is static, it would be much better to keep
cached iterator to save the lookup time in each cycle.
Additionally, the find for the second key again has O(log(N)),
where N seems to be 4 (onion_address, onion_scheme, onion_port 
and onion_count).

The loop should rather be organized like this:

      auto cit = DB_fast.begin();
        pair < string, map < string, string > > sm;
        sm.first = "tor_onion_survey";
        for (...) {
                sm.second["onion_address"] = iter->address() + ".onion";
                if (iter->has_scheme())
                  sm.second["onion_scheme"] = iter->scheme();
                if (iter->has_port())
                  sm.second["onion_port"] = iter->port();
                sm.second["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
                DB_fast.insert(cit, sm);
                DB_fast.apply();
                DB_fast.clear();
        }


The full speedup-demo with comparison of both methods can be
found here. The average speedup in my tests are about 30% which
can save a lot of tax payers money if the agency scales
their XKS horizontally. The speedup here is the O(1) access
via the pair<>, compared to the O(log(N)) access in the original
code via a map<>. And thats for a DB map that
just has N=1 (tor_onion_survey). In reality N should be much
larger.

Nevertheless C++ is a good choice for XKS for various reasons
and they seem to be learning-by-doing just like any other
coder out there.

Edit: Meanwhile I found another reason to avoid operator[]
for assignments in a row inside one of Scott Meyers excellent
books on C++ effectiveness [3] which I really recommend reading
to any XKS developers (there are also classes for it).


[1] The clear() is important for our later optimization, as
    insert() has the same semantics like operator[] assignment
    only if the key doesn't already exist - otherwise the
    assignment-step after finding the key won't happen with
    insert().

[2] Generic Programming and the STL, using and extending the C++
    Standard Template Library
    Matthew H. Austern, Addison Wesley, 1999,
    p.159f

[3] Effective STL, 50 Specific Ways to Improve Your Use of the
    Standard Template Library
    Scott Meyers, Addison Wesley, 2001,
    Item 24, p.106ff

Thursday, May 22, 2014

Quantum-DNS trickery

(SIGILL//FVPNS//NOPORN//FORNFCK//MRKLBANG)

I made quantum-dns available in my github.

Its simple to use (non-recursive) DNS server for
IPv4 and IPv6 and also works without having an
IP address assigned to the interface (i.e. it can
answer any DNS query).

Similar to my writeup on QUANTUMINSERT it also contains
a demo FoxAcid script for HTTP. Theoretically it'd also quite easy to make STARTTLS disappear with quantum-dns if its not
enforced on the sender side. While with QUANTUMINSERT
you need to see the TCP sequence# and port, with DNS you
need the XID and port, so it makes entirely sense to
have good passive capabilities for e.g. 3G/4G.
A monitor port on a large peering point is enough capability though.

Thats a sample run from my lab (please forgive me :)




And yes thats trivially to implement, but so is
QUANTUMINSERT which is so easy that I never considered it
an attacking scenario either. It was fun to code though
to get hands on DNS again. For DNSSEC support, you need
to purchase special license. :)


Friday, May 16, 2014

load balancing trickery

After cleaning up the sources a bit and making
sure it compiles on current Linux distros, I uploaded
my old IPv4/IPv6 load balancer to my github.

I started this project in 2004, back in the days
at university. 10 years ago, it was the first load balancer available for IPv6 and in 2006 I finally presented the project at
some balancing conference in Silicon Valley.
(Even though you see some other names of my CS department
there, the whole code is written by me. In academics however
you form research groups and you are not going to rock
the world single-core.)

It works on IP level, so its suitable to balance
SSL/VPN/tor traffic etc too. For IPv4 it has integrated
failover/hotplug support for the backend nodes.

Thursday, March 27, 2014

Weapons of mass-pty considered harmful (trickery!)

Fixed a bug in enabler which is part of pam_schroedinger
that made it exit() when no more pty's could be allocated.
That's wrong of course, we just need to continue dictumerating
(enumerating via dictionary) the account. 500 parallel
su/sudo are of no problem.

enabler allows you to mount dictionary attacks using su,
sudo, passwd or alike. You can stop this by using
pam_schroedinger, or something like introducing an
enforced RLIMIT_PTY and having su, sudo etc. call
isatty(0), otherwise socketpairs etc could be used too.


I also went ahead, signing my github stuff with
this key. Any release tag containing an s at the end
of the version is a signed tag. Also, all commits will
be signed in future.
You can verify this via git log --show-signature or
git tag --verify TAG after having above DSA key
imported into your gpg keyring.


Friday, March 7, 2014

crypto shell trickery!

I recently imported crash into my github. It features
IP6-ready SSH-like remote shell, using strong public key
authentication and TLS-encrypted transport. It does not
rely on SSL/TLS internal X509 cecking but compares
hostkeys bit-wise. It runs on Linux and embedded derivates,
Android, BSD, Solaris and OSX/Darwin. It does not require root
and has back-connect and trigger modes built in. It can
also be invoked as a CGI.

Update: Pushed a fix into git to use SHA512 rather than
SHA1 for signing authentication requests. That makes
it incompatible with earlier versions. Also fixed a bug
where crashc did not properly distribute SIGWINCH to the
remote peer. Now you can use your ncurses porn and resize
your xterm and it gets properly adjusted! Also tested
authentication RSA keys of up to 7500 bit in size. That
should resist upcoming (TS//SI//REL) QUANTUMFUCK computers.
I need to find the time to enforce cipher-lists and add
ephemeral keying though. (done)
Also good news: crash also integrates with sshttp!

Friday, February 28, 2014

lophttpd OSX trickery

I ported lophttpd to OSX/Darwin (10.8 tested).

As OSX/Darwin is almost POSIX-compliant (live_free() || die())
and I already separated the low-level stuff to the -flavor
files, this was not overly complicated.
Now it pays that I chose to do it that way, rather than
having a dozen of #ifdef stacked around.
lophttpd now cleanly builds on BSD (untested for some time),
Linux, Android and OSX/Darwin.

What nerves most is the various integer-size issues you
have with size_t, off_t, suseconds_t etc. and the corresponding
format specifiers with the *printf() family. However you do it,
one OS shouts at you for passing wrong sized parameters to
*printf(). Despite of any standards. Live free or die.

You can easily build it on OSX/Darwin by installing Xcode
and then installing its command-line tools. Thats not
gcc AFAIS, but it should also build if you manage to install
gcc toolchain there.

I had to disable warnings about deprecated use of OpenSSL
in OSX and I have hard times not commenting on that in light
of gotofail.
Live free and die. :)






Thursday, January 16, 2014

Fernmelder to the rescue

I've been experimenting with mass DNS resolving lately.

Imagine you have some large list of DNS names (FQDN's)
which you want to map to its IPv4 or IPv6 address.
That could be a GB sized zone-file or an enumerated list of
names for some double-flux network when you research
how you can take down a botnet. In either way, sometimes
you need to do that in finite time and clearly
gethostbyname() in a loop is not the way to go!

For asynchronous resolving the glibc already has
getaddrinfo_a() but it turns out that this function is
entirely useless because its using threads. So, for
every request you send, a thread is cloned() which does
not scale well. [The glibc aio_ functions also use threads,
its a pitty that glibc async support is so toast!]

So I hacked up something from scratch that works for me.
Its on my github. The output resembles that from dig
and from the zone files you know.

The problem is to find right parameters for the amount
of requests to send in a row and the amount of usecs you
want to usleep before doing that again. Otherwise you will
just hammer the DNS server and gain no response. The default
values are sane enough that it yields some good result.
The better your uplink to your recursive DNS, the
smaller amount of time you need to usleep().
You can also distribute the requests across multiple DNS
servers by using more than one -N switch. The more reliable
DNS servers you have, the better because you do not run
in any rate limiting.






Friday, November 8, 2013

Killing Schrödingers Cat

This post is about the so called FoxAcid/QI system apparently
used by an agency to exploit browser sessions.





I first read about FoxAcid in an article by Bruce Schneier,
who made the distinction between Man in the Middle (MiM) and
Man on the Side (MoS) attacks. Although, if
properly implemented, the referenced slide
shows a setup where no packet race exist (therefore a MiM),
there seem to be use-cases for MoS attacks.

I am only discussing HTTP/HTTPS case here, as for VPNs etc,
you clearly need MiM and the aim of FoxAcid seems to
be the exploitation of web browser client sessions.

Deploying MiM on the backbone requires quite large and expensive (both financially and technically) setups. In most cases you require the coop of the ISP or someone who made the firmware of
the routers along the path. Nevertheless, if possible, MiM is clearly
the way to go, as it allows to intercept and 'handle' encrypted
communication channels. MoS on the other hand fails to
'handle' SSL connections, as its not possible to spoof
a HTTP redirect into the session.

But MiM is easy to detect and hard to deploy
in foreign networks in the large scale,
since you basically try to add a new router
(or even transparent-proxy) to the network infrastructure.

So you have some kind of lightwight-MiM, called MoS.
Since most connections will be either HTTP or
initiated by HTTP, even if 'upgraded' to HTTPS later on,
MoS buys you a lot of benefit.

MoS does not require to deploy new router hardware,
firmware or routes to be added to the running configuration.
It works by simply plugging the MoS-box to a port that
mirrors all packets seen for 'diagnostic purposes'.

You need a second, normal uplink plug, in case the mirror
port does not accept packets for sending, but thats doable.
I am not familar with backbone routers and their mirror port
capabilities, but I guess thats easily done.

The MoS attack can then act upon seen SYN packets
(completing the handshake) or seen GET requests. The later
requires to track the connection and therefore synchronous routes
back to the client (to see the SYN|ACK). The former does not,
but then in turn does not allow to redirect to the expected
location in some cases, as its missing the Host:
information from the client request.

I implemented both cases here. At least this is how I
would implement a QI/FoxAcid framework, there might be
different ways. However, acting in a packet-race (you
cannot modify replies) leaves not too many options.
It can be easily tested in your home (W)LAN and the
FoxAcid will show you by color which requests it
intercepted:




The captured GET request is sent Base64-encoded (green) to the
FoxAcid server, which uses this info (blue) to properly reconstruct the path and Host: parameters.The red part
is sent to the client in order to exploit and redirect
the browser to the original destination afterwards.
(No, I am not using this browser and the green part is smudged
in order to prevent accidental info-leaks as I cannot
read Base64 on the fly, but its the Base64 encode of the blue
part.)

MoS is also interesting if you have capabilities of
breaking 3G or 4G (or wifi) crypto in realtime, since it allows
you to spoof the replies to the sending station directly,
circumventing the network structure entirely (in opposite
to deploying a MiM somewhere behind the BTS/AP or
replacing them). If you are on foreign ground that might
be easier with good RX/TX equipement and a laptop rather than
to setup and integrate a whole BTS on the roof top of an embassy. :)



Monday, October 28, 2013

sshttp IPv6 trickery

During last hackweek, I added IPv6 support to
sshttp. Courtesy of IPV6_TRANSPARENT in recent Linux
kernels, it works as you know it from IPv4.

Beside that, it's now also possible to add IPv6 backends
to the frontend reverse proxy which is part of lophttpd,
and to run it on an IPV6 address to the outside.