Trinity finds an ancient bug.

I’ve been chasing a bug for a few weeks that Trinity recently started picking up on.
It only recently started happening, so I thought it was something introduced in this current development cycle.

After a lot of tests, Eric Dumazet managed to narrow it down to a simple test case.
Turning on keepalives on a raw socket caused use of an uninitialised timer.

Eric posted a patch to fix it up, and I started digging deeper to find out when that bug got introduced. From my reading of historical git trees, that was introduced in the 2.1.8 kernel. In November 1996. A sixteen year old bug.

Why has it taken this long for trinity to find it ? That I’m not entirely clear on.
I added some extra setsockopt fuzzing smarts back in June, but somehow it’s taken three months for the right combination of randomness to occur for this sequence of events to occur:

  • Passes in a file descriptor of the right kind (AF_INET, SOCK_RAW, IPPROTO_TCP)

  • Calls connect on the socket in a manner that actually succeeds.
  • Calls setsockopt with the right arguments (SO_KEEPALIVE)

My theory is that the weak point in this chain right now is the second step, as the sanitising routine for connect() in trinity is pretty dumb right now.
It might pass the right form to the syscall (userspace address for structs), but the _format_ of those structs is usually garbage.

Still, this is the oldest bug that Trinity has found so far. I got pretty excited last month when it found a VM bug that dated back to 2.6.32, but this one.. It almost predates my own involvement in Linux.