Tag Archives: kernel

daily log May 22nd 2013.

Going to try and continue yesterdays daily log format for a while.

  • Grumbled at openvpn changing pathname for ‘plugin’ to ‘plugins’ breaking my vpn script.
  • Bugzilla seemed unhappy. Gave up trying to look at it after it kept timing out.
  • Continued poking at the XFS assertion from yesterday. Downgraded the compiler from f19′s 4.8.0 to 4.7.3. No luck. Couldn’t reproduce on 3.9, so started bisecting. Seemed to be caused by a patch I added recently to work around another XFS bug (slab corruption). I can’t win. Dave Chinner confused by my diagnosis. Bisect take 2 on that tomorrow.
  • Vince Weaver posted a perf_event fuzzer based on trinity. Spent a while reading it over. Neat. Glad to see people taking an idea and running with it in new directions. The more test programs the better.
  • Diagnosed yesterdays “microcode loader got slow” bug. Turned out that I had somehow inadvertently set CONFIG_FW_LOADER_USER_HELPER, which incurs a 60 second timeout.
  • While waiting for bisections, looked over some bugs in coverity’s database. Around 1500 untriaged. Would like to find time to work on that some at some point.

Spent so much of the day bisecting/building/rebooting that I didn’t write much new code today. Ho-hum.

a day in the life..

Got back from vacation today (since last Thursday). Here’s how I spent the day.

  • Caught up (skimmed) the 1500 postings to Linux-kernel and related mailing lists that had accumulated.
  • Reviewed, applied and cleaned up my patch backlog for trinity.
  • Caught up with direct mail that needed a response.
  • Brought my test machines up to 3.10rc2, and restarted tests.
  • Caught another pair of RCU/nohz bugs pretty quickly. [1][2].
  • Checked on the RMA for my failed SSD. Still awaiting shipment of replacement.
  • Received my ultrabay adaptor for my thinkpad. Surprised to find out that a full height SSD would fit into it.
  • Pushed out a 3.9.3 update for F18
  • Looked at bugzilla backlog. Swore a lot. 3.9.x rebase bugs started to trickle in.
  • Rewrote a bunch of code surrounding trinity’s rand() usage.
  • Finally got F19 installed via NFS on new test machine.
  • Hit an XFS assertion.
  • Then hit an i915 pineview kms console blanking bug.
  • Noticed that x86 microcode loading had gotten really slow. It seems to be waiting a whole 60 seconds for each core.

CVE-2013-2094. Another day, another fuzzed bug.

Last month Tommi found a kernel bug in perf_swevent_init using trinity, and posted a fix upstream. This apparently turned out to be a local root. Someone released an exploit for it this week. (interesting dissection of the exploit by spender here).

The code to fuzz perf_event_open was added to Trinity in November 2011. Yet for some reason, we only started to hit this recently. The sanitise routine for this syscall is still pretty basic, even after I added a little more to it yesterday. There’s probably more fruit on that branch somewhere.

There’s a date in the exploit code that claims it was written shortly after the affected code was merged upstream in 2010. Assuming that’s true, it’s taken way too long to find this. Trinity should have found this a lot sooner.

3.10rc1 testing status

3.10rc1 came out a few days ago. At 12,000 changesets, lwn calls it the busiest such ever. Statements like that usually make me nervous. But things are generally in pretty good shape. Much better than 3.9rc1 was.

  • There has been nowhere near the same level of fallout from trinity this cycle. The only bug I’m reliably hitting has been around for a while (connect vs sendmsg udpv6 oops)
  • I hit a few crash-in-early-boot bugs that were a pain to debug. (fixes still pending merge)
  • Some slab corruption found in XFS. (again, fixes pending merge). There’s some talk on lkml about an ext3 issue with the same symptoms, but I’ve not managed to reproduce this (yet?).

and that’s been about it.

Generally feeling pretty solid. Fedora 19 is still going to ship with 3.9, but we’ll likely have a 3.10.x update on day of release.

Monthly Fedora kernel bug statistics – April 2013

  17 18 19 rawhide  
Open: 274 336 130 66 (806)
Opened since 2013-04-01 31 271 64 16 (382)
Closed since 2013-04-01 37 351 139 19 (546)
Changed since 2013-04-01 55 163 119 27 (364)

Huge number of bug closures this month. Unfortunately several hundred of them are the automated ‘faf’ bugs that were pretty useless.
(1. lots of tainted/virtualbox reports. 2. old kernels. 3. no human attached to them if we need to ask questions, which most of the time we do).
Even discounting those bugs, it’s been quite a productive month, with the total open count around 170 bugs lower than it was a month ago.

USB debug cables.

A few years ago, I was fortunate enough to get given a USB EHCI debug cable. With traditional serial ports being a thing of the past that I haven’t seen on a new machine in a long time, it’s been a lifesaver. The number of kernel crashes I’ve been able to capture through using that cable that would have otherwise been lost is some ridiculously high immeasurable number. I’m saying I like this thing, a lot.

So much so that I wanted to buy more of them, so I could not have to keep replugging it around between test machines.
With multiple test machines constantly running, it’s not really a practical solution.

The first problem, they aren’t cheap. $95 each. For basically two USB->serial chips, and some circuitry to make them handshake.
The bigger problem, is that only one place seems to sell them and they’ve been
“out of stock, and in redesign” for a long time now.

I tried emailing the manufacturer Ajaystech, who seem to completely ignore their sales@ email address.

Disappointing.

In the absence of a replacement, I’m going to have to hope that netconsole works well enough on older machines, and in the future, dumps to pstore.