PSA: OCZ Vector SSD firmware.

My bad luck with hardware continues.

At the beginning of this year, I bought an SSD for my laptop

I previously wrote about the need to update smartmontools, which should now be updated everywhere. One thing I was not aware of at the time however, was that there’s a firmware update available. Had I known this, I would have applied it, because as soon as I hit the “400GB of lifetime writes” counter (coincidence?), it lost the ability to write to any block. It won’t even respond to secure erase commands.

The failure is exacerbated by the fact that the disk contains journalling filesystems in need of recovery. So if anything tries to mount them, it tries to write to the disk, and then falls off the bus requiring a power cycle to even see the disk again. The recovery tools provided by OCZ apparently try to mount every partition it finds during boot up (derp).

So now it’s on its way back to OCZ for reflashing/replacement. Lesson learned.

If you have one of these, and hdparm -I shows you have firmware 1.03, you might want to update it to 2.0. There are flashing tools on ocz’s site.
(in the form of bootable linux images, using an insane desktop that looks like what hacker movies in the 1990s looked like). There’s no guarantee that the new firmware actually fixes whatever problem I’ve hit, due to the lack of changelogs, but given it was the first thing they asked me to try, I’m going to say there’s a strong possibility it’s a known bug.

CVE-2013-2094. Another day, another fuzzed bug.

Last month Tommi found a kernel bug in perf_swevent_init using trinity, and posted a fix upstream. This apparently turned out to be a local root. Someone released an exploit for it this week. (interesting dissection of the exploit by spender here).

The code to fuzz perf_event_open was added to Trinity in November 2011. Yet for some reason, we only started to hit this recently. The sanitise routine for this syscall is still pretty basic, even after I added a little more to it yesterday. There’s probably more fruit on that branch somewhere.

There’s a date in the exploit code that claims it was written shortly after the affected code was merged upstream in 2010. Assuming that’s true, it’s taken way too long to find this. Trinity should have found this a lot sooner.

3.10rc1 testing status

3.10rc1 came out a few days ago. At 12,000 changesets, lwn calls it the busiest such ever. Statements like that usually make me nervous. But things are generally in pretty good shape. Much better than 3.9rc1 was.

  • There has been nowhere near the same level of fallout from trinity this cycle. The only bug I’m reliably hitting has been around for a while (connect vs sendmsg udpv6 oops)
  • I hit a few crash-in-early-boot bugs that were a pain to debug. (fixes still pending merge)
  • Some slab corruption found in XFS. (again, fixes pending merge). There’s some talk on lkml about an ext3 issue with the same symptoms, but I’ve not managed to reproduce this (yet?).

and that’s been about it.

Generally feeling pretty solid. Fedora 19 is still going to ship with 3.9, but we’ll likely have a 3.10.x update on day of release.

Monthly Fedora kernel bug statistics – April 2013

  17 18 19 rawhide  
Open: 274 336 130 66 (806)
Opened since 2013-04-01 31 271 64 16 (382)
Closed since 2013-04-01 37 351 139 19 (546)
Changed since 2013-04-01 55 163 119 27 (364)

Huge number of bug closures this month. Unfortunately several hundred of them are the automated ‘faf’ bugs that were pretty useless.
(1. lots of tainted/virtualbox reports. 2. old kernels. 3. no human attached to them if we need to ask questions, which most of the time we do).
Even discounting those bugs, it’s been quite a productive month, with the total open count around 170 bugs lower than it was a month ago.

USB debug cables.

A few years ago, I was fortunate enough to get given a USB EHCI debug cable. With traditional serial ports being a thing of the past that I haven’t seen on a new machine in a long time, it’s been a lifesaver. The number of kernel crashes I’ve been able to capture through using that cable that would have otherwise been lost is some ridiculously high immeasurable number. I’m saying I like this thing, a lot.

So much so that I wanted to buy more of them, so I could not have to keep replugging it around between test machines.
With multiple test machines constantly running, it’s not really a practical solution.

The first problem, they aren’t cheap. $95 each. For basically two USB->serial chips, and some circuitry to make them handshake.
The bigger problem, is that only one place seems to sell them and they’ve been
“out of stock, and in redesign” for a long time now.

I tried emailing the manufacturer Ajaystech, who seem to completely ignore their sales@ email address.

Disappointing.

In the absence of a replacement, I’m going to have to hope that netconsole works well enough on older machines, and in the future, dumps to pstore.