The case of the mysterious disappearing I211

Day one of unemployed life saw me finally getting around to the first of several hardware related maintenance items that I’ve been putting off until I’ve had the time.

I got a lot of life out of my desktop machine that I had been using since 2007. Earlier this year, I decided it was long overdue an upgrade, and ended up building a ridiculously over-specced machine in the hopes it too would last me a while. After some research, I ended up with a 6-core Haswell-E i7-5820K, and a frankly ridiculously over-featured motherboard.
Once I had delved through the absurd number of BIOS options to convince it that I *really* didn’t want to overclock my CPU or my RAM, or anything else, it was very stable.

It has exceeded all my expectations. In the time it took my old desktop to build one kernel, I can build kernel .deb’s for every machine I own, and still have time spare. It’s an absolute beast.

One of the features that sold me on this board was the two onboard ethernet ports. I had been wanting to do a bunch of networking experiments, and the possibility of using bonding, without having to screw around with add-in cards was appealing.

So I was a little irked one evening after updating its BIOS, to notice that the bond only had one interface active. After some investigation, I noticed that the PCI ID of one of the onboard NICs had changed.

What was once

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
08:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

Was now

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
08:00.0 Ethernet controller: Intel Corporation Device 1532 (rev 03)

My I211 had changed its PCI ID, and the e1000 driver wouldn’t bind to this new device.

At first I thought “Cool, some kind of NIC firmware update”, and assumed that e1000 hadn’t been updated yet to support this new feature. Googling for “i211 1532” told a much sadder story however.

If you read the spec update for the i211, you find this interesting table:

I211 Device ID Code Vendor ID Device ID Revision ID1
WGI211AT (not programmed/factory default) 0x8086 0x1532 0x3
WGI211AT (programmed) 0x8086 0x1539 0x3

Uh, not cool. Somehow the BIOS update procedure had wiped the NVRAM on the NIC.

A long protracted conversation with ASUS support followed, including such gems as “I understand you’re seeing blue screens” and “Have you tried removing the DIMMs, rubbing the contacts with an eraser and replacing them”. Eventually I think they got to the end of their script, and agreed to RMA the board. Somewhat annoying, given there’s probably a tool somewhere that can rewrite the flash, but Intel only seems to make that available to integrators, not end-users, and the ASUS representatives denied all knowledge.

It was gone for about two weeks, and finally returned yesterday. Its PCI ID is 0x1539 again, and it has its old MAC address once more. (I’m now hesitant to ever upgrade the BIOS on this machine again). So what happened ? Anyone’s guess, but this isn’t the first time I’ve seen this happen. We had a bunch of these NICs at Akamai too that occasionally had the same thing happen to them.

The whole thing is reminiscent of a painful old bug where ftrace would corrupt the e1000e ROM. Hopefully Linux isn’t to blame this time.

So, long story short: If you see an i211 with a PCI ID of 1532, you’re looking at an RMA.

Moving on from Akamai.

Today was my last day at Akamai. It’s been brief (Just over seven months), but things weren’t really working out for me there for a number of reasons. I’ve mentioned to a number of people who have known about my decision for a while, that it’s not that it’s a bad place to work, but it never felt like a good fit for me, and I came to realize that I’ve spent most of this last year being in denial of just how unhappy I was, in the hope “things would get better”.

There are a lot of smart people working there, working on really difficult problems, but a lot of those problems just don’t align with my interests, especially when they don’t always involve contributing code back upstream. [clarification: There is some upstream work going on there, just not as much as I’d like].

Add to this my disdain for some of the proprietary tooling that’s prevalent there, and it was becoming clear it was not a matter of “if”, but “when” I was going to leave. As an example; I joked a few months ago to co-workers “next time I’m looking for a job, the first question I ask is ‘do you use perforce’?”. Only it wasn’t really a joke, I was dead serious. User-hostile software has no place in my life.
Even little things like “let’s use git” translating to “let’s license Atlassian stash” rather than “run a git-daemon somewhere” started getting me down.

The final project I worked on there was a continuous rebase strategy for the kernel, moving away from perforce to git. It’s a move in the right direction, but ultimately, not the sort of work that gets me excited, and it’s going to be a multi-year project before it starts really bearing fruit. Given how perforce is ingrained in so many of Akamai’s systems, it would also have been extremely unlikely I’d have been able to purge all knowledge of ever having used it.

The rebase work itself also started to bother me that many of the kernel changes we made had no chance of ever even being submitted, let alone accepted upstream. (In part because many of them are very unique to Akamai’s CDN — you won’t find any of the trickery employed there described in a Richard Stevens book, and they’re unlikely to ever be official RFC’s due to the competitive edge they gain from those changes).
There are exceptions to all of this, and the kernel team is trying to do a better job there with upstreaming most of the newer changes, but many of the older legacy patches are under-documented, and/or understood well by few people, with the original authors no longer around, making it a frustrating exercise to get up to speed; especially when you’re trying to learn what the upstream code is doing at the same time.

Someone with less experience dealing exclusively with open-source for most of their career would probably find many of my reasons for leaving trivial. Those same people would probably find Akamai a great place to work. There are a lot of opportunities there if you have a higher tolerance for such things than I did. It was eye-opening recently, mentoring some of the interns there. Optimism. The unjaded outlook that comes with youth. Not getting bent out of shape at crappy tooling because they don’t know different. It made me realize I wasn’t going to ever be like this here.

On a particularly bad day a few weeks back, a recruiter reached out to me, to find out if I was interested in a second chance at an offer I received last time I was looking for a new job. It worked. Enduring an unhappy situation in the hopes things will get better isn’t a great strategy when there are other options.

So, I start at Facebook in September.

I have no delusions that things are going to be perfect there, but at least from the outside right now, the grass looks greener. I feel bad walking away from problems unfinished, but going home miserable or angry or some other negative emotion every day was really starting to get take its toll. It’s not a healthy way to live.

When I was interviewing last December, I read Being Geek to death, so it’s fitting that I’ve picked it up again recently. One paragraph in particular jumps out at me.

My single worst gig was one where I got everything I wanted out the of the offer letter, but in my exuberance for being highly valued, I totally forgot that my gut read on the gig was "meh". Ninety days later, I couldn't care less that I got a 15% raise and a sign-on bonus. I couldn't stand the mundanity of the daily work, and I happily resigned a few months later, taking both a pay cut and returning my sign-on bonus for the opportunity to work at Netscape.

Anachronisms and minor details aside, that paragraph played through my head this afternoon as I wrote the check to pay back the remainder of my sign-on bonus. I wasn’t quite thinking “meh”, but I knew I was making compromises on what I really valued from day one.

Walking away from unvested RSUs, giving up this months paycheck, and writing that check stings a little, but when I did my exit interview this morning, I knew that I too, was “happily resigning” for a great opportunity.

I’m feeling uncharacteristically optimistic right now. Hopefully it’ll last.

I’ll be in Seattle next week, but due to complications with my registration being transferred to another Akamai employee, I won’t actually be at the Linux plumbers conf. If you’re also going to be there and want to catch up, drop me a mail, or <ahem> hit me up on facebook.

Future development of Trinity.

It’s been an odd few weeks regarding Trinity based things.

First an email from a higher-up at my former employer asking (paraphrased)..

"That thing we asked you to stop working on when you worked here, any chance now you've left you'll implement these features."

I’m still trying to get my head around the thought process that led to that being a reasonable thing to ask. I’ve made the occasional commit over the last six months, but it’s mostly been code motion, clean-up work, and things like syscall table updates. New feature development came to a halt long ago.

It’s no coincidence that the number of bugs reported found with Trinity have dropped off sharply since the beginning of the year, and I don’t think it’s because the Linux kernel suddenly got lots better. Rather, it’s due to the lack of real ongoing development to “try something else” when some approaches dry up. Sadly we now live in a world where it’s easier to get paid to run someone else’s fuzzer these days than it is to develop one.

Then earlier this week, came the revelation that the only people prepared to fund that kind of new feature development are pretty much the worst people.

Apparently Hacking Team modified Trinity to fuzz ioctl() on Android, which yielded some results. I’ve done no analysis on whether those crashes are are exploitable/fixed/only relevant to Android etc. (Frankly, I’m past caring). I’m not convinced their approach is particularly sound even if it was finding results Trinity wasn’t, so it looks unlikely there are even ideas to borrow here. (We all already knew that ioctl was ripe with bugs, and had practically zero coverage testing).

It bothers me that my work was used as a foundation for their hack-job. Then again, maybe if I hadn’t released Trinity, they’d have based on iknowthis, or some other less useful fuzzer. None of this really should surprise me. I’ve known for some time that there are some “security” people that have their own modifications they have no intention of sending my way. Thanks to the way that people that release 0-days are revered in this circus, there’s no incentive for people to share their modifications if it means that someone else might beat them to finding their precious bugs.

It’s unfortunate that this project has attracted so many awful people. When I began it, the motivation had nothing to do with security. Back in 2010 we were inundated in weird oopses that we couldn’t reproduce, many times triggered by jvm’s. I came up with the idea that maybe a fuzzer could create a realistic enough workload to tickle some of those same bugs. Turned out I was right, and so began a series of huge page and other VM related bug fixes.

In the five years that I’ve made Trinity available, I’ve received notable contributions from perhaps a half dozen people. In return I’ve made my changes available before I’d even given them runtime myself.

It’s a project everyone wants to take from, but no-one wants to give back to.

And that’s why for the foreseeable future, I’m unlikely to make public any further feature work I do on it.
I’m done enabling assholes.

Thoughts on a feedback loop for Trinity.

With the success that afl has been having on fuzzing userspace, I’ve been revisiting an idea that Andi Kleen gave me years ago for trinity, which was pretty much the same thing but for kernel space. I.e., a genetic algorithm that rates how successful the last fuzz attempt was, and makes a decision on whether to mutate that last run, or do something completely new.

It’s something I’ve struggled to get my head around for a few years. The mutation part would be fairly easy. We would need to store the parameters from the last run, and extrapolate out a set of ->mutate functions from the existing ->sanitize functions that currently generate arguments.

The difficult part is the “how successful” measurement. Typically, we don’t really get anything useful back from a syscall other than “we didn’t crash”, which isn’t particularly useful in this case. What we really want is “did we execute code that we’ve not previously tested”. I’ve done some experiments with code coverage in the past. Explorations of the GCOV feature in the kernel didn’t really get very far however for a few reasons (primarily that it really slowed things down too much, and also I was looking into this last summer, when the initial cracks were showing that I was going to be leaving Red Hat, so my time investment for starting large new projecs was limited).

After recent discussions at work surrounding code coverage, I got thinking about this stuff again, and trying to come up with workable alternatives. I started wondering if I could use the x86 performance counters for this. Basically counting the number of instructions executed between system call enter/exit. The example code that Vince Weaver wrote for perf_event_open looked like a good starting point. I compiled it and ran it a few times.

$ ./a.out 
Measuring instruction count for this printf
Used 3212 instructions
$ ./a.out 
Measuring instruction count for this printf
Used 3214 instructions

Ok, so there’s some loss of precision there, but we can mask off the bottom few bits. A collision isn’t the end of the world for what we’re using this for. That’s just measuring userspace however. What happens if we tell it to measure the kernel, and measure say.. getpid().

$ ./a.out 
Used 9283 instructions
$ ./a.out 
Used 9367 instructions

Ok, that’s a lot more precision we’ve lost. What the hell.
Given how much time he’s spent on this stuff, I emailed Vince, and asked if he had insight as to why the counters weren’t deterministic across different runs. He had actually written a paper on the subject. Turns out we’re also getting event counts here for page faults, hardware interrupts, timers, etc.
x86 counters lack the ability to say “only generate events if RIP is within this range” or anything similar, so it doesn’t look like this is going to be particularly useful.

That’s kind of where I’ve stopped with this for now. I don’t have a huge amount of time to work on this, but had hoped that I could hack up something basic using the perf counters, but it looks like even if it’s possible, it’s going to be a fair bit more work than I had anticipated.

It occurred to me after posting this that measuring instructions isn’t going to work regardless of the amount of precision the counters offer. Consider a syscall that operates on vma’s for example. Over the lifetime of a process, the number of executed instructions of a call to such a syscall will vary even with the same input parameters, as the lengths of various linked lists that have to be walked will change. Number of instructions, or number of branches taken/untaken etc just isn’t a good match for this idea. Approximating “have we been here before” isn’t really achievable with this approach afaics, so I’m starting to think something like the initial gcov idea is the only way this could be done.

kernel code coverage brain dump.

Someone at work recently asked me about code coverage tooling for the kernel. I played with this a little last year. At the time I was trying to figure out just how much of certain syscalls trinity was exercising. I ended up being a little disappointed at the level of post-processing tools to deal with the information presented, and added some things to my TODO list to find some time to hack up something, which quickly bubbled its way to the bottom.

As I did a write-up based on past experiences with this stuff, I figured I’d share.

requires kernel built with
Note: Setting GCOV_PROFILE_ALL incurs some performance penalty, so any resulting kernel built with this option should _never_ be used for any kind of performance tests.
I can’t exaggerate this enough, it’s miserably slow. Disk operations that took minutes for me now took hours. As example:


# time dd if=/dev/zero of=output bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 0.409712 s, 1.3 GB/s
0.00user 0.40system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 2980maxresident)k
136inputs+1024000outputs (1major+340minor)pagefaults 0swaps


# time dd if=/dev/zero of=output bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 6.17212 s, 84.9 MB/s
0.00user 7.17system 0:07.22elapsed 99%CPU (0avgtext+0avgdata 2940maxresident)k
0inputs+1024000outputs (0major+338minor)pagefaults 0swaps

From 41 seconds, to over 7 minutes. Ugh.

If we *didn’t* set GCOV_PROFILE_ALL, we’d have to recompile just the files we cared about with the relevant gcc profiling switches. It’s kind of a pain.

For all this to work, gcov expects to see a source tree, with:

  • .o objects
  • source files
  • .gcno files (these are generated during the kernel build)
  • .gcda files containing the runtime counters. These come from sysfs on the running kernel.

After booting the kernel, a subtree appears in sysfs at /sys/kernel/debug/gcov/
These directories mirror the kernel source tree, but instead of source files, now contain files that can be fed to the gcov tool. There will be a .gcda file, and a .gcno symlink back to the source tree (with complete path). Ie, /sys/kernel/debug/mm for example contains (among others..)

-rw------- 1 root root 0 Mar 24 11:46 readahead.gcda
lrwxrwxrwx 1 root root 0 Mar 24 11:46 readahead.gcno -> /home/davej/build/linux-dj/mm/readahead.gcno

It is likely the symlink will be broken on the test machine, because the path doesn’t exist, unless you nfs mount the source code from the built kernel for eg.

I hacked up the script below, which may or may not be useful for anyone else (honestly, it’s way easier to just use nfs).
Run it from within a kernel source tree, and it will populate the source tree with the relevant gcda files, and generate the .gcov output file.

obj=$(echo "$1" | sed 's/\.c/\.o/')
if [ ! -f $obj ]; then

dirname=$(dirname $1)
gcovfn=$(echo "$(basename $1)" | sed 's/\.c/\.gcda/')
if [ -f /sys/kernel/debug/gcov$pwd/$dirname/$gcovfn ]; then
  cp /sys/kernel/debug/gcov$pwd/$dirname/$gcovfn $dirname
  gcov -f -r -o $1 $obj
  if [ -f $(basename $1).gcov ]; then
    mv $(basename $1).gcov $dirname
  echo "no gcov data for /sys/kernel/debug/gcov$pwd/$dirname/$gcovfn"

Take that script, and run it like so..

$ cd kernel-source-tree
$ find . -type f -name "*.c" -exec "{}" \;

Running for eg, mm/mmap.c will cause gcov to spit out a mmap.c.gcov file (in the current directory) that has coverage information that looks like..

   135684:  269:static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
        -:  270:{
   135684:  271:        struct vm_area_struct *next = vma->vm_next;
        -:  272:
   135684:  273:        might_sleep();
   135686:  274:        if (vma->vm_ops && vma->vm_ops->close)
     5080:  275:                vma->vm_ops->close(vma);
   135686:  276:        if (vma->vm_file)
    90302:  277:                fput(vma->vm_file);
        -:  278:        mpol_put(vma_policy(vma));
   135686:  279:        kmem_cache_free(vm_area_cachep, vma);
   135686:  280:        return next;
        -:  281:}

The numbers on the left being the number of times that line of code was executed.
Lines beginning with ‘-‘ have no coverage information for whatever reason.
If a branch is not taken, it gets prefixed with ‘#####’, like so..

  4815374:  391:                if (vma->vm_start < pend) {
    #####:  392:                        pr_emerg("vm_start %lx < pend %lx\n",
        -:  393:                                  vma->vm_start, pend);
        -:  394:                        bug = 1;
        -:  395:                }

There are some cases that need a little more digging to explain. eg:

    88105:  237:static void __remove_shared_vm_struct(struct vm_area_struct *vma,
        -:  238:                struct file *file, struct address_space *mapping)
        -:  239:{
    88105:  240:        if (vma->vm_flags & VM_DENYWRITE)
    15108:  241:                atomic_inc(&file_inode(file)->i_writecount);
    88105:  242:        if (vma->vm_flags & VM_SHARED)
        -:  243:                mapping_unmap_writable(mapping);
        -:  244:
        -:  245:        flush_dcache_mmap_lock(mapping);
    88105:  246:        vma_interval_tree_remove(vma, &mapping->i_mmap);
        -:  247:        flush_dcache_mmap_unlock(mapping);
    88104:  248:}

In this example, lines 245 & 247 have no hitcount, even though there’s no way they could have been skipped.
If we look at the definition of flush_dcache_mmap_(un)lock, we see..
#define flush_dcache_mmap_lock(mapping) do { } while (0)
So the compiler never emitted any code, and hence, it gets treated the same way as the blank lines.

There is a /sys/kernel/debug/gcov/reset file that can be written to to reset the counters before each test if desired.

Additional thoughts

  • Not sure how inlining affects things.
  • There needs to be some element of post-processing, to work out percentages of code coverage etc, which may involve things like stripping out comments/preprocessor defines.
  • debug kernels differ in functionality in various low level features. For example LOCKDEP will fundamentally change the way spinlocks work. For coverage purposes though, we can choose to not care and stop drilling down at certain levels.
  • Whatever does the post-processing of results may need to aggregate results from multiple test machines. Think of the situation where we’re running a client/server test: Both machines will be running different code paths.
  • ggcov has some interesting looking tooling for visually displaying results.

Trinity socket improvements

I’ve been wanting to get back to working on the networking related code in trinity for a long time. I recently carved out some time in the evenings to make a start on some of the lower hanging fruit.

Something that bugged me a while is that we create a bunch of sockets on startup, and then when we call for eg, setsockopt() on that socket, the socket options we pass have more chance of not being the correct protocol for the protocol the socket was created for. This isn’t always a bad thing; for eg, one of the oldest kernel bugs trinity found was found by setting TCP options on a non-TCP socket. But doing this the majority of the time is wasteful, as we’ll just get -EINVAL most the time.

We actually have the necessary information in trinity to know what kind of socket we were dealing with in a socketinfo struct.

struct socket_triplet {
        unsigned int family;
        unsigned int type;
        unsigned int protocol;

struct socketinfo {
        struct socket_triplet triplet;
        int fd; 

We just had it at the wrong level of abstraction. setsockopt only ever saw a file descriptor. We could have searched through the fd arrays looking for the socketinfo that matched, but that seems like a lame solution. So I changed the various networking syscalls to take a ARG_SOCKETINFO instead of an ARG_FD. As a side-effect, we actually pass sockets to those syscalls more than say, a perf fd, or an epoll fd, or ..

There is still a small chance we pass some crazy fd, just to cover the crazy cases, though those cases don’t tend to trip things up much any more.

After passing down the triplet, it was a simple case of annotating the structures containing the various setsockopt function pointers to indicate which family they belonged to. AF_INET was the only complication, which needed special casing due to the multiple protocols for which we have setsockopt() functions. Creation of a second table, using the protocol instead of the family was enough for the matching code.

There are still a ton of improvements I want to make to this code, but it’s going to take a while, so it’s good when some mostly trivial changes like the above come together quickly.

the more things change.. 4.0

$ ping gelk
PING ( 56(84) bytes of data.
WARNING: kernel is not very fresh, upgrade is recommended.
$ uname -r

Remember that one time the kernel versioning changed and nothing in userspace broke ? Me either.

Why people insist on trying to think they can get this stuff right is beyond me.


update: this was already fixed, almost exactly a year ago in the ping git tree. The (now removed) commentary kind of explains why they cared. Sigh.

LSF/MM 2015 recap.

It’s been a long week.
Spent Monday/Tuesday at LSFMM. This year it was in Boston, which was convenient in that I didn’t have to travel anywhere, but less convenient in that I had to get up early and do a rush-hour commute to get to the conference location in time. At least the weather got considerably better this week compared to the frankly stupid amount of snow we’ve had over the last month.
LWN did their usual great write-up which covers everything that was talked about in a lot more detail than my feeble mind can remember.

A lot of things from last years event seem to still be getting a lot of discussion. SMR drives & persistent memory being the obvious stand-outs. Lots of discussion surrounding various things related to huge pages (so much so one session overran and replaced a slot I was supposed to share with Sasha, not that I complained. It was interesting stuff, and I learned a few new reasons to dislike the way we handle hugepages & forking), and I lost track how many times the GFP_NOFAIL discussion came up.

In a passing comment in one session, one of the people Intel sent (Dave Hansen iirc) mentioned that Intel are now shipping a 18 core/36 thread CPU. A bargain at just $4642. Especially when compared to this madness.

A few days before the event, I had been asked if I wanted to do a “how Akamai uses Linux” type talk at LSFMM, akin to what Chris Mason did re: facebook at last years event. I declined, given I’m still trying to figure that out myself. Perhaps another time.

Wednesday/Thursday, I attended Vault at the same location.
My take-away’s:

  • There was generally a lot more positive vibes around btrfs this year. Even with Josef playing bad cop to Chris’ good cop talk, things generally seemed to be moving away from a “everything is awful” toward “this actually works…” though with the qualifier “.. for facebook’s workload”. Josef did touch on one area that btrfs does still suck, which apparently is database workloads (iirc, due to the copy-on-write nature of btrfs). The spurious ENOSPC failures of the past should hopefully stay in the past. Things generally on the up and up. (Though, this does include the linecount, which has now passed 100KLOC, more than double that of XFS or ext*. Scary).
  • Equally positive vibes surrounding XFS. We celebrated the 20 year anniversary at one evening event, making us all feel just that little bit more like an old fart club. Interesting talk toward the end by Dave Chinner about the future of XFS, and how the current surge of development in XFS is probably its last for various scaling reasons as disks continue to get bigger and bigger. Predicting the future is always hard, but if what Dave said was true, things will start to get ‘interesting’ in about 5 years time, given every other filesystem we support in Linux has the same issues (or worse).
  • People still care a lot about NFS. Especially pNFS. Surprising amount of activity still happening.
  • Even when I worked there, I never really got Red Hat’s “big picture” wrt the several distributed filesystems they supported. Now that I’m not there, I feel even more out of the loop. “ceph is the way forward” “except when it’s glusterfs” or something. Oh, and GFS2 is still a thing apparently, for some reason.
  • As entertaining as Jeremy Allison might be, don’t go to a talk on Samba internals unless you work on it (in which case it’s too late for you). The horrors will likely keep you up at night.
  • Ted’s ext4 talk drew a decent crowd. As fancy as btrfs/xfs etc might be, a *lot* of people still give a crap about extN. Somehow I missed the addition of the ‘lazytime’ option to ext4. Seems neat. Played with it (and also the super-secret ‘dioread_nolock’ mount option). Saw another talk on orphan list scalability in ext4, which was interesting, but didn’t draw as big a crowd.

I got asked “What are you doing at Akamai ?” a lot. (answer right now: trying to bring some coherence to our multiple test infrastructures).
Second most popular question: “What are going to do after that ?”. (answer: unknown, but likely something more related to digging into networking problems rather than fighting shell scripts, perl and Makefiles).

All that, plus a lot of hallway conversations, long lunches, and evening activities that went on possibly a little later than they should have have led to me almost losing my voice today.
Really good use of time though. I had fun, and it’s always good to catch up with various people.

Trinity 1.5 release.

As announced this morning, today I decided that things had slowed down (to an almost standstill of late) enough that it was worth making a tarball release of Trinity, to wrap up everything that’s gone in over the last year.

The email linked above covers most of the major changes, but a lot of the change over the last year has actually been groundwork for those features. Things like..

  • The post-mortem dumper needed the generation of the text and the writing to log files to be decoupled, which wasn’t particularly trivial.
  • Some features involved considerable rewrites. The fd generators are now pretty much isolated from each other, making adding a new one a simple task.
  • Handling of the mapping structs got a lot of cleanup (though there is definitely still a lot of room for improvement there, especially when we do things like splitting a mapping).
  • I should also mention the countless hours spent chasing down quite a few hard-to-reproduce bugs that are fixed in 1.5

As I mentioned in the announcement, I don’t see myself having a huge amount of time for at least this year to work on Trinity. I’ve had a number of people email me asking the status of some feature. Hopefully this demarkation point will answer the question.

So, it’s not abandoned, it just won’t be seeing the volume of change it has over the last few years. I expect my personal involvement will be limited to merging patches, and updating the syscall lists when new syscalls get added.

Trinity used to be on roughly a six month release schedule. We’ll see if by the end of the year there’s enough input from other people to justify doing a 1.6 release.

I’m also hopeful that time working on other projects mean I’ll come back to this at some point with fresh eyes. There are a number of features I wanted to implement that needed a lot more thought. Perhaps working on some other things for a while will give me the perspective necessary to realize those features.

backup solutions.

For the longest time, my backup solution has been a series of rsync scripts that have evolved over time into a crufty mess. Having become spoiled on my mac with time machine, I decided to look into something better that didn’t involve a huge time investment on my part.

The general consensus seemed to be that for ready-to-use home-nas type devices, the way to go was either Synology, or Drobo. You just stick in some disks, and setup NFS/SAMBA etc with a bunch of mouse clicking. Perfect.

I had already decided I was going to roll with a 5 disk RAID6 setup, so bit the bullet and laid down $1000 for a Synology 8-Bay DS1815+. It came *triple* boxed, unlike the handful of 3TB HGST drives.
I chose the HGST’s after reading backblaze’s report on failure rates across several manufacturers, and figured that after the RAID6 overhead, 8TB would be more than enough for a long time, even at the rate I accumulate flac and wav files. Also, worst case, I still had 3 spare bays I could expand into later if needed.

Installation was a breeze. The plastic drive caddies felt a little flimsy, but the drives were secure once in them, even if they did feel like they were going to snap as I flexed them to pop them into place. After putting in all the drives, I connected the four ethernet ports, I powered it up.
After connecting to its web UI, it wanted to do a firmware update, like just about every internet connected device wants to do these days. It rebooted, and finally I could get about setting things up.

On first logging into the device over ssh, I think the first command I typed was uname. Seeing a 3.2 kernel surprised me a little. I got nervous thinking about how many VFS,EXT4,MD bugfixes hadn’t made their way back to long-term stable, and got the creeps a little. I decided to not think too much about it, and put faith in the Synology people doing backports (though I never got as far as looking into their kernel package).

The web ui is pretty slick, though felt a little sluggish at times. I set up my RAID6 volume with a bunch of clicks, and then listened as all those disks started clattering away. After creation, it wanted to do an initial parity scan. I set it going, and went to bed. The next morning before going to work, I checked on it, and noticed it wasn’t even at 20% done. I left it going while I went into the office the next day. I spent the night away from home, and so didn’t get back to it until another day later.

When I returned home, the volume was now ready, but I noticed the device was now noticeably hotter to touch than I remembered. I figured it had been hammering the disks non-stop for 24hrs, so go figure, and that it would probably cool off a little as it idled. As the device was now ready for exporting, I set up an nfs export, and then spent some time fighting uid mappings, as you do. The device does have ability to deal with LDAP and some other stuff that I’ve never had time to setup, so I did things the hard way. Once I had the export mounted, I started my first rsync from my existing backups.

While it was running, I remembered I had intended to set up bonding. A little bit of clicky-clicky later, it was done, and transfers started getting even faster. Very nice. I set up two bonds, with a pair of NICs in each. Given my desktop only has a dual NIC, that was good enough. Having a 2nd 2GigE bond I figured was nice in case I had multiple machines wanting to use it while I was doing a backup.

So the backup was going to take a while, so I left it running.
A few hours later, I got back to it, and again, it was getting really hot. There are two pretty big fans in the back of the units, and they were cranking out heat. Then, things started getting really weird. I noticed that the rsync had hung. I ctrl-c’d it, and tried logging into the device as root. It took _minutes_ to get a command prompt. I typed top and waited. About two minutes later top started. Then it spontaneously rebooted.

When it came back up, I logged in, and poked around the log files, and didn’t see anything out of the ordinary.
I restarted the rsync, and left it go for a while. About 20 minutes later, I came back to check on it again, and found that the box had just hung completely. The rsync was stalled, I couldn’t ssh in. I rebooted the device, cursed a bit, and then decided to think about it for a while, so never restarted the rsync. I clicked around in the interface, to see if there was anything I could turn on/off that would perhaps give me some clues wtf was going on.
Then it rebooted spontaneously again.

It was about this time I was ready to throw the damn thing out the window. I bought this thing because I wanted a turn-key solution that ‘just worked’, and had quickly come to realize that with this device when something went bad, I was pretty screwed. Sometimes “It runs Linux” just isn’t enough. For some people, the Synology might be a great solution, but it wasn’t for me. Reading some of the Amazon reviews, it seems there were a few people complaining about their units overheating, which might explain the random reboots I saw. For a device I wanted to leave switched on 24/7 and never think about, something that overheats (especially when I’m not at home) really doesn’t give me feel good vibes. Some of the other reviews on Amazon rave about the DS1815+. It may be that there was a bad batch, and I got unlucky, but I felt burnt on the whole experience, and even if I had got a replacement, I don’t know if I would have felt like I could have trusted this thing with my data.

I ended up returning it to Amazon for a refund, and used the money to buy a motherboard, cpu, ram etc to build a dedicated backup computer. It might not have the fancy web ui, and it might mean I’ll still be using my crappy rsync scripts, but when things go wrong, I generally have a much better chance of fixing the problems.

Other surprises: At one point, I opened the unit up to install an extra 4GB of RAM (It comes with just 2GB by default), I noticed that it runs off a single 250W power supply, which seemed surprising to me. I thought disks during spin-up used considerably more power, but apparently they’re pretty low power these days.

So, two weeks of wasted time, frustration, and failed experiments. Hopefully by next week I’ll have my replacement solution all set up and can move on to more interesting things instead of fighting appliances.