I spent over ten years working for Red Hat on the Fedora kernel for living.
I frequently got asked how I managed to swing that gig by people hoping to one day get into kernel hacking themselves.
One of the most common things I get asked is that the kernel is so big, how could anyone possibly understand it all?
Truth is, there are very few people that really understand the whole kernel. The majority of
the 'big name' kernel hackers got where they are today by specialising in one thing,
and branching out. There are exceptions to this of course, with a number of people like
Andrew Morton, Alan Cox, and Linus who are 'all rounders', who have hacked on close to
everything in the tree at some point. While the kernel could always use more people
like these superheros, there is nothing wrong with becoming a specialist in one area.
One thing that both the all-rounders, and specialists have in common however, is an understanding
of the common kernel APIs. Things like 'how to allocate/free memory' 'how to create proc/sysfs files'
Most 'how do I ..' questions can be answered by taking a look at how other parts of the kernel are
already doing something similar. With enough experience of use of the common APIs, higher level
concepts are learned such as 'how to create userspace interfaces that don't suck'.
There is no fast path to learning kernel hacking. It comes down to a big time
investment on your part to read (and understand) code, learn from mistakes you make (and you will make them!),
and above all, realising that in the end, it's just code. There may some additional
restrictions for kernel hacking that you could get away with in userspace, but once
you've grasped the basics, a lot of it just follows.
- Make sure you understand how to compile and install a kernel before going any further.
- A sound knowledge of C is essential. If you're still struggling with pointer arithmetic and such concepts,
stick with hacking stuff in userspace until you understand more. A crash in userspace caused by your
misunderstanding is a lot easier to debug, understand and learn from than one in the kernel which
just causes your machine to lock up or reboot itself.
Kernel specific books:
- The C programming language is regarded by most as 'the' C book. If you don't have a solid handle on things like pointer arithmetic, nested structures and other C concepts, start here.
(And if you're still at that stage, you really should spend some time getting comfortable with C in userspace before moving to kernel programming. Userspace programming is a lot more forgiving about mistakes).
The Linux kernel uses a number of gcc extensions (and later C standard features) that the book won't cover, but you can pick these up later.
- Understanding how various data structures like lists, trees, hashes and more work is essential. Introduction to algorithms is your go-to book here.
Alternatively, there's a lot more in-depth stuff in Algorithms in C, Parts 1-5 (Bundle): Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition)
There have been a number of good books on kernel hacking written. Due to the rapid pace of Linux development however, they are out of date by the time they hit the printing press. They do however remain worth reading, as they explain fundamental concepts well, and knowing some of the historical developments can be useful knowledge.
- kernelnewbies is a great resource for those starting out.
It contains a lot of examples, and pointers.
- Jonathan Corbet also writes a really good concise summary of the past week of kernel development each week at lwn.net. It's well worth
reading, even if you follow linux-kernel, as the rephrasing and explanations are sometimes a lot better in summaries than reading through a 200 email thread.
- The Linux kernel comes with a Documentation/ directory which contains a number of really useful documents worth spending some time reading.
- Finally the code itself. Find something that interests you, find out where in the kernel that is handled, and just start reading.
- git grep is invaluable. (A solid grounding in at least the basic git commands should be considered a prerequisite)
- You're going to be building and rebuilding a lot. So consider installing 'ccache'. Most distros have it packaged so that it automatically sets itself up after installing. A useful trick is to put your ~/.ccache directory on the fastest drive you have, especially an SSD. If you're building on an especially fast system, you may want to benchmark both with/without ccache.
You may also want to look at 'distcc' if you've a lot of potential build-cluster candidate machines local to you.
- Regardless of whether you're an emacs or vi person, 'ctags' are invaluable for navigating your way around
the source tree. 'make tags' in the toplevel of the kernel tree will generate an index. You can learn how to
navigate with them in the man pages of your favorite editor. (In vi, ctrl-] over a symbol jumps to that function, ctrl-t takes you back. :ts will bring up a list of alternatives if there are >1 hit for that function name. You can also
vim -t functionname from the command line)
- Some other people find cscope really useful for the same purpose. 'make cscope' generates the index, running 'cscope' gets you an interface to jump to where functions are used/defined etc.
"I don't know what to hack on!"
A great way of putting your newly learned skills to good use is to take a look at the open bugs
in the kernel bug tracker, find something, and try to help fix it. While many driver bugs need the hardware to really debug/test a solution, a lot of problems can
still be found purely by code inspection. There are no shortage of new bugs being filed all the time, and bug-fixing is a great way to learn about many different areas
of the kernel and how they interact.
"I really just don't get it"
Not everyone gets to be a spaceman, rockstar, or kernel hacker when they grow up. It's fine. Really. There are still a lot of things you can do to help out Linux.
- Testing. Even if hacking code isn't your thing, building and testing the latest snapshots of Linus' tree, or Andrew's -mm tree if you're feeling really adventurous
is always useful. If it breaks, great! You get to contribute something. A bug report to the linux-kernel list.
- related to testing - write test tools. A new syscall got added? Great, write an application to use it in every way imaginable. Complain loudly when it breaks.
Some of the simplest test tools have been the most useful to us. Filesystem stress tools like fsx have been
so useful they become a 'must-use' tool for filesystem developers. My own Trinity project constantly finds bugs
every release cycle. More tools like this would be awesome.
- Hacking userspace isn't 'uncool'. There are a *lot* of things still in need of a lot of love in userspace. Find something that bugs you, and fix it. Can't fix it?
Get involved with the people who wrote it, maybe they'll give you pointers. A lot of userspace projects have summer of code/mentorship programs that may be worth checking out.
- Triage work. Bugzilla is swamped with bugs (both upstream kernel.org, and Fedora). A lot of them really old ones that may even be fixed by now. No-one has the time to regularly go through them all, looking for patches that never got applied upstream, closing duplicates, pinging reporters etc. Get involved!
- Documentation. If along your journey you find something particularly hard to understand, and you found no documentation on it, here's your chance to be
a documentation-writing-superhero! Kernel hackers hate writing documentation for some strange reason.
"What about janitor tasks?"
While the janitor project has some useful information, patches that do nothing but clean up code to comply
with style guidelines and other such trivial patches aren't really a great way to learn. No-one ever learned any skills by changing indentation of a function.
Learn some of the 'rules' proposed there, but instead of focusing on them as 'something to do', use those rules while doing something more useful.