My Linux founation mentorship program journey

What are the Linux foundation mentorships and why should I care?

The Linux foundation hosts a number of mentorship opportunities for aspiring developers in the open source community. They offer programs tailored towards web development, cloud computing, distributed systems, as well as the program that caught my interest and the one that I participated in - The Linux kernel program (Fall 2025). If you’re interested, do check them out.

About The Linux Kernel Mentorship Program (LKMP)

The program in question is tailored towards kernel development, also known as “kernel hacking” with a focus towards fixing bugs in existing code, although writing new code in the core kernel or an additional device driver is also encouraged (although this is definitely more advanced). There used to be 2 seperate programs with the first one called “Linux Kernel Bug Fixing Mentorship” (Yeah, that’s a mouthful) but at some point they were merged into one, referred to as The Linux Kernel Mentorship Program (LKMP).

I found the application process rather straight forward, I created an account on their plafrorm and applied for the program, providing some details on my past professional experience. The mentors definitely expect the mentees to have some shell skills as well as writing C code (or at the very least reading it an understanding it).

I was then presented with some tasks that I had to complete before my application would be considered. This included compiling the kernel, mailing a simple patch for the kernel documentation as well as spelling fixes. Although trivial, these tasks help the mentors filter out candidates that are unwilling to put in some work without guaranteed success. Another task was writing a simple kernel module. I decided to take my time here and wrote something more meaningful than a module that just logs output to the ring buffer, so I wrote a small module that creates a procfs entry /proc/kill that takes input from the user. The input was in the format “<pid> <signal>” and what it did was send the signal to the process (Yes, I know it’s the most useless module ever as it’s just a glorified kill command) but at least it worked. I also integrated it into the build system which was nice. I had some prior “experience” writing myself a procfs module to control the Raspberry Pi GPIO, so I kind of knew where to start.

Either way, I completed all the tasks and hoped for the best. I was very ecstatic when I received the acceptance email in the first week of September.

The LKMP itself

The mentorship itself doesn’t require you to put in work during a specific time of day, which is very nice especially if you work full time. It has a part time and full time (the one I was part of) options. Though this should not scare you if your work full time you will have to put in work during the weekends and late hours after work, so unless you enjoy this you should not apply. The part time and full time options last 6 and 3 month respectively. You are expected to have 5 patches upstream in order to graduate at the end. You will also be asked to generate a final report on your work and write a blog post (like this one).

The mentors are typically well-known kernel developers. During my participation Shuah Khan was the main mentor, I believe she has always hosted this program. She was also helped by another kernel developer - David Hunter and at a later stage Khalid Aziz.

For about the first 1.5 Months, which is half the program I had no patches to show for my troubles, as I was still finding my way through all the available documentation and code. The mentors did very good work with helping us into the process, we had a weekly meeting with them to discuss bugs and ask questions. They showed us how to set up our environment as well as what tools to use. They also focused a lot on patch formatting and how to interact with the kernel community (this one would have definitely taken me forever if I was on my own).

Qemu, syzkaller and some other hard to pronounce words

I used my editor of choice, which is neovim (by the way) and I also set up LSP to make it aware of the kernel source. It helped a lot being able to jump to functions and struct definitions so seemlessly as well as having code completion available.

In our meetings with the mentors we were introduced to syzkaller - the holy grail of kernel fuzzers. The essence of syzkaller is that it finds bugs by running random programs and monitors the kernel for anomalies, like for example a WARNING or OOPS in the ring buffer, or a straight up panic or deadlock. To clarify further, the programs it decides to run are not really random, the are coverage guided so that they invoke as much of the kernel code as possible. I will not pretend to understand how exactly it works, but it does find a lot of bugs in the kernel before they are encountered during normal operation.

The syzkaller reports consist of a commit ID from which the kernel was built, the .config, as well as a so called “C repro”. This is the C representation of the program that was ran to reproduce the bug. This is all the information we need to reproduce the bug.

Qemu is the other vital tool for kernel hacking as it makes it very seemless to boot the kernel in a VM. It can directly load the kernel for you without the need of installing a bootloader and the kernel itself on the VM disk, leaving you to directly format the disk with a filesystem and install your userspace there. More info on setting up your environment can be found in the syzkaller project.

gdb is also widely used to debug the kernel, instructions on setting it up for kernel debugging can be found in the kernel docs. “printf debugging” too is useful in some cases.

“Talk is cheap. Show me the code.” - Linus Torvalds

Armed with all the tools I was ready to tackle some bugs. The first one I focused on was related to the IPSec stack. Although I don’t remember ever setting up and IPSec end-to-end tunnel I was able to figure out the essence of the bug, it was happening due to allowing allocation of an SPI parameter with the value of 0, which was not possible prior to a patch by some other person, but their patch caused a regression. I mailed the patch but as it turned out it was already fixed in the IPSec git tree. It was unfortunate, but I did not get discouraged.

I then decided to get more into filesystems, as they seemed more approachable than network code. I learned about the VFS layer, how filesystems register and present their structures and functionality to it and what could go wrong with them. I read Steve Pate’s book on filesystems. It goes into filesystems starting from the userspace interface going into the kernel. The book is still work in progress but is worth a read in my opinion. The best reference for knowledge is definitely the source code, but a book can be a very good way to start. I definitely cannot cite most of the things I read there, but when I later went through the source code I definitely got recollections of the “Wait, I’ve seen this before” type.

Either way, I was able to solve some bugs related to the f2fs and bfs filesystems, with my f2fs patches currently residing in the git tree of the maintainer and the bfs patches still awaiting review.

Regarding bfs, it’s a very simple and legacy filesystem that was used to hold the binaries needed to boot the SCO UnixWare kernel. It has a single directory and only 512 inodes (as god intended apparently). Since it does not have the classical “debugfs” tool, my only way to peek into the filesystem was to do a hexdump. I have seen people patch binaries with vim before with the typical “:%!xxd” and “:%!xxd -r” so I decided to give it a shot here and it’s surprisingly effective at poking into the filesystem itself and doing some tests. If you’re exploring a simpler filesystem definitely give it a shot, some more modern filesystems may be too complex for this approach.

The bug that took me the longest time to figure out was this one. Even its title is misleading as it has nothing to do with GRE encapsulation, it is a bug in the team device driver implementation. Up to that point I had never even heard of the team device driver, all I had ever used for link aggregation was the bond driver, but apparently it’s meant to be a more modern approach and it also offloads more code to userspace. I decided to test if it’s a regression so I built an older version of the kernel and to my surprise the bug disappeared. Now what was left to do was find out which commit exactly caused the regression, which I first tried to do manually, but that was taking forever and I decided to finally try “git bisect”, which the mentors had mentioned in the meetings. I wrote a small script that tests whether the bug is present.

#!/bin/bash

PORT=10021
REPRO='cd gre && gcc repro.c -o repro && rm -rf syz* && timeout 30s ./repro && exit'

yes '' | make CC='ccache clang' -j$(nproc)

qemu-system-x86_64 -s \
        -kernel $KSRC/arch/x86/boot/bzImage \
        -append "console=ttyS0 root=/dev/sda debug earlyprintk=serial page_poison=on slub_debug=FZP net.ifnames=0 nokaslr slub_debug=P"\
        -hda $IMAGES/unstable.img \
        -enable-kvm \
        -net user,hostfwd=tcp:0.0.0.0:${PORT}-:22 -net nic \
        -nographic \
        -m 2G \
        -smp 2 >/dev/null 2>&1 &

until nc -z localhost $PORT; do
        sleep 1
done

sleep 5

ssh -i ${IMAGES}/unstable.id_rsa -p $PORT root@localhost ${REPRO}
if [ $? -eq 124 ]; then
        RESULT=0
else
        RESULT=1
fi

kill %1
exit $RESULT

In essence it starts a VM, ssh’es to it and runs the reproducer with a timeout of 30 seconds. The return code from ssh will depend on whether we hang or reach the timeout, from there we determine whether the bug is present or not. Now with the relevant git bisect commands I started the process of bisection and let it run overnight as it takes a lot of time during rebuilds. On the other day I was greeted with a result that was at first even more confusing. The cause for the bug was apparently a commit that just removed an unused field from “struct team”. With some more tests I figured out that adding arbitrary padding “masks” the bug and concluded that it must be some sort of memory corruption. With some more time I was able to figure out what was going on, the team device was being left in an inconsistent state when an invalid device was added to it, I came up with a patch in the following weeks and soon the patch got accepted. It also got backported to some stable trees.

Should you apply?

If you want to get into kernel hacking and are struggling to start like me you will definitely benefit from this program. It would have taken me ages to get here (and I’m still very much a beginner) without the help of the mentors and discussions with other mentees. You can gain technical knowledge in many places, but the chance to interact with a community with the same interests is priceless.