W4118

HW2 (W4118 Fall 2024)

DUE: Wednesday 10/2/2024 at 11:59pm ET

Modified: Fri Sep 27, 0005 hrs (part 4, question 2)

Modified: Fri Oct 11, 1505 hrs (clarification on what happens when procs > *nr – added for future semesters)

General instructions

All homework submissions are to be made via Git. You must submit a detailed list of references as part of your homework submission indicating clearly what sources you referenced for each homework problem. You do not need to cite the course textbooks and instructional staff. All other sources must be cited. Please edit and include this file in the top-level directory of your homework submission in the main branch of your team repo. Be aware that commits pushed after the deadline will not be considered. Refer to the homework policy section on the class website for further details.

Group programming problems are to be done in your assigned groups. We will let you know when the Git repository for your group has been set up on GitHub. It can be cloned using the following command. Replace teamN with your team number, e.g. team0. You can find your group number here.

$ git clone git@github.com:W4118/f24-hmwk2-teamN.git

IMPORTANT: You should clone the repository directly in the terminal of your VM, instead of cloning it on your local machine and then copying it into your VM. The filesystem in your VM is case-sensitive, but your local machine might use a case-insensitive filesystem (for instance, this is the default setting on macs). Cloning the repository to a case-insensitive filesystem might end up clobbering some kernel source code files. See this post for some examples.

This repository will be accessible to all members of your team, and all team members are expected to make local commits and push changes or contributions to GitHub equally. You should become familiar with team-based shared repository Git commands such as git-pull, git-merge, git-fetch. For more information, see this guide.

There should be at least five commits per member in the team’s Git repository. The point is to make incremental changes and use an iterative development cycle. Follow the Linux kernel coding style. You must check your commits with the run_checkpatch.sh script provided as part of your team repository. Errors from the script in your submission will cause a deduction of points. (Note that the script only checks the changes up to your latest commit. Changes in the working tree or staging area will not be checked.)

The kernel programming for this assignment will be run using your Linux VM. As part of this assignment, you will be experimenting with Linux platforms and gaining familiarity with the development environment. Linux platforms can run on many different architectures, but the specific platforms we will be targeting are the X86_64 or Arm64 CPU families. All of your kernel builds will be done in the same Linux VM from homework 1. You will be developing with the Linux 6.8 kernel.

For this assignment, you will write a system call to dump the process tree and a user space program to use the system call.

For students on Arm computers (e.g. macs with M1/M2/M3 CPU): if you want your submission to be built/tested for Arm, you must create and submit a file called .armpls in the top-level directory of your repo; feel free to use the following one-liner:

$ cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add .armpls && git commit -m "Arm pls"

You should do this first so that this file is present in any code you submit for grading.

For all programming problems, you should submit your source code as well as a single README file documenting your files and code for each part. Please do NOT submit kernel images. The README should explain any way in which your solution differs from what was assigned, and any assumptions you made. You are welcome to include a test run in your README showing how your system call works. It should also state explicitly how each group member contributed to the submission and how much time each member spent on the homework. The README should be placed in the top level directory of the main branch of your team repo (on the same level as the linux/ and user/ directories).

Part 1: Build your own Linux 6.8.0 kernel and install and run it in your Linux VM

You will need to install your own custom kernel in your VM to do this assignment. The source code for the kernel you will use is located in your team repository on GitHub. Follow the instructions provided here to build and install the kernel in your VM.

Part 2: Write a new system call in Linux

General description

The system call you write will retrieve information from each thread associated with each process in some subset of the process tree. That thread information will be stored in a buffer and copied to user space.

Within the buffer, threads associated with the same process (main thread) should be grouped together. These thread groups should be sorted in breadth-first-search (BFS) order of the associated process within the process tree. Within each thread group, the threads should be sorted in ascending order of PID. You may ignore PID rollover for the purposes of ordering threads. See the hint on PID vs TGID below for more information on what we mean when we say the PID of a thread. See also Additional requirements below for an example ordering.

The prototype for your system call will be:

int ptree(struct tskinfo *buf, int *nr, int root_pid);

You should define struct tskinfo as:

struct tskinfo {
    pid_t pid;              /* process id */
    pid_t tgid;             /* thread group id */
    pid_t parent_pid;       /* process id of parent */
    int level;              /* level of this process in the subtree */
    char comm[16];          /* name of program executed */
    unsigned long userpc;   /* pc/ip when task returns to user mode */
    unsigned long kernelpc; /* pc/ip when task is run by schedule() */
};

You should put this definition in include/uapi/linux/tskinfo.h as part of your solution. Note that this path is relative to the root directory of your kernel source tree. The uapi/ directory contains the user space API of the kernel, which should include structures used as arguments for system calls. As an example, you may find it helpful to look at how struct tms is defined and used in the kernel. This is another structure that is part of the user space API and is used for getting process times.

To ensure that user space programs have access to your updated set of header files, you must do the following in the root directory of your kernel source tree:

$ sudo make headers_install INSTALL_HDR_PATH=/usr

This copies the UAPI header files to /usr/. You should now be able to see your new header file as /usr/include/linux/tskinfo.h. You should do this every time you update a header file that is part of the user space API of the kernel.

Description of parameters

Additional requirements

Hints

Part 3: Test your new system call

General description

Write a simple C program which calls ptree. The program should be able to take in a single command line argument and use it as root_pid. If no argument is provided, your program should return the entire process tree. The program should be in the user/part3/ folder of your team repo, and your makefile should generate an executable named test.

Since you do not know the tree size in advance, you should start with some reasonable buffer size for calling ptree, then if the buffer size is not sufficient for storing the tree, repeatedly double the buffer size and call ptree until you have captured the full process tree requested. Print the contents of the buffer from index 0 to the end. For each process, you must use the following format for program output:

printf("%s,%d,%d,%d,%p,%p,%d\n", buf[i].comm, buf[i].pid, buf[i].tgid,
    buf[i].parent_pid, (void *)buf[i].userpc, (void *)buf[i].kernelpc, buf[i].level);

Example program output (yours will likely be different depending on the processes running in your VM):

$ ./test
swapper/0,0,0,0,(nil),0xffff8000815be794,0
systemd,1,1,0,0xfbe1d5d2bd20,0xffff8000815be794,1
kthreadd,2,2,0,(nil),0xffff8000815be794,1
systemd-journal,362,362,1,0xf19f7d72bd74,0xffff8000815be794,2
systemd-udevd,408,408,1,0xfdf0f485bd20,0xffff8000815be794,2
...
bash,3073,3073,2989,0xedbbabba7a70,0xffff8000815be794,11
test,3239,3239,3073,0xe1d2a6e496a8,0xffff8000815be794,12

Hints

Part 4: Investigate kernel source code

Write answers to the following questions in the user/part4.txt text file, following the provided template exactly. Make sure to include any references you use in your references.txt file.

IMPORTANT: The skeleton code pushed to the assignment repo has an error in question two. The original skeleton code asks about the init process. The question should be about the PROCESS WITH PID 0, as is the case below.

  1. There are a few PIDs that are reserved for system processes and kernel threads. These include PIDs 0, 1, and 2. What is the process name associated with each of these three PIDs (some may have multiple acceptable names)?

  2. Give the exact URL on https://elixir.bootlin.com/linux/v6.8/source pointing to the file and line number at which the data structure describing the process with PID 0 is defined. Note: make sure you use v6.8.

  3. Give the exact URL on https://elixir.bootlin.com/linux/v6.8/source pointing to the file and line number at which the function that executes instructions to context-switch from one task to another is defined. Please provide an answer for both arm64 and x86-64. The function you identify, which may be in assembly code, should be the one that contains the actual instruction that switches the CPU’s program counter register to the task so it can run. Note: use v6.8.

  4. Give the exact URL on https://elixir.bootlin.com/linux/v6.8/source pointing to the file and line number at which the process with PID 1 starts running as the currently running process. Please provide an answer for both arm64 and x86-64. Note: use v6.8.

For reference, the URLs you answer with should be in the following format: https://elixir.bootlin.com/linux/v6.8/source/kernel/sched/core.c#L6607

Part 5: Create your own process tree

Write another program foo that creates processes and/or threads corresponding to the following process tree:

         5000,5001
          / \
       5002 5003 
        |     |
       5004 5005

In other words, with foo running in another shell, you should be able to use the program you wrote in part 3 to print the following console output:

$ ./test 5000
foo,5000,5000,1,x,y,0
foo,5001,5000,1,x,y,0
foo,5002,5002,5000,x,y,1
foo,5003,5003,5000,x,y,1
foo,5004,5004,5002,x,y,2
foo,5005,5005,5003,x,y,2

where x represents the userpc, and x represents the kernelpc. Any valid values of x and y are okay. Otherwise, all of the other fields shown in the output above should exactly match the strings and integers shown. In particular, the PID etc must match.

This program should be in the user/part5/ directory of your team repo, and your Makefile should generate the foo executable. While we will be testing your code on a freshly booted system, you may find it helpful to change the maximum possible PID value to make it easier to test your program (this allows PID values to rollover more quickly). For example, using the following commands may be helpful: echo 10000 | sudo tee /proc/sys/kernel/pid_max

Hints

Submission Checklist

Include the following in your main branch. Only include source code (ie *.c,*.h) and text files, do not include compiled objects.