DUE: Wednesday 10/16/2024 at 11:59pm ET
Modified: Fri Oct 25, 1502 hrs (clarification on parts 4 and 5 – added for future semesters)
All homework submissions are to be made via Git. You must submit a detailed list of references as part of your homework submission indicating clearly what sources you referenced for each homework problem. You do not need to cite the course textbooks and instructional staff. All other sources must be cited. Please edit and include this file in the top-level directory of your homework submission in the main
branch of your team repo. Be aware that commits pushed after the deadline will not be considered. Refer to the homework policy section on the class website for further details.
Group programming problems are to be done in your assigned groups. We will let you know when the Git repository for your group has been set up on GitHub. It can be cloned using the following command. Replace teamN
with your team number, e.g. team0
. You can find your group number here.
$ git clone git@github.com:W4118/f24-hmwk3-teamN.git
IMPORTANT: You should clone the repository directly in the terminal of your VM, instead of cloning it on your local machine and then copying it into your VM. The filesystem in your VM is case-sensitive, but your local machine might use a case-insensitive filesystem (for instance, this is the default setting on macs). Cloning the repository to a case-insensitive filesystem might end up clobbering some kernel source code files. See this post for some examples.
This repository will be accessible to all members of your team, and all team members are expected to make local commits and push changes or contributions to GitHub equally. You should become familiar with team-based shared repository Git commands such as git-pull, git-merge, git-fetch. For more information, see this guide.
There should be at least five commits per member in the team’s Git repository. The point is to make incremental changes and use an iterative development cycle. Follow the Linux kernel coding style. You must check your commits with the run_checkpatch.sh
script provided as part of your team repository. Errors or warnings from the script in your submission will cause a deduction of points.
For students on Arm computers (e.g. macs with M1/M2/M3 CPUs): if you want your submission to be built/tested for Arm, you must create and submit a file called .armpls
in the top-level directory of your repo; feel free to use the following one-liner:
$ cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add .armpls && git commit -m "Arm pls"
You should do this first so that this file is present in any code you submit for grading.
For all programming problems, you should submit your source code as well as a single README file documenting your files and code for each part. Please do NOT submit kernel images. The README should explain any way in which your solution differs from what was assigned, and any assumptions you made. You are welcome to include a test run in your README showing how your system call works. It should also state explicitly how each group member contributed to the submission and how much time each member spent on the homework. The README should be placed in the top level directory of the main branch of your team repo (on the same level as the linux/
and user/
directories).
As you have seen in homework assignment 2, the kernel maintains the state for each process and records that state in the __state
field of the task_struct
of the process. The state indicates whether the process is runnable or running (TASK_RUNNING
), sleeping (TASK_INTERRUPTIBLE
, TASK_UNINTERRUPTIBLE
), stopped (__TASK_STOPPED
), dead (TASK_DEAD
), etc. When a process is dead, the exit_state
field of the task_struct of the process indicates whether the process is zombied (EXIT_ZOMBIE
) or really dead (EXIT_DEAD
).
In this homework, you will be tracing the state changes of processes, and recording them in a ring buffer. A ring buffer is a circular buffer that has no real end. It can be represented using a fixed size array that keeps wrapping around. After writing to the last array location, new entries will start writing at the beginning of the array again, overwriting any records that were previously there. You will write a system call to copy the contents of the ring buffer to user space so that a user program can dump the contents of the buffer. In addition, you will need to write a synchronization mechanism to copy the contents of the ring buffer to user space at the occurrence of some future event.
Your changes should include the following (note that all paths are relative to the root of your kernel source tree):
pstrace.c
in the kernel/
directory, i.e. kernel/pstrace.c
.pstrace.h
in the kernel uapi include directory, i.e. include/uapi/linux/pstrace.h
.pstrace.h
in the kernel include directory, i.e. include/linux/pstrace.h
.You should not change any existing kernel data structures. Almost all your kernel changes should be in pstrace.c
. pstrace.h
should only contain the necessary data structure and macro definitions. The only changes to existing files in the kernel source code should be those required for the system call table modifications and calls to record process state changes in various kernel code paths. You will also need to change the kernel Makefiles
.
STEP 1. Set up system calls for enabling and disabling tracing
In this part, we will set up the relevant system calls to enable/disable the tracing of a process’s state change. We only want to record the state changes of processes that have had tracing enabled by these system calls. The interface for these functions are:
/*
* Syscall No. 462
* Enable the tracing for @pid. If -1 is given, trace all processes.
*/
long pstrace_enable(pid_t pid);
/*
* Syscall No. 463
* Disable tracing.
*/
long pstrace_disable();
In other words, your tracing system will either trace a single process or trace all processes. You should use a suitable data structure to keep track of what processes are being traced. Note that this system call simply enables/disables the tracing - the actual writing to the ring buffer will happen in a subsequent function.
Additional requirements
task_struct
(for the entire assignment).pstrace_enable()
will replace the set of processes being traced with what is specified in the newer pstrace_enable()
call.Tasks
pstrace_enable()
and pstrace_disable()
.STEP 2. Set up the ring buffer
Once you can successfully indicate which processes should be traced, you should set up the structure of the ring buffer. Here are some data structures that are relevant for this part of the assignment:
#define PSTRACE_BUF_SIZE 500 /* The maximum size of the ring buffer */
/* The data structure representing the global ring buffer */
struct pstrace_kernel {
struct pstrace entry[PSTRACE_BUF_SIZE]; /* State changes */
long counter; /* Number of records */
/* Include other fields that you may need */
};
/* The data structure used to record a state change */
struct pstrace {
char comm[16]; /* Name of the process */
long state; /* NEW state of the process */
pid_t pid; /* PID of the process; i.e. returned by getpid() */
pid_t tid; /* TID of the thread; i.e. returned by gettid() */
};
Note that we want our entry to record the NEW state of the process. Note also that these data structures may need to be spread across multiple header files, depending on whether they’re part of the user space API.
You are free to modify the pstrace_kernel
structure, and you will indeed need to add additional fields to the structure. Note that in addition to the array for the ring buffer itself, pstrace_kernel
has a counter, which should be initialized to zero and be incremented for each record that is added to the ring buffer. It is a count of how many trace records have ever been added to the ring buffer and should never be reset.
Tasks
STEP 3. Recording one type of state change
For now, we want to focus on tracing only one type of state change: when a process that is running becomes blocked but interruptible. This will result in the process state changing from TASK_RUNNING
to TASK_INTERRUPTIBLE
. When a process blocks in this manner, you should call pstrace_add()
to create a new entry that records the process’s updated state TASK_INTERRUPTIBLE
in the state
field of a struct ptrace
, and add this new entry to the global ring buffer. The interface of the function is:
/* Add a record of a state change into the ring buffer. */
void pstrace_add(struct task_struct *p, long stateoption);
pstrace_add()
should only record the state change if the given process has tracing enabled. The stateoption
argument is an additional argument passed to pstrace_add()
that you can use as you see fit. For example, you could use the argument to pass in the state that you want recorded or any other information that would be helpful, but you do not have to use the argument for this part of the assignment. We will use this argument in later parts.
Since the ring buffer is shared by all CPUs, multiple state changes may occur simultaneously. You will need to define a lock to protect the ring buffer from race conditions, and use this lock in pstrace_add()
.
Your job is to record actual changes in process state, not necessarily whenever the __state
field is changed. For example, if the Linux code changes __state
from TASK_RUNNING
to TASK_INTERRUPTIBLE
to TASK_RUNNING
all without actually running another process, the process’s state did not really change from TASK_RUNNING
. A key part of this assignment is figuring out where a process’s state actually changes and recording those events. You should carefully consider the discussion in class regarding the lifecycle of a process in Linux, including when a process actually starts and stops running.
Hints
You may want to look at the following file when thinking about where to add (a) call(s) to pstrace_add()
:
kernel/sched/core.c
Tasks
pstrace_add()
to track when a process’s state changes from TASK_RUNNING
to TASK_INTERRUPTIBLE
.STEP 4. Return a copy of ring buffer to user space
At this point, we’ve allowed users to enable/disable the tracing of processes, and have recorded state changes from TASK_RUNNING
to TASK_INTERRUPTIBLE
into a buffer located in kernel space. Now, we want to be able to access these records in user space. Write a system call that can copy the records in the ring buffer to user space. The interface of the system call is:
/*
* Syscall No. 464
*
* Copy the pstrace ring buffer into @buf.
* If @counter points to 0, copy the pstrace ring buffer.
* If @counter points to a value < 0, return an error.
*
* Returns the number of records copied.
*/
long pstrace_get(struct pstrace *buf, long *counter);
If *counter == 0
, copy the current state of the pstrace global ring buffer into @buf
and set *counter
to the value of the buffer counter corresponding to the last record copied. If *counter < 0
, it is invalid and so return an error. Later in the assignment, you will make use of the counter
argument more extensively, but for now, just have your system call handle these two cases.
In addition, you should have another system call that clears the ring buffer.
/*
* Syscall No. 465
*
* Clear the pstrace buffer. Cleared records should
* never be returned by `pstrace_get()`. This function
* should NOT reset the value of the buffer counter.
*/
long pstrace_clear();
Cleared records should no longer be able to be returned by future pstrace_get()
calls. Do not reset the ring buffer counter value.
Tasks
pstrace_get()
and pstrace_clear()
.pstrace
implementation. Make sure that you can record this one type of state change and return the buffer results to user space immediately when pstrace_get()
is called, before moving on to the next sections.Extend the usage of pstrace_add()
to now record changes in processes’ state throughout the kernel. Specifically, changes to the following seven states should be traced and recorded in the kernel ring buffer:
TASK_RUNNING
TASK_RUNNABLE
TASK_INTERRUPTIBLE
TASK_UNINTERRUPTIBLE
__TASK_STOPPED
EXIT_ZOMBIE
EXIT_DEAD
Note that you should represent a task that is on the run queue as TASK_RUNNABLE
while a task that is actually running on the CPU should be TASK_RUNNING
. However, Linux does not have an actual state TASK_RUNNABLE
. It denotes both the state of being on the run queue and actually running as TASK_RUNNING
. To get around this, you should introduce a TASK_RUNNABLE
state for tracing purposes only (i.e. TASK_RUNNABLE
is not stored in the actual __state
field of the task_struct
). You should define TASK_RUNNABLE
to have a value of 3. (See here for more details.) Note that since TASK_RUNNABLE
is not an actual value stored in the __state
field of the task_struct
, you will find it useful to use the stateoption
argument of pstrace_add()
to record when a process’s state changes to TASK_RUNNABLE
.
Again, your job is to record actual changes in process state, not necessarily whenever the __state
field is changed. You should identify where a process’s state actually changes and record those events as opposed to trying to insert pstrace_add()
calls whenever you find the __state
field is being modified. You should minimize the number of pstrace_add()
calls required to trace the seven states.
You will also need to carefully consider the code paths in which you insert pstrace_add()
since the function does locking. Otherwise, you may deadlock your system. In particular, you should ask yourself whether the code path in which you insert pstrace_add()
could be executed as a result of an interrupt and what implications that may have on the specific locking primitives you use.
Hints
You may want to look at the following files when thinking about where to add (a) call(s) to pstrace_add()
:
kernel/exit.c
kernel/sched/core.c
Tasks
pstrace_add()
to now track all specified process state changes.You will now update the implementation of your pstrace_get()
system call to support waiting for the ring buffer counter to reach some value before copying the contents of the ring buffer to user space. The interface to pstrace_get()
remains the same, but the system call should now provide new functionality for the case when the counter argument is positive:
/*
* Syscall No. 464
*
* Copy the pstrace ring buffer into @buf.
* If @counter points to 0, copy the immediate pstrace ring buffer.
* If @counter points to a value < 0, return an error.
* If @counter points to a value > 0, the caller process will wait until
* a full buffer can be returned starting from record *counter
* (i.e. copy records numbering between *counter and
* *counter + PSTRACE_BUF_SIZE - 1 from the pstrace ring buffer to @buf).
*
* Returns the number of records copied.
*/
long pstrace_get(struct pstrace *buf, long *counter);
Specifically, a positive value of *counter
indicates a request for a full buffer starting at ring buffer counter *counter
. That is, your system call should copy records into @buf
in chronological order such that the first record is the record corresponding to ring buffer counter *counter
, and the last record is the record corresponding to ring buffer counter *counter + PSTRACE_BUF_SIZE - 1
.
If the ring buffer counter has not yet reached record *counter + PSTRACE_BUF_SIZE - 1
(the last record requested), your system call should sleep until this happens and the full buffer can be returned. When you return, update *counter
to the value of the ring buffer counter corresponding to the last record that was requested.
For example, if
pstrace_get()
is called with*counter = 1000
, it should not return until the ring buffer counter has reached 1499. When it returns, it should return the relevant corresponding buffer records from buffer counter 1000 to 1499, with*counter
updated to 1499.
If the ring buffer counter has already exceeded record *counter + PSTRACE_BUF_SIZE - 1
(the last record requested), you will only be able to partially fill the buffer. In particular, only copy relevant records within the requested range. You should NOT copy in any other records to fill the buffer.
For example, if
pstrace_get()
is called with*counter = 800
, then the last record that is copied over should be the record corresponding to the counter 1299. However, if the ring buffer counter is already at 1399, then it means that it only contains records corresponding to counters 900 to 1399. Therefore, you should only return 400 buffer records (corresponding to counters 900 to 1299) instead of a full buffer, since those are the only relevant records within the requested range that are available.
If all the records within the requested range have already been evicted from the ring buffer, that is if *counter + PSTRACE_BUF_SIZE - 1
(the last requested record) is less than the earliest record left in the ring buffer, set *counter
to be the current value of the ring buffer counter and return immediately. Do not copy any records from the ring buffer in this case.
For example, if
pstrace_get()
is called with*counter = 100
when the ring buffer counter is already at 1400, no records can be copied and*counter
is updated to 1400.
Hints
TASK_INTERRUPTIBLE
. You may NOT let the system call spin when the counter condition has not been met.pstrace_get()
to provide the expected functionality. In particular, there may be an arbitrary amount of time that elapses between when a process is woken up and when the process is actually run to complete the system call and return the records to the calling process. Be sure to return the correct records requested when possible.pstrace_get()
call, to ensure that you do not deadlock your system. How does that process wake up? What trace records are generated as a result of waking up that process? What locks are held when you are recording trace records?Additional requirements
pstrace_get()
should be woken up and return an appropriate error.pstrace_clear()
should also wake up all processes waiting on pstrace_get()
. The processes that are woken up should copy any relevant records currently in the buffer and return as opposed to waiting for their respective buffer full conditions to be met. *counter
should still remain as the value of the last record that was originally requested.Tasks
pstrace_get()
to support positive values of *counter
.pstrace_clear()
to wake up any waiting processes.Write a program named test
that calls the the pstrace_get()
function repeatedly to return the records in the buffer over time. Show how you can use the counter value so that successive calls to pstrace_get()
return a chronological ordering of all records. The program should be in the user/part4/
directory of your team repo.
For testing purposes, you should also write another program named seven_states
that changes its states between running and sleeping a certain number of times and then exits. Use your test
program to trace the process of this second program. This should be done by modifying test
so that it forks a child process that execs the second program. In other words, we should be able to see the results of your second program by simply doing ./test
, without needing to start your seven_states
program manually in another terminal.
Your Makefile
must have a distinct rule to build test
, and a distinct rule to build seven_states
. In other words, we should be able to call make test
and make seven_states
separately to compile each of the executables. Calling make
without any specified target should compile both executables. In other words, we should be able to see all your results by simply doing make && ./test
.
You should be able to observe how the second program turns from running to sleeping and finally, to zombie, and exits. Your testing should generate at least one record for each of the seven distinct process states we have asked you to record, and you should include the resulting output in your submission in a file user/part4/pstrace_output.txt
.
As was the case in HW2, you will have to run
sudo make headers_install INSTALL_HDR_PATH=/usr
in the root of your kernel source tree to make your new kernel header files available to user space programs.
Write answers to the following questions in the user/part5.txt
text file, following the provided template exactly. Make sure to include any references you use in your references.txt file
.
For question 1, make sure that you are referencing the correct kernel version in bootlin (v6.8). For reference, the URLs you answer with should be in the following format: https://elixir.bootlin.com/linux/v6.8/source/kernel/sched/core.c#L1234
IMPORTANT: For each of the URLs you give, make sure the line numbers correspond to the line numbers on bootlin. In other words, they should be the line numbers BEFORE you made any changes to the kernel source code.
For questions, 2, 3, and 4, give your answer in the format [STATE] -> [STATE]
, replacing STATE
with the relevant state. For example: [TASK_RUNNING] -> [TASK_INTERRUPTIBLE]
. Do NOT remove the brackets around the state names.
Give bootlin URLs showing the file and line number at which you inserted a pstrace_add()
call to trace the state change to:
TASK_RUNNING
TASK_RUNNABLE
TASK_INTERRUPTIBLE
TASK_UNINTERRUPTIBLE
__TASK_STOPPED
EXIT_ZOMBIE
EXIT_DEAD
Which state change(s) can be directly caused by interrupts (i.e. an interrupt handler will change the __state
field)?
Which state change(s) can only be caused by the running process itself (i.e. current
will change its own __state
field)?
Which state change(s) always involves or results in the current running process calling schedule()
?
There are 3 tasks with respective pid values 1150, 1152, and 1154. They are run in the following order on a CPU: pid 1150, 1152, 1154, 1152, 1150, 1154. In __schedule()
, there are two variables, prev
and next
. Each of these variables is a pointer to a struct task_struct
.
prev
and next
? For any prev
or next
that cannot be known from the information provided, indicate your answer as UNKNOWN
.prev
and next
? For any prev
or next
that cannot be known from the information provided, indicate your answer as UNKNOWN
.Include the following in your main
branch. Only include source code (ie *.c
, *.h
) and text files. Do not include compiled objects.
.armpls
file for teams with Arm CPUsREADME
filereferences.txt
filelinux/kernel/pstrace.c
linux/include/uapi/linux/pstrace.h
and linux/include/linux/pstrace.h
test.c
in user/part4/
test.c
in user/part4/pstrace_output.txt
user/part5.txt