DUE: Wednesday 11/6/2024 at 11:59pm ET
All homework submissions are to be made via Git. You must submit a detailed list of references as part of your homework submission indicating clearly what sources you referenced for each homework problem. You do not need to cite the course textbooks and instructional staff. All other sources must be cited. Please edit and include this file in the top-level directory of your homework submission in the main
branch of your team repo. Be aware that commits pushed after the deadline will not be considered. Refer to the homework policy section on the class website for further details.
Group programming problems are to be done in your assigned groups. We will let you know when the Git repository for your group has been set up on GitHub. It can be cloned using the following command. Replace teamN
with your team number, e.g. team0
. You can find your group number here.
$ git clone git@github.com:W4118/f24-hmwk4-teamN.git
IMPORTANT: You should clone the repository directly in the terminal of your VM, instead of cloning it on your local machine and then copying it into your VM. The filesystem in your VM is case-sensitive, but your local machine might use a case-insensitive filesystem (for instance, this is the default setting on macs). Cloning the repository to a case-insensitive filesystem might end up clobbering some kernel source code files. See this post for some examples.
This repository will be accessible to all members of your team, and all team members are expected to make local commits and push changes or contributions to GitHub equally. You should become familiar with team-based shared repository Git commands such as git-pull, git-merge, git-fetch. For more information, see this guide.
There should be at least five commits per member in the team’s Git repository. The point is to make incremental changes and use an iterative development cycle. Follow the Linux kernel coding style. You must check your commits with the run_checkpatch.sh
script provided as part of your team repository. Errors from the script in your submission will cause a deduction of points. (Note that the script only checks the changes up to your latest commit. Changes in the working tree or staging area will not be checked.)
For students on Arm computers (e.g. macs with M1/M2/M3 CPU): if you want your submission to be built/tested for Arm, you must create and submit a file called .armpls
in the top-level directory of your repo; feel free to use the following one-liner:
$ cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add .armpls && git commit -m "Arm pls"
You should do this first so that this file is present in any code you submit for grading.
For all programming problems, you should submit your source code as well as a single README file documenting your files and code for each part. Please do NOT submit kernel images. The README should explain any way in which your solution differs from what was assigned, and any assumptions you made. You are welcome to include a test run in your README showing how your system call works. It should also state explicitly how each group member contributed to the submission and how much time each member spent on the homework. The README should be placed in the top level directory of the main branch of your team repo (on the same level as the linux/
and user/
directories).
Having completed the first half of your operating systems training, you and your team are now well-versed OS developers, and have been hired by W4118 Inc. to work on their next-generation operating systems. In this next-gen OS, W4118 Inc. has tasked your team to improve the Linux kernel’s scheduling capabilities and optimize for workloads typical of W4118 Inc.’s customers.
Data privacy is important in modern computing, so instead of real customer programs, W4118 Inc. has provided you with a way to simulate the typical workloads submitted by end users using a Fibonacci calculator. The following links provide two task set workloads, each consisting of a list of jobs, with each line representing the N-th Fibonacci number to compute:
Note: The Fibonacci calculator has been provided as user/fibonacci.c. It uses an inefficient algorithm by design to produce differing job lengths. You can modify the file, but do not modify the fib
function.
Given these two task sets, W4118 Inc. is interested in a scheduler (let’s call it “Oven”) that can optimize for the completion time of the tasks. Specifically, the scheduler should provide the minimum average completion time for each task set across all tasks in the respective task set.
To see how well a scheduler performs, it is necessary to first have a way to measure performance, specifically completion time of tasks. W4118 Inc. is looking for a way to trace scheduling events and determine how well a scheduler functions by measuring tasks’ run queue latency and total duration from start to finish.
Extended Berkeley Packet Filter (eBPF) is a powerful feature of the Linux kernel that allows programs to inject code into the kernel at runtime to gather metrics and observe kernel state. You will use eBPF to trace scheduler events. Tracepoints for these scheduler events are already available in the scheduler (for example, you can search in core.c for trace_sched_), so your job is to write code that will be injected into the kernel to use them. You should not need to modify any kernel source code files for this first part of the assignment.
bpftrace is a Linux tool that allows using eBPF with a high level, script-like language. Install it on your VM with:
sudo apt install bpftrace
Instead of searching through source code, you can run bpftrace -l to see what available tracepoints there are in the kernel.
Write a bpftrace script named trace.bt to trace how much time a task spends on the run queue, and how much time has elapsed until the task completed. Your profiler should run until stopped with Ctrl-C. While tracing, your script should, at tasks’ exits, print out a table of the task name, task PID, milliseconds spent on the run queue (not including time spent running), and milliseconds until task completion. Note that task PID here is the actual pid field in a task_struct, not what is returned by getpid. The output should be comma-delimited and follows the format:
COMM,PID,RUNQ_MS,TOTAL_MS
myprogram1,5000,2,452
myprogram2,5001,0,15
...
Ensure that the output of your eBPF script is synchronous. That is, it should print each line immediately after a task completes, and not when Ctrl-C is sent to the eBPF script. You should also only print traces from processes that have started after your script begins running. You should trace all such process and perform no additional filtering.
Test your script by running sudo bpftrace trace.bt in one terminal. In a separate terminal, run a few commands and observe the trace metrics in your first terminal. Experiment with different task sizes. Submit your eBPF script as user/trace.bt.
Now that you have an eBPF measurement tool, use it to measure the completion times for the the two task set workloads when running on the default CFS scheduler in Linux. What is the resulting average completion time for each workload? Write the average completion times of both workloads in your README. You may find this shell script helpful for running your tasks. You should perform your measurements for two different VM configurations, a single CPU VM and a four CPU VM. We are using the term CPU here to mean a core, so if your hardware has a single CPU but four cores, that should be sufficient for running a four CPU VM. If your hardware does not support four cores, you may instead run with a two CPU VM. Specify in your README the number of CPUs used with your VM for your measurements.
Hint: You may find it useful to reference the bpftrace GitHub project, which contains a manual and a list of sample scripts. A useful example script to start with is runqlat.bt.
Note: Since your profiler will run on the same machine as the test workloads, you will need to ensure that the profiler gets sufficient CPU time to record data. What could you do to the profiler’s scheduling class and priority to ensure it can run over the test workloads? You may find chrt
helpful.
W4118 Inc. has tasked you with creating a new scheduling policy that provides better average completion time than the default Linux scheduling policy. You should begin by implement a new scheduler policy. Call this policy OVEN. For this stage, the scheduler should work as follows:
sched_priority
field of struct sched_param
.Some notes that may be helpful in your implementation:
sched_priority
such that any weight assignment less than the minimum valid weight is instead forced to be the MAX_WEIGHT. The reason for this is that there are system programs that use sched_setscheduler with lower values for sched_priority
. If they are scheduled using your scheduling class, you want to detect their invalid weight assignments and schedule them using your round-robin algorithm.Recall your average completion time measurements from part 1, which used the CFS scheduler. Can you do better?
You should consider how you might modify the fibonacci program to set its OVEN weight. How did you determine when jobs of different weights will be scheduled in relation to each other? What weights and scheduling algorithm will optimize for average completion time? Make any changes to fibonacci and your scheduler, then submit eBPF traces of the two task sets running on your scheduler. You should use the same two VM CPU configurations you used earlier to measure performance using CFS. Submit your eBPF traces for the two task sets and CPU configurations as user/taskset1_average.txt, user/taskset1_average_smp.txt, user/taskset2_average.txt, and user/taskset2_average_smp.txt Write the average completion times for each workload in your README and compare against CFS.
Hint: Configuring the scheduling class and priority within the Fibonacci program will be too late (why is this the case?). You may want to write a small program to set the scheduler and weight before calling exec on Fibonacci.
Note: When launching jobs from the task list, start tasks in the order they are listed, but do not wait for tasks to finish before launching the next one. Also, for all trace submissions in this section, filter out any process that is not fibonacci.
So far, your scheduler will run a task only on the CPU that it was originally assigned. Let’s change that now! For this part you will be implementing idle balancing, which is when a CPU will attempt to steal a task from another CPU when it doesn’t have any tasks to run (i.e. when its runqueue is empty).
Load balancing is a key feature of the default Linux scheduler (Completely Fair Scheduler). While CFS follows a rather complicated policy to balance tasks and works to move more than one task in a single go, your scheduler will have a simple policy in this regard.
Idle balancing works as follows:
Once you add idle load balancing, repeat the performance tests you conducted in the previous part and make notes on any observations. Again, be sure to include actual data. Submit the trace in user/taskset1_smp_balance.txt and user/taskset2_smp_balance.txt and note any differences to your previous scheduler in your README. You only need to submit results in this case for the same multi-CPU VM configuration as you used previously, but do not need to submit single-CPU VM results for this case. Write the updated average completion times for each workload in your README and compare against CFS.
Thus far you have focused on optimizing the average completion time across all jobs. Another important metric to consider is tail completion time, which is the maximum completion time across 99% of all jobs. Tail completion time focuses on ensuring that the completion time of most jobs is no worse than some amount. Using tail completion time as the performance metric of interest, compare your optimized scheduling class versus the default Linux scheduler for the W4118 Inc. workloads. Which one does better? If the default Linux scheduler has better tail completion time, can you change how you use your OVEN scheduler (without modifying the OVEN scheduler code) to provide tail completion time comparable to the default Linux scheduler?
Make any changes to fibonacci , then submit eBPF traces of the two task sets running on your scheduler as user/taskset1_tail.txt and user/taskset2_tail.txt. You only need to submit results in this case for the same multi-CPU VM configuration as you used previously, but do not need to submit single-CPU VM results for this case. Write the tail completion times for each workload in your README and compare against CFS.
Write answers to the following questions in the user/part6.txt text file, following the provided template exactly. Make sure to include any references you use in your references.txt file.
The following are some debugging tips you may find useful as you create your new scheduling class.
You may find it helpful to use ps, top, or htop to view information such as the CPU usage and scheduling policies of your tasks. These commands rely on execution time statistics reported by certain scheduler functions in the kernel. As a result, bugs in your scheduling class could cause these commands to report inaccurate information. This cuts two ways. First, it is possible that your scheduler is working properly in terms of selecting the right tasks to execute for the right amount of time, but your calculation of execution time statistics in the kernel is wrong, so these commands appear to report that your tasks are not running at all when in fact they are. Second, it is possible that your scheduler is not working properly such that these tools report that tasks are using your scheduling class when in fact they are not.
As a result, you should not exclusively rely on these tools to determine if your scheduling class is working properly. For example, when you make your scheduling class the default scheduling class, the fact that user-level tools claim that all tasks are using your scheduling class may not necessarily mean that this is the case. Instead, to ensure that your scheduling class functions are actually being used, you might add a printk to a function like your class’s enqueue_task() and verify that it appears in dmesg output. Make sure that you do not submit your code with such debugging printk statements as they can cause issues if invoked too frequently.
Include the following in your main branch. Only include source code (ie *.c,*.h) and text files, do not include compiled objects.