7. Scheduling
Chapter 6 is cooperative: a process runs until it yields or exits. A CPU-bound loop without ecall holds the CPU indefinitely. This chapter fixes that with a timer quantum that forcibly preempts after 10 ms.
It also adds SYS_SLEEP and brings secondary harts online so multiple processes run in parallel.
spawned pid 0spawned pid 1hart 1 onlineA1B1A2pid 0: exitedB2pid 1: exitedall processes exitedThe demo programs now use sleep instead of yield. A sleeps 100 ms between prints, B sleeps 200 ms. “hart 1 online” may appear at a different position depending on timing.
What Changes
Section titled “What Changes”| File | Status |
|---|---|
.cargo/config.toml | change -smp 1 to -smp 2 |
src/sched/timer.rs | new — rdtime, TICKS_PER_MS, handle_interrupt |
src/sched/mod.rs | add quantum timer, sleep handling, preemption |
src/sched/process.rs | add Sleeping state, sleep_until, wake_sleepers |
src/sched/syscall.rs | add SYS_SLEEP |
src/utils/sbi.rs | add HSM extension, hart_start |
src/boot.rs | store hart IDs, start secondary harts |
src/trap/mod.rs | timer dispatch to sched::timer |
src/main.rs | call start_secondary_harts |
src/demo.rs | update programs to use sleep |
boot.S | add _secondary_entry |
Enable a Second Hart
Section titled “Enable a Second Hart”In .cargo/config.toml, change -smp 1 to -smp 2:
[build]target = "riscv64gc-unknown-none-elf"
[target.riscv64gc-unknown-none-elf]runner = "qemu-system-riscv64 -machine virt -m 256M -smp 2 -nographic -serial mon:stdio -bios default -kernel"rustflags = [ "-C", "link-arg=-Tlink.ld",]The Timer
Section titled “The Timer”Create src/sched/timer.rs:
pub const TICKS_PER_MS: u64 = 10_000; // 10 MHz QEMU virt timer
pub fn rdtime() -> u64 { let val: u64; unsafe { core::arch::asm!("rdtime {}", out(reg) val) }; val}
/// Clear the pending timer interrupt.pub fn handle_interrupt() { crate::utils::sbi::set_timer(u64::MAX);}QEMU’s virt machine has a 10 MHz timer. rdtime reads the current tick count. handle_interrupt disarms the timer by setting it to u64::MAX.
Add the Sleeping State
Section titled “Add the Sleeping State”Update src/sched/process.rs. The state machine gains a Sleeping variant:
use crate::memory::{PAGE_SIZE, alloc_frame, alloc_zeroed_frame};use crate::paging::*;use crate::trap::TrapFrame;use alloc::vec::Vec;use spin::Mutex;
const CODE_VA: usize = 0x10000;const USER_STACK_BASE: usize = 0x0100_0000;const USER_STACK_PAGES: usize = 4;
enum State { Ready, Running(usize), Sleeping(u64), Exited,}
struct Process { state: State, tf: *mut TrapFrame, satp: usize, tf_va: usize,}
unsafe impl Send for Process {}
#[derive(Clone, Copy)]pub(crate) struct Runnable { pub pid: usize, pub tf: *mut TrapFrame, pub satp: usize, pub tf_va: usize,}
pub(crate) enum Next { Runnable(Runnable), SleepUntil(u64), Done, Empty,}
static TABLE: Mutex<ProcessTable> = Mutex::new(ProcessTable::new());
pub(crate) fn spawn_with_args(code: &[u8], _args: [usize; 4]) -> usize { assert!(code.len() <= PAGE_SIZE, "user program too large");
// Code page is zeroed: any bytes past `code.len()` decode to a // deterministic instruction (illegal-instruction trap on RV64) instead // of stale frame contents from a previous owner. let code_pa = alloc_zeroed_frame().expect("oom"); let tf_pa = alloc_frame().expect("oom");
unsafe { core::ptr::copy_nonoverlapping( code.as_ptr(), code_pa as *mut u8, code.len(), ); }
let tf_va: usize = TRAMPOLINE - PAGE_SIZE;
let pt = PageTable::alloc(); pt.map_at(CODE_VA, code_pa, PTE_U | PTE_R | PTE_X, 0); let stack_top = crate::memory::map_stack( pt, USER_STACK_BASE, 0, USER_STACK_PAGES, PTE_U, ); pt.map_trampoline(); pt.map_at(tf_va, tf_pa, PTE_R | PTE_W, 0);
let tf_ptr = tf_pa as *mut TrapFrame; let tf = TrapFrame::new(CODE_VA, stack_top); unsafe { tf_ptr.write(tf) };
let pid = TABLE.lock().add(tf_ptr, pt.satp(), tf_va); println!("spawned pid {}", pid); pid}
pub(crate) fn pick_next(now: u64, hartid: usize) -> Next { let mut table = TABLE.lock(); table.wake_sleepers(now); table .pick_next(hartid) .map(Next::Runnable) .or_else(|| table.next_sleep_deadline().map(Next::SleepUntil)) .or_else(|| table.all_exited().then_some(Next::Done)) .unwrap_or(Next::Empty)}
pub(crate) fn ready(pid: usize) { TABLE.lock().ready(pid);}
pub(crate) fn sleep_until(pid: usize, deadline: u64) { TABLE.lock().sleep_until(pid, deadline);}
pub(crate) fn exit(pid: usize) { TABLE.lock().exit(pid);}
impl Process { fn new(tf: *mut TrapFrame, satp: usize, tf_va: usize) -> Self { Self { state: State::Ready, tf, satp, tf_va, } }
fn to_runnable(&self, pid: usize) -> Runnable { Runnable { pid, tf: self.tf, satp: self.satp, tf_va: self.tf_va, } }
fn sleeping_deadline(&self) -> Option<u64> { match self.state { State::Sleeping(deadline) => Some(deadline), _ => None, } }
fn wake_if_due(&mut self, now: u64) { if self .sleeping_deadline() .is_some_and(|deadline| now >= deadline) { self.state = State::Ready; } }}
struct ProcessTable { processes: Vec<Process>, cursor: usize,}
impl ProcessTable { const fn new() -> Self { Self { processes: Vec::new(), cursor: 0, } }
fn add(&mut self, tf: *mut TrapFrame, satp: usize, tf_va: usize) -> usize { let pid = self.processes.len(); self.processes.push(Process::new(tf, satp, tf_va)); pid }
fn wake_sleepers(&mut self, now: u64) { for proc in &mut self.processes { proc.wake_if_due(now); } }
fn pick_next(&mut self, hartid: usize) -> Option<Runnable> { let len = self.processes.len(); (0..len).find_map(|_| { let idx = self.cursor % len; self.cursor = (idx + 1) % len; let proc = &mut self.processes[idx]; if matches!(proc.state, State::Ready) { proc.state = State::Running(hartid); Some(proc.to_runnable(idx)) } else { None } }) }
fn next_sleep_deadline(&self) -> Option<u64> { self.processes .iter() .filter_map(Process::sleeping_deadline) .min() }
fn all_exited(&self) -> bool { !self.processes.is_empty() && self .processes .iter() .all(|proc| matches!(proc.state, State::Exited)) }
fn ready(&mut self, pid: usize) { self.processes[pid].state = State::Ready; }
fn sleep_until(&mut self, pid: usize, deadline: u64) { self.processes[pid].state = State::Sleeping(deadline); }
fn exit(&mut self, pid: usize) { self.processes[pid].state = State::Exited; }}Changes from Chapter 6:
Sleeping(u64)holds a deadline in timer ticks.wake_if_duetransitions toReadywhennow >= deadline.pick_nextnow takesnowand callswake_sleepersfirst.SleepUntil(u64)inNexttells the scheduler to sleep until the nearest deadline when no process is ready.next_sleep_deadlinescans for the earliest sleeping deadline.
Add SYS_SLEEP
Section titled “Add SYS_SLEEP”Update src/sched/syscall.rs:
use crate::trap::TrapFrame;use super::{process, timer};
const SYS_EXIT: usize = 1;const SYS_SLEEP: usize = 2;const SYS_YIELD: usize = 3;const SYS_PUTCHAR: usize = 100;
pub(super) fn handle(pid: usize, tf: &mut TrapFrame) { match tf.a7() { SYS_EXIT => { println!("pid {}: exited", pid); process::exit(pid); } SYS_SLEEP => { let ms = tf.a0() as u64; process::sleep_until( pid, timer::rdtime() + ms * timer::TICKS_PER_MS, ); } SYS_YIELD => { process::ready(pid); } SYS_PUTCHAR => { print!("{}", tf.a0() as u8 as char); process::ready(pid); } nr => { println!("pid {}: unknown syscall {nr}", pid); process::exit(pid); } }}SYS_SLEEP converts milliseconds to ticks and sets a deadline. The process is not runnable until the timer reaches that deadline. Each syscall sets the next state for pid (ready, sleeping, or exited); the scheduler decides which process to run next.
The Preemptive Scheduler
Section titled “The Preemptive Scheduler”Update src/sched/mod.rs:
pub mod process;mod syscall;pub mod timer;
use crate::trap::{self, TrapCause};use crate::utils::sbi;use process::{Next, Runnable};
const QUANTUM: u64 = 10 * timer::TICKS_PER_MS; // 10 ms
pub fn run(hartid: usize) -> ! { loop { match process::pick_next(timer::rdtime(), hartid) { Next::Runnable(proc) => { sbi::set_timer(timer::rdtime() + QUANTUM); let cause = enter_process(proc); handle_trap(proc, cause); } Next::SleepUntil(deadline) => sleep_until_timer(deadline), Next::Done => halt(hartid == crate::boot::boot_hartid()), Next::Empty => { sbi::set_timer(timer::rdtime() + QUANTUM); unsafe { core::arch::asm!("wfi") }; } } }}
fn enter_process(proc: Runnable) -> TrapCause { trap::enter_usermode(unsafe { &mut *proc.tf }, proc.satp, proc.tf_va)}
fn handle_trap(proc: Runnable, cause: TrapCause) { match cause { TrapCause::Syscall => syscall::handle(proc.pid, unsafe { &mut *proc.tf }), TrapCause::TimerInterrupt => { timer::handle_interrupt(); process::ready(proc.pid); } TrapCause::Exception { scause, stval, sepc, } => { println!( "pid {}: exception scause={scause:#x} stval={stval:#x} sepc={sepc:#x}", proc.pid ); process::exit(proc.pid); } }}
fn sleep_until_timer(deadline: u64) { disable_interrupts(); sbi::set_timer(deadline); unsafe { core::arch::asm!("wfi") }; enable_interrupts();}
fn halt(boot: bool) -> ! { sbi::set_timer(u64::MAX); if boot { println!("all processes exited"); } loop { disable_interrupts(); unsafe { core::arch::asm!("wfi") }; }}
fn disable_interrupts() { unsafe { core::arch::asm!("csrci sstatus, 0x2") };}
fn enable_interrupts() { unsafe { core::arch::asm!("csrsi sstatus, 0x2") };}Key additions over Chapter 6:
- Quantum timer:
sbi::set_timer(rdtime() + QUANTUM)before entering each process. If the process does not trap within 10 ms, the timer fires and the trampoline returnsTrapCause::TimerInterrupt. The process goes back toReadyand the scheduler picks the next one. SleepUntil: when no process is ready but some are sleeping, the hart programs the timer for the nearest deadline, executeswfi, and retries. Interrupts are disabled beforewfiso the timer handler does not consume the wakeup beforewfiactually sleeps.halt: only the boot hart prints “all processes exited.” All harts park with interrupts disabled.timer::handle_interrupt()replaces the inlineset_timer(u64::MAX).
Each iteration of the scheduler loop enters one process exactly once. When the process traps back, handle_trap decides its next state — ready, sleeping, or exited — and pick_next chooses what to run next. Putchar simply marks the process Ready again; if it is the only runnable one, pick_next will pick it on the next iteration, but multi-character prints from one process can interleave with another runnable process whenever a quantum expires.
Update the Kernel Trap Handler
Section titled “Update the Kernel Trap Handler”In src/trap/mod.rs, update handle_kernel_trap to dispatch through the timer module:
#[unsafe(no_mangle)]extern "C" fn handle_kernel_trap(sepc: usize, scause: usize, stval: usize) { if scause & INTERRUPT_BIT != 0 { match scause & !INTERRUPT_BIT { SUPERVISOR_TIMER_INTERRUPT => { crate::sched::timer::handle_interrupt() } cause => panic!( "unhandled kernel interrupt: cause={cause}, sepc={sepc:#x}" ), } } else { panic!( "kernel exception: scause={scause:#x}, stval={stval:#x}, sepc={sepc:#x}" ); }}Add SBI Hart Start
Section titled “Add SBI Hart Start”Update src/utils/sbi.rs — add the HSM (Hart State Management) extension:
#[repr(usize)]enum SbiExt { ConsolePutchar = 0x01, Hsm = 0x0048_534D, Timer = 0x5449_4D45,}
fn sbi_call(ext_id: SbiExt, fun_id: usize, args: [usize; 3]) -> (isize, usize) { let mut error: isize; let mut value: usize; unsafe { core::arch::asm!( "ecall", in("a7") ext_id as usize, in("a6") fun_id, inlateout("a0") args[0] => error, inlateout("a1") args[1] => value, in("a2") args[2], ) } (error, value)}
pub fn console_write(buf: &[u8]) { for &byte in buf { sbi_call(SbiExt::ConsolePutchar, 0, [byte as usize, 0, 0]); }}
pub fn set_timer(val: u64) { sbi_call(SbiExt::Timer, 0, [val as usize, 0, 0]);}
pub fn hart_start( hartid: usize, start_addr: usize, opaque: usize,) -> (isize, usize) { sbi_call(SbiExt::Hsm, 0, [hartid, start_addr, opaque])}hart_start asks OpenSBI to boot a secondary hart at start_addr with opaque passed in a1. The secondary hart starts in S-mode with paging disabled.
Start Secondary Harts
Section titled “Start Secondary Harts”Update src/boot.rs. The boot hart discovers all harts from the DTB, then starts each secondary hart with its own kernel stack and the kernel page table.
use crate::memory::alloc_frame;use crate::utils::symbols::*;use core::sync::atomic::{AtomicUsize, Ordering};use fdt::Fdt;
const MAX_HARTS: usize = 16;const INVALID_HARTID: usize = usize::MAX;const KSTACK_PAGES: usize = 4;const KSTACK_VA_BASE: usize = 0xFFFF_FFD0_0000_0000;
static BOOT_HARTID: AtomicUsize = AtomicUsize::new(0);static HART_COUNT: AtomicUsize = AtomicUsize::new(1);static HART_IDS: [AtomicUsize; MAX_HARTS] = [const { AtomicUsize::new(INVALID_HARTID) }; MAX_HARTS];
pub fn boot_hartid() -> usize { BOOT_HARTID.load(Ordering::Relaxed)}
pub fn init(hartid: usize, dtb_ptr: usize) -> usize { BOOT_HARTID.store(hartid, Ordering::Relaxed); println!("jikei: booted under OpenSBI"); print_kernel_info();
let fdt = unsafe { Fdt::from_ptr(dtb_ptr as *const u8) } .unwrap_or_else(|e| panic!("failed to parse DTB: {e}"));
let mut cpu_count = 0; for cpu in fdt.cpus() { if cpu_count == MAX_HARTS { panic!("too many harts in DTB"); } HART_IDS[cpu_count].store(cpu.ids().first(), Ordering::Release); cpu_count += 1; } HART_COUNT.store(cpu_count, Ordering::Release); println!("CPUs: {}", cpu_count);
crate::memory::init(kernel_end());
{ let mut mem = crate::memory::lock();
fdt.memory() .regions() .filter_map(|r| { r.size .filter(|&s| s > 0) .map(|s| (r.starting_address as usize, s)) }) .for_each(|(start, size)| { println!( "RAM: {:#x} - {:#x} ({} MB)", start, start + size, size >> 20 ); mem.add_region(start, start + size); });
let alloc_end = mem.end_ptr(); let dtb_end = dtb_ptr + fdt.total_size(); mem.init(&[(text_start(), alloc_end), (dtb_ptr, dtb_end)]);
println!( "Memory: {} free frames ({} MB), {} total", mem.free_count, (mem.free_count * 4096) >> 20, mem.num_frames, ); }
crate::memory::freeze_regions(); crate::paging::init(); crate::memory::heap::init(); alloc_kernel_stack(hart_index(hartid))}
// ── Secondary hart startup ──────────────────────────────────────
/// Layout must match boot.S: stack_top at offset 0, satp at offset 8.#[repr(C)]struct HartBootInfo { stack_top: usize, satp: usize,}
unsafe extern "C" { fn _secondary_entry();}
pub fn start_secondary_harts() { let count = HART_COUNT.load(Ordering::Acquire); let boot = boot_hartid(); let satp = crate::paging::kernel_satp();
for hi in 0..count { let hartid = HART_IDS[hi].load(Ordering::Acquire); if hartid == boot { continue; }
let stack_top = alloc_kernel_stack(hi);
let info_pa = alloc_frame().expect("oom for hart boot info"); let info = unsafe { &mut *(info_pa as *mut HartBootInfo) }; info.stack_top = stack_top; info.satp = satp;
let (err, _) = crate::utils::sbi::hart_start( hartid, _secondary_entry as *const () as usize, info_pa, ); if err != 0 { println!("failed to start hart {}: error {}", hartid, err); } }}
#[unsafe(no_mangle)]pub extern "C" fn secondary_hart_main(hartid: usize) -> ! { crate::trap::init_hart(); println!("hart {} online", hartid); crate::sched::run(hartid)}
// ── Helpers ─────────────────────────────────────────────────────
fn alloc_kernel_stack(hart_index: usize) -> usize { crate::paging::with_kernel_pt_mut(|pt| { crate::memory::map_stack(pt, KSTACK_VA_BASE, hart_index, KSTACK_PAGES, 0) })}
fn hart_index(hartid: usize) -> usize { let count = HART_COUNT.load(Ordering::Acquire); (0..count) .find(|&idx| HART_IDS[idx].load(Ordering::Acquire) == hartid) .expect("boot hart not found in DTB")}
fn print_kernel_info() { let kb = |a: usize, b: usize| (b - a) / 1024;
println!( "Kernel: {:#x} - {:#x} ({} KB)", text_start(), bss_end(), kb(text_start(), bss_end()), ); println!( " .text: {} KB, .rodata: {} KB, .data: {} KB, .bss: {} KB", kb(text_start(), text_end()), kb(rodata_start(), rodata_end()), kb(rodata_end(), data_end()), kb(data_end(), bss_end()), );}The boot hart discovers all hart IDs from the DTB and stores them in HART_IDS. start_secondary_harts iterates them, skipping the boot hart, and for each:
- Allocates a kernel stack at a unique slot in
KSTACK_VA_BASE. - Allocates a
HartBootInfostruct in a physical frame with the stack top and kernelsatp. - Calls
sbi::hart_startwith the entry address_secondary_entryand the info pointer.
The secondary hart boots into _secondary_entry, enables paging, and calls secondary_hart_main which sets up traps and enters the scheduler loop.
Secondary Hart Entry
Section titled “Secondary Hart Entry”Add to boot.S, after the _switch_to_stack section:
# secondary hart entry point (called by SBI HSM hart_start)# a0 = hartid, a1 = ptr to HartBootInfo.globl _secondary_entry_secondary_entry: mv tp, a0 # tp = hartid ld sp, 0(a1) # HartBootInfo.stack_top ld t0, 8(a1) # HartBootInfo.satp csrw satp, t0 # enable paging sfence.vma # flush TLB call secondary_hart_main # a0 still has hartid
1: wfi j 1bThe secondary hart arrives with paging disabled. It loads the kernel satp from HartBootInfo, enables paging, and calls into Rust. After that, it runs the same scheduler loop as the boot hart.
Update the Demo
Section titled “Update the Demo”Update src/demo.rs — replace yield with sleep:
use crate::sched::process;
pub fn spawn() { process::spawn_with_args(prog_bytes(_prog_a_start, _prog_a_end), [0; 4]); process::spawn_with_args(prog_bytes(_prog_b_start, _prog_b_end), [0; 4]);}
fn prog_bytes( start: unsafe extern "C" fn() -> u8, end: unsafe extern "C" fn() -> u8,) -> &'static [u8] { let s = start as usize; let len = end as usize - s; unsafe { core::slice::from_raw_parts(s as *const u8, len) }}
unsafe extern "C" { fn _prog_a_start() -> u8; fn _prog_a_end() -> u8; fn _prog_b_start() -> u8; fn _prog_b_end() -> u8;}
core::arch::global_asm!( r#" .pushsection .rodata.user_prog, "a"
.globl _prog_a_start _prog_a_start: li a7, 100 li a0, 'A' ecall li a7, 100 li a0, '1' ecall li a7, 100 li a0, '\n' ecall li a7, 2 li a0, 100 ecall li a7, 100 li a0, 'A' ecall li a7, 100 li a0, '2' ecall li a7, 100 li a0, '\n' ecall li a7, 1 ecall .globl _prog_a_end _prog_a_end:
.globl _prog_b_start _prog_b_start: li a7, 100 li a0, 'B' ecall li a7, 100 li a0, '1' ecall li a7, 100 li a0, '\n' ecall li a7, 2 li a0, 200 ecall li a7, 100 li a0, 'B' ecall li a7, 100 li a0, '2' ecall li a7, 100 li a0, '\n' ecall li a7, 1 ecall .globl _prog_b_end _prog_b_end:
.popsection "#);Program A: print “A1\n”, sleep 100 ms, print “A2\n”, exit. Program B: print “B1\n”, sleep 200 ms, print “B2\n”, exit.
Update main.rs
Section titled “Update main.rs”#![no_std]#![no_main]
extern crate alloc;
#[macro_use]mod utils;mod boot;mod demo;mod memory;mod paging;mod sched;mod trap;
core::arch::global_asm!(include_str!("../boot.S"));
#[unsafe(no_mangle)]pub extern "C" fn kernel_main(hartid: usize, dtb_ptr: usize) -> ! { let stack_top = boot::init(hartid, dtb_ptr); unsafe { _switch_to_stack(stack_top, kernel_main_on_stack, hartid) }}
unsafe extern "C" { fn _switch_to_stack( stack_top: usize, entry: extern "C" fn(usize) -> !, arg0: usize, ) -> !;}
extern "C" fn kernel_main_on_stack(hartid: usize) -> ! { trap::init_hart();
demo::spawn(); boot::start_secondary_harts(); sched::run(hartid)}The only change from Chapter 6: boot::start_secondary_harts() is called after spawning the demo processes. Secondary harts join the scheduler immediately.
Run It
Section titled “Run It”cargo run --releasejikei: booted under OpenSBIKernel: 0x80200000 - ... ...CPUs: 2RAM: 0x80000000 - 0x90000000 (256 MB)Memory: ... free frames (... MB), ... totalPaging enabledHeap: 0xffffffc000000000 (1024 KiB)spawned pid 0spawned pid 1hart 1 onlineA1B1A2pid 0: exitedB2pid 1: exitedall processes exitedWith two harts, both processes can run truly in parallel. The sequence:
- Both processes are spawned. Hart 1 comes online.
- One hart runs A, another runs B. Both print their first line.
- A sleeps 100 ms, B sleeps 200 ms. Both harts idle.
- A wakes first, prints “A2”, exits.
- B wakes, prints “B2”, exits.
- Boot hart prints “all processes exited.”
If you change -smp 2 back to -smp 1, the same output appears — the scheduler handles both processes on a single hart through preemption and sleep.
Checkpoint
Section titled “Checkpoint”OpenSBI -> boot::init -> discover 2 harts from DTB -> memory, paging, heap -> alloc boot hart kernel stack -> _switch_to_stack -> kernel_main_on_stack -> trap::init_hart -> demo::spawn (pid 0, pid 1) -> start_secondary_harts -> alloc stack, sbi::hart_start -> secondary_hart_main -> init_hart -> sched::run -> sched::run (boot hart) -> pick_next -> set quantum -> enter_usermode -> timer preemption or syscall -> handle -> repeat -> all exited -> haltThe kernel now has preemptive scheduling, sleep, and SMP. A misbehaving process that loops without syscalls gets preempted after 10 ms. Multiple harts run the same scheduler loop with a shared process table protected by a spin mutex.
What Comes Next
Section titled “What Comes Next”Processes can print characters but cannot talk to each other. Chapter 8 adds IPC: rendezvous endpoints where one process sends a message and another receives it. This is the foundation for building services on top of the microkernel.