4. Sv39 Paging
Chapter 3 gave the kernel physical frames. This chapter builds Sv39 page tables from those frames: an identity map for the kernel with W^X permissions, a trampoline page for future mode transitions, and the helpers that later chapters use to create process address spaces.
After paging is enabled, every memory access goes through the page table. An identity map (virtual address equals physical address) lets existing kernel code continue running at the same addresses.
jikei: booted under OpenSBIKernel: 0x80200000 - ... .text: ... .rodata: ... .data: ... .bss: ...CPUs: 1RAM: 0x80000000 - 0x90000000 (256 MB)Memory: ... free frames (... MB), ... totalPaging enabled“Paging enabled” is the new line. Everything before it is unchanged from Chapter 3.
What Changes
Section titled “What Changes”| File | Status |
|---|---|
link.ld | add .trampoline section, add _data_start |
src/paging/mod.rs | new — page tables, identity map, activation |
src/utils/symbols.rs | add data_start, trampoline_start |
src/memory/mod.rs | add freeze_regions, regions |
src/boot.rs | add freeze_regions + paging::init calls |
src/main.rs | add mod paging |
Sv39 Basics
Section titled “Sv39 Basics”RISC-V Sv39 uses 39-bit virtual addresses and three levels of page tables. Each level indexes 9 bits of the virtual address:
63 39 38 30 29 21 20 12 11 0[ sign ext ][ VPN[2] ][ VPN[1] ][ VPN[0] ][ offset ] 25 bits 9 bits 9 bits 9 bits 12 bitsEach page table is one 4 KiB page containing 512 entries (8 bytes each). The MMU walks the tree: VPN[2] indexes into the root table, VPN[1] into the next, VPN[0] into the leaf. A leaf page table entry (PTE) can appear at any level:
- Level 0: 4 KiB page
- Level 1: 2 MiB megapage
- Level 2: 1 GiB gigapage
A PTE encodes a physical page number and permission flags:
63 54 53 10 9 8 7 6 5 4 3 2 1 0[reserved][ PPN ][ RSW ][ D A G U X W R V ] 10 bits 44 bits 2 bits 8 bitsThe flags the kernel uses:
- V — valid: the MMU ignores entries without this
- R, W, X — read, write, execute permissions
- U — user-accessible: without it, only supervisor mode can touch the page
W^X (write xor execute): the kernel never maps a page as both writable and executable. If an attacker can write to a page, they cannot execute it.
The Paging Module
Section titled “The Paging Module”Create src/paging/mod.rs. The sections below build the file from top to bottom.
Constants and the PTE Type
Section titled “Constants and the PTE Type”use crate::memory::{PAGE_SIZE, alloc_frame};use core::ptr;use core::sync::atomic::{AtomicUsize, Ordering};use spin::Mutex;
const PAGE_TABLE_ENTRIES: usize = 512;
/// Top of Sv39 user canonical range, page-aligned.pub const TRAMPOLINE: usize = 0x3F_FFFF_F000;
const PTE_V: usize = 1 << 0;pub const PTE_R: usize = 1 << 1;pub const PTE_W: usize = 1 << 2;pub const PTE_X: usize = 1 << 3;pub const PTE_U: usize = 1 << 4;
#[derive(Clone, Copy)]#[repr(transparent)]struct PageTableEntry(usize);
impl PageTableEntry { fn from_addr(pa: usize, flags: usize) -> Self { PageTableEntry(((pa >> 12) << 10) | flags) }
fn is_valid(&self) -> bool { self.0 & PTE_V != 0 }
fn is_leaf(&self) -> bool { self.0 & (PTE_R | PTE_W | PTE_X) != 0 }
fn as_page_table(&self) -> *mut PageTable { ((self.0 >> 10) << 12) as *mut PageTable }}
fn pt_index(va: usize, level: usize) -> usize { (va >> (12 + level * 9)) & 0x1FF}
fn page_size(level: usize) -> usize { PAGE_SIZE << (level * 9) // 4K, 2M, 1G}TRAMPOLINE is the last page of the Sv39 user canonical range. It holds code that runs during page table switches — Chapter 5 fills it.
from_addr converts a physical address into a PTE: shift right 12 to get the physical page number, shift left 10 to position it in the PTE, then OR in the flags. as_page_table reverses this to extract the physical address of a next-level table.
pt_index extracts the 9-bit index for a given level from a virtual address. page_size returns the mapping granularity at each level.
The PageTable Struct
Section titled “The PageTable Struct”#[repr(C, align(4096))]pub struct PageTable { entries: [PageTableEntry; PAGE_TABLE_ENTRIES],}
impl PageTable { pub fn alloc() -> &'static mut PageTable { let frame = alloc_frame().expect("out of memory for page table"); unsafe { ptr::write_bytes(frame as *mut u8, 0, PAGE_SIZE); &mut *(frame as *mut PageTable) } }
pub fn satp(&self) -> usize { (8 << 60) | ((self as *const _ as usize) >> 12) }alloc grabs a frame from the buddy allocator and zeros it. All 512 entries start as zero (invalid), so the table is empty.
satp builds the value for the satp CSR: mode 8 (Sv39) in bits 63–60, and the root table’s physical page number in bits 43–0.
Walking and Mapping
Section titled “Walking and Mapping” fn ensure_table(&mut self, idx: usize) -> &mut PageTable { let entry = &mut self.entries[idx]; if entry.is_valid() { assert!(!entry.is_leaf(), "cannot split existing superpage mapping"); } else { let frame = alloc_frame().expect("out of memory"); unsafe { ptr::write_bytes(frame as *mut u8, 0, PAGE_SIZE) }; *entry = PageTableEntry::from_addr(frame, PTE_V); } unsafe { &mut *self.entries[idx].as_page_table() } }
/// Place a leaf PTE at the given level (0 = 4K, 1 = 2M mega, 2 = 1G giga) pub fn map_at(&mut self, va: usize, pa: usize, flags: usize, level: usize) { debug_assert!(level <= 2); debug_assert!( va.is_multiple_of(page_size(level)) && pa.is_multiple_of(page_size(level)) );
let mut table = &mut *self; for l in (level + 1..=2).rev() { table = table.ensure_table(pt_index(va, l)); } table.entries[pt_index(va, level)] = PageTableEntry::from_addr(pa, flags | PTE_V); }
pub fn map_range(&mut self, va: usize, pa: usize, size: usize, flags: usize) { debug_assert!( va.is_multiple_of(PAGE_SIZE) && pa.is_multiple_of(PAGE_SIZE) && size.is_multiple_of(PAGE_SIZE) ); let mut va = va; let mut pa = pa; let end = va + size;
while va < end { let remaining = end - va; let level = (0..=2) .rev() .find(|&l| { let ps = page_size(l); remaining >= ps && va.is_multiple_of(ps) && pa.is_multiple_of(ps) }) .unwrap();
self.map_at(va, pa, flags, level); let step = page_size(level); va += step; pa += step; } }ensure_table returns a mutable reference to the next-level table at index idx, allocating one if it does not exist.
map_at walks from the root to the target level, creating intermediate tables as needed, then installs the leaf PTE.
map_range maps a contiguous region, automatically choosing the largest page size that fits. A 256 MB RAM region uses 2 MiB megapages where alignment permits, falling back to 4 KiB pages at section boundaries. This reduces TLB pressure significantly.
Activation and Trampoline
Section titled “Activation and Trampoline” pub fn activate(&self) { unsafe { core::arch::asm!("csrw satp, {}", "sfence.vma", in(reg) self.satp()); } }
/// Map the trampoline page at the canonical VA. /// Must be identical in every page table (kernel and per-process). pub fn map_trampoline(&mut self) { self.map_at( TRAMPOLINE, crate::utils::symbols::trampoline_start(), PTE_R | PTE_X, 0, ); }}activate writes satp and flushes the TLB with sfence.vma. After this instruction, every memory access is translated through this page table.
map_trampoline maps the physical trampoline page at TRAMPOLINE (0x3F_FFFF_F000). The trampoline section is empty for now — Chapter 5 fills it with the code that transitions between kernel and user page tables. Every page table (kernel and per-process) maps this same physical page at the same virtual address, so the CPU can keep fetching instructions across a satp switch.
Kernel Page Table Singleton
Section titled “Kernel Page Table Singleton”static KERNEL_PT_PTR: AtomicUsize = AtomicUsize::new(0);static KERNEL_PT_LOCK: Mutex<()> = Mutex::new(());
pub fn init() { let pt = PageTable::alloc(); KERNEL_PT_PTR.store(pt as *mut PageTable as usize, Ordering::Release); map_kernel_identity(pt); pt.activate(); println!("Paging enabled");}
/// Returns the kernel page table's satp value (lock-free).pub fn kernel_satp() -> usize { let ptr = KERNEL_PT_PTR.load(Ordering::Acquire); assert!(ptr != 0, "kernel page table not initialized"); let pt = unsafe { &*(ptr as *const PageTable) }; pt.satp()}
/// Locked mutable access to the kernel page table.pub fn with_kernel_pt_mut<R>(f: impl FnOnce(&mut PageTable) -> R) -> R { let _lock = KERNEL_PT_LOCK.lock(); let ptr = KERNEL_PT_PTR.load(Ordering::Acquire) as *mut PageTable; assert!(!ptr.is_null(), "kernel page table not initialized"); f(unsafe { &mut *ptr })}init allocates a root page table, applies the kernel identity map, and activates it. The root table pointer is stored in an AtomicUsize so kernel_satp() can read it without locking — later chapters need the satp value on every trap entry.
with_kernel_pt_mut provides locked mutable access for adding mappings after init (kernel stacks, heap pages).
The Identity Map
Section titled “The Identity Map”const UART_BASE: usize = 0x1000_0000;
/// Identity-map `[start, end)` as `start == va == pa`, skipping empty ranges.fn id_map(pt: &mut PageTable, start: usize, end: usize, flags: usize) { if end > start { pt.map_range(start, start, end - start, flags); }}
/// Apply kernel identity mappings: RAM with W^X, UART, trampoline.fn map_kernel_identity(pt: &mut PageTable) { use crate::utils::symbols::*;
for r in crate::memory::regions() { let (start, size) = (r.start, r.size); let end = start + size; id_map(pt, start, text_start().min(end), PTE_R | PTE_W); id_map( pt, text_start().max(start), rodata_start().min(end), PTE_R | PTE_X, ); id_map(pt, rodata_start().max(start), data_start().min(end), PTE_R); id_map(pt, data_start().max(start), end, PTE_R | PTE_W); }
pt.map_range(UART_BASE, UART_BASE, PAGE_SIZE, PTE_R | PTE_W); pt.map_trampoline();}map_kernel_identity walks each RAM region and splits it at kernel section boundaries to apply W^X:
| Range | Flags | Contents |
|---|---|---|
RAM start → .text | R+W | OpenSBI firmware (reserved by the frame allocator from FDT memreserve / /reserved-memory) |
.text → .rodata | R+X | kernel code, trampoline section |
.rodata → .data | R | constants, string literals |
.data → RAM end | R+W | variables, BSS, frame allocator metadata, free frames |
The frame allocator places its bookkeeping (BuddyAllocator header, region array, per-frame array) immediately after _kernel_end, inside the trailing .data → RAM end region — not below .text. The pre-.text window belongs to OpenSBI. Identity-mapping it lets supervisor code reach SBI entry points, but the allocator must keep its hands off; that’s what the FDT-derived reservation list is for.
No page is both writable and executable.
The UART at 0x1000_0000 gets its own mapping. After paging is enabled, all MMIO must go through the page table.
The min/max clamps handle the case where a section boundary falls outside a region — id_map skips empty ranges.
Update the Linker Script
Section titled “Update the Linker Script”Add the .trampoline section between .text and .rodata, and the _data_start symbol. Update link.ld:
ENTRY(_start)
SECTIONS { . = 0x80200000; /* OpenSBI loads kernel here */
.text : { _text_start = .; KEEP(*(.text.init)) *(.text .text.*) _text_end = .; }
. = ALIGN(4096); .trampoline : { _trampoline_start = .; *(.trampoline) . = _trampoline_start + 4096; /* pad to full page */ }
. = ALIGN(4096); .rodata : { _rodata_start = .; *(.srodata .srodata.*) *(.rodata .rodata.*) _rodata_end = .; }
. = ALIGN(4096); .data : { _data_start = .; *(.sdata .sdata.*) *(.data .data.*) _data_end = .; }
.bss : { . = ALIGN(8); _bss_start = .; *(.sbss .sbss.*) *(.bss .bss.*) . = ALIGN(8); _bss_end = .; }
. = ALIGN(4096); _kernel_end = .;}The .trampoline section is page-aligned and padded to exactly 4 KiB. Even though no assembly is placed here yet, the section reserves one full page. The identity map covers it as R+X (between .text and .rodata). Chapter 5 fills it with trap transition code.
_data_start is needed for the W^X split — the identity map uses it to separate read-only .rodata from read-write .data.
Update Linker Symbols
Section titled “Update Linker Symbols”Add data_start and trampoline_start to src/utils/symbols.rs:
unsafe extern "C" { static _text_start: u8; static _text_end: u8; static _rodata_start: u8; static _rodata_end: u8; static _data_start: u8; static _data_end: u8; static _bss_end: u8; static _kernel_end: u8; static _trampoline_start: u8;}
macro_rules! sym { ($name:ident, $sym:ident) => { pub fn $name() -> usize { (&raw const $sym) as usize } };}
sym!(text_start, _text_start);sym!(text_end, _text_end);sym!(rodata_start, _rodata_start);sym!(rodata_end, _rodata_end);sym!(data_start, _data_start);sym!(data_end, _data_end);sym!(bss_end, _bss_end);sym!(kernel_end, _kernel_end);sym!(trampoline_start, _trampoline_start);Two new entries: data_start for the identity map’s W^X split, and trampoline_start for the trampoline mapping.
Update the Memory Module
Section titled “Update the Memory Module”The paging code needs the region list without holding the allocator lock (since it allocates page table frames during the mapping walk). Add freeze_regions and regions to src/memory/mod.rs:
pub mod frame_allocator;
use frame_allocator::BuddyAllocator;use spin::{Mutex, Once};
pub use frame_allocator::{PAGE_SIZE, Region};
static MEMORY: Once<Mutex<&'static mut BuddyAllocator>> = Once::new();static REGIONS: Once<&'static [Region]> = Once::new();
pub fn init(addr: usize) { let ptr = addr as *mut BuddyAllocator; unsafe { ptr.write(BuddyAllocator::new()); MEMORY.call_once(|| Mutex::new(&mut *ptr)); }}
pub fn lock() -> spin::MutexGuard<'static, &'static mut BuddyAllocator> { MEMORY.get().expect("memory not initialized").lock()}
pub fn alloc_frame() -> Option<usize> { lock().alloc(0)}
/// Snapshot the region slice so `regions()` can be called without locking./// Must be called after all regions are added and `init` is complete.pub fn freeze_regions() { let mem = lock(); let ptr = mem.regions_ptr() as *const Region; let len = mem.regions().len(); REGIONS.call_once(|| unsafe { core::slice::from_raw_parts(ptr, len) });}
/// Returns the memory regions. Safe to call after `freeze_regions`.pub fn regions() -> &'static [Region] { REGIONS.get().expect("regions not frozen")}freeze_regions takes a snapshot of the region slice while holding the lock, then stores it in a Once for lock-free reads. regions() returns that snapshot.
This matters because map_kernel_identity iterates regions while also calling alloc_frame (through ensure_table and map_range). If it held the memory lock for the region read, the allocation would deadlock.
The Region type is re-exported from frame_allocator so the paging module can access r.start and r.size.
Update Boot
Section titled “Update Boot”Add two lines to boot::init in src/boot.rs, after the memory stats block:
crate::memory::freeze_regions(); crate::paging::init();The full init sequence is now:
boot::init -> print kernel info -> parse DTB (CPUs, RAM) -> memory::init (place allocator) -> add regions, reserve kernel/DTB/firmware ranges, init frames, print stats -> memory::freeze_regions <-- new -> paging::init <-- new -> parkUpdate the Entry Point
Section titled “Update the Entry Point”Add mod paging to src/main.rs:
#![no_std]#![no_main]
#[macro_use]mod utils;mod boot;mod memory;mod paging;
core::arch::global_asm!(include_str!("../boot.S"));
#[unsafe(no_mangle)]pub extern "C" fn kernel_main(hartid: usize, dtb_ptr: usize) -> ! { boot::init(hartid, dtb_ptr);
loop { unsafe { core::arch::asm!("wfi") } }}Run It
Section titled “Run It”cargo run --releaseAfter the memory stats from Chapter 3, you should see:
Paging enabledThe kernel continues to run — println! works, the loop parks normally. This proves the identity map is correct: every address the kernel was using before paging is still valid after.
If the kernel hangs or faults after “Memory: …”, the identity map is wrong. Common causes:
- Missing
_data_startin the linker script (the W^X split uses it) .trampolinesection not page-alignedfreeze_regionsnot called beforepaging::init
Checkpoint
Section titled “Checkpoint”At this checkpoint:
OpenSBI -> _start -> boot::init -> parse device tree -> discover RAM -> init buddy allocator -> reserve kernel + DTB + firmware/reserved-memory ranges -> freeze regions -> build kernel page table (identity + W^X + UART + trampoline) -> write satp, sfence.vma -> park (paging active)The kernel has a working Sv39 page table. Every RAM byte is identity-mapped with appropriate permissions. The UART is accessible. A trampoline page is reserved at a fixed virtual address for future trap transitions.
Page tables are now available as a building block. PageTable::alloc() creates new tables, map_at and map_range fill them, and activate installs them.
What Comes Next
Section titled “What Comes Next”The page table infrastructure is in place, but there is no way to enter user mode or handle interrupts. Chapter 5 builds the trap system: the stvec handler, the trampoline code that runs on the reserved page, and the syscall interface that lets user processes talk to the kernel.