All Articles

Reading xv6OS Thoroughly to Fully Understand the Kernel - Local APIC Edition -

This page has been machine-translated from the original page.

Inspired by An Introduction to OS Code Reading: Learning Kernel Internals with UNIX V6, I’m reading xv6 OS.

Because UNIX V6 itself does not run on x86 CPUs, I decided to read the source of kash1064/xv6-public: xv6 OS, a fork of the xv6 OS repository that makes UNIX V6 run on the x86 architecture.

In the previous article, I looked at how the mpinit function retrieves CPU information in a multiprocessor configuration.

This time, I will trace the behavior of the lapicinit function.

Table of Contents

The lapicinit function

This time I will start from lapicinit, the first function called in main.

This function initializes the interrupt controller.

lapicinit();     // interrupt controller

The lapicinit function is defined in lapic.c as follows.

void lapicinit(void)
{
  if(!lapic) return;

  // Enable local APIC; set spurious interrupt vector.
  lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));

  // The timer repeatedly counts down at bus frequency
  // from lapic[TICR] and then issues an interrupt.
  // If xv6 cared more about precise timekeeping,
  // TICR would be calibrated using an external time source.
  lapicw(TDCR, X1);
  lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER));
  lapicw(TICR, 10000000);

  // Disable logical interrupt lines.
  lapicw(LINT0, MASKED);
  lapicw(LINT1, MASKED);

  // Disable performance counter overflow interrupts
  // on machines that provide that interrupt entry.
  if(((lapic[VER]>>16) & 0xFF) >= 4) lapicw(PCINT, MASKED);

  // Map error interrupt to IRQ_ERROR.
  lapicw(ERROR, T_IRQ0 + IRQ_ERROR);

  // Clear error status register (requires back-to-back writes).
  lapicw(ESR, 0);
  lapicw(ESR, 0);

  // Ack any outstanding interrupts.
  lapicw(EOI, 0);

  // Send an Init Level De-Assert to synchronise arbitration ID's.
  lapicw(ICRHI, 0);
  lapicw(ICRLO, BCAST | INIT | LEVEL);
  while(lapic[ICRLO] & DELIVS)
    ;

  // Enable interrupts on the APIC (but not on the processor).
  lapicw(TPR, 0);
}

The line if(!lapic) return; checks whether the global variable lapic holds a value.

Local APIC registers

The variable lapic had the address from lapicaddr inside the MP Floating Pointer Structure stored into it during the MP table retrieval covered in the previous article.

Let’s confirm the actual value stored in this global variable using a debugger.

$ b *0x801027a0
$ continue

After inspecting the contents of lapic, the address 0xfee00000 was stored there.

$ info variables lapic
File lapic.c:
44:volatile uint *lapic;

$ p lapic
$1 = (volatile uint *) 0xfee00000

This lapic is the memory-mapped Local APIC register.

The Local APIC register is 32-bit data that is memory-mapped at the address pointed to by the MP Configuration Table.

Each 32-bit element at an offset aligned to a 16-byte boundary is set as a Local APIC register.

The following page is helpful for further details.

Reference: APIC - OSDev Wiki

From here, we will configure the Local APIC registers.

The following values defined in lapic.c are used for this purpose.

// Local APIC registers, divided by 4 for use as uint[] indices.
#define ID      (0x0020/4)   // ID
#define VER     (0x0030/4)   // Version
#define TPR     (0x0080/4)   // Task Priority
#define EOI     (0x00B0/4)   // EOI
#define SVR     (0x00F0/4)   // Spurious Interrupt Vector
  #define ENABLE     0x00000100   // Unit Enable
#define ESR     (0x0280/4)   // Error Status
#define ICRLO   (0x0300/4)   // Interrupt Command
  #define INIT       0x00000500   // INIT/RESET
  #define STARTUP    0x00000600   // Startup IPI
  #define DELIVS     0x00001000   // Delivery status
  #define ASSERT     0x00004000   // Assert interrupt (vs deassert)
  #define DEASSERT   0x00000000
  #define LEVEL      0x00008000   // Level triggered
  #define BCAST      0x00080000   // Send to all APICs, including self.
  #define BUSY       0x00001000
  #define FIXED      0x00000000
#define ICRHI   (0x0310/4)   // Interrupt Command [63:32]
#define TIMER   (0x0320/4)   // Local Vector Table 0 (TIMER)
  #define X1         0x0000000B   // divide counts by 1
  #define PERIODIC   0x00020000   // Periodic
#define PCINT   (0x0340/4)   // Performance Counter LVT
#define LINT0   (0x0350/4)   // Local Vector Table 1 (LINT0)
#define LINT1   (0x0360/4)   // Local Vector Table 2 (LINT1)
#define ERROR   (0x0370/4)   // Local Vector Table 3 (ERROR)
  #define MASKED     0x00010000   // Interrupt masked
#define TICR    (0x0380/4)   // Timer Initial Count
#define TCCR    (0x0390/4)   // Timer Current Count
#define TDCR    (0x03E0/4)   // Timer Divide Configuration

Setting the Spurious Interrupt Vector and enabling the Local APIC

Moving on.

This line uses the lapicw function to set the Spurious Interrupt Vector and enable the Local APIC.

// Enable local APIC; set spurious interrupt vector.
lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));

The lapicw function, used frequently from here on, is defined as follows.

static void
lapicw(int index, int value)
{
  lapic[index] = value;
  lapic[ID];  // wait for write to finish, by reading
}

It takes index and value as arguments and overwrites the corresponding lapic value.

ID in lapic[ID] is defined as (0x0020/4) in lapic.c.

lapic[ID]; does not change any settings; its purpose is to wait for the preceding lapic write to complete by reading this value.

Let’s look at the line lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));.

The index is SVR.

This refers to the offset (0x00F0/4) of the Spurious Interrupt Vector Register.

As described in the following article, setting bit 8 (0x100) of the Spurious Interrupt Vector Register enables the APIC.

Reference: APIC - OSDev Wiki

Also, in order for the Local APIC to be able to receive interrupts, the Spurious Interrupt Vector must be set.

The lower 8 bits of the Spurious Interrupt Vector Register are mapped to the IRQ number of the Spurious Interrupt Vector.

Therefore, 0x13f, the OR of 0x100 and 0x3f, is set in the Spurious Interrupt Vector Register.

This lower value 0x3f comes from the expression T_IRQ0 + IRQ_SPURIOUS.

The values such as T_IRQ0 are defined in traps.h.

// These are arbitrarily chosen, but with care not to overlap
// processor defined exceptions or interrupt vectors.
#define T_SYSCALL       64      // system call
#define T_DEFAULT      500      // catchall

#define T_IRQ0          32      // IRQ 0 corresponds to int T_IRQ

#define IRQ_TIMER        0
#define IRQ_KBD          1
#define IRQ_COM1         4
#define IRQ_IDE         14
#define IRQ_ERROR       19
#define IRQ_SPURIOUS    31

According to the OSDev Wiki, the simplest value to set for the Spurious Interrupt Vector is 0xff, but xv6OS uses 0x1f.

(I don’t really understand the reason for this…)

Timer configuration

Next are the following lines.

// The timer repeatedly counts down at bus frequency
// from lapic[TICR] and then issues an interrupt.
// If xv6 cared more about precise timekeeping,
// TICR would be calibrated using an external time source.
lapicw(TDCR, X1);
lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER));
lapicw(TICR, 10000000);

TDCR is (0x03E0/4), which is the offset of the Divide Configuration Register (for Timer).

This setting is used when configuring the Local APIC timer period.

The Local APIC timer is a timer built into each processor’s Local APIC, and it generates interrupts only for that processor.

The Local APIC timer starts from a value set by the kernel as the Timer Initial Count, decrements at a fixed interval, and fires an interrupt when it reaches 0.

The decrement speed depends on the CPU’s bus frequency divided by the Timer Divide Configuration value.

The Local APIC timer supports two modes: Periodic mode and One-shot mode.

In Periodic mode, when the count reaches 0, it automatically resets to the initial value and begins counting down again.

In One-shot mode, when the count reaches 0 and fires an interrupt, the count stays at 0 until the program explicitly sets a new initial value.

Reference: APIC timer - OSDev Wiki

Reference: Timer interrupt selected as Local APIC timer interrupt - ZDNet Japan

In xv6OS, the Timer Divide Configuration value is set to 0x0000000B.

2022/02/image.png

Reference image: Intel SDM vol3

Also, the Timer Initial Count is set to 10000000 by lapicw(TICR, 10000000);.

Incidentally, the following line configures Local Vector Table 0 (TIMER).

lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER));

The Local Vector Table (LVT) itself is a table in the Local APIC that maps events to interrupt vectors.

Through the LVT, software can specify how local interrupts are sent to the CPU.

2022/02/image-1.png

Reference image: Intel SDM vol3

The LVT consists of the following 32-bit registers. Details will be examined when they are actually used.

  • LVT Timer Register (FEE0 0320H)
  • LVT Thermal Monitor Register (FEE0 0330H)
  • LVT Performance Counter Register (FEE0 0340H)
  • LVT LINT0 Register (FEE0 0350H)
  • LVT LINT1 Register (FEE0 0360H)
  • LVT Error Register (FEE0 0370H)

Here, TIMER specifies the LVT Timer Register at (0x0320/4).

The LVT Timer Register specifies the interrupt to raise when the APIC timer fires.

Here, PERIODIC sets bit 18 (Timer Mode) of the LVT to 1, switching the timer to Periodic mode.

Also, T_IRQ0 + IRQ_TIMER sets the interrupt vector; the value being set appears to be 0x20. (I couldn’t find a direct source for this, but it probably refers to a timer interrupt like INT 0x20 — I’m not sure.)

Disable logical interrupt lines

Moving on.

0x00010000 is set at (0x0350/4) and (0x0360/4) respectively.

// Disable logical interrupt lines.
lapicw(LINT0, MASKED);
lapicw(LINT1, MASKED);

First, (0x0350/4) and (0x0360/4) are the LVT LINT0 Register and LVT LINT1 Register.

Both are interrupt vectors that define interrupts from the LINT0 and LINT1 pins.

Here, bit 17 is set to enable the mask.

(I had no idea why it was necessary to disable these two — why?!)

Disable performance counter overflow interrupts

Moving on.

// Disable performance counter overflow interrupts
// on machines that provide that interrupt entry.
if(((lapic[VER]>>16) & 0xFF) >= 4) lapicw(PCINT, MASKED);

PCINT masks the LVT Performance Counter Register, which generates an interrupt on overflow, under a specific condition.

That condition is that the value obtained by right-shifting the Local APIC Version Register by 16 bits and taking the lower 8 bits is 4 or greater.

I had no idea what this means…

In any case, when I ran through this in the debugger, this line was not executed, so I will proceed on the assumption that xv6OS does not mask the LVT Performance Counter Register.

Setting the Error Register

The interrupt vector for the LVT Error Register is set here.

// Map error interrupt to IRQ_ERROR.
lapicw(ERROR, T_IRQ0 + IRQ_ERROR);

Clearing the ESR

The ESR is cleared here.

// Clear error status register (requires back-to-back writes).
lapicw(ESR, 0);
lapicw(ESR, 0);

ESR stands for Error Status Register; bits are set when an error occurs.

The corresponding bits are shown in the following diagram.

2022/02/image-2.png

Reference image: Intel SDM vol3

Why it is cleared twice is, once again, a mystery.

Checking the EOI

EOI stands for End of Interrupt, a signal sent to the PIC to indicate that a specific interrupt handler has completed.

// Ack any outstanding interrupts.
lapicw(EOI, 0);

Reference: End of interrupt - Wikipedia

For compatibility reasons, 0 must be set in the EOI register.

Setting the Interrupt Command Register

Both ICRHI and ICRLO are Interrupt Command Registers.

These two registers are used to send interrupts to CPUs.

Note that writing to (0x0300/4) triggers an interrupt, but writing to (0x0310/4) does not, so values are set in the order ICRHI then ICRLO.

// Send an Init Level De-Assert to synchronise arbitration ID's.
lapicw(ICRHI, 0);
lapicw(ICRLO, BCAST | INIT | LEVEL);
while(lapic[ICRLO] & DELIVS);

Here, an INIT Level De-assert message is sent to all Local APICs in the system to synchronize and set their arbitration IDs.

2022/02/image-3.png

Reference image: Intel SDM vol3

Finally, setting the TPR (Task Priority Register) to 0 allows the CPU to handle all interrupts, enabling the interrupt functionality.

// Enable interrupts on the APIC (but not on the processor).
lapicw(TPR, 0);

Incidentally, setting the TPR to 15 disables all interrupts.

Summary

With this, all processing inside the lapicinit function called from main is complete.

There were many parts I didn’t fully understand this time, so I plan to add notes as I figure things out.

Next time: segment descriptors at last.

It’s becoming more and more interesting as the inner workings of the kernel start to come into focus.

Reference Books