Reading xv6OS Thoroughly to Fully Understand the Kernel - Multiprocessor Edition -

This page has been machine-translated from the original page.

Inspired by An Introduction to OS Code Reading: Learning Kernel Internals with UNIX V6, I’m reading xv6 OS.

Because UNIX V6 itself does not run on x86 CPUs, I decided to read the source of kash1064/xv6-public: xv6 OS, a fork of the xv6 OS repository that makes UNIX V6 run on the x86 architecture.

In the previous article, I looked at the behavior of the kvmalloc function and its page table allocation executed from main.

This time, I will trace the behavior of the mpinit function.

The mpinit function
Summary
Reference Books

The `mpinit` function

The mpinit function is the following function defined in mp.c.

“mp” probably stands for multiprocessor; the function whose role is to detect other processors is mpinit.

void mpinit(void)
{
  uchar *p, *e;
  int ismp;
  struct mp *mp;
  struct mpconf *conf;
  struct mpproc *proc;
  struct mpioapic *ioapic;

  if((conf = mpconfig(&mp)) == 0) panic("Expect to run on an SMP");
  ismp = 1;
  lapic = (uint*)conf->lapicaddr;
  for(p=(uchar*)(conf+1), e=(uchar*)conf+conf->length; p<e; ){
    switch(*p){
    case MPPROC:
      proc = (struct mpproc*)p;
      if(ncpu < NCPU) {
        cpus[ncpu].apicid = proc->apicid;  // apicid may differ from ncpu
        ncpu++;
      }
      p += sizeof(struct mpproc);
      continue;
    case MPIOAPIC:
      ioapic = (struct mpioapic*)p;
      ioapicid = ioapic->apicno;
      p += sizeof(struct mpioapic);
      continue;
    case MPBUS:
    case MPIOINTR:
    case MPLINTR:
      p += 8;
      continue;
    default:
      ismp = 0;
      break;
    }
  }
  if(!ismp)
    panic("Didn't find a suitable machine");

  if(mp->imcrp){
    // Bochs doesn't support IMCR, so this doesn't run on Bochs.
    // But it would on real hardware.
    outb(0x22, 0x70);   // Select IMCR
    outb(0x23, inb(0x23) | 1);  // Mask external interrupts.
  }
}

Let’s read through the source code step by step.

Declaring structure variables

After the function call, several structure variables are declared.

uchar *p, *e;
int ismp;

struct mp *mp;
struct mpconf *conf;
struct mpproc *proc;
struct mpioapic *ioapic;

All of these are defined in mp.h.

The structure definitions are as follows. I will skip the details for now and cover them when the source code actually uses them.

// See MultiProcessor Specification Version 1.[14]

struct mp {             // floating pointer
  uchar signature[4];           // "_MP_"
  void *physaddr;               // phys addr of MP config table
  uchar length;                 // 1
  uchar specrev;                // [14]
  uchar checksum;               // all bytes must add up to 0
  uchar type;                   // MP system config type
  uchar imcrp;
  uchar reserved[3];
};

struct mpconf {         // configuration table header
  uchar signature[4];           // "PCMP"
  ushort length;                // total table length
  uchar version;                // [14]
  uchar checksum;               // all bytes must add up to 0
  uchar product[20];            // product id
  uint *oemtable;               // OEM table pointer
  ushort oemlength;             // OEM table length
  ushort entry;                 // entry count
  uint *lapicaddr;              // address of local APIC
  ushort xlength;               // extended table length
  uchar xchecksum;              // extended table checksum
  uchar reserved;
};

struct mpproc {         // processor table entry
  uchar type;                   // entry type (0)
  uchar apicid;                 // local APIC id
  uchar version;                // local APIC verison
  uchar flags;                  // CPU flags
    #define MPBOOT 0x02           // This proc is the bootstrap processor.
  uchar signature[4];           // CPU signature
  uint feature;                 // feature flags from CPUID instruction
  uchar reserved[8];
};

struct mpioapic {       // I/O APIC table entry
  uchar type;                   // entry type (2)
  uchar apicno;                 // I/O APIC id
  uchar version;                // I/O APIC version
  uchar flags;                  // I/O APIC flags
  uint *addr;                  // I/O APIC address
};

About the MP specification

Before reading the source code, let me summarize the MP specification.

The MP table is a mechanism that allows an OS on an x86 CPU to obtain multiprocessor information.

The MP table contains information related to the MP specification of x86 CPUs.

The following diagram, from Intel’s documentation, shows the data structure of the MP specification.

It depicts the FLOATING POINTER STRUCTURE pointing to the FIXED-LENGTH HEADER.

Reference image: Intel MultiProcessor Specification | ManualsLib

This FLOATING POINTER STRUCTURE is the MP Floating Pointer Structure, defined in xv6OS as the mp structure.

If a system has an MP Floating Pointer Structure, it means the system conforms to the MP specification.

The MP Floating Pointer Structure contains the following information:

A pointer to the MP Configuration Table
Pointers to other MP information

The MP Configuration Table is defined as the mpconf structure in xv6OS.

The mpconfig function described later defines the processing that obtains the MP Configuration Table after retrieving the MP Floating Pointer Structure.

Now, how does the OS find the MP Floating Pointer Structure? According to the Intel specification, the MP Floating Pointer Structure is defined to exist in one of the following locations, so the OS searches these locations to check whether one is present:

Within the first 1 KiB of the Extended BIOS Data Area (EBDA)
Within the last 1 KiB of the system base memory
In the BIOS ROM address space between 0x0F0000 and 0x0FFFFFF

In xv6OS, the mpsearch and mpsearch1 functions perform the search in the above regions.

These functions are described below.

Next, regarding the MP Configuration Table — it appears to be an optional component in the default configuration.

If the system uses the default configuration, defining the MP Configuration Table is unnecessary; however, it becomes required when the number of CPUs may vary. (In practice it is effectively required for any general-purpose OS.)

The MP Configuration Table contains configuration information about the APIC, processors, buses, and interrupts.

A new term has appeared: APIC. This is the interrupt control mechanism used in Intel’s multiprocessor CPUs.

I plan to look at APIC in more detail when configuring the interrupt controller in xv6OS.

Reference: APIC - OSDev Wiki

Since I could not find much useful information about the MP specification from web pages or books, reading Intel’s specification document directly is probably the fastest way to learn more.

Reference: Chapter 4 MP Configuration Table; MP Configuration Data Structures - Intel MultiProcessor Specification [Page 37] | ManualsLib

Obtaining the MP Floating Pointer Structure

With that background, let’s look at the following code:

if((conf = mpconfig(&mp)) == 0) panic("Expect to run on an SMP");

The mp structure is passed as an argument to mpconfig, and the return value is stored in conf, a variable of type mpconf.

This line initializes the mpconf variable conf and also checks whether the system is running on an SMP.

SMP stands for Symmetric Multiprocessing (or Shared-Memory Multiprocessing) — essentially a multiprocessor system in which multiple CPUs share memory resources.

Reference: Symmetric multiprocessing - Wikipedia

Reference: Symmetric Multiprocessing - OSDev Wiki

Let’s look at the source code of mpconfig.

mpconfig takes the address of the mp structure object declared in mpinit as its argument, and returns an mpconf structure.

This function searches for the MP table and initializes both the mp structure object passed as the argument and the returned mpconf structure.

// Search for an MP configuration table.  For now,
// don't accept the default configurations (physaddr == 0).
// Check for correct signature, calculate the checksum and,
// if correct, check the version.
// To do: check extended table checksum.
static struct mpconf* mpconfig(struct mp **pmp)
{
  struct mpconf *conf;
  struct mp *mp;

  if((mp = mpsearch()) == 0 || mp->physaddr == 0) return 0;
  conf = (struct mpconf*) P2V((uint) mp->physaddr);
  if(memcmp(conf, "PCMP", 4) != 0) return 0;
  if(conf->version != 1 && conf->version != 4) return 0;
  if(sum((uchar*)conf, conf->length) != 0) return 0;
  *pmp = mp;
  return conf;
}

The mp structure refers to the MP Floating Pointer Structure described above.

At the point mpconfig is called, the system’s MP Floating Pointer Structure has not yet been obtained, so the search must be performed first.

This is done by calling mpsearch and mpsearch1.

// Look for an MP structure in the len bytes at addr.
static struct mp* mpsearch1(uint a, int len)
{
  uchar *e, *p, *addr;
  addr = P2V(a);
  e = addr+len;
  for(p = addr; p < e; p += sizeof(struct mp))
  {
    if(memcmp(p, "_MP_", 4) == 0 && sum(p, sizeof(struct mp)) == 0) return (struct mp*)p;
  }
  return 0;
}

// Search for the MP Floating Pointer Structure, which according to the
// spec is in one of the following three locations:
// 1) in the first KB of the EBDA;
// 2) in the last KB of system base memory;
// 3) in the BIOS ROM between 0xE0000 and 0xFFFFF.
static struct mp* mpsearch(void)
{
  uchar *bda;
  uint p;
  struct mp *mp;

  bda = (uchar *) P2V(0x400);
  if((p = ((bda[0x0F]<<8)| bda[0x0E]) << 4)){
    if((mp = mpsearch1(p, 1024))) return mp;
  } else {
    p = ((bda[0x14]<<8)|bda[0x13])*1024;
    if((mp = mpsearch1(p-1024, 1024))) return mp;
  }
  return mpsearch1(0xF0000, 0x10000);
}

As noted above, if the MP Floating Pointer Structure is present, it is located in one of the following:

Within the first 1 KiB of the Extended BIOS Data Area (EBDA)
Within the last 1 KiB of the system base memory
In the BIOS ROM address space between 0x0F0000 and 0x0FFFFFF

These areas are searched, and if the MP Floating Pointer Structure is found, it is stored in mp.

If the MP Floating Pointer Structure is not found, or if the address of the MP Configuration Table held by the MP Floating Pointer Structure is empty, the kernel terminates.

if((mp = mpsearch()) == 0 || mp->physaddr == 0) return 0;

The MP Floating Pointer Structure has the following layout:

struct mp {             // floating pointer
  uchar signature[4];           // "_MP_"
  void *physaddr;               // phys addr of MP config table
  uchar length;                 // 1
  uchar specrev;                // [14]
  uchar checksum;               // all bytes must add up to 0
  uchar type;                   // MP system config type
  uchar imcrp;
  uchar reserved[3];
};

The definition of the mp structure is as above, but the diagram from the Intel specification is easier to visualize, so I am including it here.

Reference image: Intel MultiProcessor Specification | ManualsLib

The first 4 bytes of SIGNATURE are expected to contain _MP_.

When mpsearch and mpsearch1 perform their search, they look for this SIGNATURE.

physaddr holds the address of the MP Configuration Table; from here we use that information to obtain the MP Configuration Table.

Obtaining the MP Configuration Table

After obtaining the MP Floating Pointer Structure, the virtual address of the MP Configuration Table is retrieved and stored as the pointer variable conf of type mpconf.

struct mpconf *conf;
conf = (struct mpconf*) P2V((uint) mp->physaddr);

if(memcmp(conf, "PCMP", 4) != 0) return 0;
if(conf->version != 1 && conf->version != 4) return 0;
if(sum((uchar*)conf, conf->length) != 0) return 0;

The MP Configuration Table has the following structure:

struct mpconf {         // configuration table header
  uchar signature[4];           // "PCMP"
  ushort length;                // total table length
  uchar version;                // [14]
  uchar checksum;               // all bytes must add up to 0
  uchar product[20];            // product id
  uint *oemtable;               // OEM table pointer
  ushort oemlength;             // OEM table length
  ushort entry;                 // entry count
  uint *lapicaddr;              // address of local APIC
  ushort xlength;               // extended table length
  uchar xchecksum;              // extended table checksum
  uchar reserved;
};

The following is a structural diagram quoted from the Intel specification.

Reference image: Intel MultiProcessor Specification | ManualsLib

The first 4 bytes hold the SIGNATURE, which is expected to be PCMP.

In xv6OS, the memcmp function checks whether the first 4 bytes of the retrieved MP Configuration Table match PCMP.

if(memcmp(conf, "PCMP", 4) != 0) return 0;
if(conf->version != 1 && conf->version != 4) return 0;
if(sum((uchar*)conf, conf->length) != 0) return 0;

It also checks whether the version information is appropriate and whether the data size matches the actual size.

With this, the following processing in mpinit is complete, and both the MP Floating Pointer Structure and the MP Configuration Table have been obtained.

if((conf = mpconfig(&mp)) == 0) panic("Expect to run on an SMP");

Obtaining the IOAPIC from the MP Configuration Table

Next, the address obtained from lapicaddr in the MP Configuration Table is stored into lapic, and ismp is initialized.

int ismp;
struct mpioapic *ioapic;

ismp = 1;
lapic = (uint*)conf->lapicaddr;

The mpioapic structure is as follows:

struct mpioapic {       // I/O APIC table entry
  uchar type;                   // entry type (2)
  uchar apicno;                 // I/O APIC id
  uchar version;                // I/O APIC version
  uchar flags;                  // I/O APIC flags
  uint *addr;                  // I/O APIC address
};

The Intel specification diagram is as follows.

Reference image: Intel MultiProcessor Specification | ManualsLib

The IOAPIC is a mechanism that distributes external interrupts among multiple CPUs.

I mentioned earlier that APIC is an external interrupt mechanism; there appear to be two types of APIC: the Local APIC and the IOAPIC.

In xv6OS, the Local APIC is implemented in lapic.c, and the IOAPIC is implemented in ioapic.c.

The Local APIC handles interrupts built into the CPU, while the IOAPIC receives interrupts from I/O devices and notifies the CPU based on information in the Redirection Table.

The IOAPIC has an IOAPIC table, and x86 CPUs can define entries in this table through memory-mapped I/O.

In simple terms, memory-mapped I/O is one way to perform input and output between the CPU and I/O devices: a region of the physical address space is reserved for I/O device input and output, and data is exchanged using the CPU’s memory read/write capabilities.

Reference: About I/O APIC - 睡分不足

Reference: Memory-mapped I/O - Wikipedia

In a typical system with PCI I/O devices, the IOAPIC detects changes in PCI interrupt signals and issues interrupt messages to the CPU based on the Redirection Table.

The Local APIC inside the CPU receives this information, calls the interrupt handler to process the interrupt, and then sends an EOI (End of Interrupt) command back to the IOAPIC to notify it that the interrupt has been handled.

I will cover this in more detail when I get to actually implementing interrupt handling. Probably.

Reference: APIC - Wikipedia

Reference: APIC - OSDev Wiki

Reference: IOAPIC - OSDev Wiki

Reference: 82093AA I/O ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (IOAPIC)

The APIC mechanism was born from the need for an interrupt controller capable of handling interrupts in a multiprocessor environment such as x86 CPUs and their motherboards.

It seems the earlier, simpler interrupt mechanism called the PIC could not handle interrupt processing in multiprocessor configurations.

Because xv6OS assumes a multiprocessor configuration, it ignores interrupts from the PIC and implements interrupt processing using the Local APIC and IOAPIC.

Reference: P45

In the xv6OS code, the address obtained from lapicaddr in the MP Configuration Table is stored into the global variable lapic.

The value stored in lapicaddr is the address of the memory-mapped Local APIC.

The variable lapic is defined as a global variable in defs.h.

It is not used further within this function; it will be used in lapic.c.

extern volatile uint*    lapic;

Obtaining processor information

Let’s look at the loop that comes after obtaining lapicaddr.

conf holds the MP Configuration Table retrieved earlier.

According to Intel’s specification, MP Configuration Table entries follow the MP Configuration Table header (the starting address of the MP Configuration Table) in variable numbers.

In addition to the Processor Entries shown earlier, MP Configuration Table entries include Bus Entry, I/O APIC Entry, I/O Interrupt Entry, and Local Interrupt Entry. (There are also extended entries.)

Each of these has a unique Entry Type defined in its first byte.

Processor Entry

Bus Entry

I/O APIC Entry

I/O Interrupt Entry

Local Interrupt Entry

Reference image: Intel MultiProcessor Specification | ManualsLib

The following xv6OS code iterates through each entry from the start of the MP Configuration Header and branches processing according to the value of the first Entry Type.

for(p=(uchar*)(conf+1), e=(uchar*)conf+conf->length; p<e; ){
  switch(*p){

  case MPPROC:
    proc = (struct mpproc*)p;
    if(ncpu < NCPU) {
      cpus[ncpu].apicid = proc->apicid;  // apicid may differ from ncpu
      ncpu++;
    }
    p += sizeof(struct mpproc);
    continue;

  case MPIOAPIC:
    ioapic = (struct mpioapic*)p;
    ioapicid = ioapic->apicno;
    p += sizeof(struct mpioapic);
    continue;

  case MPBUS:
  case MPIOINTR:
  case MPLINTR:
    p += 8;
    continue;

  default:
    ismp = 0;
    break;
  }
}

The entries being checked are defined as follows:

// Table entry types
#define MPPROC    0x00  // One per processor
#define MPBUS     0x01  // One per bus
#define MPIOAPIC  0x02  // One per I/O APIC
#define MPIOINTR  0x03  // One per bus interrupt source
#define MPLINTR   0x04  // One per system interrupt source

Reading Processor Entry information

First is the case for Processor Entry.

Here, the Processor Entry is obtained as an mpproc structure object, and the Local APIC ID values are stored one by one in all elements of the cpus array.

case MPPROC:
  proc = (struct mpproc*)p;
  if(ncpu < NCPU) {
    cpus[ncpu].apicid = proc->apicid;  // apicid may differ from ncpu
    ncpu++;
  }
  p += sizeof(struct mpproc);
  continue;

Incidentally, NCPU used here is defined as a constant in param.h as follows.

It appears xv6OS supports up to 8 CPUs.

#define NCPU      8  // maximum number of CPUs

Reading IOAPIC information

Next, IOAPIC information is obtained.

case MPIOAPIC:
  ioapic = (struct mpioapic*)p;
  ioapicid = ioapic->apicno;
  p += sizeof(struct mpioapic);
  continue;

I will skip the details of each structure here since they were covered earlier.

Modifying the IMCR

With this, the processing in mp.c is nearly complete.

Finally, the IMCR is disabled.

The IMCR is known as the Interrupt Mode Configuration Register; changing the IMCR appears to be required in order to switch away from PIC mode.

Reference: x86 - Where is the IMCR defined in the docs? - Reverse Engineering Stack Exchange

Reference: OSDev.org • View topic - Set IMCR to 0x1 to mask external interrupts?

Reference: Default Configurations; Symmetric I/O Mode - Intel MultiProcessor Specification [Page 31] | ManualsLib

Summary

Next time I will start with the lapicinit function.

It seems the information obtained this time will be used to implement the interrupt controller.

My head is starting to struggle to keep up, so I will try to buckle down.

Reference Books

Published Jan 31, 2022

Aspiring Reverse Engineer and CTF Player (Team: 0nePadding). Passionate about WinDbg and Anti-Virus internals. OSCP / CISSP. Working at Microsoft Japan, but all views expressed are my own.かしわば(@kash1064) on Twitter

Reading xv6OS Thoroughly to Fully Understand the Kernel - Multiprocessor Edition -

Table of Contents

The mpinit function

Declaring structure variables

About the MP specification

Obtaining the MP Floating Pointer Structure

Obtaining the MP Configuration Table

Obtaining the IOAPIC from the MP Configuration Table

Obtaining processor information

Reading Processor Entry information

Reading IOAPIC information

Modifying the IMCR

Summary

Reference Books

The `mpinit` function