{"componentChunkName":"component---src-templates-post-template-js","path":"/unix-xv6-001-bootstrap-en","result":{"data":{"markdownRemark":{"id":"57b3d29f-2cc5-517f-aa17-24efab75f771","html":"<blockquote>\n<p>This page has been machine-translated from the <a href=\"/unix-xv6-001-bootstrap\">original page</a>.</p>\n</blockquote>\n<p>Inspired by <a href=\"https://amzn.to/3q8TU3K\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Reading OS Code for the First Time ~Learning Kernel Mechanisms with UNIX V6~</a>, I have been reading <a href=\"https://github.com/mit-pdos/xv6-public\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">xv6 OS</a>.</p>\n<p>I want to strengthen my reverse engineering skills and deepen my understanding of kernels and operating systems.</p>\n<p><a href=\"https://amzn.to/3I6fkVt\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Understanding the Linux Kernel</a> was quite heavy, so I was looking for something lighter to start with. I came across UNIX V6, which has a total codebase of around 10,000 lines — just barely within the range a human can understand — and became interested.</p>\n<p>However, since UNIX V6 itself does not run on x86 CPUs, I decided to read the source code of <a href=\"https://github.com/kash1064/xv6-public\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">kash1064/xv6-public: xv6 OS</a>, which is my fork of <a href=\"https://github.com/mit-pdos/xv6-public\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">xv6 OS</a> — a port of UNIX V6 that runs on the x86 architecture.</p>\n<p>xv6 was originally developed as an educational OS for MIT’s OS course.</p>\n<p>Reference: <a href=\"https://pdos.csail.mit.edu/6.828/2012/xv6.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Xv6, a simple Unix-like teaching operating system</a></p>\n<p>The textbook used in this course is also distributed online.</p>\n<p>Reference: <a href=\"https://pdos.csail.mit.edu/6.828/2012/xv6/book-rev7.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">xv6 a simple, Unix-like teaching operating system</a></p>\n<p>The upstream repository is no longer maintained; active maintenance has moved to a RISC-V version of xv6.</p>\n<!-- omit in toc -->\n<h2 id=\"table-of-contents\" style=\"position:relative;\"><a href=\"#table-of-contents\" aria-label=\"table of contents permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Table of Contents</h2>\n<ul>\n<li>\n<p><a href=\"#image-file-structure\">Image File Structure</a></p>\n<ul>\n<li><a href=\"#xv6img\">xv6.img</a></li>\n<li><a href=\"#fsimg\">fs.img</a></li>\n</ul>\n</li>\n<li>\n<p><a href=\"#reading-bootblock\">Reading bootblock</a></p>\n<ul>\n<li><a href=\"#compiler\">Compiler</a></li>\n<li><a href=\"#note-position-independent-code-pic\">Note: Position-Independent Code (PIC)</a></li>\n<li><a href=\"#overview-of-bootmainc\">Overview of bootmain.c</a></li>\n<li><a href=\"#overview-of-bootblocko\">Overview of bootblock.o</a></li>\n<li><a href=\"#linking-the-boot-program\">Linking the Boot Program</a></li>\n<li><a href=\"#real-mode-and-protected-mode\">Real Mode and Protected Mode</a></li>\n<li><a href=\"#booting-in-real-mode\">Booting in Real Mode</a></li>\n<li><a href=\"#disabling-cpu-interrupts-with-cli-and-sti\">Disabling CPU Interrupts with cli and sti</a></li>\n<li><a href=\"#initializing-segment-registers\">Initializing Segment Registers</a></li>\n<li><a href=\"#enabling-the-a20-line\">Enabling the A20 Line</a></li>\n<li><a href=\"#switching-to-protected-mode\">Switching to Protected Mode</a></li>\n<li><a href=\"#memory-address-references-in-protected-mode\">Memory Address References in Protected Mode</a></li>\n<li><a href=\"#lgdt-gdtdesc\">lgdt gdtdesc</a></li>\n<li><a href=\"#starting-32-bit-mode\">Starting 32-bit Mode</a></li>\n<li><a href=\"#why-use-the-ljmp-instruction\">Why Use the ljmp Instruction?</a></li>\n<li><a href=\"#post-protected-mode-setup\">Post-Protected-Mode Setup</a></li>\n<li><a href=\"#initializing-segment-registers-1\">Initializing Segment Registers</a></li>\n<li><a href=\"#calling-bootmainc\">Calling bootmain.c</a></li>\n<li><a href=\"#loading-the-kernel\">Loading the Kernel</a></li>\n<li><a href=\"#loading-the-elf-kernel-image-from-disk\">Loading the ELF Kernel Image from Disk</a></li>\n<li><a href=\"#reading-sectors-from-disk\">Reading Sectors from Disk</a></li>\n<li><a href=\"#verifying-the-loaded-kernel\">Verifying the Loaded Kernel</a></li>\n<li><a href=\"#loading-program-headers\">Loading Program Headers</a></li>\n</ul>\n</li>\n<li><a href=\"#summary\">Summary</a></li>\n<li><a href=\"#reference-books\">Reference Books</a></li>\n</ul>\n<h2 id=\"image-file-structure\" style=\"position:relative;\"><a href=\"#image-file-structure\" aria-label=\"image file structure permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Image File Structure</h2>\n<p>xv6 boots using two image files: <code class=\"language-text\">xv6.img</code> and <code class=\"language-text\">fs.img</code>.</p>\n<h3 id=\"xv6img\" style=\"position:relative;\"><a href=\"#xv6img\" aria-label=\"xv6img permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>xv6.img</h3>\n<p><code class=\"language-text\">xv6.img</code> has the following structure:</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\"><span class=\"token comment\"># Makefile</span>\nxv6.img: bootblock kernel\n\t<span class=\"token function\">dd</span> <span class=\"token assign-left variable\">if</span><span class=\"token operator\">=</span>/dev/zero <span class=\"token assign-left variable\">of</span><span class=\"token operator\">=</span>xv6.img <span class=\"token assign-left variable\">count</span><span class=\"token operator\">=</span><span class=\"token number\">10000</span>\n\t<span class=\"token function\">dd</span> <span class=\"token assign-left variable\">if</span><span class=\"token operator\">=</span>bootblock <span class=\"token assign-left variable\">of</span><span class=\"token operator\">=</span>xv6.img <span class=\"token assign-left variable\">conv</span><span class=\"token operator\">=</span>notrunc\n\t<span class=\"token function\">dd</span> <span class=\"token assign-left variable\">if</span><span class=\"token operator\">=</span>kernel <span class=\"token assign-left variable\">of</span><span class=\"token operator\">=</span>xv6.img <span class=\"token assign-left variable\">seek</span><span class=\"token operator\">=</span><span class=\"token number\">1</span> <span class=\"token assign-left variable\">conv</span><span class=\"token operator\">=</span>notrunc</code></pre></div>\n<p><code class=\"language-text\">dd if=/dev/zero of=xv6.img count=10000</code> reads <code class=\"language-text\">512×10^4</code> bytes (51.2 MB) from <code class=\"language-text\">/dev/zero</code> and saves it as <code class=\"language-text\">xv6.img</code>.</p>\n<p>Since the default block size (<code class=\"language-text\">bs</code>) of the <code class=\"language-text\">dd</code> command is 512, this results in the behavior described above.</p>\n<p><code class=\"language-text\">conv=notrunc</code> is an option that writes the specified binary while preserving the original file size.</p>\n<p>This ensures that when the 512-byte bootblock is written to the beginning of <code class=\"language-text\">xv6.img</code>, the original size of <code class=\"language-text\">xv6.img</code> is maintained.</p>\n<p><code class=\"language-text\">seek=1</code> is an option that skips the write start position by one block.</p>\n<p>Since the default block size is 512, <code class=\"language-text\">dd if=kernel of=xv6.img seek=1 conv=notrunc</code> means the kernel is placed starting at byte 512.</p>\n<p>As these commands show, first an empty image file of <code class=\"language-text\">512×10^4</code> bytes (51.2 MB) is created, then the bootblock is placed in the first 512 bytes, followed by the kernel.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 500px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/0ee723f0df4390ef428db10febeb6cfd/0b533/image-8.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 75%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAARlAAAEZQAGA43XUAAABiElEQVQ4y8WU6XaCMBBGff8Hah+gdtGiApGqaCsgFgmgHDZ3viZxpaL1X3POPWFmwiXDVsF+bLdbRFFUII5jJEki4Me/65vNRpyb5/lBg0oQ+IjY4vVqhW9KQcwRNMsWEMNCo9dHUx+ADM1Tnq2xnAm2ZcLlcon1es12uIE9cVHTh3gfGJAGJl47AzxUa3h8qeNZ0yF9mixviDXDsYOcdXUhxNlwPAeNLxmyQRiqmMlIEygmQWufaw5lGI5ZLuTBIeGHFF0qo+8T6B5HFce7WN3nCHqeAtszWcsn4YGC0Ju5+JjI6LkEXVdlEKijBhRLOsacjsuEtCgstHwhpDshn+t6FW/dp13MZL0rwuMOC8LQFYt1ry1knL6vCQ4xr3WpgrFnYZ4tQKmLsW0jCKbgD7ggpNMJa1GCNm6hfaS5ZxfzGrElWK7BdpjfbjlOYqTzBOkiZSTlsHrG6mE4E6/bzZazbI57B/96zoWlO8yyrHC1a4hu+Nf1L8J7x13CNE3FveHzX1z72/wARuV7fAc3xmkAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/0ee723f0df4390ef428db10febeb6cfd/8ac56/image-8.webp 240w,\n/static/0ee723f0df4390ef428db10febeb6cfd/d3be9/image-8.webp 480w,\n/static/0ee723f0df4390ef428db10febeb6cfd/b0a15/image-8.webp 500w\"\n              sizes=\"(max-width: 500px) 100vw, 500px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/0ee723f0df4390ef428db10febeb6cfd/8ff5a/image-8.png 240w,\n/static/0ee723f0df4390ef428db10febeb6cfd/e85cb/image-8.png 480w,\n/static/0ee723f0df4390ef428db10febeb6cfd/0b533/image-8.png 500w\"\n            sizes=\"(max-width: 500px) 100vw, 500px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/0ee723f0df4390ef428db10febeb6cfd/0b533/image-8.png\"\n            alt=\"img\"\n            title=\"img\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>The remaining empty space is used by the system.</p>\n<h3 id=\"fsimg\" style=\"position:relative;\"><a href=\"#fsimg\" aria-label=\"fsimg permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>fs.img</h3>\n<p><code class=\"language-text\">fs.img</code> has the following structure:</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\"><span class=\"token comment\"># Makefile</span>\n<span class=\"token assign-left variable\">UPROGS</span><span class=\"token operator\">=</span><span class=\"token punctuation\">\\</span>\n\t_cat<span class=\"token punctuation\">\\</span>\n\t_echo<span class=\"token punctuation\">\\</span>\n\t_forktest<span class=\"token punctuation\">\\</span>\n\t_grep<span class=\"token punctuation\">\\</span>\n\t_init<span class=\"token punctuation\">\\</span>\n\t_kill<span class=\"token punctuation\">\\</span>\n\t_ln<span class=\"token punctuation\">\\</span>\n\t_ls<span class=\"token punctuation\">\\</span>\n\t_mkdir<span class=\"token punctuation\">\\</span>\n\t_rm<span class=\"token punctuation\">\\</span>\n\t_sh<span class=\"token punctuation\">\\</span>\n\t_stressfs<span class=\"token punctuation\">\\</span>\n\t_usertests<span class=\"token punctuation\">\\</span>\n\t_wc<span class=\"token punctuation\">\\</span>\n\t_zombie<span class=\"token punctuation\">\\</span>\n\nmkfs: mkfs.c fs.h\n\tgcc -Werror -Wall -o <span class=\"token function\">mkfs</span> mkfs.c\n\nfs.img: <span class=\"token function\">mkfs</span> README <span class=\"token variable\"><span class=\"token variable\">$(</span>UPROGS<span class=\"token variable\">)</span></span>\n\t./mkfs fs.img README <span class=\"token variable\"><span class=\"token variable\">$(</span>UPROGS<span class=\"token variable\">)</span></span></code></pre></div>\n<p>It contains the user command program binaries and the README.</p>\n<p>This is the disk that users interact with.</p>\n<h2 id=\"reading-bootblock\" style=\"position:relative;\"><a href=\"#reading-bootblock\" aria-label=\"reading bootblock permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Reading bootblock</h2>\n<p>Let us start with bootblock.</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">bootblock: bootasm.S bootmain.c\n\t<span class=\"token variable\"><span class=\"token variable\">$(</span>CC<span class=\"token variable\">)</span></span> <span class=\"token variable\"><span class=\"token variable\">$(</span>CFLAGS<span class=\"token variable\">)</span></span> -fno-pic -O -nostdinc -I. -c bootmain.c\n\t<span class=\"token variable\"><span class=\"token variable\">$(</span>CC<span class=\"token variable\">)</span></span> <span class=\"token variable\"><span class=\"token variable\">$(</span>CFLAGS<span class=\"token variable\">)</span></span> -fno-pic -nostdinc -I. -c bootasm.S\n\t<span class=\"token variable\"><span class=\"token variable\">$(</span>LD<span class=\"token variable\">)</span></span> <span class=\"token variable\"><span class=\"token variable\">$(</span>LDFLAGS<span class=\"token variable\">)</span></span> -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o\n\t<span class=\"token variable\"><span class=\"token variable\">$(</span>OBJDUMP<span class=\"token variable\">)</span></span> -S bootblock.o <span class=\"token operator\">></span> bootblock.asm\n\t<span class=\"token variable\"><span class=\"token variable\">$(</span>OBJCOPY<span class=\"token variable\">)</span></span> -S -O binary -j .text bootblock.o bootblock\n\t./sign.pl bootblock</code></pre></div>\n<p>Let us start by examining the build options.</p>\n<h3 id=\"compiler\" style=\"position:relative;\"><a href=\"#compiler\" aria-label=\"compiler permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Compiler</h3>\n<p>The following is the section of the Makefile related to <code class=\"language-text\">$(CC)</code>:</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\"><span class=\"token comment\"># Cross-compiling (e.g., on Mac OS X)</span>\n<span class=\"token comment\"># TOOLPREFIX = i386-jos-elf</span>\n\n<span class=\"token comment\"># Using native tools (e.g., on X86 Linux)</span>\n<span class=\"token comment\">#TOOLPREFIX = </span>\n\n<span class=\"token comment\"># Try to infer the correct TOOLPREFIX if not set</span>\nifndef TOOLPREFIX\nTOOLPREFIX :<span class=\"token operator\">=</span> <span class=\"token variable\"><span class=\"token variable\">$(</span>shell <span class=\"token keyword\">if</span> i386-jos-elf-objdump -i <span class=\"token operator\"><span class=\"token file-descriptor important\">2</span>></span><span class=\"token file-descriptor important\">&amp;1</span> <span class=\"token operator\">|</span> <span class=\"token function\">grep</span> <span class=\"token string\">'^elf32-i386$$'</span> <span class=\"token operator\">></span>/dev/null <span class=\"token operator\"><span class=\"token file-descriptor important\">2</span>></span><span class=\"token file-descriptor important\">&amp;1</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token keyword\">then</span> <span class=\"token builtin class-name\">echo</span> <span class=\"token string\">'i386-jos-elf-'</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token keyword\">elif</span> objdump -i <span class=\"token operator\"><span class=\"token file-descriptor important\">2</span>></span><span class=\"token file-descriptor important\">&amp;1</span> <span class=\"token operator\">|</span> <span class=\"token function\">grep</span> <span class=\"token string\">'elf32-i386'</span> <span class=\"token operator\">></span>/dev/null <span class=\"token operator\"><span class=\"token file-descriptor important\">2</span>></span><span class=\"token file-descriptor important\">&amp;1</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token keyword\">then</span> <span class=\"token builtin class-name\">echo</span> <span class=\"token string\">''</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token keyword\">else</span> <span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"***\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"*** Error: Couldn't find an i386-*-elf version of GCC/binutils.\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"*** Is the directory with i386-jos-elf-gcc in your PATH?\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"*** If your i386-*-elf toolchain is installed with a command\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"*** prefix other than 'i386-jos-elf-', set your TOOLPREFIX\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"*** environment variable to that prefix and run 'make' again.\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"*** To turn off this error, run 'gmake TOOLPREFIX= ...'.\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token punctuation\">\\</span>\n\t<span class=\"token builtin class-name\">echo</span> <span class=\"token string\">\"***\"</span> <span class=\"token operator\"><span class=\"token file-descriptor important\">1</span>></span><span class=\"token file-descriptor important\">&amp;2</span><span class=\"token punctuation\">;</span> <span class=\"token builtin class-name\">exit</span> <span class=\"token number\">1</span><span class=\"token punctuation\">;</span> <span class=\"token keyword\">fi</span><span class=\"token variable\">)</span></span>\nendif\n\nCC <span class=\"token operator\">=</span> <span class=\"token variable\"><span class=\"token variable\">$(</span>TOOLPREFIX<span class=\"token variable\">)</span></span>gcc</code></pre></div>\n<p>By default, it compiles using <code class=\"language-text\">gcc</code>.</p>\n<p>If you are building on macOS or another non-Linux environment, you will need to change the <code class=\"language-text\">TOOLPREFIX</code> setting to cross-compile.</p>\n<p>Reference: <a href=\"https://qiita.com/maru-n@github/items/9cf83944403b3fbb8422\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Building and running xv6 on OS X Yosemite - Qiita</a></p>\n<p>Next, here is the <code class=\"language-text\">CFLAGS</code> section:</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">CFLAGS <span class=\"token operator\">=</span> -fno-pic -static -fno-builtin -fno-strict-aliasing -O2 -Wall -MD -ggdb -m32 -Werror -fno-omit-frame-pointer\nCFLAGS <span class=\"token operator\">+=</span> <span class=\"token variable\"><span class=\"token variable\">$(</span>shell <span class=\"token punctuation\">$(</span>CC<span class=\"token punctuation\">)</span> -fno-stack-protector -E -x c /dev/null <span class=\"token operator\">></span>/dev/null <span class=\"token operator\"><span class=\"token file-descriptor important\">2</span>></span><span class=\"token file-descriptor important\">&amp;1</span> <span class=\"token operator\">&amp;&amp;</span> <span class=\"token builtin class-name\">echo</span> -fno-stack-protector<span class=\"token variable\">)</span></span></code></pre></div>\n<p>I will skip a detailed explanation of every option and focus on the defaults.</p>\n<p>Reference: <a href=\"https://linuxjm.osdn.jp/html/GNU_gcc/man1/gcc.1.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Man page of GCC</a></p>\n<table>\n<thead>\n<tr>\n<th align=\"center\">Option</th>\n<th align=\"center\">Purpose</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td align=\"center\">-fno-pic</td>\n<td align=\"center\">Do not generate position-independent code (PIC)</td>\n</tr>\n<tr>\n<td align=\"center\">-static</td>\n<td align=\"center\">Compile the program with static linking</td>\n</tr>\n<tr>\n<td align=\"center\">-fno-builtin</td>\n<td align=\"center\">Do not use compiler built-in functions</td>\n</tr>\n<tr>\n<td align=\"center\">-fno-strict-aliasing</td>\n<td align=\"center\">Disable strict aliasing</td>\n</tr>\n<tr>\n<td align=\"center\">-O2</td>\n<td align=\"center\">Enable all supported optimization options</td>\n</tr>\n<tr>\n<td align=\"center\">-Wall</td>\n<td align=\"center\">Enable all compiler warning messages</td>\n</tr>\n<tr>\n<td align=\"center\">-MD</td>\n<td align=\"center\">List both system and user header files</td>\n</tr>\n<tr>\n<td align=\"center\">-ggdb</td>\n<td align=\"center\">Generate debug information targeting gdb</td>\n</tr>\n<tr>\n<td align=\"center\">-m32</td>\n<td align=\"center\">Compile as 32-bit object</td>\n</tr>\n<tr>\n<td align=\"center\">-Werror</td>\n<td align=\"center\">Treat unused function arguments as compile errors</td>\n</tr>\n<tr>\n<td align=\"center\">-fno-omit-frame-pointer</td>\n<td align=\"center\">Retain the frame pointer even in functions that do not need it</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"note-position-independent-code-pic\" style=\"position:relative;\"><a href=\"#note-position-independent-code-pic\" aria-label=\"note position independent code pic permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Note: Position-Independent Code (PIC)</h3>\n<p>Position-independent code (PIC), or position-independent executable (PIE), refers to machine code that can be executed correctly regardless of where it is placed in memory.</p>\n<p>PIC is primarily used for shared libraries.</p>\n<p>Reference: <a href=\"https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Position Independent Code (PIC) in shared libraries - Eli Bendersky’s website</a></p>\n<p>Reference: <a href=\"http://0xcc.net/blog/archives/000107.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Why compile with PIC when making a shared library on Linux - bkブログ</a></p>\n<h3 id=\"overview-of-bootmainc\" style=\"position:relative;\"><a href=\"#overview-of-bootmainc\" aria-label=\"overview of bootmainc permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Overview of bootmain.c</h3>\n<p>When building xv6, the first step generates <code class=\"language-text\">bootmain.o</code> from the following <code class=\"language-text\">bootmain.c</code>:</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Boot loader.</span>\n<span class=\"token comment\">//</span>\n<span class=\"token comment\">// Part of the boot block, along with bootasm.S, which calls bootmain().</span>\n<span class=\"token comment\">// bootasm.S has put the processor into protected 32-bit mode.</span>\n<span class=\"token comment\">// bootmain() loads an ELF kernel image from the disk starting at</span>\n<span class=\"token comment\">// sector 1 and then jumps to the kernel entry routine.</span>\n\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">include</span> <span class=\"token string\">\"types.h\"</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">include</span> <span class=\"token string\">\"elf.h\"</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">include</span> <span class=\"token string\">\"x86.h\"</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">include</span> <span class=\"token string\">\"memlayout.h\"</span></span>\n\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">SECTSIZE</span>  <span class=\"token expression\"><span class=\"token number\">512</span></span></span>\n\n<span class=\"token keyword\">void</span> <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">,</span> uint<span class=\"token punctuation\">,</span> uint<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n<span class=\"token keyword\">void</span>\n<span class=\"token function\">bootmain</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  <span class=\"token keyword\">struct</span> <span class=\"token class-name\">elfhdr</span> <span class=\"token operator\">*</span>elf<span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">struct</span> <span class=\"token class-name\">proghdr</span> <span class=\"token operator\">*</span>ph<span class=\"token punctuation\">,</span> <span class=\"token operator\">*</span>eph<span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">void</span> <span class=\"token punctuation\">(</span><span class=\"token operator\">*</span>entry<span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  uchar<span class=\"token operator\">*</span> pa<span class=\"token punctuation\">;</span>\n\n  elf <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">struct</span> <span class=\"token class-name\">elfhdr</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token number\">0x10000</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// scratch space</span>\n\n  <span class=\"token comment\">// Read 1st page off disk</span>\n  <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>elf<span class=\"token punctuation\">,</span> <span class=\"token number\">4096</span><span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// Is this an ELF executable?</span>\n  <span class=\"token keyword\">if</span><span class=\"token punctuation\">(</span>elf<span class=\"token operator\">-></span>magic <span class=\"token operator\">!=</span> ELF_MAGIC<span class=\"token punctuation\">)</span>\n    <span class=\"token keyword\">return</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// let bootasm.S handle error</span>\n\n  <span class=\"token comment\">// Load each program segment (ignores ph flags).</span>\n  ph <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">struct</span> <span class=\"token class-name\">proghdr</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>elf <span class=\"token operator\">+</span> elf<span class=\"token operator\">-></span>phoff<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  eph <span class=\"token operator\">=</span> ph <span class=\"token operator\">+</span> elf<span class=\"token operator\">-></span>phnum<span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">for</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">;</span> ph <span class=\"token operator\">&lt;</span> eph<span class=\"token punctuation\">;</span> ph<span class=\"token operator\">++</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">{</span>\n    pa <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>ph<span class=\"token operator\">-></span>paddr<span class=\"token punctuation\">;</span>\n    <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span>pa<span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>off<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">if</span><span class=\"token punctuation\">(</span>ph<span class=\"token operator\">-></span>memsz <span class=\"token operator\">></span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">)</span>\n      <span class=\"token function\">stosb</span><span class=\"token punctuation\">(</span>pa <span class=\"token operator\">+</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>memsz <span class=\"token operator\">-</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token punctuation\">}</span>\n\n  <span class=\"token comment\">// Call the entry point from the ELF header.</span>\n  <span class=\"token comment\">// Does not return!</span>\n  entry <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">(</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span>elf<span class=\"token operator\">-></span>entry<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">entry</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span>\n\n<span class=\"token keyword\">void</span>\n<span class=\"token function\">waitdisk</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  <span class=\"token comment\">// Wait for disk ready.</span>\n  <span class=\"token keyword\">while</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span><span class=\"token function\">inb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F7</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">&amp;</span> <span class=\"token number\">0xC0</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">!=</span> <span class=\"token number\">0x40</span><span class=\"token punctuation\">)</span>\n    <span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span>\n\n<span class=\"token comment\">// Read a single sector at offset into dst.</span>\n<span class=\"token keyword\">void</span>\n<span class=\"token function\">readsect</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span> <span class=\"token operator\">*</span>dst<span class=\"token punctuation\">,</span> uint offset<span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  <span class=\"token comment\">// Issue command.</span>\n  <span class=\"token function\">waitdisk</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F2</span><span class=\"token punctuation\">,</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>   <span class=\"token comment\">// count = 1</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F3</span><span class=\"token punctuation\">,</span> offset<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F4</span><span class=\"token punctuation\">,</span> offset <span class=\"token operator\">>></span> <span class=\"token number\">8</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F5</span><span class=\"token punctuation\">,</span> offset <span class=\"token operator\">>></span> <span class=\"token number\">16</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F6</span><span class=\"token punctuation\">,</span> <span class=\"token punctuation\">(</span>offset <span class=\"token operator\">>></span> <span class=\"token number\">24</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">|</span> <span class=\"token number\">0xE0</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F7</span><span class=\"token punctuation\">,</span> <span class=\"token number\">0x20</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// cmd 0x20 - read sectors</span>\n\n  <span class=\"token comment\">// Read data.</span>\n  <span class=\"token function\">waitdisk</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">insl</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F0</span><span class=\"token punctuation\">,</span> dst<span class=\"token punctuation\">,</span> SECTSIZE<span class=\"token operator\">/</span><span class=\"token number\">4</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span>\n\n<span class=\"token comment\">// Read 'count' bytes at 'offset' from kernel into physical address 'pa'.</span>\n<span class=\"token comment\">// Might copy more than asked.</span>\n<span class=\"token keyword\">void</span>\n<span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span> pa<span class=\"token punctuation\">,</span> uint count<span class=\"token punctuation\">,</span> uint offset<span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  uchar<span class=\"token operator\">*</span> epa<span class=\"token punctuation\">;</span>\n\n  epa <span class=\"token operator\">=</span> pa <span class=\"token operator\">+</span> count<span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// Round down to sector boundary.</span>\n  pa <span class=\"token operator\">-=</span> offset <span class=\"token operator\">%</span> SECTSIZE<span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// Translate from bytes to sectors; kernel starts at sector 1.</span>\n  offset <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span>offset <span class=\"token operator\">/</span> SECTSIZE<span class=\"token punctuation\">)</span> <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// If this is too slow, we could read lots of sectors at a time.</span>\n  <span class=\"token comment\">// We'd write more to memory than asked, but it doesn't matter --</span>\n  <span class=\"token comment\">// we load in increasing order.</span>\n  <span class=\"token keyword\">for</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">;</span> pa <span class=\"token operator\">&lt;</span> epa<span class=\"token punctuation\">;</span> pa <span class=\"token operator\">+=</span> SECTSIZE<span class=\"token punctuation\">,</span> offset<span class=\"token operator\">++</span><span class=\"token punctuation\">)</span>\n    <span class=\"token function\">readsect</span><span class=\"token punctuation\">(</span>pa<span class=\"token punctuation\">,</span> offset<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>The four functions defined are:</p>\n<ul>\n<li>void bootmain(void)</li>\n<li>void waitdisk(void)</li>\n<li>void readsect(void *dst, uint offset)</li>\n<li>void readsect(void *dst, uint offset)</li>\n</ul>\n<p>The behavior of each function will be examined later.</p>\n<h3 id=\"overview-of-bootblocko\" style=\"position:relative;\"><a href=\"#overview-of-bootblocko\" aria-label=\"overview of bootblocko permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Overview of bootblock.o</h3>\n<p>Next, <code class=\"language-text\">bootasm.o</code> is generated from <code class=\"language-text\">bootasm.S</code>, and linked together with the previously generated <code class=\"language-text\">bootmain.o</code> to produce <code class=\"language-text\">bootblock.o</code>.</p>\n<p>Here is <code class=\"language-text\">bootasm.S</code>:</p>\n<div class=\"gatsby-highlight\" data-language=\"asm\"><pre class=\"language-asm\"><code class=\"language-asm\">#include &quot;asm.h&quot;\n#include &quot;memlayout.h&quot;\n#include &quot;mmu.h&quot;\n\n# Start the first CPU: switch to 32-bit protected mode, jump into C.\n# The BIOS loads this code from the first sector of the hard disk into\n# memory at physical address 0x7c00 and starts executing in real mode\n# with %cs=0 %ip=7c00.\n\n.code16                       # Assemble for 16-bit mode\n.globl start\nstart:\n  cli                         # BIOS enabled interrupts; disable\n\n  # Zero data segment registers DS, ES, and SS.\n  xorw    %ax,%ax             # Set %ax to zero\n  movw    %ax,%ds             # -&gt; Data Segment\n  movw    %ax,%es             # -&gt; Extra Segment\n  movw    %ax,%ss             # -&gt; Stack Segment\n\n  # Physical address line A20 is tied to zero so that the first PCs \n  # with 2 MB would run software that assumed 1 MB.  Undo that.\nseta20.1:\n  inb     $0x64,%al               # Wait for not busy\n  testb   $0x2,%al\n  jnz     seta20.1\n\n  movb    $0xd1,%al               # 0xd1 -&gt; port 0x64\n  outb    %al,$0x64\n\nseta20.2:\n  inb     $0x64,%al               # Wait for not busy\n  testb   $0x2,%al\n  jnz     seta20.2\n\n  movb    $0xdf,%al               # 0xdf -&gt; port 0x60\n  outb    %al,$0x60\n\n  # Switch from real to protected mode.  Use a bootstrap GDT that makes\n  # virtual addresses map directly to physical addresses so that the\n  # effective memory map doesn&#39;t change during the transition.\n  lgdt    gdtdesc\n  movl    %cr0, %eax\n  orl     $CR0_PE, %eax\n  movl    %eax, %cr0\n\n//PAGEBREAK!\n  # Complete the transition to 32-bit protected mode by using a long jmp\n  # to reload %cs and %eip.  The segment descriptors are set up with no\n  # translation, so that the mapping is still the identity mapping.\n  ljmp    $(SEG_KCODE&lt;&lt;3), $start32\n\n.code32  # Tell assembler to generate 32-bit code now.\nstart32:\n  # Set up the protected-mode data segment registers\n  movw    $(SEG_KDATA&lt;&lt;3), %ax    # Our data segment selector\n  movw    %ax, %ds                # -&gt; DS: Data Segment\n  movw    %ax, %es                # -&gt; ES: Extra Segment\n  movw    %ax, %ss                # -&gt; SS: Stack Segment\n  movw    $0, %ax                 # Zero segments not ready for use\n  movw    %ax, %fs                # -&gt; FS\n  movw    %ax, %gs                # -&gt; GS\n\n  # Set up the stack pointer and call into C.\n  movl    $start, %esp\n  call    bootmain\n\n  # If bootmain returns (it shouldn&#39;t), trigger a Bochs\n  # breakpoint if running under Bochs, then loop.\n  movw    $0x8a00, %ax            # 0x8a00 -&gt; port 0x8a00\n  movw    %ax, %dx\n  outw    %ax, %dx\n  movw    $0x8ae0, %ax            # 0x8ae0 -&gt; port 0x8a00\n  outw    %ax, %dx\nspin:\n  jmp     spin\n\n# Bootstrap GDT\n.p2align 2                                # force 4 byte alignment\ngdt:\n  SEG_NULLASM                             # null seg\n  SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)   # code seg\n  SEG_ASM(STA_W, 0x0, 0xffffffff)         # data seg\n\ngdtdesc:\n  .word   (gdtdesc - gdt - 1)             # sizeof(gdt) - 1\n  .long   gdt                             # address gdt</code></pre></div>\n<p>Before diving into the content, let us look at how the boot program is linked.</p>\n<h3 id=\"linking-the-boot-program\" style=\"position:relative;\"><a href=\"#linking-the-boot-program\" aria-label=\"linking the boot program permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Linking the Boot Program</h3>\n<p>Linking is performed with the following command:</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">ld -m elf_i386 -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o</code></pre></div>\n<p>The <code class=\"language-text\">ld</code> command combines multiple binaries and compiles a new executable program.</p>\n<p>Reference: <a href=\"https://kazmax.zpp.jp/cmd/l/ld.1.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">ld command description - Linux command reference</a></p>\n<p>In the command above, it is linked as an <code class=\"language-text\">elf_i386</code> binary.</p>\n<p>Reference: <a href=\"https://unix.stackexchange.com/questions/471056/gnu-linker-differences-between-the-different-32bit-emulation-modes?rq=1\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">x86 - GNU Linker differences between the different 32bit emulation modes? - Unix &#x26; Linux Stack Exchange</a></p>\n<p>The <code class=\"language-text\">-N</code> option makes both the text section and data section readable/writable, and the <code class=\"language-text\">start</code> symbol is treated as the entry point.</p>\n<p>The starting address of the entry point is defined at <code class=\"language-text\">0x7C00</code>.</p>\n<p>For x86 CPUs, after the BIOS POST (Power On Self Test) runs at startup, the boot program is read from the MBR, loaded at <code class=\"language-text\">0x7C00</code>, and treated as the boot sector.</p>\n<p>The reason this specific address is used is explained in great detail in the following excellent article:</p>\n<p>Reference: <a href=\"https://www.glamenv-septzen.net/view/614\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Assembler/Why is the MBR loaded at “0x7C00” in x86? (Complete edition) - Glamenv-Septzen.net</a></p>\n<p>To summarize briefly: there was a need to reserve the minimum 32 KB of memory required by the ROM BIOS, and the range from 0x0 to 0x3FF is reserved for interrupt vectors. As a result, it was decided to place the boot sector at the end of the 32 KB region.</p>\n<p>Consequently, reserving 512 bytes for the boot sector area and 512 bytes for the MBR bootstrap data/stack region, the starting address of the boot sector became <code class=\"language-text\">0x7C00 (32KB - 1024B)</code>.</p>\n<p>This background was not covered in much detail in the DIY OS books I had read before (or so I recall), so it was very informative.</p>\n<p>Now that the boot program is assembled, let us examine its contents.</p>\n<h3 id=\"real-mode-and-protected-mode\" style=\"position:relative;\"><a href=\"#real-mode-and-protected-mode\" aria-label=\"real mode and protected mode permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Real Mode and Protected Mode</h3>\n<p>x86 CPUs start in “real mode,” which is software-compatible with the <code class=\"language-text\">Intel 8086</code>.</p>\n<p>The <code class=\"language-text\">Intel 8086</code> is a 16-bit processor.</p>\n<p>Therefore, real mode operates as a 16-bit mode.</p>\n<p>Reference: <a href=\"https://en.wikipedia.org/wiki/Intel_8086\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Intel 8086 - Wikipedia</a></p>\n<p>From here, we will trace the steps for transitioning to 32-bit mode.</p>\n<p>The code that runs in real mode is the following:</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">#include &quot;asm.h&quot;\n#include &quot;memlayout.h&quot;\n#include &quot;mmu.h&quot;\n\n# Start the first CPU: switch to 32-bit protected mode, jump into C.\n# The BIOS loads this code from the first sector of the hard disk into\n# memory at physical address 0x7c00 and starts executing in real mode\n# with %cs=0 %ip=7c00.\n\n.code16                       # Assemble for 16-bit mode\n.globl start\nstart:\n  cli                         # BIOS enabled interrupts; disable\n\n  # Zero data segment registers DS, ES, and SS.\n  xorw    %ax,%ax             # Set %ax to zero\n  movw    %ax,%ds             # -&gt; Data Segment\n  movw    %ax,%es             # -&gt; Extra Segment\n  movw    %ax,%ss             # -&gt; Stack Segment\n\n  # Physical address line A20 is tied to zero so that the first PCs \n  # with 2 MB would run software that assumed 1 MB.  Undo that.\nseta20.1:\n  inb     $0x64,%al               # Wait for not busy\n  testb   $0x2,%al\n  jnz     seta20.1\n\n  movb    $0xd1,%al               # 0xd1 -&gt; port 0x64\n  outb    %al,$0x64\n\nseta20.2:\n  inb     $0x64,%al               # Wait for not busy\n  testb   $0x2,%al\n  jnz     seta20.2\n\n  movb    $0xdf,%al               # 0xdf -&gt; port 0x60\n  outb    %al,$0x60\n\n  # Switch from real to protected mode.  Use a bootstrap GDT that makes\n  # virtual addresses map directly to physical addresses so that the\n  # effective memory map doesn&#39;t change during the transition.\n  lgdt    gdtdesc\n  movl    %cr0, %eax\n  orl     $CR0_PE, %eax\n  movl    %eax, %cr0\n\n//PAGEBREAK!\n  # Complete the transition to 32-bit protected mode by using a long jmp\n  # to reload %cs and %eip.  The segment descriptors are set up with no\n  # translation, so that the mapping is still the identity mapping.\n  ljmp    $(SEG_KCODE&lt;&lt;3), $start32\n\n.code32  # Tell assembler to generate 32-bit code now.\nstart32:\n{{ 省略 }}</code></pre></div>\n<h3 id=\"booting-in-real-mode\" style=\"position:relative;\"><a href=\"#booting-in-real-mode\" aria-label=\"booting in real mode permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Booting in Real Mode</h3>\n<p>The <code class=\"language-text\">.code16</code> directive at the top tells the assembler that the code is expected to execute in 16-bit mode.</p>\n<p>The <code class=\"language-text\">start</code> section is the symbol that was linked as the entry point earlier.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">.code16                       # Assemble for 16-bit mode\n.globl start\nstart:\n  cli                         # BIOS enabled interrupts; disable\n\n  # Zero data segment registers DS, ES, and SS.\n  xorw    %ax,%ax             # Set %ax to zero\n  movw    %ax,%ds             # -&gt; Data Segment\n  movw    %ax,%es             # -&gt; Extra Segment\n  movw    %ax,%ss             # -&gt; Stack Segment</code></pre></div>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/32395542/objdump-of-code16-and-code32-x86-assembly\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Objdump of .code16 and .code32 x86 assembly - Stack Overflow</a></p>\n<h3 id=\"disabling-cpu-interrupts-with-cli-and-sti\" style=\"position:relative;\"><a href=\"#disabling-cpu-interrupts-with-cli-and-sti\" aria-label=\"disabling cpu interrupts with cli and sti permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Disabling CPU Interrupts with cli and sti</h3>\n<p><code class=\"language-text\">cli</code> is an instruction that disables CPU interrupts.</p>\n<p>From the point <code class=\"language-text\">cli</code> is called until the <code class=\"language-text\">sti</code> instruction is called, CPU interrupts are disabled. (More precisely, interrupt requests from the CPU are still generated but are ignored.)</p>\n<p>Reference: <a href=\"https://c9x.me/x86/html/file_module_x86_id_31.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">CLI : Clear Interrupt Flag (x86 Instruction Set Reference)</a></p>\n<p>This is because if the interrupts set by the BIOS remain enabled, the boot program’s processing will not work correctly.</p>\n<p>Therefore, while the boot program is setting up the stack pointer and interrupt configuration, interrupts must be disabled.</p>\n<h3 id=\"initializing-segment-registers\" style=\"position:relative;\"><a href=\"#initializing-segment-registers\" aria-label=\"initializing segment registers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Initializing Segment Registers</h3>\n<p>The following four lines initialize the AX, DS, ES, and SS registers to <code class=\"language-text\">0x0000</code>.</p>\n<p>The AX register is the accumulator; the other three are segment registers.</p>\n<p>Reference: <a href=\"https://en.wikipedia.org/wiki/Intel_8086\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Intel 8086 - Wikipedia</a></p>\n<p>Reference: <a href=\"https://qiita.com/timwata/items/e7b7a18cc80b31fd940a\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Intel 8086 CPU Basics - Qiita</a></p>\n<p>Here, the segment register values set by the BIOS are being initialized.</p>\n<p>For reference, here is a brief summary of each segment register’s purpose:</p>\n<table>\n<thead>\n<tr>\n<th align=\"center\">Register</th>\n<th align=\"center\">Purpose</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td align=\"center\">DS register</td>\n<td align=\"center\">Default segment register for data</td>\n</tr>\n<tr>\n<td align=\"center\">ES register</td>\n<td align=\"center\">Segment register for data; normally DS register is used</td>\n</tr>\n<tr>\n<td align=\"center\">SS register</td>\n<td align=\"center\">Segment register for the stack; used with SP/BP memory references</td>\n</tr>\n<tr>\n<td align=\"center\">CS register</td>\n<td align=\"center\">Segment register for code; the instruction pointer (IP) uses the CS register</td>\n</tr>\n</tbody>\n</table>\n<p>Reference: <a href=\"http://www.tamasoft.co.jp/lasm/help/lasm1to2.htm\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">8086 Registers</a></p>\n<h3 id=\"enabling-the-a20-line\" style=\"position:relative;\"><a href=\"#enabling-the-a20-line\" aria-label=\"enabling the a20 line permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Enabling the A20 Line</h3>\n<p>In the <code class=\"language-text\">Intel 8086</code>, the A20 Line (bit 21 of memory access) is disabled by default for backward compatibility.</p>\n<p>Therefore, to access up to 2 MB of memory, the A20 Line must be enabled.</p>\n<p>The A20 Line is initially connected to the KBC (Keyboard Controller).</p>\n<p>The KBC is a mechanism for transmitting keyboard input to the CPU.</p>\n<p>The KBC receives information from the keyboard via serial communication, buffers it, and then checks whether it is a KBC control command or input data to be forwarded to the CPU.</p>\n<p>Data to be forwarded to the CPU goes through port <code class=\"language-text\">0x60</code>; control commands go through port <code class=\"language-text\">0x64</code>.</p>\n<p>In the following code, each step first confirms that the KBC buffer has no pending input, then sends control commands to ports <code class=\"language-text\">0x60</code> and <code class=\"language-text\">0x64</code> to enable A20.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">  # Physical address line A20 is tied to zero so that the first PCs \n  # with 2 MB would run software that assumed 1 MB.  Undo that.\nseta20.1:\n  inb     $0x64,%al               # Wait for not busy\n  testb   $0x2,%al\n  jnz     seta20.1\n\n  movb    $0xd1,%al               # 0xd1 -&gt; port 0x64\n  outb    %al,$0x64\n\nseta20.2:\n  inb     $0x64,%al               # Wait for not busy\n  testb   $0x2,%al\n  jnz     seta20.2\n\n  movb    $0xdf,%al               # 0xdf -&gt; port 0x60\n  outb    %al,$0x60</code></pre></div>\n<p>In the example above, control command <code class=\"language-text\">0xd1</code> is sent to port <code class=\"language-text\">0x64</code>, followed by <code class=\"language-text\">0xdf</code> being sent to port <code class=\"language-text\">0x60</code>, which enables A20.</p>\n<p>Reference: <a href=\"https://wiki.osdev.org/A20_Line\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">A20 Line - OSDev Wiki</a></p>\n<p>Reference: <a href=\"https://cstmize.hatenablog.jp/entry/2019/06/11/A20_gate%E3%81%A8keyboard_controller%E3%81%A8%E3%81%AE%E3%82%84%E3%82%8A%E3%81%A8%E3%82%8A%28xv6%E3%82%92%E4%BE%8B%E3%81%AB%E3%81%97%E3%81%A6%29\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">A20 gate and keyboard controller (using xv6 as example) - 私のひらめき日記</a></p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/15768683/the-a20-line-with-jos\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">assembly - The A20 Line with JOS - Stack Overflow</a></p>\n<h3 id=\"switching-to-protected-mode\" style=\"position:relative;\"><a href=\"#switching-to-protected-mode\" aria-label=\"switching to protected mode permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Switching to Protected Mode</h3>\n<p>From here, the program switches to protected mode.</p>\n<p>Unlike real mode, protected mode provides memory protection — programs can only access memory regions they are permitted to access.</p>\n<p>Therefore, when transitioning from real mode to protected mode, the memory regions accessible to the kernel being loaded must be defined in advance.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">  # Switch from real to protected mode.  Use a bootstrap GDT that makes\n  # virtual addresses map directly to physical addresses so that the\n  # effective memory map doesn&#39;t change during the transition.\n  lgdt    gdtdesc\n  movl    %cr0, %eax\n  orl     $CR0_PE, %eax\n  movl    %eax, %cr0\n\n//PAGEBREAK!\n  # Complete the transition to 32-bit protected mode by using a long jmp\n  # to reload %cs and %eip.  The segment descriptors are set up with no\n  # translation, so that the mapping is still the identity mapping.\n  ljmp    $(SEG_KCODE&lt;&lt;3), $start32\n\n.code32  # Tell assembler to generate 32-bit code now.\nstart32:</code></pre></div>\n<p>The actual switch to protected mode happens in these lines:</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">movl    %cr0, %eax\norl     $CR0_PE, %eax\nmovl    %eax, %cr0</code></pre></div>\n<p>On x86 CPUs, enabling protected mode requires setting the PE flag in the control register to 1.</p>\n<p>The assembly code above uses an <code class=\"language-text\">or</code> operation to set the PE flag of the control register to 1.</p>\n<p>This completes the transition to protected mode.</p>\n<p> The GDT must be initialized before this step.</p>\n<p>Reference: <a href=\"https://en.wikipedia.org/wiki/Control_register\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Control register - Wikipedia</a></p>\n<h3 id=\"memory-address-references-in-protected-mode\" style=\"position:relative;\"><a href=\"#memory-address-references-in-protected-mode\" aria-label=\"memory address references in protected mode permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Memory Address References in Protected Mode</h3>\n<p>In protected mode, the GDT (Global Descriptor Table) is used for memory address references.</p>\n<p>The GDT mechanism is a larger topic, so I have summarized it as a separate article.</p>\n<p>Reference: <a href=\"/linux-got-plt\">Notes on x86 CPU Memory Protection (GDT and LDT)</a></p>\n<h3 id=\"lgdt-gdtdesc\" style=\"position:relative;\"><a href=\"#lgdt-gdtdesc\" aria-label=\"lgdt gdtdesc permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>lgdt gdtdesc</h3>\n<p>The <code class=\"language-text\">lgdt</code> instruction registers a GDT data structure into the GDTR.</p>\n<p>The <code class=\"language-text\">gdtdesc</code> stored here is the following label defined in <code class=\"language-text\">bootasm.S</code>:</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\"># Bootstrap GDT\n.p2align 2                                # force 4 byte alignment\ngdt:\n  SEG_NULLASM                             # null seg\n  SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)   # code seg\n  SEG_ASM(STA_W, 0x0, 0xffffffff)         # data seg\n\ngdtdesc:\n  .word   (gdtdesc - gdt - 1)             # sizeof(gdt) - 1\n  .long   gdt                             # address gdt</code></pre></div>\n<p><code class=\"language-text\">.p2align 2</code> forces the immediately following instruction or data to be placed on a 4-byte boundary.</p>\n<p>This means data is placed starting at an address that is a multiple of 4.</p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/2846914/what-is-meant-by-memory-is-8-bytes-aligned\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">c - What is meant by “memory is 8 bytes aligned”? - Stack Overflow</a></p>\n<p>The next line attaches the <code class=\"language-text\">gdt</code> label to the line where the <code class=\"language-text\">SEG_NULLASM</code> macro and others are placed.</p>\n<p>These macros are defined in <code class=\"language-text\">asm.h</code> as follows:</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">//\n// assembler macros to create x86 segments\n//\n\n#define SEG_NULLASM                                             \\\n        .word 0, 0;                                             \\\n        .byte 0, 0, 0, 0\n\n// The 0xC0 means the limit is in 4096-byte units\n// and (for executable segments) 32-bit mode.\n#define SEG_ASM(type,base,lim)                                  \\\n        .word (((lim) &gt;&gt; 12) &amp; 0xffff), ((base) &amp; 0xffff);      \\\n        .byte (((base) &gt;&gt; 16) &amp; 0xff), (0x90 | (type)),         \\\n                (0xC0 | (((lim) &gt;&gt; 28) &amp; 0xf)), (((base) &gt;&gt; 24) &amp; 0xff)\n\n#define STA_X     0x8       // Executable segment\n#define STA_W     0x2       // Writeable (non-executable segments)\n#define STA_R     0x2       // Readable (executable segments)</code></pre></div>\n<p>The GDT in an x86 CPU is basically a structure consisting of multiple 8-byte descriptors placed consecutively.</p>\n<p>Reference: <a href=\"https://en.wikipedia.org/wiki/Global_Descriptor_Table\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Global Descriptor Table - Wikipedia</a></p>\n<p>Reference: <a href=\"https://amzn.to/3qXYsZX\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Writing an OS from Scratch (ゼロからのOS自作入門)</a></p>\n<p>The following is the descriptor structure introduced in <a href=\"https://amzn.to/3qZSCY7\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">30-Day OS Development (30日でできる! OS自作入門)</a>.</p>\n<p>It is easier to understand than the xv6 code, so I am including it here:</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token keyword\">struct</span> <span class=\"token class-name\">SEGMENT_DESCRIPTOR</span><span class=\"token punctuation\">{</span>\n    <span class=\"token keyword\">short</span> limit_low<span class=\"token punctuation\">,</span> base_low<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">char</span> base_mid<span class=\"token punctuation\">,</span> access_right<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">char</span> limit_high<span class=\"token punctuation\">,</span> base_high<span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span><span class=\"token punctuation\">;</span></code></pre></div>\n<p>Reference: <a href=\"https://amzn.to/3qZSCY7\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">30-Day OS Development (30日でできる! OS自作入門)</a></p>\n<p>At the beginning of the GDT (the first descriptor), a null descriptor with all values set to 0 is placed.</p>\n<p>This is never referenced by the system.</p>\n<p>The null descriptor is used to invalidate segment registers.</p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/37861691/why-x86-processor-need-a-null-descriptor-in-gdt\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Why x86 processor need a NULL descriptor in GDT? - Stack Overflow</a></p>\n<p>The second descriptor defines the code segment descriptor, and the third defines the data segment descriptor.</p>\n<p>The code segment is granted read and execute permissions; the data segment is granted write permission.</p>\n<p>Finally, using the address of the GDT created by these macros, the <code class=\"language-text\">lgdt gdtdesc</code> instruction initializes the GDTR.</p>\n<p>The GDTR holds a 48-bit value.</p>\n<p>The upper 32 bits hold the starting address of the GDT (the <code class=\"language-text\">gdt</code> label).</p>\n<p>The lower 16 bits hold the limit value (the number of GDT entries).</p>\n<p>Reference: <a href=\"https://yz2cm.hatenadiary.org/entry/20140502/1399006500\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">GDTR (Global Descriptor Table Register) - ゆずさん研究所</a></p>\n<p>When a program uses a descriptor such as LDT, it is referenced as an offset from the starting address set in the GDTR.</p>\n<p>This completes the GDTR initialization.</p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/67901342/why-in-xv6-theres-sizeofgdt-1-in-gdtdesc\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">assembly - Why in xv6 there’s sizeof(gdt)-1 in gdtdesc - Stack Overflow</a></p>\n<p>Reference: <a href=\"https://jupiteroak.hatenablog.com/entry/2021/12/10/073000\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">OS boot series ⑮-2 entryother.S (Reading Unix xv6 ~ OS Code Reading ~) - 野良プログラマーのCS日記</a></p>\n<h3 id=\"starting-32-bit-mode\" style=\"position:relative;\"><a href=\"#starting-32-bit-mode\" aria-label=\"starting 32 bit mode permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Starting 32-bit Mode</h3>\n<p>With GDTR initialization and the transition to protected mode complete, the system now operates in 32-bit mode.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">//PAGEBREAK!\n  # Complete the transition to 32-bit protected mode by using a long jmp\n  # to reload %cs and %eip.  The segment descriptors are set up with no\n  # translation, so that the mapping is still the identity mapping.\n  ljmp    $(SEG_KCODE&lt;&lt;3), $start32\n\n.code32  # Tell assembler to generate 32-bit code now.\nstart32:</code></pre></div>\n<p>The <code class=\"language-text\">ljmp</code> instruction takes a segment selector as its first operand and an offset address (the <code class=\"language-text\">start32</code> label) as its second operand, then jumps to the address corresponding to the segment base + offset for the given selector.</p>\n<p><code class=\"language-text\">SEG_KCODE</code> is defined in <code class=\"language-text\">mmu.h</code> as follows:</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// various segment selectors.</span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">SEG_KCODE</span> <span class=\"token expression\"><span class=\"token number\">1</span>  </span><span class=\"token comment\">// kernel code</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">SEG_KDATA</span> <span class=\"token expression\"><span class=\"token number\">2</span>  </span><span class=\"token comment\">// kernel data+stack</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">SEG_UCODE</span> <span class=\"token expression\"><span class=\"token number\">3</span>  </span><span class=\"token comment\">// user code</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">SEG_UDATA</span> <span class=\"token expression\"><span class=\"token number\">4</span>  </span><span class=\"token comment\">// user data+stack</span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">SEG_TSS</span>   <span class=\"token expression\"><span class=\"token number\">5</span>  </span><span class=\"token comment\">// this process's task state</span></span></code></pre></div>\n<p>From the above, the segment selector defined by <code class=\"language-text\">$(SEG_KCODE&lt;&lt;3)</code> is <code class=\"language-text\">0b1000</code>.</p>\n<p>This points to the second segment in the GDT, which is the code segment.</p>\n<p>A segment selector is 16 bits, as described on the following page: the upper 13 bits hold the descriptor index (index from the beginning of the GDT), and the lower 3 bits hold the Table Indicator (TI) and Requestor Privilege Level (RPL).</p>\n<p>Reference: <a href=\"https://yz2cm.hatenadiary.org/entry/20140502/1399012324\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Segment Selector - ゆずさん研究所</a></p>\n<p>When TI is 0, the GDT is referenced. (When TI is 1, the LDT is referenced.)</p>\n<p>When RPL is 0, it indicates privileged access.</p>\n<p>Here, the segment selector <code class=\"language-text\">0b1000</code> defined by <code class=\"language-text\">$(SEG_KCODE&lt;&lt;3)</code> represents a segment register with index 1, TI 0, and RPL 0.</p>\n<h3 id=\"why-use-the-ljmp-instruction\" style=\"position:relative;\"><a href=\"#why-use-the-ljmp-instruction\" aria-label=\"why use the ljmp instruction permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Why Use the ljmp Instruction?</h3>\n<p>This code raised a question for me.</p>\n<p>Once the <code class=\"language-text\">%cr0</code> setting is complete and the transition to protected mode has been performed, it should be possible to call the <code class=\"language-text\">start32</code> processing without explicitly using the <code class=\"language-text\">ljmp</code> instruction.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">  lgdt    gdtdesc\n  movl    %cr0, %eax\n  orl     $CR0_PE, %eax\n  movl    %eax, %cr0\n  ljmp    $(SEG_KCODE&lt;&lt;3), $start32\n\n.code32  # Tell assembler to generate 32-bit code now.\nstart32:\n{{ 省略 }}</code></pre></div>\n<p>The reason <code class=\"language-text\">ljmp</code> is explicitly used here is to discard the instructions that the CPU pre-fetched from memory while still operating in real mode.</p>\n<p>CPUs have a mechanism called a pipeline to execute instructions at high speed, which pre-fetches the next instruction.</p>\n<p>However, when transitioning to protected mode, the interpretation of machine code changes from real mode, so this pipeline must be reset.</p>\n<p>By calling the <code class=\"language-text\">ljmp</code> instruction, the values of the <code class=\"language-text\">cs</code> register and the <code class=\"language-text\">eip</code> register are reloaded.</p>\n<h3 id=\"post-protected-mode-setup\" style=\"position:relative;\"><a href=\"#post-protected-mode-setup\" aria-label=\"post protected mode setup permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Post-Protected-Mode Setup</h3>\n<p>We are almost done tracing all of <code class=\"language-text\">bootasm.S</code>.</p>\n<p>The last behavior to examine is the following:</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">.code32  # Tell assembler to generate 32-bit code now.\nstart32:\n  # Set up the protected-mode data segment registers\n  movw    $(SEG_KDATA&lt;&lt;3), %ax    # Our data segment selector\n  movw    %ax, %ds                # -&gt; DS: Data Segment\n  movw    %ax, %es                # -&gt; ES: Extra Segment\n  movw    %ax, %ss                # -&gt; SS: Stack Segment\n  movw    $0, %ax                 # Zero segments not ready for use\n  movw    %ax, %fs                # -&gt; FS\n  movw    %ax, %gs                # -&gt; GS\n\n  # Set up the stack pointer and call into C.\n  movl    $start, %esp\n  call    bootmain\n\n  # If bootmain returns (it shouldn&#39;t), trigger a Bochs\n  # breakpoint if running under Bochs, then loop.\n  movw    $0x8a00, %ax            # 0x8a00 -&gt; port 0x8a00\n  movw    %ax, %dx\n  outw    %ax, %dx\n  movw    $0x8ae0, %ax            # 0x8ae0 -&gt; port 0x8a00\n  outw    %ax, %dx\nspin:\n  jmp     spin</code></pre></div>\n<h3 id=\"initializing-segment-registers-1\" style=\"position:relative;\"><a href=\"#initializing-segment-registers-1\" aria-label=\"initializing segment registers 1 permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Initializing Segment Registers</h3>\n<p>Here, segment registers other than CS are initialized.</p>\n<p>Having transitioned from real mode to protected mode, the usage of segment selectors has changed.</p>\n<p>Specifically, in real mode, memory address references used a scheme of multiplying the segment portion by 16 and adding it to the offset; in protected mode, this changes to referencing segment descriptors.</p>\n<p>Reference: <a href=\"https://atmarkit.itmedia.co.jp/icd/root/02/5785802.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Insider’s Computer Dictionary: What is 8086? - @IT</a></p>\n<p>Therefore, in protected mode, segment selectors must be stored in segment registers.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">start32:\n  # Set up the protected-mode data segment registers\n  movw    $(SEG_KDATA&lt;&lt;3), %ax    # Our data segment selector\n  movw    %ax, %ds                # -&gt; DS: Data Segment\n  movw    %ax, %es                # -&gt; ES: Extra Segment\n  movw    %ax, %ss                # -&gt; SS: Stack Segment\n  \n  movw    $0, %ax                 # Zero segments not ready for use\n  movw    %ax, %fs                # -&gt; FS\n  movw    %ax, %gs                # -&gt; GS</code></pre></div>\n<p>The segment selector is set by <code class=\"language-text\">$(SEG_KDATA&lt;&lt;3)</code>.</p>\n<p><code class=\"language-text\">SEG_KDATA</code> was defined as 2 in <code class=\"language-text\">mmu.h</code>.</p>\n<p>Therefore, <code class=\"language-text\">$(SEG_KDATA&lt;&lt;3)</code> becomes <code class=\"language-text\">0b10000</code>.</p>\n<p>This is the selector that specifies the third segment descriptor defined in the GDT, <code class=\"language-text\">SEG_ASM(STA_W, 0x0, 0xffffffff)</code>.</p>\n<p>Using these values, the DS, ES, and SS segment registers are initialized.</p>\n<p>FS and GS are set to 0.</p>\n<p>When a segment selector is 0, the segment register is invalidated.</p>\n<h3 id=\"calling-bootmainc\" style=\"position:relative;\"><a href=\"#calling-bootmainc\" aria-label=\"calling bootmainc permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Calling bootmain.c</h3>\n<p>From here, processing moves to <code class=\"language-text\">bootmain.c</code>.</p>\n<p>As seen earlier, <code class=\"language-text\">$start</code> is placed at <code class=\"language-text\">0x7C00</code>.</p>\n<p>That is, the stack pointer (the base address for the stack) is <code class=\"language-text\">0x7C00</code>.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\"># Set up the stack pointer and call into C.\nmovl    $start, %esp\ncall    bootmain</code></pre></div>\n<p>Based on the understanding so far, this diagram shows the layout (please correct me if I am wrong):</p>\n<p><img src=\"https://yukituna.com/wp-content/uploads/2022/01/image-15.png\" alt=\"https://yukituna.com/wp-content/uploads/2022/01/image-15.png\"></p>\n<p>Note: the following code handles what happens if <code class=\"language-text\">bootmain.c</code> returns (which it should not), so it is omitted here.</p>\n<div class=\"gatsby-highlight\" data-language=\"assembly\"><pre class=\"language-assembly\"><code class=\"language-assembly\">  # If bootmain returns (it shouldn&#39;t), trigger a Bochs\n  # breakpoint if running under Bochs, then loop.\n  movw    $0x8a00, %ax            # 0x8a00 -&gt; port 0x8a00\n  movw    %ax, %dx\n  outw    %ax, %dx\n  movw    $0x8ae0, %ax            # 0x8ae0 -&gt; port 0x8a00\n  outw    %ax, %dx\nspin:\n  jmp     spin</code></pre></div>\n<h3 id=\"loading-the-kernel\" style=\"position:relative;\"><a href=\"#loading-the-kernel\" aria-label=\"loading the kernel permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Loading the Kernel</h3>\n<p>The transition to protected mode is now complete, and processing has switched to <code class=\"language-text\">bootmain.c</code>.</p>\n<p><code class=\"language-text\">bootmain.c</code> defines the following four functions:</p>\n<ul>\n<li>void bootmain(void)</li>\n<li>void waitdisk(void)</li>\n<li>void readsect(void *dst, uint offset)</li>\n<li>void readsect(void *dst, uint offset)</li>\n</ul>\n<p>From here, we will trace the behavior of <code class=\"language-text\">bootmain.c</code>.</p>\n<h3 id=\"loading-the-elf-kernel-image-from-disk\" style=\"position:relative;\"><a href=\"#loading-the-elf-kernel-image-from-disk\" aria-label=\"loading the elf kernel image from disk permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Loading the ELF Kernel Image from Disk</h3>\n<p><code class=\"language-text\">bootmain()</code> is the function responsible for loading the ELF kernel image from disk.</p>\n<p>Let us walk through it from the beginning.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token keyword\">void</span> <span class=\"token function\">bootmain</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  <span class=\"token keyword\">struct</span> <span class=\"token class-name\">elfhdr</span> <span class=\"token operator\">*</span>elf<span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">struct</span> <span class=\"token class-name\">proghdr</span> <span class=\"token operator\">*</span>ph<span class=\"token punctuation\">,</span> <span class=\"token operator\">*</span>eph<span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">void</span> <span class=\"token punctuation\">(</span><span class=\"token operator\">*</span>entry<span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  uchar<span class=\"token operator\">*</span> pa<span class=\"token punctuation\">;</span>\n\n  elf <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">struct</span> <span class=\"token class-name\">elfhdr</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token number\">0x10000</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// scratch space</span>\n\n  <span class=\"token comment\">// Read 1st page off disk</span>\n  <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>elf<span class=\"token punctuation\">,</span> <span class=\"token number\">4096</span><span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// Is this an ELF executable?</span>\n  <span class=\"token keyword\">if</span><span class=\"token punctuation\">(</span>elf<span class=\"token operator\">-></span>magic <span class=\"token operator\">!=</span> ELF_MAGIC<span class=\"token punctuation\">)</span>\n    <span class=\"token keyword\">return</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// let bootasm.S handle error</span>\n\n  <span class=\"token comment\">// Load each program segment (ignores ph flags).</span>\n  ph <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">struct</span> <span class=\"token class-name\">proghdr</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>elf <span class=\"token operator\">+</span> elf<span class=\"token operator\">-></span>phoff<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  eph <span class=\"token operator\">=</span> ph <span class=\"token operator\">+</span> elf<span class=\"token operator\">-></span>phnum<span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">for</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">;</span> ph <span class=\"token operator\">&lt;</span> eph<span class=\"token punctuation\">;</span> ph<span class=\"token operator\">++</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">{</span>\n    pa <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>ph<span class=\"token operator\">-></span>paddr<span class=\"token punctuation\">;</span>\n    <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span>pa<span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>off<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">if</span><span class=\"token punctuation\">(</span>ph<span class=\"token operator\">-></span>memsz <span class=\"token operator\">></span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">)</span>\n      <span class=\"token function\">stosb</span><span class=\"token punctuation\">(</span>pa <span class=\"token operator\">+</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>memsz <span class=\"token operator\">-</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token punctuation\">}</span>\n\n  <span class=\"token comment\">// Call the entry point from the ELF header.</span>\n  <span class=\"token comment\">// Does not return!</span>\n  entry <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">(</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span>elf<span class=\"token operator\">-></span>entry<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">entry</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>The structs <code class=\"language-text\">elfhdr</code> and <code class=\"language-text\">proghdr</code> declared at the top are both defined in <code class=\"language-text\">elf.h</code>:</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Format of an ELF executable file</span>\n\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">ELF_MAGIC</span> <span class=\"token expression\"><span class=\"token number\">0x464C457FU</span>  </span><span class=\"token comment\">// \"\\x7FELF\" in little endian</span></span>\n\n<span class=\"token comment\">// File header</span>\n<span class=\"token keyword\">struct</span> <span class=\"token class-name\">elfhdr</span> <span class=\"token punctuation\">{</span>\n  uint magic<span class=\"token punctuation\">;</span>  <span class=\"token comment\">// must equal ELF_MAGIC</span>\n  uchar elf<span class=\"token punctuation\">[</span><span class=\"token number\">12</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">;</span>\n  ushort type<span class=\"token punctuation\">;</span>\n  ushort machine<span class=\"token punctuation\">;</span>\n  uint version<span class=\"token punctuation\">;</span>\n  uint entry<span class=\"token punctuation\">;</span>\n  uint phoff<span class=\"token punctuation\">;</span>\n  uint shoff<span class=\"token punctuation\">;</span>\n  uint flags<span class=\"token punctuation\">;</span>\n  ushort ehsize<span class=\"token punctuation\">;</span>\n  ushort phentsize<span class=\"token punctuation\">;</span>\n  ushort phnum<span class=\"token punctuation\">;</span>\n  ushort shentsize<span class=\"token punctuation\">;</span>\n  ushort shnum<span class=\"token punctuation\">;</span>\n  ushort shstrndx<span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span><span class=\"token punctuation\">;</span>\n\n<span class=\"token comment\">// Program section header</span>\n<span class=\"token keyword\">struct</span> <span class=\"token class-name\">proghdr</span> <span class=\"token punctuation\">{</span>\n  uint type<span class=\"token punctuation\">;</span>\n  uint off<span class=\"token punctuation\">;</span>\n  uint vaddr<span class=\"token punctuation\">;</span>\n  uint paddr<span class=\"token punctuation\">;</span>\n  uint filesz<span class=\"token punctuation\">;</span>\n  uint memsz<span class=\"token punctuation\">;</span>\n  uint flags<span class=\"token punctuation\">;</span>\n  uint align<span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span><span class=\"token punctuation\">;</span>\n\n<span class=\"token comment\">// Values for Proghdr type</span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">ELF_PROG_LOAD</span>           <span class=\"token expression\"><span class=\"token number\">1</span></span></span>\n\n<span class=\"token comment\">// Flag bits for Proghdr flags</span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">ELF_PROG_FLAG_EXEC</span>      <span class=\"token expression\"><span class=\"token number\">1</span></span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">ELF_PROG_FLAG_WRITE</span>     <span class=\"token expression\"><span class=\"token number\">2</span></span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">ELF_PROG_FLAG_READ</span>      <span class=\"token expression\"><span class=\"token number\">4</span></span></span></code></pre></div>\n<p>The details of this struct are covered by the following page, so I will skip them here.</p>\n<p>Reference: <a href=\"https://en.wikipedia.org/wiki/Executable_and_Linkable_Format\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Executable and Linkable Format - Wikipedia</a></p>\n<p>The line <code class=\"language-text\">void (*entry)(void);</code> declares a function pointer.</p>\n<p>Ultimately, it retrieves the entry point from the loaded ELF header and calls it.</p>\n<h3 id=\"reading-sectors-from-disk\" style=\"position:relative;\"><a href=\"#reading-sectors-from-disk\" aria-label=\"reading sectors from disk permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Reading Sectors from Disk</h3>\n<p>Next, let us look at the following section:</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\">elf <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">struct</span> <span class=\"token class-name\">elfhdr</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token number\">0x10000</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// scratch space</span>\n\n<span class=\"token comment\">// Read 1st page off disk</span>\n<span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>elf<span class=\"token punctuation\">,</span> <span class=\"token number\">4096</span><span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span></code></pre></div>\n<p>I was initially unclear about what <code class=\"language-text\">elf = (struct elfhdr*)0x10000;</code> was doing. It casts the region starting at address <code class=\"language-text\">0x10000</code> as an <code class=\"language-text\">elfhdr</code> struct, allowing access to that region through the pointer variable <code class=\"language-text\">elf</code>.</p>\n<p>This pointer variable <code class=\"language-text\">elf</code> is then cast to an <code class=\"language-text\">unsigned char</code> pointer on the next line and passed as the first argument to the <code class=\"language-text\">readseg</code> function.</p>\n<p>Reference: <a href=\"https://vmm.dev/ja/lowlevel/xv6/xv6-1.md\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Reading the xv6 Boot Loader</a></p>\n<p>The line <code class=\"language-text\">readseg((uchar*)elf, 4096, 0);</code> loads 4096 bytes of data into the address <code class=\"language-text\">(uchar*)elf</code>.</p>\n<p>The reason 4096 bytes are read even though an ELF header is only 52 bytes is explained in the following reference:</p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/64795450/xv6-bootmain-loading-kernel-elf-header\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">x86 - XV6: bootmain - loading kernel ELF header - Stack Overflow</a></p>\n<p>The background is that since the combined size of the ELF header and program headers is unknown at call time, one full page of data is read with the expectation that the ELF header and program headers will fit within 4 KB.</p>\n<p> The page size for x86 CPUs is 4096 bytes (4 KB) (note: the original source contains a typo of 4069).</p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/11543748/why-is-the-page-size-of-linux-x86-4-kb-how-is-that-calculated\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Why is the page size of Linux (x86) 4 KB, how is that calculated? - Stack Overflow</a></p>\n<p>The <code class=\"language-text\">readseg</code> function looks like this.</p>\n<p>Internally, the <code class=\"language-text\">readsect</code> function reads 4096 bytes of data from the second sector (sector 1) one sector at a time from disk and writes it to the <code class=\"language-text\">uchar* pa</code> region.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Read 'count' bytes at 'offset' from kernel into physical address 'pa'.</span>\n<span class=\"token comment\">// Might copy more than asked.</span>\n<span class=\"token keyword\">void</span> <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span> pa<span class=\"token punctuation\">,</span> uint count<span class=\"token punctuation\">,</span> uint offset<span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  uchar<span class=\"token operator\">*</span> epa<span class=\"token punctuation\">;</span>\n  epa <span class=\"token operator\">=</span> pa <span class=\"token operator\">+</span> count<span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// Round down to sector boundary.</span>\n  pa <span class=\"token operator\">-=</span> offset <span class=\"token operator\">%</span> SECTSIZE<span class=\"token punctuation\">;</span>\n  \n  <span class=\"token comment\">// Translate from bytes to sectors; kernel starts at sector 1.</span>\n  offset <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span>offset <span class=\"token operator\">/</span> SECTSIZE<span class=\"token punctuation\">)</span> <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">;</span>\n\n  <span class=\"token comment\">// If this is too slow, we could read lots of sectors at a time.</span>\n  <span class=\"token comment\">// We'd write more to memory than asked, but it doesn't matter --</span>\n  <span class=\"token comment\">// we load in increasing order.</span>\n  <span class=\"token keyword\">for</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">;</span> pa <span class=\"token operator\">&lt;</span> epa<span class=\"token punctuation\">;</span> pa <span class=\"token operator\">+=</span> SECTSIZE<span class=\"token punctuation\">,</span> offset<span class=\"token operator\">++</span><span class=\"token punctuation\">)</span>\n    <span class=\"token function\">readsect</span><span class=\"token punctuation\">(</span>pa<span class=\"token punctuation\">,</span> offset<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>Here is the <code class=\"language-text\">readsect</code> function:</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Read a single sector at offset into dst.</span>\n<span class=\"token keyword\">void</span> <span class=\"token function\">readsect</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span> <span class=\"token operator\">*</span>dst<span class=\"token punctuation\">,</span> uint offset<span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  <span class=\"token comment\">// Issue command.</span>\n  <span class=\"token function\">waitdisk</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F2</span><span class=\"token punctuation\">,</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>   <span class=\"token comment\">// count = 1</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F3</span><span class=\"token punctuation\">,</span> offset<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F4</span><span class=\"token punctuation\">,</span> offset <span class=\"token operator\">>></span> <span class=\"token number\">8</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F5</span><span class=\"token punctuation\">,</span> offset <span class=\"token operator\">>></span> <span class=\"token number\">16</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F6</span><span class=\"token punctuation\">,</span> <span class=\"token punctuation\">(</span>offset <span class=\"token operator\">>></span> <span class=\"token number\">24</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">|</span> <span class=\"token number\">0xE0</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">outb</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F7</span><span class=\"token punctuation\">,</span> <span class=\"token number\">0x20</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// cmd 0x20 - read sectors</span>\n\n  <span class=\"token comment\">// Read data.</span>\n  <span class=\"token function\">waitdisk</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token function\">insl</span><span class=\"token punctuation\">(</span><span class=\"token number\">0x1F0</span><span class=\"token punctuation\">,</span> dst<span class=\"token punctuation\">,</span> SECTSIZE<span class=\"token operator\">/</span><span class=\"token number\">4</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>This type of disk access uses a method called <code class=\"language-text\">Cylinder-head-sector (CHS)</code>.</p>\n<p>Modern operating systems (probably) do not implement disk reads this way, so I will skip the details of CHS.</p>\n<p>Reference: <a href=\"https://en.wikipedia.org/wiki/Cylinder-head-sector\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Cylinder-head-sector - Wikipedia</a></p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/61028931/xv6-boot-loader-reading-sectors-off-disk-using-chs\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">c - xv6 boot loader: Reading sectors off disk using CHS - Stack Overflow</a></p>\n<p>The reason the line <code class=\"language-text\">offset = (offset / SECTSIZE) + 1;</code> reads from the second sector (sector 1) is that, as confirmed in the Makefile section, the kernel program is placed at offset 512 bytes into the image.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 500px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/0ee723f0df4390ef428db10febeb6cfd/0b533/image-8-16455921223972.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 75%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAARlAAAEZQAGA43XUAAABiElEQVQ4y8WU6XaCMBBGff8Hah+gdtGiApGqaCsgFgmgHDZ3viZxpaL1X3POPWFmwiXDVsF+bLdbRFFUII5jJEki4Me/65vNRpyb5/lBg0oQ+IjY4vVqhW9KQcwRNMsWEMNCo9dHUx+ADM1Tnq2xnAm2ZcLlcon1es12uIE9cVHTh3gfGJAGJl47AzxUa3h8qeNZ0yF9mixviDXDsYOcdXUhxNlwPAeNLxmyQRiqmMlIEygmQWufaw5lGI5ZLuTBIeGHFF0qo+8T6B5HFce7WN3nCHqeAtszWcsn4YGC0Ju5+JjI6LkEXVdlEKijBhRLOsacjsuEtCgstHwhpDshn+t6FW/dp13MZL0rwuMOC8LQFYt1ry1knL6vCQ4xr3WpgrFnYZ4tQKmLsW0jCKbgD7ggpNMJa1GCNm6hfaS5ZxfzGrElWK7BdpjfbjlOYqTzBOkiZSTlsHrG6mE4E6/bzZazbI57B/96zoWlO8yyrHC1a4hu+Nf1L8J7x13CNE3FveHzX1z72/wARuV7fAc3xmkAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/0ee723f0df4390ef428db10febeb6cfd/8ac56/image-8-16455921223972.webp 240w,\n/static/0ee723f0df4390ef428db10febeb6cfd/d3be9/image-8-16455921223972.webp 480w,\n/static/0ee723f0df4390ef428db10febeb6cfd/b0a15/image-8-16455921223972.webp 500w\"\n              sizes=\"(max-width: 500px) 100vw, 500px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/0ee723f0df4390ef428db10febeb6cfd/8ff5a/image-8-16455921223972.png 240w,\n/static/0ee723f0df4390ef428db10febeb6cfd/e85cb/image-8-16455921223972.png 480w,\n/static/0ee723f0df4390ef428db10febeb6cfd/0b533/image-8-16455921223972.png 500w\"\n            sizes=\"(max-width: 500px) 100vw, 500px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/0ee723f0df4390ef428db10febeb6cfd/0b533/image-8-16455921223972.png\"\n            alt=\"img\"\n            title=\"img\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>In xv6 OS, as noted by <code class=\"language-text\">#define SECTSIZE  512</code>, one sector is defined as 512 bytes.</p>\n<p>Therefore, the first byte of the kernel is expected to be at the beginning of sector 2.</p>\n<h3 id=\"verifying-the-loaded-kernel\" style=\"position:relative;\"><a href=\"#verifying-the-loaded-kernel\" aria-label=\"verifying the loaded kernel permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Verifying the Loaded Kernel</h3>\n<p>The next step checks whether the kernel was loaded correctly by inspecting the magic number.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Is this an ELF executable?</span>\n<span class=\"token keyword\">if</span><span class=\"token punctuation\">(</span>elf<span class=\"token operator\">-></span>magic <span class=\"token operator\">!=</span> ELF_MAGIC<span class=\"token punctuation\">)</span>\n\t<span class=\"token keyword\">return</span><span class=\"token punctuation\">;</span>  <span class=\"token comment\">// let bootasm.S handle error</span></code></pre></div>\n<h3 id=\"loading-program-headers\" style=\"position:relative;\"><a href=\"#loading-program-headers\" aria-label=\"loading program headers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Loading Program Headers</h3>\n<p>Next, the program headers are loaded.</p>\n<p>The behavior is essentially the same as when the kernel was loaded.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Load each program segment (ignores ph flags).</span>\nph <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">struct</span> <span class=\"token class-name\">proghdr</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>elf <span class=\"token operator\">+</span> elf<span class=\"token operator\">-></span>phoff<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\neph <span class=\"token operator\">=</span> ph <span class=\"token operator\">+</span> elf<span class=\"token operator\">-></span>phnum<span class=\"token punctuation\">;</span>\n<span class=\"token keyword\">for</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">;</span> ph <span class=\"token operator\">&lt;</span> eph<span class=\"token punctuation\">;</span> ph<span class=\"token operator\">++</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">{</span>\n  pa <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span>uchar<span class=\"token operator\">*</span><span class=\"token punctuation\">)</span>ph<span class=\"token operator\">-></span>paddr<span class=\"token punctuation\">;</span>\n  <span class=\"token function\">readseg</span><span class=\"token punctuation\">(</span>pa<span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>off<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n  <span class=\"token keyword\">if</span><span class=\"token punctuation\">(</span>ph<span class=\"token operator\">-></span>memsz <span class=\"token operator\">></span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">)</span>\n    <span class=\"token function\">stosb</span><span class=\"token punctuation\">(</span>pa <span class=\"token operator\">+</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">,</span> ph<span class=\"token operator\">-></span>memsz <span class=\"token operator\">-</span> ph<span class=\"token operator\">-></span>filesz<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>First, the address <code class=\"language-text\">(uchar*)elf + elf->phoff</code> is cast as a <code class=\"language-text\">proghdr</code> struct.</p>\n<p><code class=\"language-text\">(uchar*)elf + elf->phoff</code> is the starting offset of the program headers.</p>\n<p>Since 4096 bytes of data were read earlier, both the ELF header and all program headers are expected to have been loaded, so the program header information is already present at the position <code class=\"language-text\">(uchar*)elf + elf->phoff</code>.</p>\n<p>From here, the data for each program segment is loaded.</p>\n<p>This is the step that loads the actual programs to be executed.</p>\n<p>Note that <code class=\"language-text\">stosb</code> is defined in <code class=\"language-text\">x86.h</code> and is used to zero-fill when <code class=\"language-text\">ph->memsz</code> is greater than <code class=\"language-text\">ph->filesz</code>.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token keyword\">static</span> <span class=\"token keyword\">inline</span> <span class=\"token keyword\">void</span>\n<span class=\"token function\">stosb</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span> <span class=\"token operator\">*</span>addr<span class=\"token punctuation\">,</span> <span class=\"token keyword\">int</span> data<span class=\"token punctuation\">,</span> <span class=\"token keyword\">int</span> cnt<span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">{</span>\n  <span class=\"token keyword\">asm</span> <span class=\"token keyword\">volatile</span><span class=\"token punctuation\">(</span><span class=\"token string\">\"cld; rep stosb\"</span> <span class=\"token operator\">:</span>\n               <span class=\"token string\">\"=D\"</span> <span class=\"token punctuation\">(</span>addr<span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> <span class=\"token string\">\"=c\"</span> <span class=\"token punctuation\">(</span>cnt<span class=\"token punctuation\">)</span> <span class=\"token operator\">:</span>\n               <span class=\"token string\">\"0\"</span> <span class=\"token punctuation\">(</span>addr<span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> <span class=\"token string\">\"1\"</span> <span class=\"token punctuation\">(</span>cnt<span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> <span class=\"token string\">\"a\"</span> <span class=\"token punctuation\">(</span>data<span class=\"token punctuation\">)</span> <span class=\"token operator\">:</span>\n               <span class=\"token string\">\"memory\"</span><span class=\"token punctuation\">,</span> <span class=\"token string\">\"cc\"</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/27958743/difference-between-p-filesz-and-p-memsz-of-elf32-phdr/31011428#31011428\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">elf - Difference between p<em>filesz and p</em>memsz of Elf32_Phdr - Stack Overflow</a></p>\n<p>Now that the kernel program has been loaded from disk, all subsequent processing is handed off to the kernel.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token comment\">// Call the entry point from the ELF header.</span>\n<span class=\"token comment\">// Does not return!</span>\nentry <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">(</span><span class=\"token operator\">*</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span><span class=\"token keyword\">void</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">(</span>elf<span class=\"token operator\">-></span>entry<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n<span class=\"token function\">entry</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span></code></pre></div>\n<h2 id=\"summary\" style=\"position:relative;\"><a href=\"#summary\" aria-label=\"summary permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Summary</h2>\n<p>It took quite a bit of time, but I have now thoroughly read through and analyzed the xv6 UNIX bootstrap process.</p>\n<p>Starting from the next article, I can finally get into the main topic — reading the kernel source code.</p>\n<h2 id=\"reference-books\" style=\"position:relative;\"><a href=\"#reference-books\" aria-label=\"reference books permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Reference Books</h2>\n<ul>\n<li><a href=\"https://amzn.to/3qZSCY7\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">30-Day OS Development (30日でできる! OS自作入門)</a></li>\n<li><a href=\"https://amzn.to/3qXYsZX\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Writing an OS from Scratch (ゼロからのOS自作入門)</a></li>\n<li><a href=\"https://amzn.to/3q8TU3K\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Reading OS Code for the First Time ~Learning Kernel Mechanisms with UNIX V6~</a></li>\n<li><a href=\"https://amzn.to/3I6fkVt\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Understanding the Linux Kernel (詳解 Linuxカーネル)</a></li>\n<li><a href=\"https://amzn.to/3JRUdI2\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Building and Understanding an OS: Theory and Implementation for x86 Computers (作って理解するOS x86系コンピュータを動かす理論と実装)</a></li>\n</ul>","fields":{"slug":"/unix-xv6-001-bootstrap-en","tagSlugs":["/tag/unix-en/","/tag/xv-6-en/","/tag/kernel-en/","/tag/os-en/","/tag/english/"]},"frontmatter":{"date":"2022-01-10","description":"Reading the xv6 OS source code to learn about the kernel. This article walks through the bootstrap process of xv6 OS.","tags":["Unix (en)","xv6 (en)","Kernel (en)","OS (en)","English"],"title":"Reading xv6 OS Seriously to Fully Understand the Kernel - Bootstrap","socialImage":{"publicURL":"/static/a1219355b6fca5aca448f0393c8a403a/unix-xv6-001-bootstrap.png"}}}},"pageContext":{"slug":"/unix-xv6-001-bootstrap-en"}},"staticQueryHashes":["251939775","401334301","825871152"]}