• After being decompressed, the kernel image starts with another 'startup_32' function included in $(linux-2.6.15.3_dir/arch/i386/kernel/head.S'. This 'head.S' is the second one in linux source package, which is also called 'kernel head'. And it is exactly what we want to describe in this artical.

    The kernel head continues to perform higher initialization operations for the first linux process(process 0). It sets up an execution environment for the kernel main routine just like what the operating system does before an application begins to start. There are two entries for CPUs in this 'head.S' and we only talk about the execution routine of the boot CPU.

    /*
     * ! $(linux2.6.3.15_dir)/arch/i386/kernel/head.S
     */
    ENTRY(startup_32)

     /*
      * ! We still use liner address, since
      * ! %ds = %es = %fs = %gs = __BOOT_DS
      * ! we use the third segment which base
      * ! address starts from 0x00000000
      */
     cld
     lgdt boot_gdt_descr - __PAGE_OFFSET
     movl $(__BOOT_DS),%eax
     movl %eax,%ds
     movl %eax,%es
     movl %eax,%fs
     movl %eax,%gs

     /*
      * ! Clear the kernel bss
      */
     xorl %eax,%eax
     movl $__bss_start - __PAGE_OFFSET,%edi
     movl $__bss_stop - __PAGE_OFFSET,%ecx
     subl %edi,%ecx
     shrl $2,%ecx
     rep ; stosl

    After copying the bootup parameters, it prepares to enable the paging. Before the paging enabled, some data structure should be loaded first following the 'Intel Manual Vol3'.

     /*
      * ! Initialize the provisional kernel page tables
      * ! which are stored starting from pg0, right after
      * ! the end of the kernel's uninitialized data segments(bss).
      * ! and the provisional page global directory is
      * ! contained in the swapper_pg_dir variable.
      * !
      * ! page_pde_offset = 0x0c00
      */
     page_pde_offset = (__PAGE_OFFSET >> 20);

     /*
      * ! this line indicates the table starts from 'pg0'
      */
     movl $(pg0 - __PAGE_OFFSET), %edi

     /*
      * ! this line told us 'swapper_pg_dir' is the
      * ! page directory start point
      */
     movl $(swapper_pg_dir - __PAGE_OFFSET), %edx

     /*
      * ! There were 1024 entries in 'swapper_pg_dir'
      * ! since the code below:
      * ! ENTRY(swapper_pg_dir)
      * !     .fill 1024,4,0
      * !
      * ! The first mapping:
      * !     both entry 0 and entry 0x300 (page_pde_offset/4) --> pg0
      * !     that is (0x00000000~0x007fffff) ---> pg0
      * ! The second mapping:
      * !     both entry 1 and entry 0x301 (page_pde_offset/4+1) --> pg1 (the page following pg0)
      * !     that is (0xC0000000~0xC07fffff) ---> pg1
      * !
      * ! The objective of this first phase of paging is to
      * ! allow these 8 MB of RAM to be easily addressed
      * ! both in real mode and protected mode.
      */
     movl $0x007, %eax   /* 0x007 = PRESENT+RW+USER */
    10:
     leal 0x007(%edi),%ecx   /* Create PDE entry */
     movl %ecx,(%edx)   /* Store identity PDE entry */
     movl %ecx,page_pde_offset(%edx)  /* Store kernel PDE entry */
     addl $4,%edx
     movl $1024, %ecx
    11:
     stosl
     addl $0x1000,%eax
     loop 11b
     /* End condition: we must map up to and including INIT_MAP_BEYOND_END */
     /* bytes beyond the end of our own page tables; the +0x007 is the attribute bits */
     leal (INIT_MAP_BEYOND_END+0x007)(%edi),%ebp
     cmpl %ebp,%eax
     jb 10b
     movl %edi,(init_pg_tables_end - __PAGE_OFFSET)

     /*
      * ! here just the boot CPU go this way
      */
    #ifdef CONFIG_SMP
     xorl %ebx,%ebx    /* This is the boot CPU (BSP) */
     jmp 3f

    The kernel page tables have been loaded and we can enable the paging now!

     /*
      * Enable paging
      */
     movl $swapper_pg_dir-__PAGE_OFFSET,%eax
     
     /*
      * ! load the table physical address into the %cr3
      */
     movl %eax,%cr3  /* set the page table pointer.. */
     movl %cr0,%eax
     orl $0x80000000,%eax
     
     /*
      * ! Enable the paging
      */
     movl %eax,%cr0  /* ..and set paging (PG) bit */
     
     /*
      * ! A relative jump after the paging enabled
      */
     ljmp $__BOOT_CS,$1f /* Clear prefetch and normalize %eip */
    1:
     /* Set up the stack pointer */
     lss stack_start,%esp

    There is a relative jump instruction - 'ljmp $(__BOOT_CS), $1f'. Maybe you wonder what the '$1f' means. '1' is a local symbol. To define a local symbol, write a label of the form 'N:' (where N represents any digit). To refer to the most recent previous definition of that symbol write 'Nb', using the same digit as when you defined the label. To refer to the next definition of a local label, write 'Nf'. The 'b' stands for "backwards" and the 'f' stands for "forwards".  

    Now we are in 32-bit protected mode with paging enable. so we still need to re-do something done in 16-bit mode for 'real-mode' operations.

     /*
      * ! Setup the interrupt descriptor table
      * ! All the 256 entries are pointing to
      * ! the default interrupt "handler" -- 'ignore_int'
      */
     call setup_idt

     ....
     ....

    setup_idt:
     lea ignore_int,%edx
     movl $(__KERNEL_CS << 16),%eax
     movw %dx,%ax  /* selector = 0x0010 = cs */
     movw $0x8E00,%dx /* interrupt gate - dpl=0, present */

     /*
      * ! idt_table varible is defined
      * ! in $(linux2.6.3.15_dir)/arch/i386/kernel/traps.c
      */
     lea idt_table,%edi
     mov $256,%ecx
    rp_sidt:
     movl %eax,(%edi)
     movl %edx,4(%edi)
     addl $8,%edi
     dec %ecx
     jne rp_sidt
     ret

    After checking the type of CPU, the kernel head prepare to call the kernel main function 'start_kernel'. 

     /*
      * ! use new descriptor table in safe place
      * ! then reload segment registers after lgdt
      */
     lgdt cpu_gdt_descr
     lidt idt_descr
     ljmp $(__KERNEL_CS),$1f
    1: movl $(__KERNEL_DS),%eax # reload all the segment registers
     movl %eax,%ss   # after changing gdt.

     movl $(__USER_DS),%eax  # DS/ES contains default USER segment
     movl %eax,%ds
     movl %eax,%es

     xorl %eax,%eax   # Clear FS/GS and LDT
     movl %eax,%fs
     movl %eax,%gs
     lldt %ax
     cld   # gcc2 wants the direction flag cleared at all times

     ...
     ...

     /*
      * ! The boot CPU will jump to execute
      * ! $(linux2.6.3.15_dir)/init/main.c:start_kernel()
      * ! And the start_kernel() should never return :)
      */
     call start_kernel
    L6:
     jmp L6   # main should never return here, but
        # just in case, we know what happens.

  • Why do we do this? Don't ask me.. Incomprehensible are the ways of bootloaders.
                                 -- comments in arch/i386/boot/compressed/misc.c

    There are two 'head.S' in linux source package. One is in $(Linux-2.6.15.3_dir/arch/i386/boot/compressed and the other one is in $(Linux-2.6.15.3_dir/arch/i386/kernel. The first one will be analyzed in this artical. Before we go ahead, let's show a news of linux, that is 'Army leans toward Linux for FCS(Future Combat System)'.

    The first 'head.S' is also called 'compressed head', which used to decompress the kernel image. Different from those code before, we are now in 32-bit protected mode with paging disabled. The 'compressed head' starts from 'startup_32'.

    .text /* ! here just '.text', without '.code16' assembly directive */
    .globl startup_32
     
    startup_32:
     /*
      * ! clear direction flag
      * ! and clear interrupt flag
      */
     cld
     cli

     /*
      * ! all other segment registers are
      * ! reloaded after protected mode enabled
      * ! __BOOT_DS = 0x18
      */
     movl $(__BOOT_DS),%eax
     movl %eax, %ds
     movl %eax, %es
     movl %eax, %fs
     movl %eax, %gs

     /*
      * ! lss - load full pointer from memory
      * !       to register
      * ! and here 'ss:esp = stack_start'
      */
     lss stack_start,%esp

     /*
      * ! EAX = 0;
      * ! do {
      * !     DS:[0] = ++EAX;
      * ! } while (DS:[0x100000] == EAX);
      */
     xorl %eax, %eax
    1: incl %eax  # check that A20 really IS enabled
     movl %eax, 0x000000 # loop forever if it isn't
     cmpl %eax, 0x100000
     je 1b

    After reload the segment registers, the 'compressed head' clears the 'eflags' register and fills the kernel bss(the area of uninitialized data of the kernel identified by the _edata and _end symbols) with zeros. Then the decompressed process begins.

     /*
      * ! %esi has been loaded in 'setup.S' with 'INITSET << 4'
      * ! 'subl $16,%esp' used to store the first arg, that is
      * ! struct moveparams {
      * !     uch *low_buffer_start;
      * !     int lcount;
      * !     uch *high_buffer_start;
      * !     int hcount;
      * ! } mv;
      * ! the second arg is the %esi which indicates the position
      * ! of the real-mode data
      */
     subl $16,%esp # place for structure on the stack
     movl %esp,%eax
     pushl %esi # real mode pointer as second arg
     pushl %eax # address of structure as first arg

     /*
      * ! if (!decompress_kernel(&mv, esi)) {         // return value in AX
      * !    restore esi from stack;
      * !    ebx = 0;
      * !    goto __BOOT_CS: $__PHYSICAL_START;
      * !    // see linux/arch/i386/kernel/head.S:startup_32
      * ! }
      * ! 'decompress_kernel' is coded in
      * ! $(linux-2.6.15.3_dir)/arch/i386/boot/compressed/misc.c
      *
    /
     call decompress_kernel
     orl  %eax,%eax
     jnz  3f
     popl %esi # discard address
     popl %esi # real mode pointer
     xorl %ebx,%ebx
     ljmp $(__BOOT_CS), $__PHYSICAL_START

    3:
     /*
      * ! move move_rountine_start..move_routine_end to 0x1000
      * ! both the two functions are defined in the tail of
      * ! this file
      */
     movl $move_routine_start,%esi
     movl $0x1000,%edi
     movl $move_routine_end,%ecx
     subl %esi,%ecx
     addl $3,%ecx
     shrl $2,%ecx
     cld
     rep
     movsl

     /*
      * ! Do preparation for 'move_routine_start':
      * ! set the parameters
      * ! ebx = real mode pointer
      * ! esi = mv.low_buffer_start
      * ! ecx = mv.lcount
      * ! edx = mv.high_buffer_start
      * ! eax = mv.hcount
      * ! edi = $__PHYSICAL_START
      */
     popl %esi # discard the address
     popl %ebx # real mode pointer
     popl %esi # low_buffer_start
     popl %ecx # lcount
     popl %edx # high_buffer_start
     popl %eax # hcount
     movl $__PHYSICAL_START,%edi
     cli  # make sure we don't get interrupted

     /*
      * ! jump to physical address: __BOOT_CS:0x1000
      * ! where the move_routine_start function stays
      */
     ljmp $(__BOOT_CS), $0x1000 # and jump to the move routine

     /*
      * ! the control has been transfered to 'move_routine_start'
      */
    move_routine_start:
     movl %ecx,%ebp
     shrl $2,%ecx
     rep
     movsl
     movl %ebp,%ecx
     andl $3,%ecx
     rep
     movsb
     movl %edx,%esi
     movl %eax,%ecx # NOTE: rep movsb won't move if %ecx == 0
     addl $3,%ecx
     shrl $2,%ecx
     rep
     movsl
     movl %ebx,%esi # Restore setup pointer
     xorl %ebx,%ebx
     ljmp $(__BOOT_CS), $__PHYSICAL_START
    move_routine_end:

    In 'move_routine_start', we perform the operations as follows:
    (1) move mv.low_buffer_start to $__PHYSICAL_START, (mv.lcount >> 2) words;
    (2) move/append (mv.lcount & 3) bytes;
    (3) move/append mv.high_buffer_start, ((mv.hcount + 3) >> 2) words.

    After move the decompressed kernel image to its right place, the control will be transfered to physical address:'$(__BOOT_CS):$__PHYSICAL_START', where the second 'head.S' stays.

  • The phase we talked about before is in 'Real-address Mode', which runs 16-bit program modules. At the tail of "Begin 'setup.S'", we had moved to 'Protected Mode', which usu runs 32-bit program modules. So there are two big problems which are 'How to transfer control between 16-bit code and 32-bit code' and how to transfer control from 'real-mode' to protected mode'. They are also what we wanna talk about in this artical.

    The transfering codes are mainly in 'setup.S and 'head.S'. We have covered the 'setup.S' with a little detail about how to move to protected mode. Here we are going to make a supplementary.

    First of all, let us have a look at the characteristics of 16-Bit and 32-Bit program modules, which quotes the 'Intel Manual Vol3'.

    Characteristic                         16-Bit Program Modules            32-Bit Program Modules
    ----------------------------------------------------------------------------------------------
    Segment Size                               0 to 64 KBytes                     0 to 4 GBytes
    Operand Sizes                              8 bits and 16 bits              8 bits and 32 bits
    Pointer Offset Size (Address Size)           16 bits                            32 bits
    Stack Pointer Size                          16 Bits                              32 Bits
    Control Transfers Allowed to Code           16 Bits                              32 Bits
    Segments of This Size

    The 'Intel Manual Vol3' also tells us how to distinguish between and support 16-bit and 32-bit segments and operations.

    Details as follows:
    (1) The D (default operand and address size) flag in code-segment descriptors.
    (2) The B (default stack size) flag in stack-segment descriptors.
    (3) 16-bit and 32-bit call gates, interrupt gates, and trap gates.
    (4) Operand-size and address-size instruction prefixes.
    (5) 16-bit and 32-bit general-purpose registers.

    Due to the usage in 'setup.S', we are going to talk about item (4) in this artical and you can deep into the other four items by reading that bible book mentioned above. Before we say something about 'instruction prefix', we are going to do a review of 'setup.S'. As we know, before switching to protected mode, a minimum set of system data structures and code modules must be loaded into memory. The GDT(Global Descriptor Table) is one of them. GDT consists of several 8-byte segment descriptors.

    These segment descriptors describe the segment characteristics. They have several important fields. Some of the fields are listed below:
    (1) 'base' - contains the linear address of the first byte of the segment.
    (2) 'G' -  granularity flag, if it is cleared (equal to 0), the segment size is expressed in bytes; otherwise, it is expressed in multiples of 4096 bytes.
    (3) 'limit' - holds the offset of the last memory cell in the segment, thus binding the segment length. When G is set to 0, the size of a segment may vary between 1 byte and 1 MB; otherwise, it may vary between 4 KB and 4 GB.

    Here we are going to learn how the 'setup.S' define its provisional GDT, yeah, it is just a provisional GDT.

    /*
     * ! $(linux-2.6.15.3_dir)/arch/i386/setup.S
     */
    # Descriptor tables
    #
    # NOTE: The intel manual says gdt should be sixteen bytes aligned for
    # efficiency reasons.  However, there are machines which are known not
    # to boot with misaligned GDTs, so alter this at your peril!  If you alter
    # GDT_ENTRY_BOOT_CS (in asm/segment.h) remember to leave at least two
    # empty GDT entries (one for NULL and one reserved).
    #
    # NOTE: On some CPUs, the GDT must be 8 byte aligned.  This is
    # true for the Voyager Quad CPU card which will not boot without
    # This directive.  16 byte aligment is recommended by intel.
    #
     .align 16
    gdt:
     /*
      * ! #define GDT_ENTRY_BOOT_CS 2
      * ! The first segment descripter is setted by zero(Requested by Intel).
      * ! The second segment descripter is reserved and also setted by zero.
      * ! The third segment descripter:
      * !  base = 0; G flag = 4096(D) = 0x1000, limit = 0xFFFF * 0x1000 = 4Gb
      * ! The fourth segment descripter:
      * !  base = 0; G flag = 4096(D) = 0x1000, limit = 0xFFFF * 0x1000 = 4Gb
      */
     .fill GDT_ENTRY_BOOT_CS,8,0

     .word 0xFFFF    # 4Gb - (0x100000*0x1000 = 4Gb)
     .word 0    # base address = 0
     .word 0x9A00    # code read/exec
     .word 0x00CF    # granularity = 4096, 386
          #  (+5th nibble of limit)

     .word 0xFFFF    # 4Gb - (0x100000*0x1000 = 4Gb)
     .word 0    # base address = 0
     .word 0x9200    # data read/write
     .word 0x00CF    # granularity = 4096, 386
          #  (+5th nibble of limit)
    gdt_end:
     .align 4
     
     .word 0    # alignment byte
    idt_48:
     .word 0    # idt limit = 0
     .word 0, 0    # idt base = 0L

     .word 0    # alignment byte
    gdt_48:
     /*
      * ! Segment descriptors are always 16 bytes long recommended by intel,
      * ! the GDT limit should always be one less than an integral
      * ! multiple of sixteen (that is, 16N – 1).
      * ! we can see that the gdt base will be reset later
      */
     .word gdt_end - gdt - 1  # gdt limit
     .word 0, 0    # gdt base (filled in later)

    The following code performs an operation to load a liner address to GDTR(Global Descriptor Table Register). You must have to distinguish between GDTR(Global Descriptor Table Register) and GDT(Global Descriptor Table). The value stored in GDTR indicates where the GDT is. The GDTR is a key register when we moved to protected mode. so we must fill it before transferring control to protected mode. GDTR is 48-bit register, which consises of 'limit' field and 'base' field. We can use 'lgdt m16/32' instruction to fill this register. The 'lgdt' instruction loads a linear base address and limit value from a six-byte data operand in memory into the GDTR, respectively. If a 16-bit operand is used with 'lgdt', the register is loaded with a 16-bit limit and a 24-bit base, and the high-order eight bits of the six-byte data operand are not used. If a 32-bit operand is used, a 16-bit limit and a 32-bit base is loaded; the high-order eight bits of the six-byte operand are used as high-order base address bits. The following code showes us how the 'setup.S' loads the GDTR.
     
     # set up gdt and idt
     lidt idt_48    # load idt with 0,0
     xorl %eax, %eax   # Compute gdt_base
     movw %ds, %ax   # (Convert %ds:gdt to a linear ptr)
     shll $4, %eax
     addl $gdt, %eax

     /*
      * ! reset the GDT base to %ds:gdt, which is mentioned above
      * ! now %ds = SETUPSEG = 0x9020
      * ! after 'lgdt', the 'base' field value in GDTR is ((%ds << 4) + $gdt)
     movl %eax, (gdt_48+2)
     lgdt gdt_48    # load gdt with whatever is
                    # appropriate

    Thus, the preparation for 'protected mode' is over. What we want to do next is moving to the protected mode. We had mentioned that a far JMP instruction should be executed immediately after protected mode is enabled. Here 'setup.S' chooses a more simple way to transfer control to 32-bit protected mode. 

    /*
     * ! $(linux-2.6.15.3_dir)/include/asm-i386/segment.h
     * Simple and small GDT entries for booting only
     */

    #define GDT_ENTRY_BOOT_CS  2
    #define __BOOT_CS (GDT_ENTRY_BOOT_CS * 8)

    #define GDT_ENTRY_BOOT_DS  (GDT_ENTRY_BOOT_CS + 1)
    #define __BOOT_DS (GDT_ENTRY_BOOT_DS * 8)

    /*
     * $(linux-2.6.15.3_dir)/arch/i386/setup.S
     */
    #
    # jump to startup_32 in arch/i386/boot/compressed/head.S

    # NOTE: For high loaded big kernels we need a
    # jmpi    0x100000,__BOOT_CS
    #
     .byte 0x66, 0xea   # prefix + jmpi-opcode
    code32: .long 0x1000    # will be set to 0x100000
          # for big kernels
     .word __BOOT_CS

    There is a hard-coding instruction to do the jump. '0xea' is the binary coding form of 'jmpi' instruction. the 'jmpi' instruction uses a four-byte(when operand's size is 16 bits) or six-byte(when operand's size is 32 bits) operand as a long pointer to the destination. Now we are in 16-bit mode, all the operand's size is 16 bits(mainly the target offset). But we want to jump to a 32-bit program module where instructions are executed in 32-bit mode. How can we deal with it, since we can not directly jump there. The solution is to add '0x66' instruction prefix before 'jmpi'. This instruction prefix reverse the default size selected by the D flag in the code-segment descriptor and guarantees that the CPU will properly take our 48 bit far pointer(it is also called 'logical address' in protected mode and it consists of 16-bit segment selector and 32-bit offset). the 'jmpi' loads '__BOOT_CS' to %cs and treats the 0x100000(big kernel) as an offset.

    Where are we arrived after the intersegmental jump? Which instruction is the CPU going to execute? Both of these are what we want to solve. Now we are in protected mode with paging disabled and the memory addressing model mode has been changed. It is the 'segmented memory model' in protected mode. In this model, to address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset. Internally, the processor translates each logical address into a linear address to access a memory location. the segment selector decides which segment descriptor to be used in GDT and the final liner address could be caculated by such a formula 'segment_descriptor.base + offset'.

    There is a logical address available in 'setup.S', that is '__BOOT_CS(0000000000010000B) : 0x0010000'. The first high-order 13 bits decide the index(based on zero) of the segment descriptor to use in GDT. Here the index is equal to '2'. Just review the code above, the segment descriptor is the third defined in lable 'gdt' and its base is 0. Now we can make a conclusion that the first instruction's liner address is '0 + 0x00100000', that is 0x001000000. It is just the location where 'head.S'(the first part of the system) stays.

    /*
     * ! $(linux-2.6.15.3_dir)/arch/i386/boot/compressed/head.S
     */
     .globl startup_32
     
    startup_32:
     cld
     cli
     movl $(__BOOT_DS),%eax
     movl %eax,%ds
     movl %eax,%es
     movl %eax,%fs
     movl %eax,%gs

    Here there is still a question, that is why we do not use 'jmpi startup_32, __BOOT_CS' instead of 'jmpi 0x100000, __BOOT_CS'? We know that linux finally makes paging enable and build its own virtual memory management system. At that time, the linux kernel will have 4G-byte virtual address space and it only runs over the high 1G-byte(from 0xC0000000 to 0xFFFFFFFF) space. But the physical address space always starts from 0x00000000. There is a offset between kernel's virtual address space and the physical address space. The offset is just '0xC0000000'. So when we build linux kernel image, all address of labels in protected mode and later phases are added the offset. The address of label startup_32 is 0xC0100000. It won't be used unless the paging is enabled. The code in this 'head.S' is also to do preparation for paging.

  • So far we have arrived at the gate leading to the real kernel. And we'd better stop for a short break in order that we would have more energy to go ahead. Now let's examine what we do to memory these days. 

    Virtually what we want to do is drawing some pictures to describe the layout of the memory in various phases. For the layout is related to the bootloader, we'd better make our work based on the following assumption:
    The machine has two systems installed (Windows XP and Linux) and uses LILO as the bootloader. Let us look at the LILO configuration:

    /* LILO Configuration - /etc/lilo.conf */
    boot=/dev/hda
    map=/boot/map
    install=/boot/boot.b
    prompt
    timeout=100
    compact
    default=Linux
    image=/boot/vmlinuz-2.6.15.3
             label=Linux
             root=/dev/hda2
             read-only
    other=/dev/hda1
             label=WindowsXP

    Here the 'boot=/dev/hda' indicates it installed the LILO on the MBR of first hard disk. 'root=/dev/had2' indicates it installs linux system on the second partition of the first disk and 'other=/dev/hda1' indicates it installs windows system on the first partition of the first disk. Since lilo.conf is not read at boot time, the MBR needs to be "refreshed" when this is changed. If you do not do this upon rebooting, none of your changes to lilo.conf will be reflected at startup. Like getting LILO into the MBR in the first place, you need to run: '$ /sbin/lilo -v -v'. The '-v -v' flags give you very verbose output.

    Now we could switch on our machine! ('<->' means 'begin from ... end before ...')
    1. Power on <-> BIOS routine
    Chaos, that is the character of memory at this time.

    2. BIOS routine <--> Bootloader 1st stage(MBR)
    BIOS routine runs over and prepares to execute the code loaded from MBR. MBR contains the 1st stage bootloader of the LILO.

           |                        |
    0A0000 +------------------------+
           |                        |
    010000 +------------------------+
           |          MBR           | <- MBR (07C00 ~ 07E00)
    001000 +------------------------+
           |                        |
    000600 +------------------------+ 
           |      BIOS use only     |
    000000 +------------------------+

    3. Bootloader 1st stage(MBR) <-> Bootloader 2nd stage
    The bootloader 1st stage moves itself to 0x090000, sets up the Real Mode stack (ranging from 0x09b000 to 0x09a200) and loads the 2nd stage of the LILO from 0x09b000.

              |                        |
    0A0000 +------------------------+
           |      2nd bootloader    |
    09b000 +------------------------+
           |     Real mode stack    |
    09A200 +------------------------+
           |     1st bootloader     |
    09A000 +------------------------+
           |                        |
    010000 +------------------------+
           |      MBR(useless)      | <- MBR (07C00 ~ 07E00)
    001000 +------------------------+
           |  Reserved for MBR/BIOS |
    000800 +------------------------+
           |  Typically used by MBR |
    000600 +------------------------+ 
           |      BIOS use only     |
    000000 +------------------------+

    4. Bootloader 2nd stage <-> setup.S
    The 2nd bootloader copies the integrated boot loader of the kernel image to address 0x090000, the setup() code to address 0x090200, and the rest of the kernel image to address 0x00010000(called 'low address' for small Kernel Images compiled with 'make zImage') or 0x00100000('high address' for big Kernel Images compiled with 'make bzImage').

    zImage:

           |                        |
    0A0000 +------------------------+
           |      2nd bootloader    |
    09b000 +------------------------+
           |     Real mode stack    |
    09A200 +------------------------+
           |     1st bootloader     |
    09A000 +------------------------+
           |  Stack/heap/cmdline    | For use by the kernel real-mode code.
    098000 +------------------------+ 
           |         Kernel setup   | The kernel real-mode code.
    090200 +------------------------+
           |    Kernel boot sector  | The kernel legacy boot sector.
    090000 +------------------------+
           |          zImage        | The bulk of the kernel image.
    010000 +------------------------+
           |       MBR(useless)     | <- MBR (07C00 ~ 07E00)
    001000 +------------------------+
           |  Reserved for MBR/BIOS |
    000800 +------------------------+
           |  Typically used by MBR |
    000600 +------------------------+
           |      BIOS use only     |
    000000 +------------------------+

    bzImage:

           +------------------------+
           |          bzImage       |
    0100000+------------------------+
           |                        |
    0A0000 +------------------------+
           |      2nd bootloader    |
    09b000 +------------------------+
           |     Real mode stack    |
    09A200 +------------------------+
           |     1st bootloader     |
    09A000 +------------------------+
           |    Stack/heap/cmdline  | For use by the kernel real-mode code.
    098000 +------------------------+ 
           |        Kernel setup    | The kernel real-mode code.
    090200 +------------------------+
           |  Kernel boot sector    | The kernel legacy boot sector.
    090000 +------------------------+
           |                        |
    010000 +------------------------+
           |       MBR(useless)     | <- MBR (07C00 ~ 07E00)
    001000 +------------------------+
           |  Reserved for MBR/BIOS |
    000800 +------------------------+
           |  Typically used by MBR |
    000600 +------------------------+
           |      BIOS use only     |
    000000 +------------------------+

    5. setup.S <-> head.S
    The setup() checks the position of the Kernel Image loaded in RAM. If loaded "low" in RAM (when using zImage, at physical address 0x00010000) it is moved to "high" in RAM (at physical address 0x00001000). But, if the Kernel image is a "bzImage" loaded in "high" of RAM already, then it's NOT moved anywhere. It also move the system to its rightful place (0x00000 ~ [<0x090000]). Some system parameters were placed from 0x090000 to 0x090200, which stores the legecy boot sector.

               +------------------------+
           |         bzImage        |
    0100000+------------------------+
           |                        |
    098000 +------------------------+ 
           |      Kernel setup      |
    090200 +------------------------+
           |     System parameters  | collected by setup()
    090000 +------------------------+
           |                        |
           |                        |
           |          System        |
           |                        |
           |                        |
    000000 +------------------------+

    OK, it is much clear. and now we can walk through the door to the real kernel!

  • It is time for 'setup.S' to show its power. The 'setup.S' is loaded by the bootloader and virtually it belongs to neither the 'bootstrap' routine nor the kernel program, although it is a portion of the kernel image. The source of the 'setup.S' is kinda 'big' and what it does can be summarized into one word: "the 'setup.S' is responsible to establish the environment for the execution of the kernel program".

    Since we begin 'setup.S', the bootloader, which loaded the 'setup.S into memory, has lost its meaning and the space it took up is now available. The 'setup.S' consists of setup header and setup body. The setup header is a part of 'Real-mode kernel header', which must follow some layout pattern described in '$(Linux-2.6.15.3_dir)/Document/i386/boot.txt'. Details as follows:
    The 'Real-mode kernel header' looks like:

    Offset Proto Name  Meaning
    /Size

    01F1/1 ALL(1 setup_sects The size of the setup in sectors
    01F2/2 ALL root_flags If set, the root is mounted readonly
    01F4/4 2.04+(2 syssize  The size of the 32-bit code in 16-byte paras
    01F8/2 ALL ram_size DO NOT USE - for bootsect.S use only
    01FA/2 ALL vid_mode Video mode control
    01FC/2 ALL root_dev Default root device number
    01FE/2 ALL boot_flag 0xAA55 magic number
    0200/2 2.00+ jump  Jump instruction
    0202/4 2.00+ header  Magic signature "HdrS"
    0206/2 2.00+ version  Boot protocol version supported
    0208/4 2.00+ realmode_swtch Boot loader hook (see below)
    020C/2 2.00+ start_sys The load-low segment (0x1000) (obsolete)
    020E/2 2.00+ kernel_version Pointer to kernel version string
    0210/1 2.00+ type_of_loader Boot loader identifier
    0211/1 2.00+ loadflags Boot protocol option flags
    0212/2 2.00+ setup_move_size Move to high memory size (used with hooks)
    0214/4 2.00+ code32_start Boot loader hook (see below)
    0218/4 2.00+ ramdisk_image initrd load address (set by boot loader)
    021C/4 2.00+ ramdisk_size initrd size (set by boot loader)
    0220/4 2.00+ bootsect_kludge DO NOT USE - for bootsect.S use only
    0224/2 2.01+ heap_end_ptr Free memory after setup end
    0226/2 N/A pad1  Unused
    0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
    022C/4 2.03+ initrd_addr_max Highest legal initrd address

    The 'Real-mode kernel header' used to be checked by the bootloader and the setup routine. The setup won't go well unless all the data of the header are valid. The label 'start' is the main entry of the 'setup.S', from which the setup process starts. A jump instruction will be executed first there and the 'label' start_of_setup, which is exactly after the 'setup header', is the destination of this jump. Our analysis also starts from this label. The codes in 'setup.S' perform some operations as follows:

    1. Check code integrity
    Since the 'setup.S' code may not be contiguously loaded, we have to check code integrity first.

    /*
     * ! Get the disk type - Int 13H & AH = 0x15
     * ! I wonder why to do so.
     */
    # Bootlin depends on this being done early
     movw $0x01500, %ax
     movb $0x81, %dl
     int $0x13

    /* ! Reset the disk system -  Int 13H & AH = 0x00 */
    #ifdef SAFE_RESET_DISK_CONTROLLER
    # Reset the disk controller.
     movw $0x0000, %ax
     movb $0x80, %dl
     int $0x13
    #endif

    # Set %ds = %cs, we know that SETUPSEG = %cs at this point
     movw %cs, %ax  # aka SETUPSEG
     movw %ax, %ds

     /*
      * ! if ((setup_sig1 != SIG1) || (setup_sig2 != SIG2)) {
      * !   goto bad_sig;
      * ! }
      * ! goto good_sig1;
      *
      * ! If the image is loaded by 'bootsect-loader',
      * ! 'bad_sig' routine won't happen, since 'bootsect-loader'
      * ! loaded the image contiguously.   
      */
    # Check signature at end of setup
     cmpw $SIG1, setup_sig1
     jne bad_sig

     cmpw $SIG2, setup_sig2
     jne bad_sig

     jmp good_sig1

    Here let us have a look at how to find the rest of the setup code and data.

    bad_sig:
     movw %cs, %ax   # SETUPSEG
     subw $DELTA_INITSEG, %ax  # INITSEG
     movw %ax, %ds
     xorb %bh, %bh

     /*
      * ! ds:[497] <=> 0x9000:[497] -> %bl
      * ! rest code in words <=> (%bx - 4) << 8 -> %cx
      * ! (%bx >> 3) + SYSSEG -> start_sys_seg
      */
     movb (497), %bl   # get setup sect from bootsect
     subw $4, %bx    # LILO loads 4 sectors of setup
     shlw $8, %bx    # convert to words (1sect=2^8 words)
     movw %bx, %cx
     shrw $3, %bx    # convert to segment
     addw $SYSSEG, %bx
     movw %bx, %cs:start_sys_seg

    # Move rest of setup code/data to here
     /*
      * ! move %ds:%si to %es:%di (%cx words) <=>
      * ! move SYSSEG:0 to cs:0800 (%cx*2 bytes)
      * ! with the instruction 'rep'
      */
     movw $2048, %di   # four sectors loaded by LILO
     subw %si, %si
     pushw %cs
     popw %es
     movw $SYSSEG, %ax
     movw %ax, %ds
     rep
     movsw
     movw %cs, %ax   # aka SETUPSEG
     movw %ax, %ds
     cmpw $SIG1, setup_sig1
     jne no_sig

     cmpw $SIG2, setup_sig2
     jne no_sig

     jmp good_sig

    Now variable start_sys_seg points to where real system code starts. If "bad_sig" does not happen, start_sys_seg will remain SYSSEG as it used to be.

    2. Check bootloader type
    The lable 'good_sig' used to check if loader is compatible with image.

    /*
     * ! if ((loadflags & LOADHIGH) && !type_of_loader)
     * !  goto no_sig_loop
     */
    good_sig:
     movw %cs, %ax   # aka SETUPSEG
     subw $DELTA_INITSEG, %ax   # aka INITSEG
     movw %ax, %ds
    # Check if an old loader tries to load a big-kernel
     testb $LOADED_HIGH, %cs:loadflags # Do we have a big kernel?
     jz loader_ok   # No, no danger for old loaders.

     cmpb $0, %cs:type_of_loader   # Do we have a loader that
          # can deal with us?
     jnz loader_ok   # Yes, continue.

     pushw %cs    # No, we have an old loader,
     popw %ds    # die. ! %ds = %cs now
     lea loader_panic_mess, %si
     call prtstr

     jmp no_sig_loop

    3. Get memory size
    The comments of the code told us they try three different memory detection schemes to get the extended memory size (above 1M) in KB. First, try e820h, which lets us assemble a memory map; then try e801h, which returns a 32-bit memory size; and finally 88h, which returns 0-64M.

    4. Hardware support
    Several hardware devices are checked and some of them are reseted here. Although the BIOS already initialized most hardware devices, Linux does not rely on it, but reinitializes the devices in its own manner to enhance portability and robustness.
    (1) Keyboard
    Call int $0x16 to set the keyboard repeat rate to the max.

    (2) Video adapter
    The video() code in '$(Linux-2.6.15.3_dir)/arch/i386/video.S' has done the job.

    (3) Hard disk
    The codes here separately copy hd0 data to INIT_SEG:0080(16 bytes) and copy hd1 data to INIT_SEG:0090(16 bytes). After that it checks if hd1 exists with 'Int 13H/AH=0x15', which has been called once before.

    (4) Micro Channel (MCA) bus
    (5) ROM configuration table
    (6) PS/2 pointing device

    5. Advanced Power Management(APM) BIOS support
    Nothing to say.

    6. Enhanced Disk Drive(EDD)
    It is in another file '$(Linux-2.6.15.3_dir)/arch/i386/edd.S'. it is to build a table in RAM describing the hard disks available in the system with some proper BIOS procedure. If you are interested in it, you can go deep into these code.

    7. Prepare for protected mode
    (1) Disable interrput and close NMI

    # This is the default real mode switch routine.
    # to be called just before protected mode transition
    default_switch:
     cli     # no interrupts allowed !
     movb $0x80, %al   # disable NMI for bootup
          # sequence
     outb %al, $0x70
     lret

    (2) Relocate the code
    /*
     * ! Do (long)code32 = code32_start, since the code32
     * ! may changed by loader.
     */
    # we get the code32 start address and modify the below 'jmpi'
    # (loader may have changed it)
     movl %cs:code32_start, %eax
     movl %eax, %cs:code32

    code32_start is initialized to 0x1000 for zImage or 0x100000 for bzImage. This value will be used in passing control to '$(Linux-2.6.15.3_dir)/arch/i386/boot/compressed/head.S'.

    The code next is to move the system to its rightful place if we detected that the loaded kernel is a zImage. If we boot up zImage, it relocates vmlinux to 0100:0; If we boot up bzImage, bvmlinux remains at start_sys_seg:0. Then it will relocate code from CS-DELTA_INITSEG:0 (bbootsect and bsetup) to INITSEG:0, if necessary (whether to be downward compatible with version <=201).

    8. Enable A20
    Everybody hates A20 and really nobody wants it, but it continues to haunt us. Here says nothing about it.

    9. Switch to protected mode
    Following 'IA-32 Intel Architecture Software Developer's Manual', several operations should be done during the switching:
    (1) Prepare GDT with a null descriptor in the first GDT entry, one code and one data segment descriptor;
    (2) Disable interrupts, including maskable hardware interrupts and NMI (this has been done);
    (3) Load the base address and limit of the GDT to GDTR register, using LGDT instruction;
    (4) Set PE flag in CR0 register, using MOV CR0 (Intel386 and up) or LMSW instruction (for compatibility with Intel 286);
    (5) Immediately execute a far JMP or a far CALL instruction.

    # jump to startup_32 in arch/i386/boot/compressed/head.S

    # NOTE: For high loaded big kernels we need a
    # jmpi    0x100000,__BOOT_CS
    #
    # but we yet haven't reloaded the CS register, so the default size
    # of the target offset still is 16 bit.
    # However, using an operand prefix (0x66), the CPU will properly
    # take our 48 bit far pointer. (INTeL 80386 Programmer's Reference
    # Manual, Mixing 16-bit and 32-bit code, page 16-6)

     /*
      * ! 0xea - jmp instruction
      * !
     .byte 0x66, 0xea   # prefix + jmpi-opcode
    The far jmp instruction (0xea) updates CS register. The contents of the remaining segment registers (DS, SS, ES, FS and GS) should be reloaded later. Now control is passed to '$(Linux-2.6.15.3_dir)/arch/i386/boot/compressed/head.S:startup_32'. For zImage, it is at address 0x1000; For bzImage, it is 0x100000.

    Supporting functions and variables exist in the tail of 'setup.S'.

  • The term 'Bootstrap', which originally refers to a person who tries to stand up by pulling his own boots, refers to a subroutine used to establish the full routine(its own left part, i think) or another routine in computer science. Today modern computers act as a vital role in our daily life and many of you may wonder what happens to the computer when you have it powered on. The 'Bootstrap', which is also called 'boot' for short, is the first step to be done by the computer. The process of 'Bootstrap', which starts on when the computer is powered on and usu ends off when the kernel of the operating system begins to run, is just what we are gonna describe.

    As a matter of convenience, we are gonna try to understand the 'Bootstrap' process of linux operating system on platforms compatible with 'i386', since linux is source-opened and its source is absolutely free. We can boot linux from any bootable devices, such as hard disk, floopy, or cd-rom. We choose to boot from a hard disk which is more complex than the other two. Now we suppose a linux-installed computer is in front of us. Press the power button and we will go to the 'Bootstrap' process.

    The normal 'Bootstrap' flow can be described as following:
    [Hardware initialization] -> [BIOS routine] -> [Bootloader run] -> End (Kernel startup) :-)

    1. Hardware initialization
    Immediately after the power-up or an assertion of the RESET# pin, the processor performs a hardware initialization and an optional built-in self-test(BIST for short). The hardware initialization sets the processor's registers to a known state and places the processor in real-address operating mode which we have mentioned in "
    Inside the 'i386'". The process state after power-up or reset is vatal, since it decides the address of the code from which the processor is going to execute. Here lists the initial states of some registers:

    (1) R[EAX] = R[EBX] = R[ECX] = R[ESI] = R[EDI] = R[ESP] = R[EBP] = 0x00000000 (Note: If the value in the EAX register does not equal to 0H after the BIST, it indicates that a processor fault was detected.)
    (2) the EDX register contains component identification and revision information and different values indicate the various members of these Intel Architecture families.
    (3) R[CS] = 0xF000 (Note: In its hidden part, 'Base' = 0xFFFF0000, Limit = 0xFFFF, AR = Present, R/W, Accessed.)
    (4) R[DS] = R[ES] = R[FS] = R[GS] = R[SS] = 0x0000 (Note: In their hidden parts, 'Base' = 0x00000000, Limit = 0xFFFF, AR = Present, R/W, Accessed.)
    (5) R[EFLAGS] = 0x00000002 (Note: The 10 most-significant bits of this register are undefined following a reset. Software
    should not depend on the states of any of these bits.)
    (6) R[EIP] = 0x0000FFF0

    After the hardware initialization, The first instruction that is fetched and executed is located at physical address 0xFFFFFFF0. The BIOS EPROM containing the software initialization code must be located at this address, otherwise the processor can not locate and fetch its first instruction. Here we know that the processor is using 'Read-Address mode model', since it is in 'real-address' operating mode. but the address 0xFFFFFFF0 is beyond the 1-MByte addressable range of the processor while in real-address mode. How is the processor initialized to this starting address? As mentioned in "Inside the 'i386'", the CS register has two parts: the visible segment selector part and the hidden base address part. In real-address mode, the base address is normally formed by shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address according to the normal rule. However, during the hardware initialization, the normal rule doesn't be followd. the segment selector in the CS register is loaded with 0xF000 and the base address is loaded with 0xFFFF0000. The starting address is thus formed by adding the base address to the value in the EIP register (that is, 0xFFFF0000 + 0xFFF0 = 0xFFFFFFF0). The first time the CS register is loaded with a new value after the hardware initialization, the processor will follow the normal rule for address translation in real-address mode (that is, [CS base address = CS segment selector * 16]). To insure that the base address in the CS register remains unchanged until the EPROM based software initialization code is completed, the code must not contain a far jump or far call or allow an interrupt to occur (which would cause the CS selector value to be changed).

    2. BIOS routine (software initialization)
    Here, the hardware initialization of the processor is over and the first instruction, which is also the first one of BIOS routine, is executed. From now on, the BIOS routine, which is the very first program run by the processor takes control and begins to run. 'BIOS'(Basic Input/Output System) is the software embedded on a chip (usu EPROM) located on the computer's main board and it is also called 'firmware'.

    The BIOS routine also uses 'Real-Address' mode model and performs the following operations:
    [Power-on self-test] -> [Hardware devices initialization] -> [Load boot sector] -> End (the loaded sector takes control) :-)

    (1) Power-on self-test (POST)
    The BIOS routine executes a series of tests to establish which devices are present and whether they are working properly. It also initializes the standard devices, such as the memory controller, video controller, IDE controller and floppy controller. Using stored parameters, it initializes the motherboard chipset, and sets timing parameters. It also creates an interrupt vector table and provides a set of services, accessible through interrupts, that allow access to the standard I/O devices. During this phase we may get some messages displayed on the screen, such as the BIOS version banner or etc.

    (2) Hardware devices initialization
    In this phase, the BIOS routine guarantees that all hardware devices operate without conflicts on the IRQ lines and I/O ports and a table of installed PCI devices will be displayed on the screen.

    (3) Load boot sector
    After the 'POST' and the initialization of hardware devices, the BIOS routine call 'Int19H' service routine to search for the valid boot sector, which has the signature '0x55AA' in its last two bytes. As soon as a valid sector is found, the BIOS routine continues to call 'Int13H' service routine to load the valid sector to the address '0x00007C00', and then jumps into this address and executes the code just loaded.

    3. Bootloader run
    The valid sector loaded from hard disk by BIOS routine is usu called 'Master Boot Sector', which consists of 'Master Boot Record(MBR)', 'Disk Partition Table(DPT)' and 'Boot Record ID(0x55AA)'. Usu the MBR stores a small program which used to load the first sector of the partition containing the operating system to be started. Today a two-stage boot loader such as LILO, GRUB is required to boot a Linux kernel from disk. These bootloaders may be installed either on the MBR (replacing that small program that loads the boot sector of the active partition) or in the boot sector of every disk partition. Whatever, the final result is the same. These bootloaders usu are broken into two parts, since they are too large to fit into one single sector, which size is 512 bytes. The MBR or the partition boot sector contains the first part of one of these bootloaders, which is loaded into memory from address 0x00007C00 by the BIOS routine. Then the first part program moves itself to another special address (it is 0x0009A000 for LILO), loads the second part of the bootloader into memory and jumps to execute the just loaded code. the second part of bootloader offers user a chance to choose from a list of bootable operating systems from disk. After the user has chosen the kernel to be loaded, the boot loader may either copy the boot sector of the corresponding partition into memory (the user has chosen the kernel in other partition) and execute it or directly copy the kernel image into memory (the user has chosen linux kernel in current partition). If the linux kernel is gonna loaded, the bootloader calls a BIOS routine to load the first 512 bytes of the kernel image to the address 0x00090000 , load the code of 'setup.S' to the address 0x00090200 and load the rest of the kernel image to either low address 0x00010000 (for small kernel images compiled with make zImage) or high address 0x00100000 (for big kernel images compiled with make bzImage). At last the bootloader jumps to execute the 'setup.S' code.

    Here, the 'Bootstrap' process has come to a conclusion.

  • The term 'i386' in the title does not refer to the real Intel 80386 processor but the representative of Intel 32-bit architecture(IA32). I prefer 'i386' rather than 'IA32' just like what the linux kernel does, since you can find 'i386' folder in $(linux-2.6.x_dir)/arch directory. This artical describes some basic knowledge of 'i386', which may be kinda useful to those guys who wanna do research on or develop operating system.

    We know that the 'i386' processors are the most widely used and supported today, And even the linux was born on it. As a researcher or a developer, we wonder what the 'i386' processor offers to us. In brief 'i386' offers us an execution environment which consists of a set of registers and several mechanisms of accessing memory. Today the Intel mainstream processors, such as Pentium series, are almost based on 80386 processor which first introduced 32-bit registers and paging into 'i386'. let us have a look at the resources supplied by the 'i386' processor. A part of the contents below are quoted from the book "Intel Architecture Software Developer's Manual, Volume 1, Basic Architecture".

    1. Memory accessing
    Any operating system or executive designed to work with an 'i386' processor will use the processor's memory management facilities to access memory. So far 'i386' processors support three memory-accessing model. Once using the processor's memory management facilities, programs do not directly address physical memory. Instead, they access memory using any of these three memory models: flat, segmented, or real-address mode. With the flat memory model, memory appears to a program as a single, continuous address space, which is byte addressable and is called 'linear address space'. it covers contiguously from 0 to (4G -1). When using this model, code (a program's instructions), data, and the procedure stacks are all contained in this address space. With the segmented memory model, memory appears to a program as a group of independent address spaces called segments. When using this model, code, data, and stacks are typically contained in separate segments. To address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset. Internally, the processor translates each logical address into a linear address to access a memory location and this translation is transparent to the application program. With either the flat or segmented model, the 'i386' processor provides facilities for dividing the linear address space into pages and mapping the pages into virtual memory. If an operating system/executive uses the 'i386' processor's paging mechanism, the existence of the pages is transparent to an application program. we can also do a summary with an image as follows:
    Logical Address(segmented mode) --> [Segmentation Unit] --> Liner Address(flat mode) --> [Paging Unit] --> Physical Address

    The left real-address mode model uses the memory model for the Intel 8086 processor, the first 'i386' processor. The real-address mode uses a specific implementation of segmented memory in which the linear address space for the program and the operating system/executive consists of an array of segments, each of which is up to 64K bytes in size. The maximum size of the linear address space in real-address mode is 1M bytes.

    Here, we have to say something about the 'operatiing mode'. the 'i386' processor supports three operating mode which determines which instructions and architectural features are accessible.

    (1) Protected mode
    It is the native state of the processor. In this mode all instructions and architectural features are available, providing the highest performance and capability. This is the recommended mode for all new applications and operating systems. When in this mode, the processor can use any of the memory models described above. (The real-addressing mode memory model is ordinarily used only when the processor is in the virtual-8086 mode.)

    (2) Real-address mode
    This mode provides the programming environment of the Intel 8086 processor with a few extensions (such as the ability to switch to protected or system management mode). The processor is placed in real-address mode following power-up or a reset. When in this mode, the processor only supports the real-address mode memory model. As we know the process of booting from disk is in this mode.

    (3) System management mode
    It is unfamiliar to most of us. we have nothing to say.

    2. Registers
    The registers in 'i386' processors can be grouped into three type: 'general-purpose data registers', 'segment registers' and 'status and control registers'. Details as follows:

    (1) General-Purpose data registers
    There are eight 32-bit registers available for general purpose, such as storing operands and pointers. In theory you can select any of them to do what you wanna do, but many instructions assign specific registers to hold operands. The following is a summary of these special uses:
    EAX - Accumulator for operands and results data.
    EBX - Pointer to data in the DS segment.
    ECX - Counter for string and loop operations.
    EDX - I/O pointer.
    ESI - Pointer to data in the segment pointed to by the DS register; source pointer for string operations.
    EDI - Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations.
    ESP - Stack pointer (in the SS segment).
    EBP - Pointer to data on the stack (in the SS segment).

    (2) Segment registers
    There are six registers for holding segment selector which are a special pointer that identifies a segment in memory and all
    of these segment registers are 16-bit. To access a particular segment in memory, the segment selector for that segment must be present in the appropriate one of the segment registers. So, although a system can define thousands of segments, only 6 can be available for immediate use. Other segments can be made available by loading their segment selectors into these registers during program execution. Every segment register has a 'visible' part(16 bits in 32-bit platform) and a 'hidden' part. (The hidden part is sometimes referred to as a 'descriptor cache' or a 'shadow register'.) When a segment selector is loaded into the visible part of a segment register, the processor also loads the hidden part of the segment register with the base address, segment limit, and access control information from the segment descriptor pointed to by the segment selector. Some load instructions such as 'mov', 'pop', etc explicitly reference the segment registers and other instructions such as 'call', 'jmp', or 'ret' change the contents of the CS register (and sometimes other segment registers) as an incidental part of their operation. How these segment registers are used depends on the type of memory accessing model that the operating system or executive is using.

    We just mentioned 'segment descripters'. A segment descriptor is a data structure in a GDT or LDT that provides the processor with the size and location of a segment, as well as access control and status information. Segment descriptors are typically created by compilers, linkers, loaders, or the operating system or executive, but not application programs.

    (3) Status and control registers
    These registers report and allow modification of the state of the processor and of the program being executed. E.g. the 32-
    bit EFLAGS register contains a group of status flags, a control flag, and a group of system flags. Details as follows:
    CF - Carry Flag
    PF - Parity Flag
    AF - Auxiliary Carry Flag
    ZF - Zero Flag
    SF - Sign Flag
    TF - Trap Flag
    IF - Interrupt Enable Flag
    DF - Direction Flag
    OF - Overflow Flag
    IOPL - I/O Privilege Level
    NT - Nested Task
    RF - Resume Flag
    VM - Virtual-8086 Mode
    AC - alignment Check(AC)
    VIF - Virtual Interrupt Flag
    VIP - Virtual Interrupt Pending
    ID - ID Flag
    Some of the flags in the EFLAGS register can be modified directly using special-purpose instructions. 

    The 'i386' processor is so complex that we can not list all of its features here. If you are interested in it, you may read the thick enough 'i386' manuals to make all clear.

  • We know that the latest linux kernel version is 2.6.x, which is different from the 'old kernels' in booting. The 'bootsect.S', which used to make the kernel image in the floppy disk bootable in the early days, becomes useless in linux kernel 2.6.x today, although it is still a part of the kernel image.

    We know that 'bootsect.S' is usu placed in the first 512 bytes of the kernel image and installed in the first sector of some medium on which the kernel image is installed. the mediums usu include hard disk (or the active partition of the hard disk) and floppy disk. As a minimal 'bootloader' included in kernel images of earlier linux versions up to the 2.4, the 'bootsect.S' is in duty bound to copy the left kernel image from medium to main memory when we boot linux from the floppy disk and then execute the loaded code in order to complete its mission. when we boot linux from hard disk, the 'bootsect.S' does nothing actively but to be checked by other booting routine stored in BIOS(Basic Input/Output System) or MBR(Master Boot Record). Today if you wanna boot linux 2.6.x from a floppy disk, you have to select a suitable bootloader yourself, just like that you boot linux from hard disk, since the 'bootsect.S' has retired.

    Here list the source code of 'bootsect.S' and some comments of mine. let us go and see what the retired 'bootsect.S' really does! (my comments usu occur following the symbol '!')

    /*
     * bootsect.S  Copyright (C) 1991, 1992 Linus Torvalds
     *
     * modified by Drew Eckhardt
     * modified by Bruce Evans (bde)
     * modified by Chris Noe (May 1999) (as86 -> gas)
     * gutted by H. Peter Anvin (Jan 2003)
     *
     * BIG FAT NOTE: We're in real mode using 64k segments.  Therefore segment
     * addresses must be multiplied by 16 to obtain their respective linear
     * addresses. To avoid confusion, linear addresses are written using leading
     * hex while segment addresses are written as segment:offset.
     *
     * ! $(linux-2.6.15.3_dir)/arch/i386/bootsect.S
     */

    /* ! I found this header file in $(linux-2.6.15.3_dir)/include/asm-i386 */
    #include

    /*
     * ! DEF_INITSEG   0x9000
     * ! DEF_SYSSEG    0x1000
     * ! DEF_SETUPSEG  0x9020
     * ! DEF_SYSSIZE   0x7F00
     * ! These macros above are defined in 'boot.h' and
     * ! the values of the first three of them
     * ! used to be stored into 'cs' register
     */
    SETUPSECTS = 4   /* default nr of setup-sectors */
    BOOTSEG  = 0x07C0  /* original address of boot-sector */
    INITSEG  = DEF_INITSEG  /* we move boot here - out of the way */
    SETUPSEG = DEF_SETUPSEG  /* setup starts here */
    SYSSEG  = DEF_SYSSEG  /* system loaded at 0x10000 (65536) */
    SYSSIZE  = DEF_SYSSIZE  /* system size: # of 16-byte clicks */
                                                /* to be loaded */
    /*
     * ! Here no matter what the 'ROOT_DEV' is is insignificant.
     * ! When kernel image builds, this 'ROOT_DEV' will be reset.
     * ! And so does 'SWAP_DEV'.
     * ! 'ROOT_DEV' is variable which represents the type of the device
     * ! in which the root file system stores.
     * ! 'ROOT_DEV = 0' means the same type of floopy as boot. 
     */
    ROOT_DEV = 0    /* ROOT_DEV is now written by "build" */
    SWAP_DEV = 0   /* SWAP_DEV is now written by "build" */

    #ifndef SVGA_MODE
    #define SVGA_MODE ASK_VGA
    #endif

    #ifndef RAMDISK
    #define RAMDISK 0
    #endif

    #ifndef ROOT_RDONLY
    #define ROOT_RDONLY 1
    #endif

    /*
     * !Now we are running in 16-bit real mode, neither in
     * ! 32-bit real mode nor in 32-bit protected mode
     */
    .code16
    .text

    .global _start
    _start:

     /*
      * ! jmpl is an 'jump' instruction which
      * ! jumps between segments.
      * ! the instruction below first stores the
      * ! immediate number '$BOOTSEG' into 'CS'
      * ! register and stores the address of label
      * ! 'start2' into 'EIP' register, and then jumps
      * ! to label 'start2' to execute.
      * ! Now, R[%cs] = $BOOTSEG = 0x07C0
      */
     # Normalize the start address
     jmpl $BOOTSEG, $start2

    start2:
     /*
      * ! initialize some general registers
      * ! R[%ds] = R[%es] = R[%ss] = 0x07C0
      * ! R[%sp] = 0x7c00
      */
     movw %cs, %ax
     movw %ax, %ds
     movw %ax, %es
     movw %ax, %ss
     movw $0x7c00, %sp

     /*
      * ! sti - set the interrupt flag
      * ! cld - clear 'df'(direction flag). after it executed,
      * !       string operations will increment the index
      * !       registers (si and/or di) that they use
      */
     sti
     cld

     /*
      * ! store the address of 'bugger_off_msg'
      * ! into register 'si'(source-index register)
      */
     movw $bugger_off_msg, %si

     /*
      * ! this loop prints the 'bugger_off_msg' on screen
      * ! and jumps to 'die' label.
      */
    msg_loop:
     /*
      * ! lodsb loads 'al' register with single memory
      * ! byte at the position pointed to by 'si' register
      * ! after the executing, the 'si' is automatically
      * ! increased or decreased according to the 'df'.
      */
     lodsb
     andb %al, %al
     jz die
     movb $0xe, %ah
     movw $7, %bx
     int $0x10
     jmp msg_loop

     /*
      * ! the computer dies and you have to reboot.
      */
    die:
     # Allow the user to press a key, then reboot
     xorw %ax, %ax

     /*
      * ! int 16h - bios interrupt to give user
      * ! a chance to enter something from the keyboard
      */
     int $0x16
     int $0x19

     # int 0x19 should never return.  In case it does anyway,
     # invoke the BIOS reset code...
     ljmp $0xf000,$0xfff0


    bugger_off_msg:
     .ascii "Direct booting from floppy is no longer supported.\r\n"
     .ascii "Please use a boot loader program instead.\r\n"
     .ascii "\n"
     .ascii "Remove disk and press any key to reboot . . .\r\n"
     .byte 0


     # Kernel attributes; used by setup

     /*
      * ! variables below are important since
      * ! they would be refered by 'setup.S'
      * ! the total size of these variables is
      * ! 15 bytes, 497 + 15 = 512 :)
      * ! the last word is '0xAA55', which indicates
      * ! this is a boot sector
      */
     .org 497
    setup_sects: .byte SETUPSECTS
    root_flags: .word ROOT_RDONLY
    syssize: .word SYSSIZE
    swap_dev: .word SWAP_DEV
    ram_size: .word RAMDISK
    vid_mode: .word SVGA_MODE
    root_dev: .word ROOT_DEV
    boot_flag: .word 0xAA55

    /* ! end of bootsect.S */

    Thus, we know that the retired 'bootsect.S' only tells us it has retired.