中断处理
最后更新于
最后更新于
This is fourth part about an interrupts and exceptions handling in the Linux kernel and in the previous we saw first early #DB
and #BP
exceptions handlers from the . We stopped on the right after the early_trap_init
function that called in the setup_arch
function which defined in the . In this part we will continue to dive into an interrupts and exceptions handling in the Linux kernel for x86_64
and continue to do it from the place where we left off in the last part. First thing which is related to the interrupts and exceptions handling is the setup of the #PF
or handler with the early_trap_pf_init
function. Let's start from it.
The early_trap_pf_init
function defined in the . It uses set_intr_gate
macro that fills with the given entry:
This macro defined in the . We already saw macros like this in the previous - set_system_intr_gate
and set_intr_gate_ist
. This macro checks that given vector number is not greater than 255
(maximum vector number) and calls _set_gate
function as set_system_intr_gate
and set_intr_gate_ist
did it:
The set_intr_gate
macro takes two parameters:
vector number of a interrupt;
address of an interrupt handler;
In our case they are:
X86_TRAP_PF
- 14
;
page_fault
- the interrupt handler entry point.
regs
- pt_regs
structure that holds state of an interrupted process;
error_code
- error code of the page fault exception.
The state can be one of the:
And in the end we return previous context. Between the exception_enter
and exception_exit
we call actual page fault handler:
where fault_in_kernel_space
is:
The TASK_SIZE_MAX
macro expands to the:
or 0x00007ffffffff000
. Pay attention on unlikely
macro. There are two macros in the Linux kernel:
After this we start to fill the Interrupt Descriptor Table
with the different interrupt gates. First of all we set #DE
or Divide Error
and #NMI
or Non-maskable Interrupt
:
Here we can see:
In the next step we set the interrupt gate for the #DF
or Double fault
exception:
This exception occurs when processor detected a second exception while calling an exception handler for a prior exception. In usual way when the processor detects another exception while trying to call an exception handler, the two exceptions can be handled serially. If the processor cannot handle them serially, it signals the double-fault or #DF
exception.
The following set of the interrupt gates is:
Here we can see setup for the following exception handlers:
#NP
or Segment Not Present
exception indicates that the present flag
of a segment or gate descriptor is clear during attempt to load one of cs
, ds
, es
, fs
, or gs
register.
#SS
or Stack Fault
exception indicates one of the stack related conditions was detected, for example a not-present stack segment is detected when attempting to load the ss
register.
#GP
or General Protection
exception indicates that the processor detected one of a class of protection violations called general-protection violations. There are many different conditions that can cause general-protection exception. For example loading the ss
, ds
, es
, fs
, or gs
register with a segment selector for a system segment, writing to a code segment or a read-only data segment, referencing an entry in the Interrupt Descriptor Table
(following an interrupt or exception) that is not an interrupt, trap, or task gate and many many more.
Spurious Interrupt
- a hardware interrupt that is unwanted.
#AC
or Alignment Check
exception Indicates that the processor detected an unaligned memory operand when alignment checking was enabled.
After that we setup this exception gates, we can see setup of the Machine-Check
exception:
which indicates the processor has detected an SSE
or SSE2
or SSE3
SIMD floating-point exception. There are six classes of numeric exception conditions that can occur while executing an SIMD floating-point instruction:
Invalid operation
Divide-by-zero
Denormal operand
Numeric overflow
Numeric underflow
Inexact result (Precision)
where FIRST_EXTERNAL_VECTOR
is:
After this we setup the interrupt gate for the ia32_syscall
and add 0x80
to the used_vectors
bitmap:
There is CONFIG_IA32_EMULATION
kernel configuration option on x86_64
Linux kernels. This option provides ability to execute 32-bit processes in compatibility-mode. In the next parts we will see how it works, in the meantime we need only to know that there is yet another interrupt gate in the IDT
with the vector number 0x80
. In the next step we maps IDT
to the fixmap area:
Next we get the Task State Segment
for the current cpu and orig_ist
structure which represents origin Interrupt Stack Table
values with the:
As we got values of the Task State Segment
and Interrupt Stack Table
for the current processor, we clear following bits in the cr4
control register:
As we have filled Task State Segments
with the Interrupt Stack Tables
we can set TSS
descriptor for the current processor and load it with the:
and load_TR_desc
macro expands to the ltr
or Load Task Register
instruction:
In the end of the trap_init
function we can see the following code:
That's all. Soon we will consider all handlers of these interrupts/exceptions.
The X86_TRAP_PF
is the element of enum which defined in the :
When the early_trap_pf_init
will be called, the set_intr_gate
will be expanded to the call of the _set_gate
which will fill the IDT
with the handler for the page fault. Now let's look on the implementation of the page_fault
handler. The page_fault
handler defined in the assembly source code file as all exceptions handlers. Let's look on it:
We saw in the previous how #DB
and #BP
handlers defined. They were defined with the idtentry
macro, but here we can see trace_idtentry
. This macro defined in the same source code file and depends on the CONFIG_TRACING
kernel configuration option:
We will not dive into exceptions now. If CONFIG_TRACING
is not set, we can see that trace_idtentry
macro just expands to the normal idtentry
. We already saw implementation of the idtentry
macro in the previous , so let's start from the page_fault
exception handler.
As we can see in the idtentry
definition, the handler of the page_fault
is do_page_fault
function which defined in the and as all exceptions handlers it takes two arguments:
Let's look inside this function. First of all we read content of the control register:
This register contains a linear address which caused page fault
. In the next step we make a call of the exception_enter
function from the . The exception_enter
and exception_exit
are functions from context tracking subsystem in the Linux kernel used by the to remove its dependency on the timer tick while a processor runs in userspace. Almost in every exception handler we will see similar code:
The exception_enter
function checks that context tracking
is enabled with the context_tracking_is_enabled
and if it is in enabled state, we get previous context with the this_cpu_read
(more about this_cpu_*
operations you can read in the ). After this it calls context_tracking_user_exit
function which informs the context tracking that the processor is exiting userspace mode and entering the kernel:
The __do_page_fault
is defined in the same source code file as do_page_fault
- . In the beginning of the __do_page_fault
we check state of the checker. The kmemcheck
detects warns about some uses of uninitialized memory. We need to check it because page fault can be caused by kmemcheck:
After this we can see the call of the prefetchw
which executes instruction with the same which fetches to get exclusive . The main purpose of prefetching is to hide the latency of a memory access. In the next step we check that we got page fault not in the kernel space with the following condition:
You can find these macros in the code of the Linux kernel. Main purpose of these macros is optimization. Sometimes this situation is that we need to check the condition of the code and we know that it will rarely be true
or false
. With these macros we can tell to the compiler about this. For example
Here we can see proc_root_readdir
function which will be called when the Linux needs to read the root
directory contents. If condition marked with unlikely
, compiler can put false
code right after branching. Now let's back to the our address check. Comparison between the given address and the 0x00007ffffffff000
will give us to know, was page fault in the kernel mode or user mode. After this check we know it. After this __do_page_fault
routine will try to understand the problem that provoked page fault exception and then will pass address to the appropriate routine. It can be kmemcheck
fault, spurious fault, fault and etc. Will not dive into implementation details of the page fault exception handler in this part, because we need to know many different concepts which are provided by the Linux kernel, but will see it in the chapter about the in the Linux kernel.
There are many different function calls after the early_trap_pf_init
in the setup_arch
function from different kernel subsystems, but there are no one interrupts and exceptions handling related. So, we have to go back where we came from - start_kernel
function from the . The first things after the setup_arch
is the trap_init
function from the . This function makes initialization of the remaining exceptions handlers (remember that we already setup 3 handlers for the #DB
- debug exception, #BP
- breakpoint exception and #PF
- page fault exception). The trap_init
function starts from the check of the :
Note that it depends on the CONFIG_EISA
kernel configuration parameter which represents EISA
support. Here we use early_ioremap
function to map I/O
memory on the page tables. We use readl
function to read first 4
bytes from the mapped region and if they are equal to EISA
string we set EISA_bus
to one. In the end we just unmap previously mapped region. More about early_ioremap
you can read in the part which describes .
We use set_intr_gate
macro to set the interrupt gate for the #DE
exception and set_intr_gate_ist
for the #NMI
. You can remember that we already used these macros when we have set the interrupts gates for the page fault handler, debug handler and etc, you can find explanation of it in the previous . After this we setup exception gates for the following exceptions:
#OF
or Overflow
exception. This exception indicates that an overflow trap occurred when an special instruction was executed;
#BR
or BOUND Range exceeded
exception. This exception indicates that a BOUND-range-exceed
fault occurred when a instruction was executed;
#UD
or Invalid Opcode
exception. Occurs when a processor attempted to execute invalid or reserved , processor attempted to execute instruction with invalid operand(s) and etc;
#NM
or Device Not Available
exception. Occurs when the processor tries to execute x87 FPU
floating point instruction while EM
flag in the cr0
was set.
#CSO
or Coprocessor Segment Overrun
- this exception indicates that math of an old processor detected a page or segment violation. Modern processors do not generate this exception
#TS
or Invalid TSS
exception - indicates that there was an error related to the .
#MF
or x87 FPU Floating-Point Error
exception caused when the has detected a floating point error.
Note that it depends on the CONFIG_X86_MCE
kernel configuration option and indicates that the processor detected an internal or a bus error, or that an external agent detected a bus error. The next exception gate is for the Floating-Point exception:
In the next step we fill the used_vectors
array which defined in the header file and represents bitmap
:
of the first 32
interrupts (more about bitmaps in the Linux kernel you can read in the part which describes )
and write its address to the idt_descr.address
(more about fix-mapped addresses you can read in the second part of the chapter). After this we can see the call of the cpu_init
function that defined in the . This function makes initialization of the all per-cpu
state. In the beginning of the cpu_init
we do the following things: First of all we wait while current cpu is initialized and than we call the cr4_init_shadow
function which stores shadow copy of the cr4
control register for the current cpu and load CPU microcode if need with the following function calls:
with this we disable vm86
extension, virtual interrupts, timestamp ( can only be executed with the highest privilege) and debug extension. After this we reload the Global Descriptor Table
and Interrupt Descriptor table
with the:
After this we setup array of the Thread-Local Storage Descriptors, configure and load CPU microcode. Now is time to setup and load per-cpu
Task State Segments. We are going in a loop through the all exception stack which is N_EXCEPTION_STACKS
or 4
and fill it with Interrupt Stack Tables
:
where set_tss_desc
macro from the writes given descriptor to the Global Descriptor Table
of the given processor:
Here we copy idt_table
to the nmi_dit_table
and setup exception handlers for the #DB
or Debug exception
and #BR
or Breakpoint exception
. You can remember that we already set these interrupt gates in the previous , so why do we need to setup it again? We setup it again because when we initialized it before in the early_trap_init
function, the Task State Segment
was not ready yet, but now it is ready after the call of the cpu_init
function.
It is the end of the fourth part about interrupts and interrupt handling in the Linux kernel. We saw the initialization of the in this part and initialization of the different interrupt handlers as Divide Error
, Page Fault
exception and etc. You can note that we saw just initialization stuff, and will dive into details about handlers for these exceptions. In the next part we will start to do it.
If you have any questions or suggestions write me a comment or ping me at .
Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to .