LogoLogo
  • 简介
  • 引导
    • 从引导加载程序内核
    • 在内核安装代码的第一步
    • 视频模式初始化和转换到保护模式
    • 过渡到 64 位模式
    • 内核解压缩
  • 初始化
    • 内核解压之后的首要步骤
    • 早期的中断和异常控制
    • 在到达内核入口之前最后的准备
    • 内核入口 - start_kernel
    • 体系架构初始化
    • 进一步初始化指定体系架构
    • 最后对指定体系架构初始化
    • 调度器初始化
    • RCU 初始化
    • 初始化结束
  • 中断
    • 中断和中断处理第一部分
    • 深入 Linux 内核中的中断
    • 初步中断处理
    • 中断处理
    • 异常处理的实现
    • 处理不可屏蔽中断
    • 深入外部硬件中断
    • IRQs的非早期初始化
    • Softirq, Tasklets and Workqueues
    • 最后一部分
  • 系统调用
    • 系统调用概念简介
    • Linux 内核如何处理系统调用
    • vsyscall and vDSO
    • Linux 内核如何运行程序
    • open 系统调用的实现
    • Linux 资源限制
  • 定时器和时钟管理
    • 简介
    • 时钟源框架简介
    • The tick broadcast framework and dyntick
    • 定时器介绍
    • Clockevents 框架简介
    • x86 相关的时钟源
    • Linux 内核中与时钟相关的系统调用
  • 同步原语
    • 自旋锁简介
    • 队列自旋锁
    • 信号量
    • 互斥锁
    • 读者/写者信号量
    • 顺序锁
    • RCU
    • Lockdep
  • 内存管理
    • 内存块
    • 固定映射地址和 ioremap
    • kmemcheck
  • 控制组
    • 控制组简介
  • SMP
  • 概念
    • 每个 CPU 的变量
    • CPU 掩码
    • initcall 机制
    • Linux 内核的通知链
  • Linux 内核中的数据结构
    • 双向链表
    • 基数树
    • 位数组
  • 理论
    • 分页
    • ELF 文件格式
    • 內联汇编
    • CPUID
    • MSR
  • Initial ram disk
  • 杂项
    • Linux 内核开发
    • 内核编译方法
    • 链接器
    • 用户空间的程序启动过程
    • 书写并提交你第一个内核补丁
  • 内核数据结构
    • 中断描述符表
  • 有帮助的链接
  • 贡献者
由 GitBook 提供支持
在本页
  • Exception Handling
  • Debug and Breakpoint exceptions
  • Preparation before an exception handler
  • An exception occurred in userspace
  • An exception with paranoid > 0 occurred in kernelspace
  • Exit from an exception handler
  • Conclusion
  • Links
  1. 中断

初步中断处理

上一页深入 Linux 内核中的中断下一页中断处理

最后更新于1年前

Exception Handling

This is the third part of the about interrupts and an exceptions handling in the Linux kernel and in the previous we stopped at the setup_arch function from the source code file.

We already know that this function executes initialization of architecture-specific stuff. In our case the setup_arch function does architecture related initializations. The setup_arch is big function, and in the previous part we stopped on the setting of the two exception handlers for the two following exceptions:

  • #DB - debug exception, transfers control from the interrupted process to the debug handler;

  • #BP - breakpoint exception, caused by the int 3 instruction.

These exceptions allow the x86_64 architecture to have early exception processing for the purpose of debugging via the .

As you can remember we set these exceptions handlers in the early_trap_init function:

void __init early_trap_init(void)
{
        set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK);
        set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK);
        load_idt(&idt_descr);
}

from the . We already saw implementation of the set_intr_gate_ist and set_system_intr_gate_ist functions in the previous part and now we will look on the implementation of these two exception handlers.

Debug and Breakpoint exceptions

Ok, we setup exception handlers in the early_trap_init function for the #DB and #BP exceptions and now is time to consider their implementations. But before we will do this, first of all let's look on details of these exceptions.

The vector number of the #DB exceptions is 1 (we pass it as X86_TRAP_DB) and as we may read in specification, this exception has no error code:

+-----------------------------------------------------+
|Vector|Mnemonic|Description         |Type |Error Code|
+-----------------------------------------------------+
|1     | #DB    |Reserved            |F/T  |NO        |
+-----------------------------------------------------+
// breakpoint.c
#include <stdio.h>

int main() {
    int i;
    while (i < 6){
	    printf("i equal to: %d\n", i);
	    __asm__("int3");
		++i;
    }
}

If we will compile and run this program, we will see following output:

$ gcc breakpoint.c -o breakpoint
$ ./breakpoint
i equal to: 0
Trace/breakpoint trap

But if will run it with gdb, we will see our breakpoint and can continue execution of our program:

$ gdb breakpoint
...
...
...
(gdb) run
Starting program: /home/alex/breakpoints
i equal to: 0

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000400585 in main ()
=> 0x0000000000400585 <main+31>:	83 45 fc 01	add    DWORD PTR [rbp-0x4],0x1
(gdb) c
Continuing.
i equal to: 1

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000400585 in main ()
=> 0x0000000000400585 <main+31>:	83 45 fc 01	add    DWORD PTR [rbp-0x4],0x1
(gdb) c
Continuing.
i equal to: 2

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000400585 in main ()
=> 0x0000000000400585 <main+31>:	83 45 fc 01	add    DWORD PTR [rbp-0x4],0x1
...
...
...

From this moment we know a little about these two exceptions and we can move on to consideration of their handlers.

Preparation before an exception handler

As you may note before, the set_intr_gate_ist and set_system_intr_gate_ist functions takes an addresses of exceptions handlers in theirs second parameter. In or case our two exception handlers will be:

  • debug;

  • int3.

asmlinkage void debug(void);

and

asmlinkage void int3(void);
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK

and

idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
  • sym - defines global symbol with the .globl name which will be an an entry of exception handler;

  • do_sym - symbol name which represents a secondary entry of an exception handler;

  • has_error_code - information about existence of an error code of exception.

The last two parameters are optional:

  • paranoid - shows us how we need to check current mode (will see explanation in details later);

  • shift_ist - shows us is an exception running at Interrupt Stack Table.

Definition of the .idtentry macro looks:

.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
ENTRY(\sym)
...
...
...
END(\sym)
.endm
    +------------+
+40 | %SS        |
+32 | %RSP       |
+24 | %RFLAGS    |
+16 | %CS        |
 +8 | %RIP       |
  0 | ERROR CODE | <-- %RSP
    +------------+

Now we may start to consider implementation of the idtmacro. Both #DB and BP exception handlers are defined as:

idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK

If we will look at these definitions, we may know that compiler will generate two routines with debug and int3 names and both of these exception handlers will call do_debug and do_int3 secondary handlers after some preparation. The third parameter defines existence of error code and as we may see both our exception do not have them. As we may see on the diagram above, processor pushes error code on stack if an exception provides it. In our case, the debug and int3 exception do not have error codes. This may bring some difficulties because stack will look differently for exceptions which provides error code and for exceptions which not. That's why implementation of the idtentry macro starts from putting a fake error code to the stack if an exception does not provide it:

.ifeq \has_error_code
    pushq	$-1
.endif

But it is not only fake error-code. Moreover the -1 also represents invalid system call number, so that the system call restart logic will not be triggered.

The second parameter - paranoid defines the method which helps us to know did we come from userspace or not to an exception handler. The easiest way to determine this is to via CPL or Current Privilege Level in CS segment register. If it is equal to 3, we came from userspace, if zero we came from kernel space:

testl $3,CS(%rsp)
jnz userspace
...
...
...
// we are from the kernel space

But unfortunately this method does not give a 100% guarantee. As described in the kernel documentation:

if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context, which might have triggered right after a normal entry wrote CS to the stack but before we executed SWAPGS, then the only safe way to check for GS is the slower method: the RDMSR.

movl $MSR_GS_BASE,%ecx
rdmsr
testl %edx,%edx
js 1f

In first two lines of code we read value of the MSR_GS_BASE model specific register into edx:eax pair. We can't set negative value to the gs from userspace. But from other side we know that direct mapping of the physical memory starts from the 0xffff880000000000 virtual address. In this way, MSR_GS_BASE will contain an address from 0xffff880000000000 to 0xffffc7ffffffffff. After the rdmsr instruction will be executed, the smallest possible value in the %edx register will be - 0xffff8800 which is -30720 in unsigned 4 bytes. That's why kernel space gs which points to start of per-cpu area will contain negative value.

After we push fake error code on the stack, we should allocate space for general purpose registers with:

ALLOC_PT_GPREGS_ON_STACK
.macro ALLOC_PT_GPREGS_ON_STACK addskip=0
    addq	$-(15*8+\addskip), %rsp
.endm

So the stack will look like this after execution of the ALLOC_PT_GPREGS_ON_STACK:

     +------------+
+160 | %SS        |
+152 | %RSP       |
+144 | %RFLAGS    |
+136 | %CS        |
+128 | %RIP       |
+120 | ERROR CODE |
     |------------|
+112 |            |
+104 |            |
 +96 |            |
 +88 |            |
 +80 |            |
 +72 |            |
 +64 |            |
 +56 |            |
 +48 |            |
 +40 |            |
 +32 |            |
 +24 |            |
 +16 |            |
  +8 |            |
  +0 |            | <- %RSP
     +------------+

After we allocated space for general purpose registers, we do some checks to understand did an exception come from userspace or not and if yes, we should move back to an interrupted process stack or stay on exception stack:

.if \paranoid
    .if \paranoid == 1
	    testb	$3, CS(%rsp)
	    jnz	1f
	.endif
	call	paranoid_entry
.else
	call	error_entry
.endif

Let's consider all of these there cases in course.

An exception occurred in userspace

In the first let's consider a case when an exception has paranoid=1 like our debug and int3 exceptions. In this case we check selector from CS segment register and jump at 1f label if we came from userspace or the paranoid_entry will be called in other way.

Let's consider first case when we came from userspace to an exception handler. As described above we should jump at 1 label. The 1 label starts from the call of the

call	error_entry

routine which saves all general purpose registers in the previously allocated area on the stack:

SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
.macro SAVE_EXTRA_REGS offset=0
	movq %r15, 0*8+\offset(%rsp)
	movq %r14, 1*8+\offset(%rsp)
	movq %r13, 2*8+\offset(%rsp)
	movq %r12, 3*8+\offset(%rsp)
	movq %rbp, 4*8+\offset(%rsp)
	movq %rbx, 5*8+\offset(%rsp)
.endm

After execution of SAVE_C_REGS and SAVE_EXTRA_REGS the stack will look:

     +------------+
+160 | %SS        |
+152 | %RSP       |
+144 | %RFLAGS    |
+136 | %CS        |
+128 | %RIP       |
+120 | ERROR CODE |
     |------------|
+112 | %RDI       |
+104 | %RSI       |
 +96 | %RDX       |
 +88 | %RCX       |
 +80 | %RAX       |
 +72 | %R8        |
 +64 | %R9        |
 +56 | %R10       |
 +48 | %R11       |
 +40 | %RBX       |
 +32 | %RBP       |
 +24 | %R12       |
 +16 | %R13       |
  +8 | %R14       |
  +0 | %R15       | <- %RSP
     +------------+

After the kernel saved general purpose registers at the stack, we should check that we came from userspace space again with:

testb	$3, CS+8(%rsp)
jz	.Lerror_kernelspace

Now we can back to the idtentry macro. We may see following assembler code after the call of error_entry:

movq	%rsp, %rdi
call	sync_regs
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
{
	struct pt_regs *regs = task_pt_regs(current);
	*regs = *eregs;
	return regs;
}
#define task_pt_regs(tsk)       ((struct pt_regs *)(tsk)->thread.sp0 - 1)

As we came from userspace, this means that exception handler will run in real process context. After we got stack pointer from the sync_regs we switch stack:

movq	%rax, %rsp

The last two steps before an exception handler will call secondary handler are:

  1. Passing pointer to pt_regs structure which contains preserved general purpose registers to the %rdi register:

movq	%rsp, %rdi

as it will be passed as first parameter of secondary exception handler.

  1. Pass error code to the %rsi register as it will be second argument of an exception handler and set it to -1 on the stack for the same purpose as we did it before - to prevent restart of a system call:

.if \has_error_code
	movq	ORIG_RAX(%rsp), %rsi
	movq	$-1, ORIG_RAX(%rsp)
.else
	xorl	%esi, %esi
.endif

Additionally you may see that we zeroed the %esi register above in a case if an exception does not provide error code.

In the end we just call secondary exception handler:

call	\do_sym

which:

dotraplinkage void do_debug(struct pt_regs *regs, long error_code);

will be for debug exception and:

dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code);

will be for int 3 exception. In this part we will not see implementations of secondary handlers, because they are very specific, but will see some of them in one of next parts.

We just considered first case when an exception occurred in userspace. Let's consider last two.

An exception with paranoid > 0 occurred in kernelspace

In this case an exception was occurred in kernelspace and idtentry macro is defined with paranoid=1 for this exception. This value of paranoid means that we should use slower way that we saw in the beginning of this part to check do we really came from kernelspace or not. The paranoid_entry routing allows us to know this:

ENTRY(paranoid_entry)
	cld
	SAVE_C_REGS 8
	SAVE_EXTRA_REGS 8
	movl	$1, %ebx
	movl	$MSR_GS_BASE, %ecx
	rdmsr
	testl	%edx, %edx
	js	1f
	SWAPGS
	xorl	%ebx, %ebx
1:	ret
END(paranoid_entry)

As you may see, this function represents the same that we covered before. We use second (slow) method to get information about previous state of an interrupted task. As we checked this and executed SWAPGS in a case if we came from userspace, we should to do the same that we did before: We need to put pointer to a structure which holds general purpose registers to the %rdi (which will be first parameter of a secondary handler) and put error code if an exception provides it to the %rsi (which will be second parameter of a secondary handler):

movq	%rsp, %rdi

.if \has_error_code
	movq	ORIG_RAX(%rsp), %rsi
	movq	$-1, ORIG_RAX(%rsp)
.else
	xorl	%esi, %esi
.endif

The last step before a secondary handler of an exception will be called is cleanup of new IST stack frame:

.if \shift_ist != -1
	subq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
.endif

You may remember that we passed the shift_ist as argument of the idtentry macro. Here we check its value and if its not equal to -1, we get pointer to a stack from Interrupt Stack Table by shift_ist index and setup it.

In the end of this second way we just call secondary exception handler as we did it before:

call	\do_sym

The last method is similar to previous both, but an exception occurred with paranoid=0 and we may use fast method determination of where we are from.

Exit from an exception handler

After secondary handler will finish its works, we will return to the idtentry macro and the next step will be jump to the error_exit:

jmp	error_exit

That's all.

Conclusion

Links

The first exceptions - #DB or debug exception occurs when a debug event occurs. For example - attempt to change the contents of a . Debug registers are special registers that were presented in x86 processors starting from the processor and as you can understand from name of this CPU extension, main purpose of these registers is debugging.

These registers allow to set breakpoints on the code and read or write data to trace it. Debug registers may be accessed only in the privileged mode and an attempt to read or write the debug registers when executing at any other privilege level causes a exception. That's why we have used set_intr_gate_ist for the #DB exception, but not the set_system_intr_gate_ist.

The second exception is #BP or breakpoint exception occurs when processor executes the instruction. Unlike the DB exception, the #BP exception may occur in userspace. We can add it anywhere in our code, for example let's look on the simple program:

You will not find these functions in the C code. All of that could be found in the kernel's *.c/*.h files only definition of these functions which are located in the kernel header file:

You may note asmlinkage directive in definitions of these functions. The directive is the special specificator of the . Actually for a C functions which are called from assembly, we need in explicit declaration of the function calling convention. In our case, if function made with asmlinkage descriptor, then gcc will compile the function to retrieve parameters from stack.

So, both handlers are defined in the assembly source code file with the idtentry macro:

Each exception handler may consists of two parts. The first part is generic part and it is the same for all exception handlers. An exception handler should to save on the stack, switch to kernel stack if an exception came from userspace and transfer control to the second part of an exception handler. The second part of an exception handler does certain work depends on certain exception. For example page fault exception handler should find virtual page for given address, invalid opcode exception handler should send SIGILL and etc.

As we just saw, an exception handler starts from definition of the idtentry macro from the assembly source code file, so let's look at implementation of this macro. As we may see, the idtentry macro takes five arguments:

Before we will consider internals of the idtentry macro, we should to know state of stack when an exception occurs. As we may read in the , the state of stack when an exception occurs is following:

The last two parameters of the idtentry macro shift_ist and paranoid allow to know do an exception handler runned at stack from Interrupt Stack Table or not. You already may know that each kernel thread in the system has its own stack. In addition to these stacks, there are some specialized stacks associated with each processor in the system. One of these stacks is - exception stack. The architecture provides special feature which is called - Interrupt Stack Table. This feature allows to switch to a new stack for designated events such as an atomic exceptions like double fault, etc. So the shift_ist parameter allows us to know do we need to switch on IST stack for an exception handler or not.

In other words for example NMI could happen inside the critical section of a instruction. In this way we should check value of the MSR_GS_BASE which stores pointer to the start of per-cpu area. So to check if we did come from userspace or not, we should to check value of the MSR_GS_BASE model specific register and if it is negative we came from kernel space, in other way we came from userspace:

macro which is defined in the header file. This macro just allocates 15*8 bytes space on the stack to preserve general purpose registers:

These both macros are defined in the header file and just move values of general purpose registers to a certain place at the stack, for example:

because we may have potentially fault if as described in documentation truncated %RIP was reported. Anyway, in both cases the instruction will be executed and values from MSR_KERNEL_GS_BASE and MSR_GS_BASE will be swapped. From this moment the %gs register will point to the base address of kernel structures. So, the SWAPGS instruction is called and it was main point of the error_entry routing.

Here we put base address of stack pointer %rdi register which will be first argument (according to ) of the sync_regs function and call this function which is defined in the source code file:

This function takes the result of the task_ptr_regs macro which is defined in the header file, stores it in the stack pointer and returns it. The task_ptr_regs macro expands to the address of thread.sp0 which represents pointer to the normal kernel stack:

routine. The error_exit function defined in the same assembly source code file and the main goal of this function is to know where we are from (from userspace or kernelspace) and execute SWPAGS depends on this. Restore registers to previous state and execute iret instruction to transfer control to an interrupted task.

It is the end of the third part about interrupts and interrupt handling in the Linux kernel. We saw the initialization of the in the previous part with the #DB and #BP gates and started to dive into preparation before control will be transferred to an exception handler and implementation of some interrupt handlers in this part. In the next part we will continue to dive into this theme and will go next by the setup_arch function and will try to understand interrupts handling related stuff.

If you have any questions or suggestions write me a comment or ping me at .

Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to .

chapter
part
arch/x86/kernel/setup.c
x86_64
kgdb
arch/x86/kernel/traps.c
debug register
Intel 80386
general protection fault
int 3
arch/x86/include/asm/traps.h
gcc
arch/x86/entry/entry_64.S
general purpose registers
signal
arch/x86/entry/entry_64.S
Intel® 64 and IA-32 Architectures Software Developer’s Manual 3A
x86_64
swapgs
model specific register
arch/x86/entry/calling.h
arch/x86/entry/calling.h
SWAPGS
x86_64 ABI
arch/x86/kernel/traps.c
arch/x86/include/asm/processor.h
arch/x86/entry/entry_64.S
Interrupt descriptor table
twitter
linux-insides
Debug registers
Intel 80385
INT 3
gcc
TSS
GNU assembly .error directive
dwarf2
CFI directives
IRQ
system call
swapgs
SIGTRAP
Per-CPU variables
kgdb
ACPI
Previous part