LogoLogo
  • 简介
  • 引导
    • 从引导加载程序内核
    • 在内核安装代码的第一步
    • 视频模式初始化和转换到保护模式
    • 过渡到 64 位模式
    • 内核解压缩
  • 初始化
    • 内核解压之后的首要步骤
    • 早期的中断和异常控制
    • 在到达内核入口之前最后的准备
    • 内核入口 - start_kernel
    • 体系架构初始化
    • 进一步初始化指定体系架构
    • 最后对指定体系架构初始化
    • 调度器初始化
    • RCU 初始化
    • 初始化结束
  • 中断
    • 中断和中断处理第一部分
    • 深入 Linux 内核中的中断
    • 初步中断处理
    • 中断处理
    • 异常处理的实现
    • 处理不可屏蔽中断
    • 深入外部硬件中断
    • IRQs的非早期初始化
    • Softirq, Tasklets and Workqueues
    • 最后一部分
  • 系统调用
    • 系统调用概念简介
    • Linux 内核如何处理系统调用
    • vsyscall and vDSO
    • Linux 内核如何运行程序
    • open 系统调用的实现
    • Linux 资源限制
  • 定时器和时钟管理
    • 简介
    • 时钟源框架简介
    • The tick broadcast framework and dyntick
    • 定时器介绍
    • Clockevents 框架简介
    • x86 相关的时钟源
    • Linux 内核中与时钟相关的系统调用
  • 同步原语
    • 自旋锁简介
    • 队列自旋锁
    • 信号量
    • 互斥锁
    • 读者/写者信号量
    • 顺序锁
    • RCU
    • Lockdep
  • 内存管理
    • 内存块
    • 固定映射地址和 ioremap
    • kmemcheck
  • 控制组
    • 控制组简介
  • SMP
  • 概念
    • 每个 CPU 的变量
    • CPU 掩码
    • initcall 机制
    • Linux 内核的通知链
  • Linux 内核中的数据结构
    • 双向链表
    • 基数树
    • 位数组
  • 理论
    • 分页
    • ELF 文件格式
    • 內联汇编
    • CPUID
    • MSR
  • Initial ram disk
  • 杂项
    • Linux 内核开发
    • 内核编译方法
    • 链接器
    • 用户空间的程序启动过程
    • 书写并提交你第一个内核补丁
  • 内核数据结构
    • 中断描述符表
  • 有帮助的链接
  • 贡献者
由 GitBook 提供支持
在本页
  • Non-early initialization of the IRQs
  • Conclusion
  • Links
  1. 中断

IRQs的非早期初始化

上一页深入外部硬件中断下一页Softirq, Tasklets and Workqueues

最后更新于1年前

Non-early initialization of the IRQs

This is the eighth part of the Interrupts and Interrupt Handling in the Linux kernel and in the previous we started to dive into the external hardware . We looked on the implementation of the early_irq_init function from the source code file and saw the initialization of the irq_desc structure in this function. Remind that irq_desc structure (defined in the is the foundation of interrupt management code in the Linux kernel and represents an interrupt descriptor. In this part we will continue to dive into the initialization stuff which is related to the external hardware interrupts.

Right after the call of the early_irq_init function in the we can see the call of the init_IRQ function. This function is architecture-specific and defined in the . The init_IRQ function makes initialization of the vector_irq variable that defined in the same source code file:

...
DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
         [0 ... NR_VECTORS - 1] = -1,
};
...

and represents percpu array of the interrupt vector numbers. The vector_irq_t defined in the and expands to the:

typedef int vector_irq_t[NR_VECTORS];

where NR_VECTORS is count of the vector number and as you can remember from the first of this chapter it is 256 for the :

#define NR_VECTORS                       256

So, in the start of the init_IRQ function we fill the vector_irq array with the vector number of the legacy interrupts:

void __init init_IRQ(void)
{
	int i;

	for (i = 0; i < nr_legacy_irqs(); i++)
		per_cpu(vector_irq, 0)[IRQ0_VECTOR + i] = i;
...
...
...
}
__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
	...
	...
	...
	irq = __this_cpu_read(vector_irq[vector]);

	if (!handle_irq(irq, regs)) {
		...
		...
		...
	}

	exiting_irq();
	...
	...
	return 1;
}
static inline int nr_legacy_irqs(void)
{
        return legacy_pic->nr_legacy_irqs;
}

This structure defined in the same header file and represents non-modern programmable interrupts controller:

struct legacy_pic {
        int nr_legacy_irqs;
        struct irq_chip *chip;
        void (*mask)(unsigned int irq);
        void (*unmask)(unsigned int irq);
        void (*mask_all)(void);
        void (*restore_mask)(void);
        void (*init)(int auto_eoi);
        int (*irq_pending)(unsigned int irq);
        void (*make_irq)(unsigned int irq);
};
#define NR_IRQS_LEGACY                    16
#define FIRST_EXTERNAL_VECTOR           0x20

#define IRQ0_VECTOR                     ((FIRST_EXTERNAL_VECTOR + 16) & ~15)

In the end of the init_IRQ function we can see the call of the following function:

x86_init.irqs.intr_init();
struct x86_init_ops x86_init __initdata
{
	...
	...
	...
    .irqs = {
                .pre_vector_init        = init_ISA_irqs,
                .intr_init              = native_init_IRQ,
                .trap_init              = x86_init_noop,
	},
	...
	...
	...
}
x86_init.irqs.pre_vector_init();
void __init init_ISA_irqs(void)
{
	struct irq_chip *chip = legacy_pic->chip;
	...
	...
	...
  • name - name of a device. Used in the /proc/interrupts:

$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         16          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          2          0          0          0          0          0          0          0   IO-APIC   1-edge      i8042
  8:          1          0          0          0          0          0          0          0   IO-APIC   8-edge      rtc0

look at the last column;

  • (*irq_mask)(struct irq_data *data) - mask an interrupt source;

  • (*irq_ack)(struct irq_data *data) - start of a new interrupt;

  • (*irq_startup)(struct irq_data *data) - start up the interrupt;

  • (*irq_shutdown)(struct irq_data *data) - shutdown the interrupt

  • etc.

fields. Note that the irq_data structure represents set of the per irq chip data passed down to chip functions. It contains mask - precomputed bitmask for accessing the chip registers, irq - interrupt number, hwirq - hardware interrupt number, local to the interrupt domain chip low level interrupt hardware access, etc.

#if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC)
	init_bsp_APIC();
#endif
if (smp_found_config || !cpu_has_apic)
	return;

Otherwise, we return from this function. In the next step we call the clear_local_APIC function from the same source code file that shuts down the local APIC (more on it in the Advanced Programmable Interrupt Controller chapter) and enable APIC of the first processor by the setting unsigned int value to the APIC_SPIV_APIC_ENABLED:

value = apic_read(APIC_SPIV);
value &= ~APIC_VECTOR_MASK;
value |= APIC_SPIV_APIC_ENABLED;

and writing it with the help of the apic_write function:

apic_write(APIC_SPIV, value);

After we have enabled APIC for the bootstrap processor, we return to the init_ISA_irqs function and in the next step we initialize legacy Programmable Interrupt Controller and set the legacy chip and handler for each legacy irq:

legacy_pic->init(0);

for (i = 0; i < nr_legacy_irqs(); i++)
    irq_set_chip_and_handler(i, chip, handle_level_irq);
struct legacy_pic *legacy_pic = &default_legacy_pic;

Where the default_legacy_pic is:

struct legacy_pic default_legacy_pic = {
	...
	...
	...
	.init = init_8259A,
	...
	...
	...
}
#define alloc_intr_gate(n, addr)                        \
do {                                                    \
        alloc_system_vector(n);                         \
        set_intr_gate(n, addr);                         \
} while (0)
if (!test_bit(vector, used_vectors)) {
	set_bit(vector, used_vectors);
    if (first_system_vector > vector)
		first_system_vector = vector;
} else {
	BUG();
}
#define test_bit(nr, addr)                      \
        (__builtin_constant_p((nr))             \
         ? constant_test_bit((nr), (addr))      \
         : variable_test_bit((nr), (addr)))
#include <stdio.h>

#define PREDEFINED_VAL 1

int main() {
	int i = 5;
	printf("__builtin_constant_p(i) is %d\n", __builtin_constant_p(i));
	printf("__builtin_constant_p(PREDEFINED_VAL) is %d\n", __builtin_constant_p(PREDEFINED_VAL));
	printf("__builtin_constant_p(100) is %d\n", __builtin_constant_p(100));

	return 0;
}

and look at the result:

$ gcc test.c -o test
$ ./test
__builtin_constant_p(i) is 0
__builtin_constant_p(PREDEFINED_VAL) is 1
__builtin_constant_p(100) is 1

Now I think it must be clear for you. Let's get back to the test_bit macro. If the __builtin_constant_p returns non-zero, we call constant_test_bit function:

static inline int constant_test_bit(int nr, const void *addr)
{
	const u32 *p = (const u32 *)addr;

	return ((1UL << (nr & 31)) & (p[nr >> 5])) != 0;
}

and the variable_test_bit in other way:

static inline int variable_test_bit(int nr, const void *addr)
{
        u8 v;
        const u32 *p = (const u32 *)addr;

        asm("btl %2,%1; setc %0" : "=qm" (v) : "m" (*p), "Ir" (nr));
        return v;
}

What's the difference between two these functions and why do we need in two different functions for the same purpose? As you already can guess main purpose is optimization. If we write simple example with these functions:

#define CONST 25

int main() {
	int nr = 24;
	variable_test_bit(nr, (int*)0x10000000);
	constant_test_bit(CONST, (int*)0x10000000)
	return 0;
}

and will look at the assembly output of our example we will see following assembly code:

pushq	%rbp
movq	%rsp, %rbp

movl	$268435456, %esi
movl	$25, %edi
call	constant_test_bit

for the constant_test_bit, and:

pushq	%rbp
movq	%rsp, %rbp

subq	$16, %rsp
movl	$24, -4(%rbp)
movl	-4(%rbp), %eax
movl	$268435456, %esi
movl	%eax, %edi
call	variable_test_bit
         <- stack grows

	          %[rbp]
                 |
+----------+ +---------+ +---------+ +--------+
|          | |         | | return  | |        |
|    nr    |-|         |-|         |-|  argc  |
|          | |         | | address | |        |
+----------+ +---------+ +---------+ +--------+
                 |
              %[rsp]

After this we put this value to the eax, so eax register now contains value of the nr. In the end we do the same that in the first example, we put the $268435456 (the first parameter of the variable_test_bit function) and the value of the eax (value of nr) to the edi register (the second parameter of the variable_test_bit function).

The next step after the apic_intr_init function will finish its work is the setting interrupt gates from the FIRST_EXTERNAL_VECTOR or 0x20 up to 0x100:

i = FIRST_EXTERNAL_VECTOR;

#ifndef CONFIG_X86_LOCAL_APIC
#define first_system_vector NR_VECTORS
#endif

for_each_clear_bit_from(i, used_vectors, first_system_vector) {
	set_intr_gate(i, irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR));
}

But as we are using the for_each_clear_bit_from helper, we set only non-initialized interrupt gates. After this we use the same for_each_clear_bit_from helper to fill the non-filled interrupt gates in the interrupt table with the spurious_interrupt:

#ifdef CONFIG_X86_LOCAL_APIC
for_each_clear_bit_from(i, used_vectors, NR_VECTORS)
    set_intr_gate(i, spurious_interrupt);
#endif
for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++)
    set_bit(i, used_vectors);

In the end of the native_init_IRQ function we can see the following check:

if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs())
	setup_irq(2, &irq2);
#define acpi_ioapic 0
#ifdef CONFIG_OF
extern int of_ioapic;
...
...
...
#else
#define of_ioapic 0
...
...
...
#endif

If the condition returns non-zero value we call the:

setup_irq(2, &irq2);
static struct irqaction irq2 = {
	.handler = no_action,
    .name = "cascade",
    .flags = IRQF_NO_THREAD,
};
  • IRQ 0 - system time;

  • IRQ 1 - keyboard;

  • IRQ 2 - used for devices which are cascade connected;

  • IRQ 9 - reserved;

  • IRQ 10 - reserved;

  • IRQ 11 - reserved;

  • IRQ 12 - ps/2 mouse;

  • IRQ 13 - coprocessor;

  • IRQ 14 - hard drive controller;

  • IRQ 1 - reserved;

  • IRQ 3 - COM2 and COM4;

  • IRQ 4 - COM1 and COM3;

  • IRQ 5 - LPT2;

  • IRQ 6 - drive controller;

  • IRQ 7 - LPT1.

  • vector number of an interrupt;

  • irqaction structure related with an interrupt.

This function initializes interrupt descriptor from the given vector number at the beginning:

struct irq_desc *desc = irq_to_desc(irq);

And call the __setup_irq function that sets up given interrupt:

chip_bus_lock(desc);
retval = __setup_irq(irq, desc, act);
chip_bus_sync_unlock(desc);
return retval;

Note that the interrupt descriptor is locked during __setup_irq function will work. The __setup_irq function does many different things: it creates a handler thread when a thread function is supplied and the interrupt does not nest into another interrupt thread, sets the flags of the chip, fills the irqaction structure and many many more.

All of the above it creates /prov/vector_number directory and fills it, but if you are using modern computer all values will be zero there:

$ cat /proc/irq/2/node
0

$cat /proc/irq/2/affinity_hint
00

cat /proc/irq/2/spurious
count 0
unhandled 0
last_unhandled 0 ms

because probably APIC handles interrupts on the machine.

That's all.

Conclusion

In the next part we will continue to learn interrupts handling related stuff and will see initialization of the softirqs.

Links

This vector_irq will be used during the first steps of an external hardware interrupt handling in the do_IRQ function from the :

Why is legacy here? Actually all interrupts are handled by the modern controller. But these interrupts (from 0x30 to 0x3f) by legacy interrupt-controllers like . If these interrupts are handled by the I/O APIC then this vector space will be freed and re-used. Let's look on this code closer. First of all the nr_legacy_irqs defined in the and just returns the nr_legacy_irqs field from the legacy_pic structure:

Actual default maximum number of the legacy interrupts represented by the NR_IRQ_LEGACY macro from the :

In the loop we are accessing the vecto_irq per-cpu array with the per_cpu macro by the IRQ0_VECTOR + i index and write the legacy vector number there. The IRQ0_VECTOR macro defined in the header file and expands to the 0x30:

Why is 0x30 here? You can remember from the first of this chapter that first 32 vector numbers from 0 to 31 are reserved by the processor and used for the processing of architecture-defined exceptions and interrupts. Vector numbers from 0x30 to 0x3f are reserved for the . So, it means that we fill the vector_irq from the IRQ0_VECTOR which is equal to the 32 to the IRQ0_VECTOR + 16 (before the 0x30).

from the source code file. If you have read about the Linux kernel initialization process, you can remember the x86_init structure. This structure contains a couple of files which point to the function related to the platform setup (x86_64 in our case), for example resources - related with the memory resources, mpparse - related with the parsing of the table, etc.). As we can see the x86_init also contains the irqs field which contains the three following fields:

Now, we are interesting in the native_init_IRQ. As we can note, the name of the native_init_IRQ function contains the native_ prefix which means that this function is architecture-specific. It defined in the and executes general initialization of the and initialization of the irqs. Let's look at the implementation of the native_init_IRQ function and try to understand what occurs there. The native_init_IRQ function starts from the execution of the following function:

As we can see above, the pre_vector_init points to the init_ISA_irqs function that defined in the same file and as we can understand from the function's name, it makes initialization of the ISA related interrupts. The init_ISA_irqs function starts from the definition of the chip variable which has a irq_chip type:

The irq_chip structure defined in the header file and represents hardware interrupt chip descriptor. It contains:

After this depends on the CONFIG_X86_64 and CONFIG_X86_LOCAL_APIC kernel configuration option call the init_bsp_APIC function from the :

This function makes initialization of the of bootstrap processor (or processor which starts first). It starts from the check that we found config (read more about it in the sixth of the Linux kernel initialization process chapter) and the processor has APIC:

Where can we find init function? The legacy_pic defined in the and it is:

The init_8259A function defined in the same source code file and executes initialization of the Programmable Interrupt Controller (more about it will be in the separate chapter about Programmable Interrupt Controllers and APIC).

Now we can return to the native_init_IRQ function, after the init_ISA_irqs function finished its work. The next step is the call of the apic_intr_init function that allocates special interrupt gates which are used by the architecture for the . The alloc_intr_gate macro from the used for the interrupt descriptor allocation:

As we can see, first of all it expands to the call of the alloc_system_vector function that checks the given vector number in the used_vectors bitmap (read previous about it) and if it is not set in the used_vectors bitmap we set it. After this we test that the first_system_vector is greater than given interrupt vector number and if it is greater we assign it:

We already saw the set_bit macro, now let's look at the test_bit and the first_system_vector. The first test_bit macro defined in the and looks like this:

We can see the here makes a test with the built-in function __builtin_constant_p tests that given vector number (nr) is known at compile time. If you're feeling misunderstanding of the __builtin_constant_p, we can make simple test:

for the variable_test_bit. These two code listings starts with the same part, first of all we save base of the current stack frame in the %rbp register. But after this code for both examples is different. In the first example we put $268435456 (here the $268435456 is our second parameter - 0x10000000) to the esi and $25 (our first parameter) to the edi register and call constant_test_bit. We put function parameters to the esi and edi registers because as we are learning Linux kernel for the x86_64 architecture we use System V AMD64 ABI . All is pretty simple. When we are using predefined constant, the compiler can just substitute its value. Now let's look at the second part. As you can see here, the compiler can not substitute value from the nr variable. In this case compiler must calculate its offset on the program's . We subtract 16 from the rsp register to allocate stack for the local variables data and put the $24 (value of the nr variable) to the rbp with offset -4. Our stack frame will be like this:

Where the spurious_interrupt function represent interrupt handler for the spurious interrupt. Here the used_vectors is the unsigned long that contains already initialized interrupt gates. We already filled first 32 interrupt vectors in the trap_init function from the source code file:

You can remember how we did it in the sixth of this chapter.

First of all let's deal with the condition. The acpi_ioapic variable represents existence of . It defined in the . This variable set in the acpi_set_irq_model_ioapic function that called during the processing Multiple APIC Description Table. This occurs during initialization of the architecture-specific stuff in the (more about it we will know in the other chapter about ). Note that the value of the acpi_ioapic variable depends on the CONFIG_ACPI and CONFIG_X86_LOCAL_APIC Linux kernel configuration options. If these options were not set, this variable will be just zero:

The second condition - !of_ioapic && nr_legacy_irqs() checks that we do not use I/O APIC and legacy interrupt controller. We already know about the nr_legacy_irqs. The second is of_ioapic variable defined in the and initialized in the dtb_ioapic_setup function that build information about APICs in the . Note that of_ioapic variable depends on the CONFIG_OF Linux kernel configuration option. If this option is not set, the value of the of_ioapic will be zero too:

function. First of all about the irq2. The irq2 is the irqaction structure that defined in the source code file and represents IRQ 2 line that is used to query devices connected cascade:

Some time ago interrupt controller consisted of two chips and one was connected to second. The second chip that was connected to the first chip via this IRQ 2 line. This chip serviced lines from 8 to 15 and after this lines of the first chip. So, for example has following lines:

IRQ 8 - ;

The setup_irq function is defined in the and takes two parameters:

It is the end of the eighth part of the chapter and we continued to dive into external hardware interrupts in this part. In the previous part we started to do it and saw early initialization of the IRQs. In this part we already saw non-early interrupts initialization in the init_IRQ function. We saw initialization of the vector_irq per-cpu array which is store vector numbers of the interrupts and will be used during interrupt handling and initialization of other stuff which is related to the external hardware interrupts.

If you have any questions or suggestions write me a comment or ping me at .

Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to .

chapter
part
interrupts
kernel/irq/irqdesc.c
include/linux/irqdesc.h
init/main.c
arch/x86/kernel/irqinit.c
percpu
arch/x86/kernel/irqinit.c
arch/x86/include/asm/hw_irq.h
part
x86_64
percpu
arch/x86/kernel/irq.c
IO-APIC
Programmable Interrupt Controller
arch/x86/include/asm/i8259.h
arch/x86/include/asm/irq_vectors.h
arch/x86/include/asm/irq_vectors.h
part
ISA
arch/x86/kernel/x86_init.c
chapter
MultiProcessor Configuration Table
arch/x86/kernel/irqinit.c
Local APIC
ISA
source code
include/linux/irq.h
arch/x86/kernel/apic/apic.c
APIC
SMP
part
arch/x86/kernel/i8259.c
Intel 8259
SMP
Inter-processor interrupt
arch/x86/include/asm/desc.h
part
arch/x86/include/asm/bitops.h
ternary operator
gcc
calling convention
stack frame
arch/x86/kernel/setup.c
part
I/O APIC
arch/x86/kernel/acpi/boot.c
arch/x86/kernel/setup.c
APIC
Open Firmware
arch/x86/kernel/devicetree.c
devicetree
arch/x86/kernel/irqinit.c
Intel 8259A
RTC
kernel/irq/manage.c
Interrupts and Interrupt Handling
twitter
linux-insides
IRQ
percpu
x86_64
Intel 8259
Programmable Interrupt Controller
ISA
MultiProcessor Configuration Table
Local APIC
I/O APIC
SMP
Inter-processor interrupt
ternary operator
gcc
calling convention
PDF. System V Application Binary Interface AMD64
Call stack
Open Firmware
devicetree
RTC
Previous part