LogoLogo
  • 简介
  • 引导
    • 从引导加载程序内核
    • 在内核安装代码的第一步
    • 视频模式初始化和转换到保护模式
    • 过渡到 64 位模式
    • 内核解压缩
  • 初始化
    • 内核解压之后的首要步骤
    • 早期的中断和异常控制
    • 在到达内核入口之前最后的准备
    • 内核入口 - start_kernel
    • 体系架构初始化
    • 进一步初始化指定体系架构
    • 最后对指定体系架构初始化
    • 调度器初始化
    • RCU 初始化
    • 初始化结束
  • 中断
    • 中断和中断处理第一部分
    • 深入 Linux 内核中的中断
    • 初步中断处理
    • 中断处理
    • 异常处理的实现
    • 处理不可屏蔽中断
    • 深入外部硬件中断
    • IRQs的非早期初始化
    • Softirq, Tasklets and Workqueues
    • 最后一部分
  • 系统调用
    • 系统调用概念简介
    • Linux 内核如何处理系统调用
    • vsyscall and vDSO
    • Linux 内核如何运行程序
    • open 系统调用的实现
    • Linux 资源限制
  • 定时器和时钟管理
    • 简介
    • 时钟源框架简介
    • The tick broadcast framework and dyntick
    • 定时器介绍
    • Clockevents 框架简介
    • x86 相关的时钟源
    • Linux 内核中与时钟相关的系统调用
  • 同步原语
    • 自旋锁简介
    • 队列自旋锁
    • 信号量
    • 互斥锁
    • 读者/写者信号量
    • 顺序锁
    • RCU
    • Lockdep
  • 内存管理
    • 内存块
    • 固定映射地址和 ioremap
    • kmemcheck
  • 控制组
    • 控制组简介
  • SMP
  • 概念
    • 每个 CPU 的变量
    • CPU 掩码
    • initcall 机制
    • Linux 内核的通知链
  • Linux 内核中的数据结构
    • 双向链表
    • 基数树
    • 位数组
  • 理论
    • 分页
    • ELF 文件格式
    • 內联汇编
    • CPUID
    • MSR
  • Initial ram disk
  • 杂项
    • Linux 内核开发
    • 内核编译方法
    • 链接器
    • 用户空间的程序启动过程
    • 书写并提交你第一个内核补丁
  • 内核数据结构
    • 中断描述符表
  • 有帮助的链接
  • 贡献者
由 GitBook 提供支持
在本页
  • Introduction
  • Initialization of non-standard PC hardware clock
  • Acquainted with jiffies
  • Using the jiffies
  • Conclusion
  • Links
  1. 定时器和时钟管理

简介

上一页定时器和时钟管理下一页时钟源框架简介

最后更新于1年前

Introduction

This is yet another post that opens a new chapter in the book. The previous described concepts, and now it's time to start new chapter. As one might understand from the title, this chapter will be devoted to the timers and time management in the Linux kernel. The choice of topic for the current chapter is not accidental. Timers (and generally, time management) are very important and widely used in the Linux kernel. The Linux kernel uses timers for various tasks, for example different timeouts in the implementation, the kernel knowing current time, scheduling asynchronous functions, next event interrupt scheduling and many many more.

So, we will start to learn implementation of the different time management related stuff in this part. We will see different types of timers and how different Linux kernel subsystems use them. As always, we will start from the earliest part of the Linux kernel and go through the initialization process of the Linux kernel. We already did it in the special which describes the initialization process of the Linux kernel, but as you may remember we missed some things there. And one of them is the initialization of timers.

Let's start.

Initialization of non-standard PC hardware clock

After the Linux kernel was decompressed (more about this you can read in the part) the architecture non-specific code starts to work in the source code file. After initialization of the , initialization of and setting value we can see the call of the setup_arch function.

As you may remember, this function (defined in the ) prepares/initializes architecture-specific stuff (for example it reserves a place for section, reserves a place for , parses kernel command line, and many, many other things). Besides this, we can find some time management related functions there.

The first is:

x86_init.timers.wallclock_init();

We already saw x86_init structure in the chapter that describes initialization of the Linux kernel. This structure contains pointers to the default setup functions for the different platforms like , , etc. The x86_init structure is defined in the , and as you can see it determines standard PC hardware by default.

As we can see, the x86_init structure has the x86_init_ops type that provides a set of functions for platform specific setup like reserving standard resources, platform specific memory setup, initialization of interrupt handlers, etc. This structure looks like:

struct x86_init_ops {
	struct x86_init_resources       resources;
    struct x86_init_mpparse         mpparse;
    struct x86_init_irqs            irqs;
    struct x86_init_oem             oem;
    struct x86_init_paging          paging;
    struct x86_init_timers          timers;
    struct x86_init_iommu           iommu;
    struct x86_init_pci             pci;
};
  • setup_percpu_clockev - set up the per cpu clock event device for the boot cpu;

  • timer_init - initialize the platform timer;

  • wallclock_init - initialize the wallclock device.

So, as we already know, in our case the wallclock_init executes initialization of the wallclock device. If we look on the x86_init structure, we see that wallclock_init points to the x86_init_noop:

struct x86_init_ops x86_init __initdata = {
	...
	...
	...
	.timers = {
		.wallclock_init		    = x86_init_noop,
	},
	...
	...
	...
}

Where the x86_init_noop is just a function that does nothing:

void __cpuinit x86_init_noop(void) { }
void __init x86_intel_mid_early_setup(void)
{
	...
	...
	...
	x86_init.timers.wallclock_init = intel_mid_rtc_init;
	...
	...
	...
}
void __init intel_mid_rtc_init(void)
{
	unsigned long vrtc_paddr;

	sfi_table_parse(SFI_SIG_MRTC, NULL, NULL, sfi_parse_mrtc);

	vrtc_paddr = sfi_mrtc_array[0].phys_addr;
	if (!sfi_mrtc_num || !vrtc_paddr)
		return;

	vrtc_virt_base = (void __iomem *)set_fixmap_offset_nocache(FIX_LNW_VRTC,
								vrtc_paddr);

    x86_platform.get_wallclock = vrtc_get_time;
	x86_platform.set_wallclock = vrtc_set_mmss;
}

Acquainted with jiffies

register_refined_jiffies(CLOCK_TICK_RATE);
Jiffy is an informal term for any unspecified short period of time

This definition is very similar to the jiffy in the Linux kernel. There is global variable with the jiffies which holds the number of ticks that have occurred since the system booted. The Linux kernel sets this variable to zero:

extern unsigned long volatile __jiffy_data jiffies;

during initialization process. This global variable will be increased each time during timer interrupt. Besides this, near the jiffies variable we can see the definition of the similar variable

extern u64 jiffies_64;
#ifdef CONFIG_X86_32
...
jiffies = jiffies_64;
...
#else
...
jiffies_64 = jiffies;
...
#endif

In the case of x86_32 the jiffies will be the lower 32 bits of the jiffies_64 variable. Schematically, we can imagine it as follows

                    jiffies_64
+-----------------------------------------------------+
|                       |                             |
|                       |                             |
|                       |       jiffies on `x86_32`   |
|                       |                             |
|                       |                             |
+-----------------------------------------------------+
63                     31                             0
The `clocksource` is hardware abstraction for a free-running counter.

I'm not sure about you, but that description didn't give a good understanding about the clocksource concept. Let's try to understand what is it, but we will not go deeper because this topic will be described in a separate part in much more detail. The main point of the clocksource is timekeeping abstraction or in very simple words - it provides a time value to the kernel. We already know about the jiffies interface that represents number of ticks that have occurred since the system booted. It is represented by a global variable in the Linux kernel and increases each timer interrupt. The Linux kernel can use jiffies for time measurement. So why do we need in separate context like the clocksource? Actually, different hardware devices provide different clock sources that are varied in their capabilities. The availability of more precise techniques for time intervals measurement is hardware-dependent.

static struct clocksource clocksource_jiffies = {
	.name		= "jiffies",
	.rating		= 1,
	.read		= jiffies_read,
	.mask		= 0xffffffff,
	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT,
	.shift		= JIFFIES_SHIFT,
	.max_cycles	= 10,
};

We can see the definition of the default name here - jiffies. The next is the rating field, which allows the best registered clock source to be chosen by the clock source management code available for the specified hardware. The rating may have following value:

  • 1-99 - Only available for bootup and testing purposes;

  • 100-199 - Functional for real use, but not desired.

  • 200-299 - A correct and usable clocksource.

  • 300-399 - A reasonably fast and accurate clocksource.

  • 400-499 - The ideal clocksource. A must-use where available;

static cycle_t jiffies_read(struct clocksource *cs)
{
        return (cycle_t) jiffies;
}

that is just 64-bit unsigned type:

typedef u64 cycle_t;

The next field is the mask value, which ensures that subtraction between counters values from non 64 bit counters do not need special overflow logic. In our case the mask is 0xffffffff and it is 32 bits. This means that jiffy wraps around to zero after 42 seconds:

>>> 0xffffffff
4294967295
# 42 nanoseconds
>>> 42 * pow(10, -9)
4.2000000000000006e-08
# 43 nanoseconds
>>> 43 * pow(10, -9)
4.3e-08
((u64) cycles * mult) >> shift;

As we can see the mult field is equal:

NSEC_PER_JIFFY << JIFFIES_SHIFT

#define NSEC_PER_JIFFY  ((NSEC_PER_SEC+HZ/2)/HZ)
#define NSEC_PER_SEC    1000000000L

by default, and the shift is

#if HZ < 34
  #define JIFFIES_SHIFT   6
#elif HZ < 67
  #define JIFFIES_SHIFT   7
#else
  #define JIFFIES_SHIFT   8
#endif
#define HZ		CONFIG_HZ

Where CONFIG_HZ can be one of the following values:

This means that in our case the timer interrupt frequency is 250 HZ or occurs 250 times per second or one timer interrupt each 4ms.

The last field that we can see in the definition of the clocksource_jiffies structure is the - max_cycles that holds the maximum cycle value that can safely be multiplied without potentially causing an overflow.

Ok, we just saw definition of the clocksource_jiffies structure, also we know a little about jiffies and clocksource, now it is time to get back to the implementation of the our function. In the beginning of this part we have stopped on the call of the:

register_refined_jiffies(CLOCK_TICK_RATE);
struct clocksource refined_jiffies;

There is one difference between refined_jiffies and clocksource_jiffies: The standard jiffies based clock source is the lowest common denominator clock source which should function on all systems. As we already know, the jiffies global variable will be increased during each timer interrupt. This means the that standard jiffies based clock source has the same resolution as the timer interrupt frequency. From this we can understand that standard jiffies based clock source may suffer from inaccuracies. The refined_jiffies uses CLOCK_TICK_RATE as the base of jiffies shift.

Let's look at the implementation of this function. First of all, we can see that the refined_jiffies clock source based on the clocksource_jiffies structure:

int register_refined_jiffies(long cycles_per_second)
{
	u64 nsec_per_tick, shift_hz;
	long cycles_per_tick;

	refined_jiffies = clocksource_jiffies;
	refined_jiffies.name = "refined-jiffies";
	refined_jiffies.rating++;
	...
	...
	...

Here we can see that we update the name of the refined_jiffies to refined-jiffies and increase the rating of this structure. As you remember, the clocksource_jiffies has rating - 1, so our refined_jiffies clocksource will have rating - 2. This means that the refined_jiffies will be the best selection for clock source management code.

In the next step we need to calculate number of cycles per one tick:

cycles_per_tick = (cycles_per_second + HZ/2)/HZ;
#define CLOCK_TICK_RATE         PIT_TICK_RATE

where the PIT_TICK_RATE macro expands to the frequency of the [Intel 8253](Programmable interval timer):

#define PIT_TICK_RATE 1193182ul

After this we calculate shift_hz for the register_refined_jiffies that will store hz << 8 or in other words frequency of the system timer. We shift left the cycles_per_second or frequency of the programmable interval timer on 8 in order to get extra accuracy:

shift_hz = (u64)cycles_per_second << 8;
shift_hz += cycles_per_tick/2;
do_div(shift_hz, cycles_per_tick);

In the next step we calculate the number of seconds per one tick by shifting left the NSEC_PER_SEC on 8 too as we did it with the shift_hz and do the same calculation as before:

nsec_per_tick = (u64)NSEC_PER_SEC << 8;
nsec_per_tick += (u32)shift_hz/2;
do_div(nsec_per_tick, (u32)shift_hz);
refined_jiffies.mult = ((u32)nsec_per_tick) << JIFFIES_SHIFT;
__clocksource_register(&refined_jiffies);
return 0;

The clock source management code provides the API for clock source registration and selection. As we can see, clock sources are registered by calling the __clocksource_register function during kernel initialization or from a kernel module. During registration, the clock source management code will choose the best clock source available in the system using the clocksource.rating field which we already saw when we initialized clocksource structure for jiffies.

Using the jiffies

We just saw initialization of two jiffies based clock sources in the previous paragraph:

  • standard jiffies based clock source;

  • refined jiffies based clock source;

u64 get_jiffies_64(void)
{
	unsigned long seq;
	u64 ret;

	do {
		seq = read_seqbegin(&jiffies_lock);
		ret = jiffies_64;
	} while (read_seqretry(&jiffies_lock, seq));
	return ret;
}
EXPORT_SYMBOL(get_jiffies_64);

Note that the get_jiffies_64 function is not implemented as jiffies_read for example:

static cycle_t jiffies_read(struct clocksource *cs)
{
	return (cycle_t) jiffies;
}

If we can access the jiffies or the jiffies_64 variable we can convert it to human time units. To get one second we can use following expression:

jiffies / HZ

So, if we know this, we can get any time units. For example:

/* Thirty seconds from now */
jiffies + 30*HZ

/* Two minutes from now */
jiffies + 120*HZ

/* One millisecond from now */
jiffies + HZ / 1000

That's all.

Conclusion

This concludes the first part covering time and time management related concepts in the Linux kernel. We first met two concepts and their initialization: jiffies and clocksource. In the next part we will continue to dive into this interesting theme, and as I already wrote in this part, we will try to understand the insides of these and other time management concepts in the Linux kernel.

Links

Note the timers field that has the x86_init_timers type. We can understand by its name that this field is related to time management and timers. x86_init_timers contains four fields which are all functions that returns pointer on :

tsc_pre_init - platform function called before init;

for the standard PC hardware. Actually, the wallclock_init function is used in the platform. Initialization of the x86_init.timers.wallclock_init is located in the source code file in the x86_intel_mid_early_setup function:

Implementation of the intel_mid_rtc_init function is in the source code file and looks pretty simple. First of all, this function parses M-Real-Time-Clock table for getting such devices to the sfi_mrtc_array array and initialization of the set_time and get_time functions:

That's all, after this a device based on Intel MID will be able to get time from the hardware clock. As I already wrote, the standard PC architecture does not support x86_init_noop and just do nothing during call of this function. We just saw initialization of the for the architecture, now it's time to return to the general x86_64 architecture and will look on the time management related stuff there.

If we return to the setup_arch function (which is located, as you remember, in the source code file), we see the next call of the time management related function:

Before we look at the implementation of this function, we must know about . As we can read on wikipedia:

Actually, only one of these variables is in use in the Linux kernel, and it depends on the processor type. For the it will be u64 use and for the it's unsigned long. We see this looking at the linker script:

Now we know a little theory about jiffies and can return to our function. There is no architecture-specific implementation for our function - the register_refined_jiffies. This function is located in the generic kernel code - source code file. Main point of the register_refined_jiffies is registration of the jiffy clocksource. Before we look on the implementation of the register_refined_jiffies function, we must know what clocksource is. As we can read in the comments:

For example x86 has on-chip a 64-bit counter that is called and its frequency can be equal to processor frequency. Or for example the , that consists of a 64-bit counter of at least 10 MHz frequency. Two different timers and they are both for x86. If we will add timers from other architectures, this only makes this problem more complex. The Linux kernel provides the clocksource concept to solve the problem.

The clocksource concept is represented by the clocksource structure in the Linux kernel. This structure is defined in the header file and contains a couple of fields that describe a time counter. For example, it contains - name field which is the name of a counter, flags field that describes different properties of a counter, pointers to the suspend and resume functions, and many more.

Let's look at the clocksource structure for jiffies that is defined in the source code file:

For example, rating of the is 300, but rating of the is 250. The next field is read - it is pointer to the function that allows it to read clocksource's cycle value; or in other words, it just returns jiffies variable with cycle_t type:

The next two fields mult and shift are used to convert the clocksource's period to nanoseconds per cycle. When the kernel calls the clocksource.read function, this function returns a value in machine time units represented with cycle_t data type that we saw just now. To convert this return value to we need these two fields: mult and shift. The clocksource provides the clocksource_cyc2ns function that will do it for us with the following expression:

The jiffies clock source uses the NSEC_PER_JIFFY multiplier conversion to specify the nanosecond over cycle ratio. Note that values of the JIFFIES_SHIFT and NSEC_PER_JIFFY depend on HZ value. The HZ represents the frequency of the system timer. This macro defined in the and depends on the CONFIG_HZ kernel configuration option. The value of HZ differs for each supported architecture, but for x86 it's defined like:

function from the source code file.

As I already wrote, the main purpose of the register_refined_jiffies function is to register refined_jiffies clocksource. We already saw the clocksource_jiffies structure represents standard jiffies clock source. Now, if you look in the source code file, you will find yet another clock source definition:

Note that we have used NSEC_PER_SEC macro as the base of the standard jiffies multiplier. Here we are using the cycles_per_second which is the first parameter of the register_refined_jiffies function. We've passed the CLOCK_TICK_RATE macro to the register_refined_jiffies function. This macro is defined in the header file and expands to the:

In the end of the register_refined_jiffies function we register new clock source with the __clocksource_register function that is defined in the header file and return:

Don't worry if you don't understand the calculations here. They look frightening at first. Soon, step by step we will learn these things. So, we just saw initialization of jiffies based clock sources and also we know that the Linux kernel has the global variable jiffies that holds the number of ticks that have occurred since the kernel started to work. Now, let's look how to use it. To use jiffies we just can use the jiffies global variable by its name or with the call of the get_jiffies_64 function. This function defined in the source code file and just returns full 64-bit value of the jiffies:

We can see that implementation of the get_jiffies_64 is more complex. The reading of the jiffies_64 variable is implemented using . Actually this is done for machines that cannot atomically read the full 64-bit values.

If you have questions or suggestions, feel free to ping me in twitter , drop me or just create .

Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to .

linux-insides
part
system call
TCP
chapter
Kernel decompression
init/main.c
lock validator
cgroups
canary
arch/x86/kernel/setup.c
bss
initrd
Intel MID
Intel CE4100
arch/x86/kernel/x86_init.c
void
TSC
Intel MID
arch/x86/platform/intel-mid/intel-mid.c
arch/x86/platform/intel-mid/intel_mid_vrtc.c
Simple Firmware Interface
x86_64
real time clock
Intel MID
arch/x86/kernel/setup.c
jiffy
x86_64
x86
arch/x86/kernel/vmlinux.lds.S
kernel/time/jiffies.c
Time Stamp Counter
High Precision Event Timer
include/linux/clocksource.h
kernel/time/jiffies.c
time stamp counter
high precision event timer
nanoseconds
include/asm-generic/param.h
arch/x86/kernel/setup.c
kernel/time/jiffies.c
arch/x86/include/asm/timex.h
include/linux/clocksource.h
kernel/time/jiffies.c
seqlocks
0xAX
email
issue
linux-insides
system call
TCP
lock validator
cgroups
bss
initrd
Intel MID
TSC
void
Simple Firmware Interface
x86_64
real time clock
Jiffy
high precision event timer
nanoseconds
Intel 8253
seqlocks
cloksource documentation
Previous chapter
HZ