Linux 内核中与时钟相关的系统调用
最后更新于
最后更新于
This is the seventh and last part , which describes timers and time management related stuff in the Linux kernel. In the previous , we discussed timers in the context of : and . Internal time management is an interesting part of the Linux kernel, but of course not only the kernel needs the time
concept. Our programs also need to know time. In this part, we will consider implementation of some time management related . These system calls are:
clock_gettime
;
gettimeofday
;
nanosleep
.
We will start from a simple userspace program and see all way from the call of the function to the implementation of certain system calls. As each provides its own implementation of certain system calls, we will consider only specific implementations of system calls, as this book is related to this architecture.
Additionally, we will not consider the concept of system calls in this part, but only implementations of these three system calls in the Linux kernel. If you are interested in what is a system call
, there is a special about this.
So, let's start from the gettimeofday
system call.
gettimeofday
system callAs we can understand from the name gettimeofday
, this function returns the current time. First of all, let's look at the following simple example:
As you can see, here we call the gettimeofday
function, which takes two parameters. The first parameter is a pointer to the timeval
structure, which represents an elapsed time:
The second parameter of the gettimeofday
function is a pointer to the timezone
structure which represents a timezone. In our example, we pass address of the timeval time
to the gettimeofday
function, the Linux kernel fills the given timeval
structure and returns it back to us. Additionally, we format the time with the strftime
function to get something more human readable than elapsed microseconds. Let's see the result:
The glibc
implementation of gettimeofday
tries to resolve the given symbol; in our case this symbol is __vdso_gettimeofday
by the call of the _dl_vdso_vsym
internal function. If the symbol cannot be resolved, it returns NULL
and we fallback to the call of the usual system call:
The __vdso_gettimeofday
is defined in the same source code file and calls the do_realtime
function if the given timeval
is not null:
If the do_realtime
will fail, we fallback to the real system call via call the syscall
instruction and passing the __NR_gettimeofday
system call number and the given timeval
and timezone
:
First of all we try to access the gtod
or global time of day
the vsyscall_gtod_data
structure via the call of the gtod_read_begin
and will continue to do it until it will be successful:
That's all about the gettimeofday
system call. The next system call in our list is the clock_gettime
.
The clock_gettime
function gets the time which is specified by the second parameter. Generally the clock_gettime
function takes two parameters:
clk_id
- clock identifier;
timespec
- address of the timespec
structure which represent elapsed time.
Let's look on the following simple example:
which prints uptime
information:
The elapsed_from_boot.tv_sec
represents elapsed time in seconds, so:
The clock_id
maybe one of the following:
CLOCK_REALTIME
- system wide clock which measures real or wall-clock time;
CLOCK_REALTIME_COARSE
- faster version of the CLOCK_REALTIME
;
CLOCK_MONOTONIC
- represents monotonic time since some unspecified starting point;
CLOCK_MONOTONIC_COARSE
- faster version of the CLOCK_MONOTONIC
;
CLOCK_BOOTTIME
- the same as the CLOCK_MONOTONIC
but plus time that the system was suspended;
CLOCK_PROCESS_CPUTIME_ID
- per-process time consumed by all threads in the process;
CLOCK_THREAD_CPUTIME_ID
- thread-specific clock.
The Implementation of the clock_gettime
depends on the clock id. If we have passed the CLOCK_REALTIME
clock id, the do_realtime
function will be called:
In other cases, the do_{name_of_clock_id}
function is called. Implementations of some of them is similar. For example if we will pass the CLOCK_MONOTONIC
clock id:
the do_monotonic
function will be called which is very similar on the implementation of the do_realtime
:
That's all.
nanosleep
system callThe last system call in our list is the nanosleep
. As you can understand from its name, this function provides sleeping
ability. Let's look on the following simple example:
If we will compile and run it, we will see the first line
and the second line after five seconds.
rdi
- first parameter;
rsi
- second parameter;
rdx
- third parameter;
r10
- fourth parameter;
r8
- fifth parameter;
r9
- sixth parameter.
The nanosleep
system call has two parameters - two pointers to the timespec
structures. The system call suspends the calling thread until the given timeout has elapsed. Additionally it will finish if a signal interrupts its execution. It takes two parameters, the first is timespec
which represents timeout for the sleep. The second parameter is the pointer to the timespec
structure too and it contains remainder of time if the call of the nanosleep
was interrupted.
As nanosleep
has two parameters:
The LOAD_ARGS_##nr
macro calls the LOAD_ARGS_N
macro where the N
is number of arguments of the system call. In our case, it will be the LOAD_ARGS_2
macro. Ultimately all of these macros will be expanded to the following:
Which freezes current task during sleep. After we set TASK_INTERRUPTIBLE
flag for the current task, the hrtimer_start_expires
function starts the give high-resolution timer on the current processor. As the given high resolution timer will expire, the task will be again running.
That's all.
As you may already know, a userspace application does not call a system call directly from the kernel space. Before the actual system call entry will be called, we call a function from the standard library. In my case it is , so I will consider this case. The implementation of the gettimeofday
function is located in the source code file. As you already may know, the gettimeofday
is not a usual system call. It is located in the special area which is called vDSO
(you can read more about it in the , which describes this concept).
The gettimeofday
entry is located in the source code file. As we can see the gettimeofday
is a weak alias of the __vdso_gettimeofday
:
The do_realtime
function gets the time data from the vsyscall_gtod_data
structure which is defined in the header file and contains mapping of the timespec
structure and a couple of fields which are related to the current clock source in the system. This function fills the given timeval
structure with values from the vsyscall_gtod_data
which contains a time related data which is updated via timer interrupt.
As we got access to the gtod
, we fill the ts->tv_sec
with the gtod->wall_time_sec
which stores current time in seconds gotten from the during initialization of the timekeeping subsystem in the Linux kernel and the same value but in nanoseconds. In the end of this code we just fill the given timespec
structure with the resulted values.
We can easily check the result with the help of the util:
CLOCK_MONOTONIC_RAW
- the same as the CLOCK_MONOTONIC
but provides non adjusted time.
The clock_gettime
is not usual syscall too, but as the gettimeofday
, this system call is placed in the vDSO
area. Entry of this system call is located in the same source code file - ) as for gettimeofday
.
We already saw a little about the implementation of this function in the previous paragraph about the gettimeofday
. There is only one difference here, that the sec
and nsec
of our timespec
value will be based on the gtod->monotonic_time_sec
instead of gtod->wall_time_sec
which maps the value of the tk->tkr_mono.xtime_nsec
or number of elapsed.
The nanosleep
is not located in the vDSO
area like the gettimeofday
and the clock_gettime
functions. So, let's look how the real
system call which is located in the kernel space will be called by the standard library. The implementation of the nanosleep
system call will be called with the help of the instruction. Before the execution of the syscall
instruction, parameters of the system call must be put in processor according to order which is described in the or in other words:
To call system call, we need put the req
to the rdi
register, and the rem
parameter to the rsi
register. The does these job in the INTERNAL_SYSCALL
macro which is located in the header file.
which takes the name of the system call, storage for possible error during execution of system call, number of the system call (all x86_64
system calls you can find in the ) and arguments of certain system call. The INTERNAL_SYSCALL
macro just expands to the call of the INTERNAL_SYSCALL_NCS
macro, which prepares arguments of system call (puts them into the processor registers in correct order), executes syscall
instruction and returns the result:
After the syscall
instruction will be executed, the will occur and the kernel will transfer execution to the system call handler. The system call handler for the nanosleep
system call is located in the source code file and defined with the SYSCALL_DEFINE2
macro helper:
More about the SYSCALL_DEFINE2
macro you may read in the about system calls. If we look at the implementation of the nanosleep
system call, first of all we will see that it starts from the call of the copy_from_user
function. This function copies the given data from the userspace to kernelspace. In our case we copy timeout value to sleep to the kernelspace timespec
structure and check that the given timespec
is valid by the call of the timesc_valid
function:
which just checks that the given timespec
does not represent date before 1970
and nanoseconds does not overflow 1
second. The nanosleep
function ends with the call of the hrtimer_nanosleep
function from the same source code file. The hrtimer_nanosleep
function creates a and calls the do_nanosleep
function. The do_nanosleep
does main job for us. This function provides loop:
This is the end of the seventh part of the that describes timers and timer management related stuff in the Linux kernel. In the previous part we saw specific clock sources. As I wrote in the beginning, this part is the last part of this chapter. We saw important time management related concepts like clocksource
and clockevents
frameworks, jiffies
counter and etc., in this chpater. Of course this does not cover all of the time management in the Linux kernel. Many parts of this mostly related to the scheduling which we will see in other chapter.
If you have questions or suggestions, feel free to ping me in twitter , drop me or just create .
Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to .