In Linux, whenever a user application wants to touch any of these:
- any hardware or devices
- running processes
- networking stuff
- writing to stdout it can't do so on its own - it must ask Linux Kernel to do it instead.
Syscall is a way to ask Kernel to do something for you. The data is transfered between Kernel and the application using CPU registers, and the kernel is invoked using an interrupt (trap).
As many key functions are impossible without kernel, many of the programming language features rely on
syscall under the hood.
For example in 01 Inbox/Golang:
os.Stdoutas an argument.
Fileunder the hood
writefunction finally invokes
As you can see, syscalls are often deeply abstracted, and their higher abstractions should always be used instead of syscalls directly (where available).
- saves the state of the CPU registers
- adds data about the syscall you are creating to the registers
- issues an interrupt (trap), letting the kernel know that there is something that needs to be done
- kernel then reads the info from the register, performs the operation, and writes the response back into the register
syscallthen resumes by reading the response from the register
- and finally restores the register to initial state (i assume as cleanup?)
We can see the syscalls that a process makes by using a tool called
strace -c echo "hello world" will give us the count of all the syscalls made during execution of
echo "hello world".
strace itself actually accomplishes this by using Syscalls under the hood - by relying on
ptrace syscall, it can see into the process and manipulate it. This is how breakpoint debuggers work.
It's possible to limit what syscalls a process can make for security reasons. For docker containers, we can do this with