System Calls in Linux
System calls are the interface between user space and the kernel. When a program needs to read a file, create a process, or send a signal, it invokes a syscall; the CPU switches to kernel mode, the kernel runs the corresponding handler, and control returns to user space with a return value. This page covers how they work, how to trace them (e.g. strace), and how that helps when troubleshooting application startup.
What are system calls?
Section titled “What are system calls?”User space cannot access hardware or kernel data structures directly. The kernel exposes system calls — a fixed set of entry points (e.g. read, write, open, fork, kill). The C library (glibc) wraps them; your program calls read(), which may call the read syscall. Syscalls are documented in man 2 (e.g. man 2 read).
Categories
Section titled “Categories”| Category | Examples |
|---|---|
| Process control | fork, execve, exit, waitpid, kill |
| File operations | open, close, read, write, stat, getdents |
| Memory | brk, mmap, munmap |
| Signals | kill, sigaction, rt_sigreturn |
| IPC | pipe, socket, sendmsg, recvmsg |
| Networking | socket, bind, connect, accept, send, recv |
Network syscall errors
Section titled “Network syscall errors”Networking syscalls return -1 and set errno on failure. Common cases when debugging connectivity:
connect()—ECONNREFUSED(nothing listening or firewall RST),ETIMEDOUT(no reply to SYN),EHOSTUNREACH/ENETUNREACH(routing or ICMP),EINPROGRESSfor non-blocking setups.send()/sendto()—EPIPE(peer closed),ECONNRESET, blocking orEAGAIN/EWOULDBLOCKon non-blocking sockets.recv()/recvfrom()—0means orderly shutdown; errors mirror broken connections.
strace shows these immediately. When errno looks right but traffic still fails, the problem may be past the socket (firewall, NAT, path MTU). Then use packet capture or VPC flow logs to see what actually traversed the network.
How system calls work
Section titled “How system calls work”Rough flow when a user program invokes a syscall (e.g. read()):
| Stage | What happens |
|---|---|
| User call | Program calls libc wrapper (e.g. read(fd, buf, count)). |
| Libc wrapper | Puts syscall number and arguments in registers (per architecture ABI), then triggers the switch to kernel (e.g. syscall instruction on x86-64). |
| Mode switch | CPU switches to kernel mode; kernel syscall entry saves user state and dispatches by syscall number. |
| Kernel syscall | Kernel runs the handler (e.g. for read: VFS → filesystem → block layer). |
| Return to user | Kernel puts return value in a register, restores user state, and returns to user mode; libc returns that value (or sets errno on error). |
So: user call → libc → mode switch → kernel → return.
Relevant tools
Section titled “Relevant tools”| Tool | Purpose |
|---|---|
strace | Trace syscalls and signals of a process. |
ltrace | Trace library function calls (not raw syscalls). |
man 2 <name> | Syscall documentation. |
perf | Sampling and tracepoints (e.g. syscalls). |
gdb | Step through code; can see syscall entry/return. |
dmesg | Kernel log. |
/proc/<pid>/syscall | Current syscall (if any) and arguments (kernel-dependent). |
How strace works
Section titled “How strace works”strace uses the ptrace interface. The kernel allows a tracer process (strace) to attach to a tracee; on each syscall entry and syscall exit, the tracee stops and the tracer is notified. strace reads the syscall number and arguments from registers (and memory for pointer args), then resumes the tracee. On exit it reads the return value. So you see a log of every syscall: name, arguments, return value, and optionally time spent. Example:
strace -e openat,read,write cat /etc/hostnameYou see openat for the file, read for the content, write to stdout.
Tracing library and function calls
Section titled “Tracing library and function calls”Syscalls are the boundary between user and kernel; library and function calls are inside user space. Tools:
- ltrace — Interposes library calls (e.g.
malloc,printf) and prints them. Useful to see which libc/other library functions are called. - gdb — Step through code;
btfor backtrace. Can break on any function. - perf —
perf record/perf reportfor sampling;perf tracefor a syscall-oriented trace. Can also trace user symbols with the right options. - ftrace — Kernel-side; trace kernel functions and some entry points. Less about user-space function calls, more about kernel internals.
Using the tools
Section titled “Using the tools”- Trace a command —
strace ./myprogorstrace -f ./myprog(follow fork). Filter:strace -e open,read,write ./myprog. - Attach to running process —
strace -p <pid>. Use-e trace=fileto limit to file-related syscalls. - Man pages —
man 2 read,man 2 open. - Current syscall —
cat /proc/<pid>/syscall(format is kernel-specific). - Time per syscall —
strace -T -p <pid>to see time spent in each syscall (helps spot I/O bound behavior).
Example: read() flow
Section titled “Example: read() flow”In C, read(fd, buf, count) typically results in:
- Libc
read()is called. - Libc invokes the
readsyscall withfd,buf,count. - Kernel resolves
fdto the file (and checks permissions), then uses the filesystem and block layer to read data into a kernel buffer and copy it tobuf. - Kernel returns number of bytes read (or -1 and errno).
- Libc returns that value to the caller.
Example: ls -l from shell to kernel
Section titled “Example: ls -l from shell to kernel”When you run ls -l in a shell:
- Shell — Parses the command, then typically fork() and in the child execve(“/usr/bin/ls”, [“ls”, “-l”], env).
- execve (syscall) — Kernel loads the
lsexecutable, sets up its stack and argv/env, and startsmain. - ls — Opens the current directory with open(”.”, O_RDONLY | O_DIRECTORY) (or similar), then getdents (or getdents64) to read directory entries.
- For each entry, stat (or lstat) to get file metadata (permissions, size, mtime) for the long listing.
- write to stdout (fd 1) to print each line. The shell has set up stdout (and the terminal); the kernel handles the actual write to the terminal.
So: shell → fork/execve → ls → open, getdents, stat, write — all implemented via syscalls.
Troubleshooting application startup
Section titled “Troubleshooting application startup”When an application won’t start, work through these systematically:
- Logs — Check the app’s log file,
journalctl -u <unit>for systemd, ordmesgfor kernel messages (e.g. OOM, segfault). - Is it running? —
ps aux | grep <name>; check exit status if it exited (echo $?after running in shell). - Ports — If it should listen:
ss -tlnpornetstatto see if the port is in use or if the app bound to it. - Permissions — Wrong user, missing execute bit, or unreadable config/file. Run as the intended user; check
ls -land ownership. - strace — Run with
strace -f ./myprog 2>&1 | tee trace.logand look for the last syscall before exit (e.g. failingopen,connect, orexecve). - Missing libraries —
ldd ./myprogshows dynamic libraries; “not found” means fix PATH or install the package. - Environment — Wrong
PATH,HOME, or required env vars. Run withenv -ito strip env and test, or compare with a working environment. - Limits —
ulimit -a; too-low limits (e.g. open files, stack) can cause failure. Adjust or run under systemd withLimitNOFILE=...etc. - SELinux — If enabled, denials can block execution or file access. Check
getenforce(Enforcing vs Permissive); inspectausearch -m avcor audit log for denials. Temporarily set to Permissive to confirm, then fix labels or policy. - AppArmor — Similarly can block execution. Check
aa-status; look at the app’s profile and audit logs. Easiest test:aa-complain /path/to/profileor disable for that profile to see if startup succeeds.
Summary table
Section titled “Summary table”| Purpose | Trigger | Return | Trace | Man |
|---|---|---|---|---|
| Syscall | User code → libc → kernel | Value (or errno) | strace, perf | man 2 |
| Process | fork, execve, exit | PID, 0, or status | strace -f | man 2 |
| File | open, read, write, stat | fd, bytes, or 0 | strace -e trace=file | man 2 |
| Traceability | strace (ptrace) | — | strace -p <pid> | — |