Skip to content

System Calls in Linux

First PublishedLast UpdatedByAtif Alam

System calls are the interface between user space and the kernel. When a program needs to read a file, create a process, or send a signal, it invokes a syscall; the CPU switches to kernel mode, the kernel runs the corresponding handler, and control returns to user space with a return value. This page covers how they work, how to trace them (e.g. strace), and how that helps when troubleshooting application startup.

User space cannot access hardware or kernel data structures directly. The kernel exposes system calls — a fixed set of entry points (e.g. read, write, open, fork, kill). The C library (glibc) wraps them; your program calls read(), which may call the read syscall. Syscalls are documented in man 2 (e.g. man 2 read).

CategoryExamples
Process controlfork, execve, exit, waitpid, kill
File operationsopen, close, read, write, stat, getdents
Memorybrk, mmap, munmap
Signalskill, sigaction, rt_sigreturn
IPCpipe, socket, sendmsg, recvmsg
Networkingsocket, bind, connect, accept, send, recv

Networking syscalls return -1 and set errno on failure. Common cases when debugging connectivity:

  • connect()ECONNREFUSED (nothing listening or firewall RST), ETIMEDOUT (no reply to SYN), EHOSTUNREACH / ENETUNREACH (routing or ICMP), EINPROGRESS for non-blocking setups.
  • send() / sendto()EPIPE (peer closed), ECONNRESET, blocking or EAGAIN / EWOULDBLOCK on non-blocking sockets.
  • recv() / recvfrom()0 means orderly shutdown; errors mirror broken connections.

strace shows these immediately. When errno looks right but traffic still fails, the problem may be past the socket (firewall, NAT, path MTU). Then use packet capture or VPC flow logs to see what actually traversed the network.

Rough flow when a user program invokes a syscall (e.g. read()):

StageWhat happens
User callProgram calls libc wrapper (e.g. read(fd, buf, count)).
Libc wrapperPuts syscall number and arguments in registers (per architecture ABI), then triggers the switch to kernel (e.g. syscall instruction on x86-64).
Mode switchCPU switches to kernel mode; kernel syscall entry saves user state and dispatches by syscall number.
Kernel syscallKernel runs the handler (e.g. for read: VFS → filesystem → block layer).
Return to userKernel puts return value in a register, restores user state, and returns to user mode; libc returns that value (or sets errno on error).

So: user call → libc → mode switch → kernel → return.

ToolPurpose
straceTrace syscalls and signals of a process.
ltraceTrace library function calls (not raw syscalls).
man 2 <name>Syscall documentation.
perfSampling and tracepoints (e.g. syscalls).
gdbStep through code; can see syscall entry/return.
dmesgKernel log.
/proc/<pid>/syscallCurrent syscall (if any) and arguments (kernel-dependent).

strace uses the ptrace interface. The kernel allows a tracer process (strace) to attach to a tracee; on each syscall entry and syscall exit, the tracee stops and the tracer is notified. strace reads the syscall number and arguments from registers (and memory for pointer args), then resumes the tracee. On exit it reads the return value. So you see a log of every syscall: name, arguments, return value, and optionally time spent. Example:

Terminal window
strace -e openat,read,write cat /etc/hostname

You see openat for the file, read for the content, write to stdout.

Syscalls are the boundary between user and kernel; library and function calls are inside user space. Tools:

  • ltrace — Interposes library calls (e.g. malloc, printf) and prints them. Useful to see which libc/other library functions are called.
  • gdb — Step through code; bt for backtrace. Can break on any function.
  • perfperf record / perf report for sampling; perf trace for a syscall-oriented trace. Can also trace user symbols with the right options.
  • ftrace — Kernel-side; trace kernel functions and some entry points. Less about user-space function calls, more about kernel internals.
  • Trace a commandstrace ./myprog or strace -f ./myprog (follow fork). Filter: strace -e open,read,write ./myprog.
  • Attach to running processstrace -p <pid>. Use -e trace=file to limit to file-related syscalls.
  • Man pagesman 2 read, man 2 open.
  • Current syscallcat /proc/<pid>/syscall (format is kernel-specific).
  • Time per syscallstrace -T -p <pid> to see time spent in each syscall (helps spot I/O bound behavior).

In C, read(fd, buf, count) typically results in:

  1. Libc read() is called.
  2. Libc invokes the read syscall with fd, buf, count.
  3. Kernel resolves fd to the file (and checks permissions), then uses the filesystem and block layer to read data into a kernel buffer and copy it to buf.
  4. Kernel returns number of bytes read (or -1 and errno).
  5. Libc returns that value to the caller.

When you run ls -l in a shell:

  1. Shell — Parses the command, then typically fork() and in the child execve(“/usr/bin/ls”, [“ls”, “-l”], env).
  2. execve (syscall) — Kernel loads the ls executable, sets up its stack and argv/env, and starts main.
  3. ls — Opens the current directory with open(”.”, O_RDONLY | O_DIRECTORY) (or similar), then getdents (or getdents64) to read directory entries.
  4. For each entry, stat (or lstat) to get file metadata (permissions, size, mtime) for the long listing.
  5. write to stdout (fd 1) to print each line. The shell has set up stdout (and the terminal); the kernel handles the actual write to the terminal.

So: shell → fork/execve → ls → open, getdents, stat, write — all implemented via syscalls.

When an application won’t start, work through these systematically:

  1. Logs — Check the app’s log file, journalctl -u <unit> for systemd, or dmesg for kernel messages (e.g. OOM, segfault).
  2. Is it running?ps aux | grep <name>; check exit status if it exited (echo $? after running in shell).
  3. Ports — If it should listen: ss -tlnp or netstat to see if the port is in use or if the app bound to it.
  4. Permissions — Wrong user, missing execute bit, or unreadable config/file. Run as the intended user; check ls -l and ownership.
  5. strace — Run with strace -f ./myprog 2>&1 | tee trace.log and look for the last syscall before exit (e.g. failing open, connect, or execve).
  6. Missing librariesldd ./myprog shows dynamic libraries; “not found” means fix PATH or install the package.
  7. Environment — Wrong PATH, HOME, or required env vars. Run with env -i to strip env and test, or compare with a working environment.
  8. Limitsulimit -a; too-low limits (e.g. open files, stack) can cause failure. Adjust or run under systemd with LimitNOFILE=... etc.
  9. SELinux — If enabled, denials can block execution or file access. Check getenforce (Enforcing vs Permissive); inspect ausearch -m avc or audit log for denials. Temporarily set to Permissive to confirm, then fix labels or policy.
  10. AppArmor — Similarly can block execution. Check aa-status; look at the app’s profile and audit logs. Easiest test: aa-complain /path/to/profile or disable for that profile to see if startup succeeds.
PurposeTriggerReturnTraceMan
SyscallUser code → libc → kernelValue (or errno)strace, perfman 2
Processfork, execve, exitPID, 0, or statusstrace -fman 2
Fileopen, read, write, statfd, bytes, or 0strace -e trace=fileman 2
Traceabilitystrace (ptrace)strace -p <pid>