systemd Service Hardening

This document outlines systemd service configurations that significantly impact a service's exposure. The following configurations can be utilized to enhance the security of a systemd service:

  1. Networking
  1. File system
  1. User separation
  1. Devices
  1. Kernel
  1. Misc
  1. Capabilities
  1. System calls

1. Networking

1.1. PrivateNetwork

PrivateNetwork is useful for preventing the service from accessing the network.

Type: Boolean.
Default: false
Options:

  • true : Creates a new network namespace for the service. Only the loopback device "lo" is available in this namespace, other network devices are not accessible.
  • false : The service will use the host's network namespace, it can access all the network devices available on the host. It can communicate over the network like any other process running on a host.

1.2. IPAccounting

IPAccounting helps in detecting unusual or unexpected network activity by a service.

Type: Boolean.
Default: false
Options:

  • true: Enables accounting for all IPv4 and IPv6 sockets created by the service: keeps track of the data sent and received by each socket in the service.
  • false: Disables tracking of the sockets created by the service.

1.3. IPAddressAllow, IPAddressDeny

IPAddressAllow=ADDRESS[/PREFIXLENGTH]…, IPAddressDeny=ADDRESS[/PREFIXLENGTH]…

Enables packet filtering on all IPv4 and IPv6 sockets created by the service. Useful for restricting/preventing a service from communicating only with certain IP addresses or networks.

Type: Space separated list of ip addresses and/or a symbolic name.
Default: All IP addresses are allowed and no IP addresses are explicitly denied.
Options:

  • List of addresses: Specify list of addresses allowed/denied. For example, ['192.168.1.8' '192.168.1.0/24']. Any IP not explicitly allowed will be denied.
  • Symbolic Names: Following symbolic names can also be used.
    any : Any host (i.e., '0.0.0.0/0 ::/0').
    localhost: All addresses on the local loopback (i.e., '127.0.0.0/8 ::1/128').
    link-local: All link-local IP addresses(i.e., '169.254.0.0/16 fe80::/64').
    multicast: All IP multicasting addresses (i.e., 224.0.0.0/4 ff00::/8).

1.4. RestrictNetworkInterfaces

RestrictNetworkInterfaces is used to control which network interfaces a service has access to. This helps isolate services from the network or restrict them to specific network interfaces, enhancing security and reducing potential risk.

Type: Space-separated list of network interface names.
Default: The service can access to all available network interfaces unless other network restrictions are in place.
Options:

  • Specify individual network interface names to restrict the service to using only those interfaces.
  • Prefix an interface name with '~' to invert the restriction, i.e. denying access to that specific interface while allowing all others.

1.5. RestrictAddressFamilies

RestrictAddressFamilies is used to control which address families a service can use. This setting restricts the service's ability to open sockets using specific address families, such as 'AF_INET' for IPv4, 'AF_INET6' for IPv6, or others. It is a security feature that helps limit the service's network capabilities and reduces its exposure to network-related vulnerabilities.

Type: List of address family names.
Default: If not configured, the service is allowed to use all available address families.
Options:

  • none: Apply no restriction.
  • Specific Address Families: Specify one or more address families that the service is allowed to use, for example, 'AF_INET', 'AF_INET6', 'AF_UNIX'.
  • Inverted Restriction: Prepend character '~' to an address family name to deny access to it while allowing all others, for example, '~AF_INET' would block IPv4 access.

Back to Top ⏫

2. File System

2.1 ProtectHome

ProtectHome is used to restrict a service's access to home directories. This security feature can be used either completely to block access to /home, /root, and /run/user or make them appear empty to the service, thereby protecting user data from unauthorized access by system services.

Type: Boolean or String.
Default: false i.e. the service has full access to home directories unless restricted by some other mean.
Options:

  • true: The service is completely denied access to home directories.
  • false: The service has unrestricted access to home directories.
  • read-only: The service can view the contents of home directories but cannot modify them.
  • tmpfs: Mounts a temporary filesystem in place of home directories, ensuring the service cannot access or modify the actual user data. Adding the tmpfs option provides a flexible approach by creating a volatile in-memory filesystem where the service believes it has access to home but any changes it makes do not affect the actual data and are lost when the service stops. This is particularly useful for services that require a temporary space in a home.

2.2. ProtectSystem

ProtectSystem controls access to the system's root directory (/) and other essential system directories. This setting enhances security by restricting a service's ability to modify or access critical system files and directories.

Type: Boolean or String.
Default: full (Equivalent to true). The service is restricted from modifying or accessing critical system directories.
Options:

  • true: Mounts the directories /usr/, /boot, and /efi read-only for processes.
  • full: Additionally mounts the /etc/ directory read-only.
  • strict: Mounts the entire file system hierarchy read-only, except for essential API file system subtrees like /dev/, /proc/, and /sys/.
  • false: Allows the service unrestricted access to system directories.

Using true or full is recommended for services that do not require access to system directories to enhance security and stability.

2.3. ProtectProc

ProtectProc controls access to the /proc filesystem for a service. This setting enhances security by restricting a service's ability to view or manipulate processes and kernel information in the /proc directory.

Type: Boolean or String.
Default: default. No restriction is imposed from viewing or manipulating processes and kernel information in /proc.
Options:

  • noaccess: Restricts access to most process metadata of other users in /proc.
  • invisible: Hides processes owned by other users from view in /proc.
  • ptraceable: Hides processes that cannot be traced (ptrace()) by other processes.
  • default: Imposes no restrictions on access or visibility to /proc.

2.4. ReadWritePaths, ReadOnlyPaths, InaccessiblePaths, ExecPaths, NoExecPaths

ReadWritePaths creates a new file system namespace for executed processes, enabling fine-grained control over file system access.

  • ReadWritePaths=: Paths listed here are accessible with the same access modes from within the namespace as from outside it.
  • ReadOnlyPaths=: Allows reading from listed paths only; write attempts are refused even if file access controls would otherwise permit it.
  • InaccessiblePaths=: Makes listed paths and everything below them in the file system hierarchy inaccessible to processes within the namespace.
  • NoExecPaths=: Prevents execution of files from listed paths, overriding usual file access controls. Nest ExecPaths= within NoExecPaths= to selectively allow execution within directories otherwise marked non-executable.

Type: Space-separated list of paths.
Default: No restriction to file system access until unless restricted by some other mechanism.
Options:
Space separated list of paths : Space-separated list of paths relative to the host's root directory. Symlinks are resolved relative to the root directory specified by RootDirectory= or RootImage=.

2.5. PrivateTmp

PrivateTmp uses a private, isolated /tmp directory for the service, enhancing security by preventing access to other processes' temporary files and ensuring data isolation.

Type: Boolean.
Default: false. If not specified, the service shares the system /tmp directory with other processes.
Options:

  • true: Enables private /tmp for the service, isolating its temporary files from other processes.
  • false: The service shares the system /tmp directory with other processes.

Additionally, when enabled, all temporary files created by a service in these directories will be automatically removed after the service is stopped.

2.6. PrivateMounts

PrivateMounts controls whether the service should have its mount namespace, isolating its mounts from the rest of the system. This setup ensures that any file system mount points created or removed by the unit's processes remain private to them and are not visible to the host.

Type: Boolean.
Default: false. If not specified, the service shares the same mount namespace as other processes.
Options:

  • true: Enables private mount namespace for the service, isolating its mounts from the rest of the system.
  • false: The service shares the same mount namespace as other processes.

2.7. ProcSubset

ProcSubset restricts the set of /proc entries visible to the service, enhancing security by limiting access to specific process information in the /proc filesystem.

Type: String.
Default: all. If not specified, the service has access to all /proc entries.
Options:

  • all: Allows the service access to all /proc entries.
  • pid: Restricts the service to only its own process information (/proc/self, /proc/thread-self/).

Back to Top ⏫

3. User Separation

IMPORTANT: Not applicable for the service runs as root.

3.1. PrivateUsers

PrivateUsers= controls whether the service should run with a private set of UIDs and GIDs, isolating the user and group databases used by the unit from the rest of the system, and creating a secure sandbox environment. The isolation reduces the privilege escalation potential of services.

Type: Boolean.
Default: false. If not specified, the service runs with the same user and group IDs as other processes.
Options:

  • true: Enables private user and group IDs for the service by creating a new user namespace, isolating them from the rest of the system.
  • false: The service runs with the same user and group IDs as other processes.

3.2. DynamicUser

DynamicUser enables systemd to dynamically allocate a unique user and group ID (UID/GID) for the service at runtime, enhancing security and resource isolation. These user and group entries are managed transiently during runtime and are not added to /etc/passwd or /etc/group.

Type: Boolean.
Default: false. If not specified, the service uses a static user and group ID defined in the service unit file or defaults to root.
Options:

  • true: A UNIX user and group pair are dynamically allocated when the unit is started and released as soon as it is stopped.
  • false: The service uses a static UID/GID defined in the service unit file or defaults to root.

Back to Top ⏫

4. Devices

4.1. PrivateDevices

PrivateDevices controls whether the service should have access to device nodes in /dev.

Type: Boolean.
Default: false. If not specified, the service has access to device nodes in /dev.
Options:

  • true: Restricts the service's access to device nodes in /dev by creating a new /dev/ mount for the executed processes and includes only pseudo devices such as /dev/null, /dev/zero, or /dev/random. Physical devices are not added to this mount. This setup is useful for disabling physical device access by the service.
  • false: The service has access to device nodes in /dev.

4.2. DeviceAllow

DeviceAllow specifies individual device access rules for the service, allowing fine-grained control over device permissions.

Type: Space-separated list of device access rules.
Default: None. If not specified, the service does not have specific device access rules defined.
Options:

  • Specify device access rules in the format: <device path> <permission> where <permission> can be r (read), w (write), or m (mknod, allowing creation of devices).

Back to Top ⏫

5. Kernel

5.1. ProtectKernelTunables

ProtectKernelTunables controls whether the service is allowed to modify tunable kernel variables in /proc/sys, enhancing security by restricting access to critical kernel parameters.

Type: Boolean.
Default: true. If not specified, the service is restricted from modifying kernel variables.
Options:

  • true: Restricts the service from modifying the kernel variables accessible through paths like /proc/sys/, /sys/, /proc/sysrq-trigger, /proc/latency_stats, /proc/acpi, /proc/timer_stats, /proc/fs, and /proc/irq. These paths are made read-only to all processes of the unit.
  • false: Allows the service to modify tunable kernel variables.

5.2. ProtectKernelModules

ProtectKernelModules controls whether the service is allowed to load or unload kernel modules, enhancing security by restricting module management capabilities.

Type: Boolean.
Default: true. If not specified, the service is restricted from loading or unloading kernel modules.
Options:

  • true: Restricts the service from loading or unloading kernel modules. It removes CAP_SYS_MODULE from the capability bounding set for the unit and installs a system call filter to block module system calls. /usr/lib/modules is also made inaccessible.
  • false: Allows the service to load or unload kernel modules in a modular kernel.

5.3. ProtectKernelLogs

ProtectKernelLogs controls whether the service is allowed to access kernel log messages, enhancing security by restricting access to kernel logs.

Type: Boolean.
Default: false. If not specified, the service is allowed to access kernel logs.
Options:

  • trues: Restricts the service from accessing kernel logs from /proc/kmsg and /dev/kmsg. Enabling this option removes CAP_SYSLOG from the capability bounding set for the unit and installs a system call filter to block the syslog(2) system call.
  • no: Allows the service to access kernel logs.

Back to Top ⏫

6. Misc

6.1. Delegate

Delegate controls whether systemd should delegate further control of resource management to the service's own resource management settings.

Type: Boolean.
Default: true. If not specified, systemd delegates control to the service's resource management settings.
Options:

  • true: Enables delegation and activates all supported controllers for the unit, allowing its processes to manage them.
  • false: Disables delegation entirely. Systemd retains control over resource management, potentially overriding the service's settings.

6.2. KeyringMode

KeyringMode specifies the handling mode for session keyrings by the service, controlling how it manages encryption keys and credentials.

Type: String.
Default: private. If not specified, the service manages its session keyrings privately.
Options:

  • private: The service manages its session keyrings privately.
  • shared: The service shares its session keyrings with other services and processes.
  • inherit: The service inherits session keyrings from its parent process or environment.

6.3. NoNewPrivileges

NoNewPrivileges controls whether the service and its children processes are allowed to gain new privileges (capabilities).

Type: Boolean.
Default: false. If not specified, the service and its children's processes can gain new privileges.
Options:

  • true: Prevents the service and its children processes from gaining new privileges.
  • false: Allows the service and its children processes to gain new privileges.

important

Some configurations may override this setting and ignore its value.

6.4. UMask

UMask sets the file mode creation mask (umask) for the service, controlling the default permissions applied to newly created files and directories.

Type: Octal numeric value.
Default: If not specified, inherits the default umask of the systemd service manager(0022).
Example: UMask=027.

6.5. ProtectHostname

ProtectHostname controls whether the service can modify its own hostname.

Type: Boolean.
Default: false.
Options:

  • true: Sets up a new UTS namespace for the executed processes. It prevents changes to the hostname or domainname.
  • false: Allows the service to modify its own hostname.

6.6. ProtectClock

ProtectClock controls whether the service is allowed to manipulate the system clock.

Type: Boolean.
Default: false.
Options:

  • true: Prevents the service from manipulating the system clock. It removes CAP_SYS_TIME and CAP_WAKE_ALARM from the capability bounding set for this unit. Also creates a system call filter to block calls that can manipulate the system clock.
  • false: Allows the service to manipulate the system clock.

6.7. ProtectControlGroups

ProtectControlGroups controls whether the service is allowed to modify control groups (cgroups) settings.

Type: Boolean.
Default: false.
Options:

  • true: Prevents the service from modifying cgroups settings. Makes the Linux Control Groups (cgroups(7)) hierarchies accessible through /sys/fs/cgroup/ read-only to all processes of the unit.
  • false: Allows the service to modify cgroups settings.

6.8. RestrictNamespaces

RestrictNamespaces controls the namespace isolation settings for the service, restricting or allowing namespace access.

Type: Boolean or space-separated list of namespace type identifiers. Default: false.
Options:

  • false: No restrictions on namespace creation and switching are imposed.
  • true: Prohibits access to any kind of namespacing.
  • Otherwise: Specifies a space-separated list of namespace type identifiers, which can include cgroup, ipc, net, mnt, pid, user, and uts. When the namespace identifier is prefixed with '~', it inverts the action.

6.9. LockPersonality

LockPersonality applies restriction on the service's ability to change its execution personality.

Type: Boolean.
Default: false.
Options:

  • true: Prevents the service from changing its execution personality. If the service runs in user mode or in system mode without the CAP_SYS_ADMIN capability (e.g., setting User=), enabling this option implies NoNewPrivileges=yes.
  • false: Allows the service to change its execution personality.

6.10. MemoryDenyWriteExecute

MemoryDenyWriteExecute controls whether the service is allowed to execute code from writable memory pages.

Type: Boolean.
Default: false.
Options:

  • true: Prohibits attempts to create memory mappings that are writable and executable simultaneously, change existing memory mappings to become executable, or map shared memory segments as executable. This restriction is implemented by adding an appropriate system call filter.
  • false: Allows the service to execute code from writable memory pages.

6.11. RestrictRealtime

RestrictRealtime controls whether the service is allowed to utilize real-time scheduling policies.

Type: Boolean.
Default: false.
Options:

  • true: Prevents the service from utilizing real-time scheduling policies. Refuses any attempts to enable realtime scheduling in processes of the unit. This restriction prevents access to realtime task scheduling policies such as SCHED_FIFO, SCHED_RR, or SCHED_DEADLINE.
  • false: Allows the service to utilize real-time scheduling policies.

6.12. RestrictSUIDSGID

RestrictSUIDSGID controls whether the service is allowed to execute processes with SUID and SGID privileges.

Type: Boolean.
Default: false.
Options:

  • true: Prevents the service from executing processes with SUID and SGID privileges. Denies any attempts to set the set-user-ID (SUID) or set-group-ID (SGID) bits on files or directories. These bits are used to elevate privileges and allow users to acquire the identity of other users.
  • false: Allows the service to execute processes with SUID and SGID privileges.

6.13. RemoveIPC

RemoveIPC controls whether to remove inter-process communication (IPC) resources associated with the service upon its termination.

Type: Boolean.
Default: false.
Options:

  • true: Removes IPC resources (System V and POSIX IPC objects) associated with the service upon its termination. This includes IPC objects such as message queues, semaphore sets, and shared memory segments.
  • false: Retains IPC resources associated with the service after its termination.

6.14. SystemCallArchitectures

SystemCallArchitectures specifies the allowed system call architectures for the service to include in system call filter.

Type: Space-separated list of architecture identifiers.
Default: Empty list. No filtering is applied.
Options:

  • List of architectures: Processes of this unit will only be allowed to call native system calls and system calls specific to the architectures specified in the list. e.g. native, x86, x86-64 or arm64 etc.

6.15. NotifyAccess

NotifyAccess specifies how the service can send service readiness notification signals.

Type: Access specifier string.
Default: none.
Options:

  • none (default): No daemon status updates are accepted from the service processes; all status update messages are ignored.
  • main: Allows sending signals using the main process identifier (PID).
  • exec: Only service updates sent from any main or control processes originating from one of the Exec*= commands are accepted.
  • all: Allows sending signals using any process identifier (PID).

Back to Top ⏫

7. Capabilities

7.1. AmbientCapabilities

AmbientCapabilities specifies which capabilities to include in the ambient capability set for the service, which are inherited by all processes within the service.

Type: Space-separated list of capabilities.
Default: Processes inherit ambient capabilities from their parent process or the systemd service manager unless explicitly set.
Options:

  • List of capabilities: Specifies the capabilities that are set as ambient for all processes within the service.

This option can be specified multiple times to merge capability sets:

  • If capabilities are listed without a prefix, those capabilities are included in the ambient capability set.
  • If capabilities are prefixed with "~", all capabilities except those listed are included (inverted effect).
  • Assigning the empty string ("") resets the ambient capability set to empty, overriding all prior settings.

7.2. CapabilityBoundingSet

CapabilityBoundingSet specifies the bounding set of capabilities for the service, limiting the capabilities available to processes within the service.

Type: Space-separated list of capabilities.
Default: If not explicitly specified, the bounding set of capabilities is determined by systemd defaults or the system configuration.
Options:

  • List of capabilities: Specifies the capabilities that are allowed for processes within the service. If capabilities are prefixed with "~", all capabilities except those listed are included (inverted effect).
CapabilityDescription
CAP_AUDIT_CONTROLAllows processes to control kernel auditing behavior, including enabling and disabling auditing, and changing audit rules.
CAP_AUDIT_READAllows processes to read audit log via unicast netlink socket.
CAP_AUDIT_WRITEAllows processes to write records to kernel auditing log.
CAP_BLOCK_SUSPENDAllows processes to prevent the system from entering suspend mode.
CAP_CHOWNAllows processes to change the ownership of files.
CAP_DAC_OVERRIDEAllows processes to bypass file read, write, and execute permission checks.
CAP_DAC_READ_SEARCHAllows processes to bypass file read permission checks and directory read and execute permission checks.
CAP_FOWNERAllows processes to bypass permission checks on operations that normally require the filesystem UID of the file to match the calling process's UID.
CAP_FSETIDAllows processes to set arbitrary process and file capabilities.
CAP_IPC_LOCKAllows processes to lock memory segments into RAM.
CAP_IPC_OWNERAllows processes to perform various System V IPC operations, such as message queue management and shared memory management.
CAP_KILLAllows processes to send signals to arbitrary processes.
CAP_LEASEAllows processes to establish leases on open files.
CAP_LINUX_IMMUTABLEAllows processes to modify the immutable and append-only flags of files.
CAP_MAC_ADMINAllows processes to perform MAC configuration changes.
CAP_MAC_OVERRIDEBypasses Mandatory Access Control (MAC) policies.
CAP_MKNODAllows processes to create special files using mknod().
CAP_NET_ADMINAllows processes to perform network administration tasks, such as configuring network interfaces, setting routing tables, etc.
CAP_NET_BIND_SERVICEAllows processes to bind to privileged ports (ports below 1024).
CAP_NET_BROADCASTAllows processes to transmit packets to broadcast addresses.
CAP_NET_RAWAllows processes to use raw and packet sockets.
CAP_SETGIDAllows processes to change their GID to any value.
CAP_SETFCAPAllows processes to set any file capabilities.
CAP_SETPCAPAllows processes to set the capabilities of other processes.
CAP_SETUIDAllows processes to change their UID to any value.
CAP_SYS_ADMINAllows processes to perform a range of system administration tasks, such as mounting filesystems, configuring network interfaces, loading kernel modules, etc.
CAP_SYS_BOOTAllows processes to reboot or shut down the system.
CAP_SYS_CHROOTAllows processes to use chroot().
CAP_SYS_MODULEAllows processes to load and unload kernel modules.
CAP_SYS_NICEAllows processes to increase their scheduling priority.
CAP_SYS_PACCTAllows processes to configure process accounting.
CAP_SYS_PTRACEAllows processes to trace arbitrary processes using ptrace().
CAP_SYS_RAWIOAllows processes to perform I/O operations directly to hardware devices.
CAP_SYS_RESOURCEAllows processes to override resource limits.
CAP_SYS_TIMEAllows processes to set system time and timers.
CAP_SYS_TTY_CONFIGAllows processes to configure tty devices.
CAP_WAKE_ALARMAllows processes to use the RTC wakeup alarm.

Back to Top ⏫

8. System Calls

8.1. SystemCallFilter

SystemCallFilter specifies a system call filter for the service, restricting the types of system calls that processes within the service can make.

Type: Space-separated list of system calls.
Default: If not explicitly specified, there are no restrictions imposed by systemd on system calls.
Options:

  • List of system calls: Specifies the allowed system calls for processes within the service. If the list begins with "~", the effect is inverted, meaning only the listed system calls will result in termination.

tip

Predefined sets of system calls are available, starting with "@" followed by the name of the set.

Filter SetDescription
@clockAllows clock and timer-related system calls, such as clock_gettime, nanosleep, etc. This is essential for time-related operations.
@cpu-emulationAllows CPU emulation-related system calls, typically used by virtualization software.
@debugAllows debug-related system calls, which are often used for debugging purposes and may not be necessary for regular operations.
@keyringAllows keyring-related system calls, which are used for managing security-related keys and keyrings.
@moduleAllows module-related system calls, which are used for loading and unloading kernel modules. This can be restricted to prevent module loading for security purposes.
@mountAllows mount-related system calls, which are essential for mounting and unmounting filesystems.
@networkAllows network-related system calls, which are crucial for networking operations such as socket creation, packet transmission, etc.
@obsoleteAllows obsolete system calls, which are no longer in common use and are often deprecated.
@privilegedAllows privileged system calls, which typically require elevated privileges or are potentially risky if misused.
@raw-ioAllows raw I/O-related system calls, which provide direct access to hardware devices. This can be restricted to prevent unauthorized access to hardware.
@rebootAllows reboot-related system calls, which are necessary for initiating system reboots or shutdowns.
@swapAllows swap-related system calls, which are used for managing swap space.
@syslogAllows syslog-related system calls, which are used for system logging.
@system-serviceAllows system service-related system calls, which are used for managing system services.
@timerAllows timer-related system calls, which are essential for setting and managing timers.

Back to Top ⏫