Skip to content

Commit

Permalink
systemd troubleshooting
Browse files Browse the repository at this point in the history
- Added option to set systemd log level
- Added systemd troubleshooting guide for Ghaf

Signed-off-by: Ganga Ram <[email protected]>
  • Loading branch information
gngram authored and brianmcgillion committed Aug 14, 2024
1 parent 149d307 commit 5a0fc2d
Show file tree
Hide file tree
Showing 8 changed files with 324 additions and 3 deletions.
8 changes: 8 additions & 0 deletions docs/src/troubleshooting/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# GhafOS Troubleshooting Guide

### 1. [systemd troubleshooting](systemd/index.md)
20 changes: 20 additions & 0 deletions docs/src/troubleshooting/systemd/early-shell.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# Early shell access

In some cases, the system may fail to boot due to the failure of a critical service. If this happens, you can follow these steps to diagnose the issue with systemd services:

1. Increase the systemd log level using the previously mentioned option, and load the image.
2. Reboot the system. As expected, you will encounter a boot failure.
3. Force reboot the machine. When the machine starts again, interrupt the bootloader and add the following to the bootloader command line:

```
rescue systemd.setenv=SYSTEMD_SULOGIN_FORCE=1
```

To modify the bootloader command, select the boot option and then press the 'e' key.

4. You will now enter an early shell environment. Here, you can access the logs from the previous boot using `journalctl`. The logs will help you identify any service failures.
14 changes: 14 additions & 0 deletions docs/src/troubleshooting/systemd/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# GhafOS: systemd troubleshooting guide

Ghaf OS uses systemd and systemctl to manage services. Since security is the utmost priority, every service has restricted access to resources, which is achieved through hardened service configurations. While these restrictions enhance security, they may also limit the functionality of certain services. If a service fails, it may be necessary to adjust its configuration to restore functionality. This document focuses on troubleshooting common issues with systemd services on Ghaf OS.

1. [Analyze system log](system-log.md)
2. [Use 'systemctl'](systemctl.md)
3. [Use systemd analyzer](systemd-analyzer.md)
4. [Use 'strace' to debug sys call and capability restrictions](strace.md)
5. [Early Shell access](early-shell.md)
26 changes: 26 additions & 0 deletions docs/src/troubleshooting/systemd/strace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# Use `strace` to debug initialization sequence

`strace` can give detailed insight about system calls made by a service. This is very helpfull in debugging restrictions applied on system calls and capability of any service. Though we can attach `strace` with PID of a running process, but some time we may need to debug service initialization sequence also.

To debug initialization sequence we need to attach `strace` with the service binary in `ExecStart` . To attach strace find out existing `ExecStart` of the service using command:

```bash
$> systemctl cat <service-name>.service | grep ExecStart
```

It will give command line options used with service binary. Now we need to override `ExecStart` of the service, in order to attach `strace`. We'll use same options with `strace`too to replicate same scenario. For example to attach `strace` with `auditd` service we'll use following configuration at a suitable location:

```Nix
systemd.services."auditd".serviceConfig.ExecStart = lib.mkForce "${pkgs.strace}/bin/strace -o /etc/auditd_trace.log ${pkgs.audit}/bin/auditd -l -n -s nochange";
```

Command`${pkgs.audit}/bin/auditd -l -n -s nochange`is used in regular `ExecStart`of `auditd`service. In above command we have attached `strace` with the command, which will generate system call traces in file `/etc/auditd_trace.log`

After modifying above configuration you need to rebuild and load Ghaf image.

The log may give you information about the system call restriction which caused the service failure. You can tune your service config accordingly.
74 changes: 74 additions & 0 deletions docs/src/troubleshooting/systemd/system-log.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# Analyze system log

`systemd` has centralized logging mechanism which collects logs from all user processes in the system and kernel as well. This is called `journal`. systemd runs a journal daemon `journald`, which collects messages from the kernel, initrd, services, etc.

Analyzing logs is the most effective way to diagnose issues with any systemd service. In Ghaf OS, the default systemd log level is set to `info`. To gain deeper insights into the service state, the log level can be elevated to `debug`using the following option:

```
ghaf.systemd.logLevel = "debug";
```

While it is possible to elevate the log level on a live system using `systemctl`, this option is particularly useful when you need to inspect the startup sequence of critical services that cannot be restarted in a live environment.

To change the log level to `debug`, you can run the following `systemctl` command:

```bash
$> sudo systemctl log-level debug
```

This command will change the log level for the systemd daemon and all systemd-managed services.

After adjusting the log level, it is recommended to reload the systemd daemon and restart the service you are debugging.

## `journalctl`

When `journalctl` command is run without any option, it will show all the messages, which can be pretty long.

1. You can see logs of specific boot using -b option for example:

```bash
$> journalctl -b #Log from current boot
$> journalctl -b -1 #Log from previous boo
```

2. To list available boots, use the following command.

```
$> journalctl --list-boots
```

3. To view the logs generated by any systemd unit, use the `-u` option. For example, the command below displays all logs recorded by the logind service. You can specify multiple units by using the -u` switch more than once.

```bash
$> journalctl -u logind.service
```

4. `You can see log messages in real-time, similar to the `tail`command in Linux. To do this, use the`-f` option:

```bash
$> journalctl -f
```

5. Similar to the `tail`command, the`-n` option allows you to display a specific number of the most recent log entries. The following command shows the last 50 messages logged.

```bash
$> journalctl -n 50
```

6. Log messages can be filtered based on their priority using -p option, for example follwing command will show only error message from service logind

```bash
$> journalctl -p error -u logind.service
```
7. To see kernel message use following options:

```bash
$> journalctl -k
$> journalctl -t kernel
```
8. The `-r` option displays log entries in reverse chronological order, with the latest messages shown first.
79 changes: 79 additions & 0 deletions docs/src/troubleshooting/systemd/systemctl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# Debuuging systemd using`systemctl`

To debug failed services using `systemctl` follow below given steps:

1) List failed services in the system:

```bash
$> sudo systemctl --failed
```

Above command will give you list of failed services. You can see list of all the services in the system using the command:

```
$> sudo systemctl list-unit-files --type=service
```

2. Check status of the failed service, it will you give little more detailed information.

```bash
$> sudo systemctl status <service_name>.service
```
3. See the service logs to get more insight:

```
$> sudo journalctl -b -u <service_name>.service
```
4. You can further increase log level to get debug level information:

```bash
$> sudo systemctl log-level debug
```

Reload the systemd daemon and restart service:

```bash
$> sudo systemctl daemon-reload
$> sudo systemctl restart <service_name>.service
```

Now you can see debug level information in the service log.
5. You can also attach `strace` with the service daemon to see system call and signal status.

- Get the PID of main process from service status. It is listed as `Main PID:`
- Attach strace with the PID:

```bash
$> sudo strace -f -s 100 -p <Main_PID>
```
6. Retune the service configuration in runtime:

```bash
$> systemctl edit --runtime <service_name>.service
```

- Uncomment the `[Service]`section and also uncomment the configuration you want to enable or disable. You can add any new configuration. This basically overrides your base configuration.
- Save the configuration as `/run/systemd/system/<service_name>.d/override.conf`
- Reload the systemd daemon and restart the service as mentioned in step 4.
- You can check if your service is using the new configuration using command:

```
$> sudo systemctl show <service_name>.service
```
- You see base configuration also:

```bash
$> sudo systemctl cat <service_name>.service
```
7. If the new configuration works for you, you can check the exposure level of the service using command:

```bash
$> systemd-analyze security
$> systemd-analyze security <service_name>.service #For detailed information
```
8. Update the configuration in Ghaf repo and build it. Hardened service configs are available in directory `ghaf/modules/common/systemd/hardened-configs`
88 changes: 88 additions & 0 deletions docs/src/troubleshooting/systemd/systemd-analyzer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
<!--
Copyright 2022-2024 TII (SSRC) and the Ghaf contributors
SPDX-License-Identifier: CC-BY-SA-4.0
-->

# `systemd-analyze` Tool

`systemd-analyze` is a powerful tool that helps diagnose and troubleshoot issues related to systemd services. It provides various commands to analyze the performance and dependencies of services, as well as to pinpoint issues during the boot process.

### Steps to Analyze Systemd Services

#### 1. **Analyze Boot Performance**

`systemd-analyze` can help you understand how long each service takes to start during boot. This is useful for identifying services that are slowing down the boot process.


* To get a summary of the boot time:

```bash
$> systemd-analyze
```

This command shows the overall time taken to boot, including the kernel, initrd, and userspace times.
* To see a detailed breakdown of how long each service took to start:

```bash
$> systemd-analyze blame
```

This lists all services in order of their startup time, with the slowest services listed first.
* For a graphical representation of the boot process, you can use:

```bash
$> system-analyze plot > boot-time.svg
```

This command generates an SVG file that visually represents the startup times of all services. You can view this file in any web browser.

#### 2. View Service Dependencies

To troubleshoot issues related to service dependencies, you can visualize the dependency tree of a specific service. To display the dependency tree of a service:

```bash
systemd-analyze critical-chain <service_name>.service
```

This command shows the critical path that affects the startup time of the service, highlighting any dependencies that may delay its startup.


#### 3. Verify Unit Files

To verify the configuration of a service's unit file:

- ```bash
$> systemd-analyze verify <service-name>.service
```

This command checks the syntax and can help identify configuration issues.

#### 4. Check for Cyclic Dependencies

Cyclic dependencies can cause services to fail or hang during boot. systemd-analyze can check for these issues:

- To check for any cyclic dependencies:

```
$> systemd-analyze verify --man=your-service-name.service
```

This will warn you about any loops or issues within the unit's dependency tree.



#### 5. Analyze Security Settings

`systemd-analyze` can also assess the security of your service’s configuration:

- To evaluate the overall threat exposure of systemd services, use:

```bash
$> systemd-analyze security
```
- To evaluate the security of a specific service:

```bash
$> systemd-analyze security <service-name>.service
```
This command provides a security assessment, scoring the service based on various hardening options and highlighting potential weaknesses.
18 changes: 15 additions & 3 deletions modules/common/systemd/harden.nix
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ in {
type = lib.types.bool;
default = false;
};

excludedHardenedConfigs = lib.mkOption {
default = [];
type = lib.types.listOf lib.types.str;
Expand All @@ -34,15 +35,26 @@ in {
configurations for fast debugging and problem resolution.
'';
};

logLevel = lib.mkOption {
description = '' Log Level for systemd services.
Available options: "emerg", "alert", "crit", "err", "warning", "info", "debug"
'';
type = lib.types.str;
default = "info";
};
};

config = lib.mkIf cfg.withHardenedConfigs {
config = {
systemd = lib.mkMerge [
# Apply hardened systemd service configurations
(apply-service-configs ./hardened-configs/common)
(lib.mkIf cfg.withHardenedConfigs (apply-service-configs ./hardened-configs/common))

# Apply release only service configurations
(lib.mkIf (!cfg.withDebug) (apply-service-configs ./hardened-configs/release))
(lib.mkIf (!cfg.withDebug && cfg.withHardenedConfigs) (apply-service-configs ./hardened-configs/release))

# Set systemd log level
{services."_global_".environment.SYSTEMD_LOG_LEVEL = cfg.logLevel;}
];
};
}

0 comments on commit 5a0fc2d

Please sign in to comment.