Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the /sys/devices/system/cpu/isolated file is null ,Node-exporter fails to start #3152

Open
viviaiyc opened this issue Oct 10, 2024 · 4 comments

Comments

@viviaiyc
Copy link

sfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
panic: Couldn't create metrics handler: couldn't create collector: Unable to get isolated cpus: strconv.Atoi: parsing " (null)": invalid syntax

goroutine 1 [running]:
main.newHandler(0x1, 0x28, {0xcef880, 0xc00004a4c0})
/app/node_exporter.go:69 +0x2ab
main.main()
/app/node_exporter.go:201 +0x128d

微信图片_20241010185623
the /sys/devices/system/cpu/isolated file is null

@dswarbrick
Copy link
Contributor

Could you perhaps provide a bit more information about your system (i.e., fill in some of the questions in the issue template)?

At a bare minimum, it would be helpful to paste the output of uname -a (preferably not as a screenshot).

@viviaiyc
Copy link
Author

微信图片_20241011100356

uname -a

@dswarbrick
Copy link
Contributor

dswarbrick commented Oct 11, 2024

Please stop posting screenshots of text output. They completely defeat text searches and are thus totally unsuited to posting textual data. You'll save yourself the effort of having to blur sensitive data if you simply copy & paste the relevant text content from your terminal.

In order to avoid a lot of guesswork about your environment, please fill out all the questions from the the issue template. The template exists for a reason.

Are you specifying the isolcpus option on your kernel command line? If so, please provide its value.

@dswarbrick
Copy link
Contributor

dswarbrick commented Oct 18, 2024

In the meantime, I did a little digging to find the kernel source code responsible for outputting the contents of /sys/devices/system/cpu/isolated. The function is this (matched to user's reported kernel version):

static ssize_t print_cpus_isolated(struct device *dev,
				  struct device_attribute *attr, char *buf)
{
	int n = 0, len = PAGE_SIZE-2;
	cpumask_var_t isolated;

	if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
		return -ENOMEM;

	cpumask_andnot(isolated, cpu_possible_mask,
		       housekeeping_cpumask(HK_FLAG_DOMAIN));
	n = scnprintf(buf, len, "%*pbl\n", cpumask_pr_args(isolated));

	free_cpumask_var(isolated);

	return n;
}

The relevant part is the scnprintf. The call-chain is basically scnprintf -> vscnprintf -> vsnprintf, where we can see:

 * This function generally follows C99 vsnprintf, but has some
 * extensions and a few limitations:
 *
 *  - ``%n`` is unsupported
 *  - ``%p*`` is handled by pointer()
 *
 * See pointer() or Documentation/core-api/printk-formats.rst for more
 * extensive description.

This takes us to pointer, where we can finally see what would produce the output (null):

	if (!ptr && *fmt != 'K' && *fmt != 'x') {
		/*
		 * Print (null) with the same width as a pointer so it makes
		 * tabular output look nice.
		 */
		if (spec.field_width == -1)
			spec.field_width = default_width;
		return string(buf, end, "(null)", spec);
	}

Unsurprisingly, passing a null pointer results in the output of (null). If a valid pointer had been passed, the format string %*pbl would have triggered this code instead:

	case 'b':
		switch (fmt[1]) {
		case 'l':
			return bitmap_list_string(buf, end, ptr, spec, fmt);
		default:
			return bitmap_string(buf, end, ptr, spec, fmt);
		}

I won't go into further detail, but here's the definition of bitmap_list_string for the curious.

So the question is basically whether it is normal for the cpumask_var_t isolated pointer in scnprintf(buf, len, "%*pbl\n", cpumask_pr_args(isolated)) to be null, and if so, under what circumstances. In other words, if this (null) contents is the result of a legitimate user configuration, then node_exporter needs to handle it gracefully.

The only reference I have found (so far) to that sysfs entry outputting (null) is https://access.redhat.com/solutions/3875421.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants