Improve performance - skip manifest read if no components/targets requested #2917

JoshMcguigan · 2021-12-04T05:54:41Z

I was looking into why rustup had a performance penalty over calling cargo directly in simple cases, and found most of the time was spent parsing the multirust-channel-manifest.toml file. Looking more closely, it seemed parsing that file was not really required in many cases. This PR tweaks the ordering of the code to skip parsing this file when it isn't needed.

I do not intend to make any behavioral change with this, other than potentially if some code paths handle an error while reading this file (and now they won't see that error in cases where we skip reading the file entirely). I believe this is the reason the heal_damaged_toolchain test is now failing. I need to look into this more closely (this PR is marked as draft until I figure this out), but I'd appreciate any insight into whether that test was intended to cover this behavior.

The performance changes were tested in a brand new hello world cargo project (cargo new hello), after building rustup in release mode and installing it into a rustup/home directory as described in the contributing guidelines. The before tests were done on the current master branch, because it seems to already be much faster than the most recent official release.

Before this change a repeated call to cargo build (through the rustup proxy) would take ~90ms. After this change it takes ~35ms. Running cargo directly takes ~22ms. So this change reduces the rustup overhead in this case from 68ms to 13ms (a reduction of 80%).

JoshMcguigan · 2021-12-04T17:17:59Z

It looks like the heal_damaged_toolchain test is failing because it relies on rustup show active-toolchain having previously read/parsed the manifest, which is specifically what this PR removes (since reading/parsing the manifest seems to not be strictly required for that operation, and it takes meaningful time).

Is my understanding correct here? And is it reasonable to a command like rustup show active-toolchain to NOT heal the damaged toolchain manifest file (since it doesn't even need to look at that file)? If so, is there a command which would be more appropriate to run as part of this test to demonstrate the healing?

kinnison · 2021-12-27T10:52:13Z

Personally I think the request to show the active toolchain (which implies checking for components in the rust-toolchain.toml being available) should be able to succeed at healing things.

I remain convinced that the correct way to ameliorate the incredible slowness of parsing the TOML is to, instead, keep a cached faster-to-parse copy alongside it, e.g. BINCODE.

dralley · 2022-01-12T04:35:35Z

Has any profiling been done on the TOML parsing itself? Just glancing through the file, the parsing code doesn't seem especially efficient, it could probably benefit from a little focused attention.

edit: I just created one and the main thing that stands is tons and tons of individual allocations and drops

JoshMcguigan · 2022-01-12T06:31:01Z

I created a small one.

use std::fs::read_to_string;

fn main() {
    let s = read_to_string("/home/josh/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/multirust-channel-manifest.toml").unwrap();

    for _ in 0..10 {
        s.parse::<toml::Value>().unwrap();
    }
}

This runs in ~450ms on my machine. I found I can improve this by ~100ms by bypassing the \r\n windows style newline handling in src/tokens.rs, which I actually wonder if that is where you are seeing all the allocations. However I'm not sure how we could get this improvement while also handling windows style newlines (short of replacing all \r\n with \n before parsing, but there is a memory use cost there).

edit: I just created one and the main thing that stands is tons and tons of individual allocations and drops

I'd like to poke around at this. How did you gather this data?

dralley · 2022-01-12T16:47:08Z

I'd like to poke around at this. How did you gather this data?

If you pass the --reverse flag it groups together all of the tiny calls at the bottom of the call stack instead of the ones at the top, if you do it seems that about 25% of the total time is spent on allocs, frees and moves. Another 8% is spent computing hashes.

JoshMcguigan · 2022-01-13T21:23:51Z

I've been using frida tracing to count malloc calls, and its about 100,000 allocations to parse the manifest file a single time (so the example above would be 10x that). I replaced Vec in one place with TinyVec and cut the allocations by ~10%, but the runtime didn't really change.

I'm probably not going to continue down this path right now, but I'd be very curious to see if anyone else comes up with anything fruitful here.

skip manifest read if no components/targets requested

40f9d28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance - skip manifest read if no components/targets requested #2917

Improve performance - skip manifest read if no components/targets requested #2917

JoshMcguigan commented Dec 4, 2021 •

edited

Loading

JoshMcguigan commented Dec 4, 2021

kinnison commented Dec 27, 2021

dralley commented Jan 12, 2022 •

edited

Loading

JoshMcguigan commented Jan 12, 2022 •

edited

Loading

dralley commented Jan 12, 2022 •

edited

Loading

JoshMcguigan commented Jan 13, 2022

Improve performance - skip manifest read if no components/targets requested #2917

Are you sure you want to change the base?

Improve performance - skip manifest read if no components/targets requested #2917

Conversation

JoshMcguigan commented Dec 4, 2021 • edited Loading

JoshMcguigan commented Dec 4, 2021

kinnison commented Dec 27, 2021

dralley commented Jan 12, 2022 • edited Loading

JoshMcguigan commented Jan 12, 2022 • edited Loading

dralley commented Jan 12, 2022 • edited Loading

JoshMcguigan commented Jan 13, 2022

JoshMcguigan commented Dec 4, 2021 •

edited

Loading

dralley commented Jan 12, 2022 •

edited

Loading

JoshMcguigan commented Jan 12, 2022 •

edited

Loading

dralley commented Jan 12, 2022 •

edited

Loading