-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal performance when a large folder is listed in .markdownlintignore #108
Comments
This is great detail, thank you very much! I‘be been working on an experiment that I’d like you to try in a week or so that may help with this. I’ll follow up here when it’s ready. |
I've published something new: https://github.com/DavidAnson/markdownlint-cli2 It's a slightly different take on a CLI, please see the README for context. For your purposes, something like the following should be equivalent to what you have today. But faster (according to my own limited testing). Could you please give it a try and let me know what you find? .markdownlintignore:
Command-line:
Aside: The pattern you show of including all files and then excluding Markdown files in the ignore file is not supported by CLI2 – and kind of weird anyway. I wouldn't guess it's doing much to help with performance, though I agree that in the abstract it is equivalent to what I show above. |
By the way, you could put the two ignore paths on the command-line, but I was trying to match your current configuration as closely as possible. |
Hey David, thanks for the follow up! I’ll try testing the new CLI performance over the weekend, fingers crossed! I understand that the If for Markdownlint the rule is simply to include See this project as an example: https://github.com/kachkaev/njt. It is quite small so does not suffer from CLI performance issues, but it has a full-blown config for the linters. Hope it helps illustrate the coherence being achieved with the described approach to the |
That reasoning makes sense, but interferes with what I’m trying to do which is represent all of the positive and negative pattern matching on the command line so it can be passed into the glob tool for best performance. As you suggested above, the slow implementation of ignore is to apply the ignore file rules with a filter after using the globs to search the disk. I looked briefly at ESLint and it seems that’s what they do, though they apply some default ignore rules at glob time to filter out, for example, the node_modules directory which is probably why it’s not also slow. Though I didn’t analyze it in depth, allowing negative patterns in the ignore file didn’t seem to work with this approach because they are supposed to be scoped to just the ignore operation (“!*.md” in an ignore file does not mean the same as “*.md” on the command line). I may be wrong about this or you may point out a way to handle it better. I’d be happy to consider either! But for now, this compromise seemed to have a fairly minor effect on users but a potentially significant effect on performance and so that’s what I’ve gone with. |
FYI, I've just published version |
I've got a node_modules
/prototypes I've got 35 I tried constructing the command differently, not depending on
Is this the same issue reported here? |
@borekb Yes, seems like the same thing. However, both approaches should be fast in |
Thanks, I'll try to experiment with it sometimes soon. |
@DavidAnson I can confirm that cli2 doesn't have this problem 🎉. |
I ran my reproduction steps, but replace |
I can also confirm
Edit: I now realized the original CLI is from different author, sorry 🤦. |
@adamkudrna Thanks for confirming that! My motivation for doing CLI2 was that there were a number of features/improvements that I didn't see a clean way of adding to CLI1. I will continue updating the markdownlint dependency and keep the tests passing and review pull requests for CLI1, but will probably dedicate most of my time on features/requests to CLI2. |
Just tried
The stack points to this line: |
Great, thank you for letting me know! We should probably move this investigation to an issue in the repository for CLI2. Could you please open something there? Looking at the info above, that’s not a particularly deep call stack, so I’m guessing this is more to do with Promise chains? Also, it might help to know which repository you are seeing this problem with. If you aren’t comfortable sharing that publicly, we can work something out. |
Quick update, I don’t need an example to reproduce this, I think I figured out what’s going on by inspection. (The message is a little bit misleading.) |
@kachkaev, that Some info about the bug: https://twitter.com/DavidAns/status/1297361113386872833 |
I have a project with a local data folder, which is listed in all
.*ignore
files. When this temp folder gets quite large (more than 10K files), Markdownlint seems to become much slower than other linting tools (Prettier and ESLint). This suggests that the ignore logic could be improved.Here are the reproduction steps for macOS Catalina and zsh. I only compare Markdownlint with Prettier here for simplicity, but according to my observations, ESLint also remains quite performant.
Init a project with
README.md
, Prettier, Markdownlint and identical.prettierignore
+.markdownlintignore
files.Run Prettier and Markdownlint and observe reasonable performance.
Create a large folder, which is already listed in the ignore files.
Run Prettier and Markdownlint again.
Delete the project to release disk space.
It feels like Markdownlint CLI still lists all the files in the ignored folder and then matches each path against the ignore list, while Prettier does not even look into
my-big-temp-folder
.I don’t know Markdownlint internals well enough to submit a PR, but I hope that my repro steps may help with the investigation 🙌
I believe that having large local data / var / temp folders is not that rare in software development, so I won’t be the only person who would benefit from performance improvements 🙂
The text was updated successfully, but these errors were encountered: