-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to gengo #1305
base: main
Are you sure you want to change the base?
Switch to gengo #1305
Conversation
To get this working with one language and then expand from there.
This doesn't do much but make the parts that used to use tokei panic.
This reverts commit d601ee7.
Languages that aren't yet supported are commented out.
languages.get_statistics(&[&dir], &ignored, &tokei_config); | ||
languages | ||
) -> Result<gengo::Analysis, Box<dyn Error>> { | ||
// TODO Determine best way to ignore files (and if that should continue to be handled by onefetch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have language type and glob filtering for the V1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is this going to be handled via .gitattributes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on the "file source." Right now the Git
file source is highly opinionated and would use .gitattributes
for ignoring files, overrides, etc.
I plan on the Directory
file source being much less opinionated (can't get much more generic than just reading a folder 🙂). The functionality isn't there yet, but I was considering adding ways to configure excluding/including files. Right now .gitignore
and .ignore
files would ignore files via the ignore
crate's default behavior. I was hoping to discuss the usage sometime, but it would probably either be like this:
let file_source = Directory::with_config("./", Config { ignored_files: Some(vec![]) });
or like this:
let file_source = Directory::new("./").extend_ignored_files(&[]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clear, and it also aplies to the language type filter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default Gengo marks Programming, Markup, and Query as detectable, and Prose and Data as not detectable. detectable
is basically a boolean that answers "should this factor in to the stats?" Also, documentation, generated files, and vendored files are not detectable by default. Assuming we use the Git
file source with .gitattributes
, this is example usage:
# Markdown is Prose, but we still want it in the stats
*.md gengo-detectable
# JavaScript is Programming, but we want to exclude it
*.js -gengo-detectable
# The contents of dist/ are not generated, so they will be detectable
dist/* -gengo-generated
# OR
# Even though dist/* is generated, it should still be included in the stats
dist/* gengo-detectable
By default any file that is not detectable is excluded from the Summary
.
Again, the Git
file source is highly opinionated and tries to behave a lot like github-linguist. The usage with the Directory
file source is still undecided, and right now it doesn't implement any overrides. Or we could write our own file source (#1303 (reply in thread)) if we want very specific behavior that wouldn't be suitable for the gengo
crate.
But, however we implement it, IMO the best way would be to inform gengo what files are and aren't detectable, and then make use of either the Analysis
(detailed results) or Summary
(simplified results).
tl;dr Yes, gengo would be the one handling included/excluded types, and we'd just pass configuration to it somehow.
languages | ||
) -> Result<gengo::Analysis, Box<dyn Error>> { | ||
// TODO Determine best way to ignore files (and if that should continue to be handled by onefetch) | ||
let file_source = Git::new(dir, "HEAD")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the pending changes taken into count for the language analysis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be handled by gengo.
Remap languages
Alright, after a long time procrastinating, I implemented the last of onefetch's languages (besides combining Bash, Zsh, and Sh into Shell). One thing to note is that the MSRV has been bumped. |
Bumping gix resulting in compiler errors (new usage). Just a moment. |
afa0c1a
to
47136f7
Compare
After thinking about this a bit, I think an issue with tokei and other line-counting tools is that they specialize in counting lines, and, while language ID is a priority, not the top priority. Which leads to e.g. tokei having poor out-of-the-box support for Verilog and V (their So I'd like to add line counts back at some point, possibly by using gengo as a dependency of some crate and doing something vaguely like match Language {
JupyterNotebook => my_super_special_jupyter_notebook_counter()
_ => generic_counter(Language) // would use community-provided data similar to tokei's languages.json
} And on that topic, I'd recommend looking into |
See #1152 for previous conversation. With the drastic API changes to gengo it started to feel like more work to modify the existing PR instead of starting over.
To Do
Language
andtokei::LanguageType
enums.Resolves #26
Closes #1152