Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Markup Language Support #7

Open
hesstobi opened this issue Apr 4, 2017 · 17 comments · May be fixed by #23
Open

Improve Markup Language Support #7

hesstobi opened this issue Apr 4, 2017 · 17 comments · May be fixed by #23
Assignees
Milestone

Comments

@hesstobi
Copy link
Contributor

hesstobi commented Apr 4, 2017

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE
A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

@wysiib wysiib added this to the 0.5.0 milestone Apr 5, 2017
@wysiib wysiib self-assigned this Apr 5, 2017
@wysiib
Copy link
Owner

wysiib commented Apr 5, 2017

We should go for a proper solution following the linter-spell one. Will look at it in the coming days.

@wysiib
Copy link
Owner

wysiib commented Jun 20, 2017

I am unsure whether the core plugin should include language-specific features. The same goes for issue #7. However, I am not sure about an API for connecting language-definitions as separate packages either. Any suggestions?

@zoenglinghou
Copy link

linter-spell-latex actually compiled excluded scopes for latex. Might be helpful.

@wysiib
Copy link
Owner

wysiib commented Mar 5, 2018

We thought about porting the solution done by the linter-spell package for quite some time. Currently, I am switching jobs and thus I do not have the time to implement things myself. But I will look into it, properly around end of Mai.

@29antonioac
Copy link

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE
A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

Hi! How could I use this workaround until a final solution is found?

Thanks!

@hesstobi
Copy link
Contributor Author

hesstobi commented Nov 3, 2018

You can use my branch, which add the basic support for markup languages using the linter-spell-api. I use this a lot for latex. There are still a lot of things missing....
https://github.com/hesstobi/linter-languagetool/tree/linter-spell-api

@29antonioac
Copy link

Thanks for your work! It works pretty well :).

Only one question: in my documents the command \gls{} for handling acronyms are not correctly filtered. Is this a problem related to your plugin or related to linter-spell?

Thanks for all!

@73
Copy link

73 commented Nov 20, 2018

I would like to give this thumbs up. Support for LaTeX would be so awesome!

@davidlday
Copy link

I don't know if this helps or not, but the LanguageTool Server now has support for processing annotated text. Not sure when exactly they implemented it. You can see the data parameter of the API at SwaggerHub for an example. It takes a value like:

{"annotation":[
 {"text": "A "},
 {"markup": "<b>"},
 {"text": "test"},
 {"markup": "</b>"}
]}

Using the linter-spell approach, perhaps the different formats could be mapped to this annotated format? This would preserve offsets, I believe, and potentially be easier than trying to reduce to pure text.

@wysiib
Copy link
Owner

wysiib commented Dec 23, 2018

That sounds like another nice way to proceed. I agree, reducing to pure text and keeping offsets intact might be quite a hassle. However, I haven't found a list of "all" the annotations in say Latex. Could this be derived from the language tokens Atom creates anyway? @hesstobi since this is somewhat related to what you are doing: any input?

@hesstobi
Copy link
Contributor Author

Yes I think this is a good way to go. But I currently do not have any time to work on that.

@davidlday
Copy link

davidlday commented Dec 30, 2018

I created a few stand-alone packages that convert markup into LanguageTool's annotated text that might help:

My quick search for a LaTeX parser turned up a couple of packages, but also several SO posts on how challenging it is to create a parser. If you all know of a good parser, I can see about creating another package to handle it. Or you're free to leverage the above to create one as well. :)

@hesstobi
Copy link
Contributor Author

Nice work. But I think this is more useful outside of Atom. Because you will need a parser for every grammar. Atom includes the parsing of all major grammars. With the linter-spell-api it is possibility to choose which scopes should be checked by LanguageTool. This will enable LanguageTool to check comments in programming languages and so on.

@davidlday
Copy link

Thank you. I see where I misunderstood the parsing in Atom. Should have looked a little closer. :( Anyhow, I'll dig in a little deeper on the grammars & linter-spell as I have time and see if I can help out.

@davidlday
Copy link

I've been watching/commenting on an issue on atom-wordcount that feels like a similar problem. Basically trying to eliminate all non-natural language text from a document's word count. Getting tokenized lines seems to be possible using Atom's public API by:

editorGrammar = editor.getGrammar()
editorGrammar.tokenizeLines(editor.getText())

See the early snippet in the issue for an example of filtering out scopes using first-mate. This doesn't work for tree-sitter grammars but a similar approach should be possible

@hesstobi
Copy link
Contributor Author

This is the API we need! I added this to #23. But we should also find a way for tree-sitter.

@mbroedl
Copy link

mbroedl commented Jan 17, 2019

@hesstobi Have a look at this commit where I try to use the editor.tokensForScreenRow() API. Note that this API is undocumented and thus subject to change! (See also the discussion in atom-wordcount again.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants