Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: diagnostics in lines with multi-byte chars
There's a conflict between the way Lua interprets strings with multi-byte characters and the way we pass the `col` field through the patterns. For example, the length for the string: `* example typox` in every other language would be `15`, but Lua counts the bytes in the string, not the number of printable characters. This means that for the same string, lua returns `16` as the length of the string. The report coming from CSpell also counts only printable characters, so for a file like this: `test.md` ```markdown * example typox · example typox ``` The report will be: `npx cspell -c cspell.json lint --language-id markdown test.md` ``` 1/1 ./test.md 163.45ms X ./test.md:1:11 - Unknown word (typox) ./test.md:2:11 - Unknown word (typox) ``` Both lines have the same column as the start of the unknown word, because CSpell doesn't count bytes when reporting the position of the error. So when we read the column from the report we just forward whatever we got from the CSpell report. The `end_col` ends up with the correct position because we calculate it with the custom `from_quote` adapter, which finds the end column programmatically. To counter that discrepancy, I'm using the column reported by CSpell only as an index to start looking for the word reported as an error in the `end_col` function, and mutating the entries table to define the `col` property in the same function. I have a proof of concept that seems to work as expected, I'll test a few scenarios before I push anything. IMO, that feels a bit too hacky to keep as a long-term solution, we should look into validating the `col` property in none-ls.
- Loading branch information