Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide richer information when backend asks about active editor #2132

Merged
merged 14 commits into from
Jan 31, 2024

Conversation

jennybc
Copy link
Member

@jennybc jennybc commented Jan 25, 2024

Part of #1312. Builds on the POC that was included in #2019. In particular, last_active_editor_context is expanded from providing just a filepath to essentially all of the data members of vscode.TextEditor (or, at least, all of the ones I think are conceivably useful to a backend). This means we get coordinates and text for all of the current selections, as well as the entire contents of the editor.

Companion PRs:
posit-dev/ark#213
https://github.com/posit-dev/positron-python/pull/329

positron/comms/ui-frontend-openrpc.json Show resolved Hide resolved
],
"components": {
"schemas": {
"uiTextDocument": {

This comment was marked as outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure I have to invent a name for this schema. The rationale for uiTextDocument is that this schema is based on vscode.TextDocument and it exists for use in our "ui" comms.

This comment was marked as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All names in these contracts should be in snake_case

I have done this now.

I'm not sure we need a comm prefix

I've removed the 'ui' prefix as well. It means we have, for example, positron.TextDocument that is ... shall we say, inspired-by-but-not-the-same-as vscode.TextDocument. But I guess this is fine (?).

positron/comms/ui-frontend-openrpc.json Outdated Show resolved Hide resolved
// Related to https://github.com/posit-dev/positron/issues/12
type TEC = import('./ui-comm').TextEditorContext;
export type TextEditorContext = TEC;

Copy link
Member Author

@jennybc jennybc Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't want to manually copy/paste/tweak the contents of the automatically generated src/vs/workbench/services/languageRuntime/common/positronUiComm.ts here. Consider this an opening move towards a possible future where generate-comms.ts also places a file in src/positron-dts/ that is brought into the positron namespace kinda like I'm doing here. I'm probably not doing this in an optimal way but I was elated to just find a way to keep these in a separate file!

Copy link
Member Author

@jennybc jennybc Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current, working state of this experiment (and this PR) is that I copy/paste the necessary excerpt of src/vs/workbench/services/languageRuntime/common/positronUiComm.ts into src/positron-dts/ui-comm.d.ts. The two lines above are as close as I can come to inlining that file. This StackOverflow answer was extremely helpful (https://stackoverflow.com/a/51114250) in understanding what is and is not possible in a so-called ambient module declaration. AFAICT this is the only way to import types into a global module declaration.

Our long term hopes are to rationalize this further (#12, #265), but I accept that this is as far as we can come in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for that SO link!

export interface TextEditorContext {
/** URI of the edited document */
readonly path: string;
}
Copy link
Member Author

@jennybc jennybc Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the direct inlining that felt OK when the interface contains a single string (readonly path: string). But it feels very unsustainable when dealing with the hundreds of automatically generated lines needed to bring in the interfaces I'm adding here.

@@ -32,6 +32,7 @@
// --- Start Positron ---
"./vs/**/*.tsx",
"positron-dts/positron.d.ts",
"positron-dts/ui-comm.d.ts",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary because I keep the new interfaces in their own file.

@@ -40,7 +40,7 @@ export class ExtHostMethods implements extHostProtocol.ExtHostMethodsShape {
async call(method: UiFrontendRequest, params: Record<string, any>): Promise<JsonRpcResponse> {
try {
if (!Object.values(UiFrontendRequest).includes(method)) {
return <JsonRpcError> {
return <JsonRpcError>{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic formatting did this.

interface ITextEditorContext {
readonly path: string;
}
type ITextEditorContext = EditorContextResult;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many copies! But at least this one is just an alias.

// * Reduce the manual proliferation of these generated types.
// * Ideally a file is meant to edited by humans or by robots, but not both.
// Related to https://github.com/posit-dev/positron/issues/12
type TEC = import('./ui-comm').TextEditorContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially assuming we'd add a fenced area in positron.d.ts that generate-comms.ts would update, to preserve the file self-contained just like vscode.d.ts is.

Do we risk making imports management a bit harder in extensions? Currently they are copying that interface file locally and will now have to import two of them and remember to keep both in sync.

That said, eventually these types will probably be exported in an npm package for extensions to use, e.g. like https://www.npmjs.com/package/@types/vscode. When I jump to definition from import * as vscode from 'vscode';, I end up in a node_modules folder where the vscode.d.ts file has become an index.d.ts file inside a vscode folder. So I guess in that configuration multiple files will work out transparently (I'm a web dev neophyte so I'm not sure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all good points, so my main goal is to encourage us to discuss and figure this out. I think we do have lots of evidence from R package development that it's beneficial to draw a hard line between generated files and human-curated files, whenever possible. But if we can't do that here or choose not to, yeah we should fence off a special area in positron.d.ts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've still got this in a separate file for the moment.

assuming we'd add a fenced area in positron.d.ts

If I did that, I'd be adding >100 lines which is >10 of that file. Then there's the higher degree of difficulty of managing a file authored by both humans and computers. And I think this problem will just get worse. But I will put this directly in positron.d.ts if that's what folks want.

Do we risk making imports management a bit harder in extensions? Currently they are copying that interface file locally ...

So I see this in positron-python (https://github.com/posit-dev/positron-python/blob/98da4f51910f36c1d898a96461acdf19b662d19c/positron-dts/positron.d.ts), but not in positron-r. Why do we need to do that in positron-python but not positron-r? Is this also a manual copy paste into positron-python?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question about why positron-python maintains its own copy of positron.d.ts is answered here: https://github.com/posit-dev/positron-python/pull/329/files#r1472174669

@jennybc

This comment was marked as resolved.

@jennybc

This comment was marked as resolved.

@jennybc jennybc force-pushed the more-editor-context branch from 304175e to 34ce140 Compare January 26, 2024 19:09
@jennybc

This comment was marked as outdated.

],
"components": {
"schemas": {
"uiTextDocument": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure I have to invent a name for this schema. The rationale for uiTextDocument is that this schema is based on vscode.TextDocument and it exists for use in our "ui" comms.

"type": "string",
"description": "URI of the resource viewed in the editor"
},
"eol": {
Copy link
Member Author

@jennybc jennybc Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this an actual enum, but it created lots of problems with types. So eol is a simple string now and it will be either \n or \r\n.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the only reasonable way to make this a "real" enum would be to use enum values like "windows" and "unix" and then replace them with the physical sequences after/before serialization. Definitely not a high priority change.

positron/comms/ui-frontend-openrpc.json Outdated Show resolved Hide resolved
// Related to https://github.com/posit-dev/positron/issues/12
type TEC = import('./ui-comm').TextEditorContext;
export type TextEditorContext = TEC;

Copy link
Member Author

@jennybc jennybc Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current, working state of this experiment (and this PR) is that I copy/paste the necessary excerpt of src/vs/workbench/services/languageRuntime/common/positronUiComm.ts into src/positron-dts/ui-comm.d.ts. The two lines above are as close as I can come to inlining that file. This StackOverflow answer was extremely helpful (https://stackoverflow.com/a/51114250) in understanding what is and is not possible in a so-called ambient module declaration. AFAICT this is the only way to import types into a global module declaration.

Our long term hopes are to rationalize this further (#12, #265), but I accept that this is as far as we can come in this PR.

src/positron-dts/positron.d.ts Outdated Show resolved Hide resolved
*--------------------------------------------------------------------------------------------*/

//
// Copied from src/vs/workbench/services/languageRuntime/common/positronUiComm.ts; do not edit.
Copy link
Member Author

@jennybc jennybc Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a verbatim copy and paste of the relevant interfaces from positronUiComm.ts. Maybe one day generate-comms.ts will eliminate the need for this (#265, #12).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent renaming really confirms that it's no fun to manually maintain this copy.

import { JsonRpcErrorCode } from 'vs/workbench/services/languageRuntime/common/positronBaseComm';

import { EndOfLine } from '../extHostTypeConverters';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end-of-line enum is surprisingly tricky to handle! It is defined in vscode.d.ts but you can't import from vscode in the extension host. I learned that the hard way. You don't see any PROBLEMS but the extension host becomes DOA in dev positron.

I assume that this explains the existence of all the type conversion machinery in https://github.com/posit-dev/positron/blob/main/src/vs/workbench/api/common/extHostTypeConverters.ts, which is what I use below.

Comment on lines +91 to +96
// The gymnastics here are so that we return character positions with respect to
// Unicode code points. Otherwise, the native Position type provides offsets with respect to
// UTF-16 encoded text. That would be confusing for downstream consumers, who probably
// ultimately receive this text as UTF-8 and want to operate on this text in terms of
// as user-perceivable "characters". This only matters when the selection's neighborhood
// includes Unicode characters in the astral plane.
Copy link
Member Author

@jennybc jennybc Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces position data that is (a) actually correct and (b) makes the most sense in R. If these are the active selections:

Screenshot 2024-01-30 at 3 42 25 PM

here's the data we expose via rstudioapi::getActiveDocumentContext():

> f <- function(sel) {
+   x <- list(
+     text = sel$text,
+     `nchar(text)` = nchar(sel$text),
+     `range` = glue::glue('
+       [{sel$range$start["row"]}, {sel$range$start["column"]}] --\\
+       [{sel$range$end["row"]}, {sel$range$end["column"]}]'),
+     `end column - start column` =
+       sel$range$end['column'] -  sel$range$start['column'],
+     `strsplit(text, split = '')[[1]]` = strsplit(sel$text, split = '')[[1]],
+     `substr(text, 1, 1)` = substr(sel$text, 1, 1),
+     `substr(text, 2, 2)` = substr(sel$text, 2, 2)
+   )
+   fr <- function(x) format(x, justify = "right")
+   fl <- function(x) format(x, justify = "left")
+   print(glue::glue("{fr(names(x))}: {fl(x)}"))
+   print(glue::glue("\n"))
+   invisible()
+ }
> purrr::walk(x$selection, f)
                           text: 🤖🌷
                    nchar(text): 2
                          range: [1, 1] --[1, 3]
      end column - start column: 2
strsplit(text, split = '')[[1]]: 🤖, 🌷
             substr(text, 1, 1): 🤖
             substr(text, 2, 2): 🌷

                           text: 😱
                    nchar(text): 1
                          range: [2, 1] --[2, 2]
      end column - start column: 1
strsplit(text, split = '')[[1]]: 😱
             substr(text, 1, 1): 😱
             substr(text, 2, 2): 

                           text: ?💅
                    nchar(text): 2
                          range: [3, 1] --[3, 3]
      end column - start column: 2
strsplit(text, split = '')[[1]]: ? , 💅
             substr(text, 1, 1): ?
             substr(text, 2, 2): 💅

                           text: Ab
                    nchar(text): 2
                          range: [4, 1] --[4, 3]
      end column - start column: 2
strsplit(text, split = '')[[1]]: A, b
             substr(text, 1, 1): A
             substr(text, 2, 2): b

I like the way the reported range is consistent with R's notion of the selection length and positions, as reported by nchar() and substr().

Note that things get a little tricky in the presence of an Emoji ZWJ Sequence. But the data we return is still consistent with what R does.

Screenshot 2024-01-30 at 3 51 34 PM
> x <- rstudioapi::getActiveDocumentContext()
> purrr::walk(x$selection, f)
                           text: 🤷‍♀️😭
                    nchar(text): 5
                          range: [1, 1] --[1, 6]
      end column - start column: 5
strsplit(text, split = '')[[1]]: 🤷, ‍  , ♀ , ️  , 😭
             substr(text, 1, 1): 🤷
             substr(text, 2, 2):text: 🤖🌷
                    nchar(text): 2
                          range: [2, 1] --[2, 3]
      end column - start column: 2
strsplit(text, split = '')[[1]]: 🤖, 🌷
             substr(text, 1, 1): 🤖
             substr(text, 2, 2): 🌷

Note that the "woman shrugging" emoji is is a ZWJ sequence combining 🤷 Person Shrugging, 🏻 Light Skin Tone, Zero Width Joiner and ♀️ Female Sign.(https://emojipedia.org/female-sign). Our range suggests the first line contains 5 characters which matches nchar(). To do any better, all the tools involved would need an awareness of Unicode normalization and of graphemes. I'm not pursuing that at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. IIUC, the Javascript string iterator and the R string utilities are aligned in the way they interpret unicode characters, which seems like a happy outcome.

If we want to do better in the future for other backends or specialised consumers of rstudioapi, we could add a parameter to the document request to require positions and ranges in normalised unicode and whatnots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the Javascript string iterator and the R string utilities are aligned in the way they interpret unicode characters, which seems like a happy outcome

Yes ☝️ that's a good summary.

// The selections in this text editor. The primary selection is always at index 0.
//
// The gymnastics here are so that we return character positions with respect to
// Unicode code points. Otherwise, the native Position type provides offsets with respect to
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this helpful: JavaScript has a Unicode problem.

Another good resource that @lionel- found that supports this approach: https://jupyter-client.readthedocs.io/en/latest/messaging.html#notes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the Jupyter notes in the comment?

I found this helpful: JavaScript has a Unicode problem.

oh my

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔗 to the Jupyter notes is now added to the comment


// it's surprisingly fiddly to finesse this vscode.EndOfLife enum, which we don't have
// direct access to here
const eolSequenceEnum = EndOfLine.from(editor.document.eol);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I use existing extension host machinery to get usable information out of a vscode.EndOfLine enum.

@jennybc jennybc marked this pull request as ready for review January 31, 2024 00:13
@jennybc jennybc requested review from jmcphers and lionel- January 31, 2024 00:13
src/positron-dts/positron.d.ts Outdated Show resolved Hide resolved
],
"components": {
"schemas": {
"uiTextDocument": {

This comment was marked as resolved.

positron/comms/ui-frontend-openrpc.json Outdated Show resolved Hide resolved
positron/comms/ui-frontend-openrpc.json Outdated Show resolved Hide resolved
// Related to https://github.com/posit-dev/positron/issues/12
type TEC = import('./ui-comm').TextEditorContext;
export type TextEditorContext = TEC;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for that SO link!

// The selections in this text editor. The primary selection is always at index 0.
//
// The gymnastics here are so that we return character positions with respect to
// Unicode code points. Otherwise, the native Position type provides offsets with respect to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the Jupyter notes in the comment?

I found this helpful: JavaScript has a Unicode problem.

oh my

Comment on lines +91 to +96
// The gymnastics here are so that we return character positions with respect to
// Unicode code points. Otherwise, the native Position type provides offsets with respect to
// UTF-16 encoded text. That would be confusing for downstream consumers, who probably
// ultimately receive this text as UTF-8 and want to operate on this text in terms of
// as user-perceivable "characters". This only matters when the selection's neighborhood
// includes Unicode characters in the astral plane.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. IIUC, the Javascript string iterator and the R string utilities are aligned in the way they interpret unicode characters, which seems like a happy outcome.

If we want to do better in the future for other backends or specialised consumers of rstudioapi, we could add a parameter to the document request to require positions and ranges in normalised unicode and whatnots.

@jennybc jennybc force-pushed the more-editor-context branch from c810596 to a9775d8 Compare January 31, 2024 19:34
"type": "string",
"description": "URI of the resource viewed in the editor"
},
"eol": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the only reasonable way to make this a "real" enum would be to use enum values like "windows" and "unix" and then replace them with the physical sequences after/before serialization. Definitely not a high priority change.

@jennybc jennybc merged commit e187d12 into main Jan 31, 2024
1 check passed
@jennybc jennybc deleted the more-editor-context branch January 31, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants