Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in nimlangserver when -d:nimOptIter is defined #24094

Open
jmgomez opened this issue Sep 10, 2024 · 16 comments
Open

Segfault in nimlangserver when -d:nimOptIter is defined #24094

jmgomez opened this issue Sep 10, 2024 · 16 comments
Labels
Closures Documentation Content Related to documentation content (not generation). Iterators Threads

Comments

@jmgomez
Copy link
Collaborator

jmgomez commented Sep 10, 2024

Description

Download and build latest https://github.com/nim-lang/langserver
Remove the lock file and the requires from the nimlangserver.nimble so it uses the latest Nim to build.
Run nimble build
Open a nim project with the vscode extension (make sure the built nimlangserver is the one used)

It will crash with the following (you can see the output in the Nim Language Server tab or Nim Lsp if you are running a newer version)

Works fine with 2.0.8 and with 2.0.9

Traceback (most recent call last)
/Volumes/Store/Nim/lib/system/deepcopy.nim(190) genericDeepCopy
/Volumes/Store/Nim/lib/system/deepcopy.nim(132) genericDeepCopyAux
/Volumes/Store/Nim/lib/system/deepcopy.nim(72) genericDeepCopyAux
/Volumes/Store/Nim/lib/system/deepcopy.nim(68) genericDeepCopyAux
/Volumes/Store/Nim/lib/system/deepcopy.nim(153) genericDeepCopyAux
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Nim Version

Nim Compiler Version 2.1.99 [MacOSX: arm64]
Compiled at 2024-09-10
Copyright (c) 2006-2024 by Andreas Rumpf

git hash: baec1955b5453ec71fc12355142dd9a813fa02fb

Current Output

No response

Expected Output

No response

Known Workarounds

No response

Additional Information

No response

@metagn
Copy link
Collaborator

metagn commented Sep 11, 2024

When trying to reproduce I didn't get a stacktrace, only "out of memory".

@jmgomez
Copy link
Collaborator Author

jmgomez commented Sep 11, 2024

hmm In what OS are you testing? Do you get that same issue in version-2.0 too?

There is proc raiseOutOfMem() {.noinline.} = maybe you could add a stacktrace there

@metagn
Copy link
Collaborator

metagn commented Sep 11, 2024

hmm In what OS are you testing? Do you get that same issue in version-2.0 too?

Windows; no; with refc/arc it gives the segfault exit code (as opposed to orc). The above stacktrace should only be reachable in refc from my understanding since usrToCell is not declared otherwise.

I'm out of my depth here, I thought maybe a type graph change or cycle was the cause, I also don't know where a deep copy could be happening

@Araq
Copy link
Member

Araq commented Sep 11, 2024

@ringabout @metagn Please bisect it.

@jmgomez
Copy link
Collaborator Author

jmgomez commented Sep 11, 2024

Windows; no; with refc/arc it gives the segfault exit code (as opposed to orc). The above stacktrace should only be reachable in refc from my understanding since usrToCell is not declared otherwise.

I got the stacktrace in MacOs. Yes, refc is used

@metagn

This comment was marked as outdated.

@metagn
Copy link
Collaborator

metagn commented Sep 11, 2024

Bisected to 05df263 (commit history was messed up the first time), @yglukhov any idea what the problem could be?

@metagn
Copy link
Collaborator

metagn commented Sep 14, 2024

The type that deepcopy is iterating over during the segfault is tyRef. #23787 changes the creation of the environment ref type here. This might not have anything to do with it, but I tried to stress the added condition by mixing proc/iterator closures in the same owner context. I noticed this behavior difference:

proc foo() =
  let x = 123
  iterator bar2(): int {.closure.} =
    yield x
  proc bar() =
    let z = bar2
    for y in z(): # just doing bar2() gives param not in env: x
      echo y
  bar()

foo()

The output of this is now 1 when it used to be 0. Both are invalid, it should be 123. This is the only difference I've found that gives a runtime difference versus failing to compile. There is no problem when a closure iterator calls a closure proc, only when a closure proc calls a closure iterator.

Changing foo to be a closure iterator crashes the compiler with all permutations now, but is different before #23787. This used to give a codegen error, now segfaults when getEnvParam calls getEnvTypeForOwner:

iterator foo(): int {.closure.} =
  let x = 123
  iterator bar2(): int {.closure.} =
    yield x
  proc bar() =
    let z = bar2
    for y in z():
      echo y
  bar()
  yield x

for x in foo(): echo x

And this used to work and give the correct output but now also segfaults:

iterator foo(): int {.closure.} =
  let x = 123
  proc bar() =
    echo x
  iterator bar2(): int {.closure.} =
    bar()
    yield x
  for y in bar2():
    yield y

for x in foo(): echo x

I wouldn't even know where to start to try and minimize the segfault in nimlangserver, but the first closure proc/iterator mixing example seems realistic to encounter in async code. It was broken before too but maybe it zeroing made it more managable.

metagn added a commit to metagn/Nim that referenced this issue Sep 15, 2024
Araq pushed a commit that referenced this issue Sep 16, 2024
… enabled (#24108)

refs #24094, soft reverts #23787

#23787 turned out to cause issues as described in #24094, but the
changes are still positive, so it is now only enabled if compiling with
`-d:nimOptIters`. Unfortunately the changes are really interwoven with
each other so the checks for this switch in the code are a bit messy,
but searching for `nimOptIters` should give the necessary clues to
remove the switch properly later on.

Locally tested that nimlangserver works but others can also check.
@yglukhov
Copy link
Member

The issue is in copy codegen that violates memory, which was present before #23787, but was revealed likely because the memory layout changed.
If i take nim version 051a536 (just before my changes), and compile the following

proc foo() =
  let x = 123
  iterator bar2(): int {.closure.} =
    yield x
  proc bar() =
    let z = bar2
    for y in z(): # just doing bar2() gives param not in env: x
      echo y
  bar()

foo()

with gcc, I'm getting codegen error:

~/.cache/nim/test2_d/@mtest2.nim.c:330:70: error: passing argument 2 of ‘eqcopy___test50_u113’ from incompatible pointer type [-Wincompatible-pointer-types]
  330 |         nimln_(79);     eqcopy___test50_u113(&(*colonenv_).colonup_, colonenv_);

Take a careful look at this line:

 eqcopy___test50_u113(&(*colonenv_).colonup_, colonenv_);

It tries to copy colonenv_ into a field of itself, which is obviously wrong.

Unfortunately I'm not quite familiar with this logic, can anyone provide any input?

@Araq
Copy link
Member

Araq commented Sep 19, 2024

It's not "obviously wrong", closure environments can be self-referencing which is why you need a cycle collector for them. This is even more true for the so called "up" pointers that are so complex that I keep forgetting how they work.

@yglukhov
Copy link
Member

Not pretending to understand, but my thinking was that "ups" always point to the parent closure env and "ups" can never be cyclic. Sure, cycles can form, just not through "ups". Am I wrong here? Also the pointer type mismatch kinda hints that this particular line is fishy at least?

@Araq
Copy link
Member

Araq commented Sep 19, 2024

Also the pointer type mismatch kinda hints that this particular line is fishy at least?

Sure, yes.

@metagn metagn changed the title SIGSEGV: Illegal storage access deep copy Segfault in nimlangserver when -d:nimOptIter is defined Sep 30, 2024
@metagn metagn mentioned this issue Oct 10, 2024
11 tasks
yglukhov added a commit to yglukhov/Nim that referenced this issue Oct 16, 2024
yglukhov added a commit to yglukhov/Nim that referenced this issue Oct 16, 2024
@yglukhov
Copy link
Member

I've added a fix (#24316) for all the cases suggested by @metagn. @jmgomez Could you please verify? @metagn just to double check, were you able to reproduce the originally reported stack trace?

yglukhov added a commit to yglukhov/Nim that referenced this issue Oct 16, 2024
@metagn
Copy link
Collaborator

metagn commented Oct 16, 2024

Langserver seems to have circumvented the original issue in the master branch. Unfortunately, for the commit nim-lang/langserver@8839357 which I was testing on, #24316 does not fix it in my reproduction, the minimizations above were probably not correct.

Running nimble --nim:... test in langserver also reproduces the issue in tsuggestapi, VS Code isn't needed.

It crashes right before this call which was removed in this commit. The full log from langserver in VS Code (before restarts) is:

DBG Starting nimlangserver                     params="(clientProcessId: none(int), transport: some(stdio), port: 0)"
DBG Starting stdio server                     
DBG [Processsing Message]                      request="\"initialize\""
DBG Initialize received...                    
DBG Registering monitor for process            pid=25960
DBG Initialize completed. Trying to start nimsuggest instances
DBG [Processsing Message]                      request="\"initialized\""
DBG Client initialized.                       
DBG Requesting configuration from the client  
DBG [Processsing Message]                      request="\"textDocument/didOpen\""
DBG New document opened for URI:               uri=file:///test.nim
DBG Auto-guessing project file for             file="test.nim"
DBG getProjectFile                             project="test.nim" fileUri="test.nim"
DBG Document associated with the following projectFile uri=file:///test.nim projectFile="test.nim"
DBG Will create nimsuggest for this file       uri=file:///test.nim
DBG ShowMessage                                message="Using Nim Compiler Version 2.2.0 [Windows: amd64] from nim-2.2.0/bin"
INF Starting nimsuggest                        root="test.nim" timeout=120000 path="nim-2.2.0/bin/nimsuggest" workingDir="..."
DBG Parsing nimsuggest capability              capability=con
DBG Parsing nimsuggest capability              capability=exceptionInlayHints
DBG Parsing nimsuggest capability              capability=unknownFile
DBG Nimsuggest Capabilities                    capabilities="{con, exceptionInlayHints, unknownFile}"
out of memory
[Error - 3:52:39 PM] Server process exited with code 1.
[Info  - 3:52:39 PM] Connection to server got closed. Server will restart.
true

It seems like neither thread created above it actually starts, and not creating the threads does not cause the issue. Just creating a thread that takes a tuple[] argument and does nothing also causes the crash.

@yglukhov
Copy link
Member

Ok, finally managed to reproduce, thanks @metagn.

Now, TLDR. This issue should be fixed not by reverting my pr, but in any other suitable way!))

Long description:
There are a few problems I'm seeing here.

  1. Nim's threading api requires the thread object to exist while the thread is running. This was indeed the case before my optimization, but now - not so, because the thread object disappears right before the await after createThread. To put it bluntly, the threading API has been misused. Whether this api is good is a question for another issue.
  2. Nim performs the deepCopy of thread arguments inside the thread, which is wrong, because the arguments can mutate by this time. Instead the deepCopy should be performed before the thread is started. In this particular case this issue is not as relevant, but again, a nice question for another issue.

The statements above can be easily backed by one of the following experiments:

  1. Mark thread objects with {.liftLocals.} pragma, which puts them into the async function env. Note that this pragma should not be used in production, as it is not documented, and it will most likely be removed in the future, but it's perfect for the demo)
  2. Insert a sleep(100) after all the createThreads. It gives the threads enough time to perform the aforementioned deepCopy.

Finally I suggest closing this issue for the reason of misused threading api.

@metagn
Copy link
Collaborator

metagn commented Oct 17, 2024

Sounds reasonable, in the worst case maybe a changelog entry for the new behavior and disclaimer.

Nim's threading api requires the thread object to exist while the thread is running.

It uses a potentially escaping address for this. If we don't change this it should at least still be documented in createThread, something like "the address of the thread object should be available throughout the thread's lifetime".

We can track this in this issue too since this is the root problem and the iterator changes were only tangential, but closing it is fine too.

@metagn metagn added the Threads label Oct 17, 2024
Araq added a commit that referenced this issue Oct 18, 2024
…24316)

The first commit reverts the revert of #23787.
The second fixes lambdalifting in convolutedly nested
closures/closureiters. This is considered to be the reason of #24094,
though I can't tell for sure, as I was not able to reproduce #24094 for
complicated but irrelevant reasons. Therefore I ask @jmgomez, @metagn or
anyone who could reproduce it to try it again with this PR.

I would suggest this PR to not be squashed if possible, as the history
is already messy enough.

Some theory behind the lambdalifting fix:
- A closureiter that captures anything outside its body will always have
`:up` in its env. This property is now used as a trigger to lift any
proc that captures such a closureiter.
- Instantiating a closureiter involves filling out its `:up`, which was
previously done incorrectly. The fixed algorithm is to use "current" env
if it is the owner of the iter declaration, or traverse through `:up`s
of env param until the common ancestor is found.

---------

Co-authored-by: Andreas Rumpf <[email protected]>
@metagn metagn added the Documentation Content Related to documentation content (not generation). label Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closures Documentation Content Related to documentation content (not generation). Iterators Threads
Projects
None yet
Development

No branches or pull requests

4 participants