Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Savestate Loading doesn't work in Celeste64, causes SIGSEGV in ThreadManager.cpp #591

Open
jakobhellermann opened this issue Feb 1, 2024 · 3 comments

Comments

@jakobhellermann
Copy link

When I start the game without gdb, save a state and then load it, one of messages gets printed:

[f:70 t:17123M] ERROR (checkpoint/Checkpoint.cpp:702): mprotect(saved_area.addr, saved_area.size, saved_area.prot | PROT_WRITE) == 0 failed in readAnArea with error Cannot allocate memory
free(): corrupted unsorted chunks
recv() returns 0 -> socket closed

or

[f:154 t:17453M] Saved state 1 of size 116621852 in 0.481466 seconds
malloc(): mismatching next->prev_size (unsorted)

or

[f:240 t:17673M] Saved state 1 of size 130795131 in 0.497735 seconds
malloc(): invalid size (unsorted)

or

[f:114 t:18044M] Saved state 2 of size 115862809 in 0.485308 seconds
malloc(): unsorted double linked list corrupted

or just

recv() returns 0 -> socket closed
Got wrong message after state loading

When I start the game in gdb, I get this backtrace

Thread 1 "Celeste64" received signal SIGSEGV, Segmentation fault.
libtas::ThreadManager::getThreadTid () at checkpoint/ThreadManager.cpp:106
106	       return current_thread->tid;
(gdb) br
Breakpoint 1 at 0x7ffff7c47a2e: file checkpoint/ThreadManager.cpp, line 106.
(gdb) bt
#0  libtas::ThreadManager::getThreadTid () at checkpoint/ThreadManager.cpp:106
#1  0x00007ffff7c2a06a in libtas::debuglogfull (lcf=lcf@entry=1028, file=file@entry=0x7ffff7d6741b "checkpoint/Checkpoint.cpp", line=line@entry=672) at logging.cpp:109
#2  0x00007ffff7c447d9 in libtas::reallocateArea (current_area=0x7ffff72c3070, saved_area=0x7ffff72c34d0) at checkpoint/Checkpoint.cpp:672
#3  libtas::readAllAreas () at checkpoint/Checkpoint.cpp:482
#4  libtas::Checkpoint::handler (signum=<optimized out>, info=<optimized out>, ucontext=0x7ffff7300480) at checkpoint/Checkpoint.cpp:332
#5  <signal handler called>
#6  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=31, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7  0x00007ffff76ac8a3 in __pthread_kill_internal (signo=31, threadid=<optimized out>) at pthread_kill.c:78
#8  0x00007ffff765c668 in __GI_raise (sig=31) at ../sysdeps/posix/raise.c:26
#9  0x00007ffff7c47086 in libtas::SaveStateManager::restore (slot=<optimized out>) at checkpoint/SaveStateManager.cpp:379
#10 0x00007ffff7c261df in libtas::receive_messages (hud=..., draw=...) at frame.cpp:822
#11 libtas::frameBoundary(std::function<void ()>, libtas::RenderHUD&) (draw=..., hud=...) at frame.cpp:457
#12 0x00007ffff7c8e9cd in libtas::SDL_GL_SwapWindow (window=<optimized out>) at sdl/sdlwindows.cpp:100
#13 0x00005554d74103ab in ?? ()
#14 0x00000000000003e8 in ?? ()
#15 0x0000555555f90a58 in ?? ()
#16 0xffffffffffffffff in ?? ()
#17 0x00005554d6b2ddf0 in ?? ()
#18 0x00007ffff7e1b5a0 in ?? () from /usr/local/bin/libtas.so
#19 0x00007fffffffc490 in ?? ()
#20 0x00005554d74103ab in ?? ()
#21 0x00007fffffffc530 in ?? ()
#22 0x00007fbfda039f10 in ?? ()
#23 0x00007fbfda0079b0 in ?? ()
#24 0x0000000000000000 in ?? ()

if (current_thread)
return current_thread->tid;

There's a null check, so maybe this is related to the fact that current_thread is a thread_local?

@clementgallet
Copy link
Owner

One of the issues is present when enabling the checkpoint log, because appending the log line to ImGui log window may cause a malloc. However, no memory allocation must be performed during checkpoint code, nothing that can alter the process memory mapping.

But the main issue lies in the following line:

MYASSERT(mprotect(saved_area.addr, saved_area.size, saved_area.prot | PROT_WRITE) == 0)

Knowing that in Celeste64, one of the memory segments has a size of 274 GB! Adding the write flag commits the allocated memory, hence the memory allocation error.

I could fix it already, I'm only calling the mprotect() function on each individual page when I know I will write to it, instead of on the whole memory segment. One issue is that it leads to more fragmented Virtual Memory Areas (as seen in /proc/PID/maps ). This is because, even when two areas share the same protection flags, they are not merged if one has the write flag once and was removed later (there seems to be a recent fix).

This is related to a more general issue that I couldn't really solve. I would really like to be able to see if a memory page is mapped to a physical page or not. There is the present flag in /proc/PID/pagemap, see doc that I'm checking, and is handled by the option Runtime > Savestates > Skip Unmapped Pages . But this flag does not seem to be perfectly robust:

When I implemented the above fix, the 274GB segment got fragmented multiple times, which should mean that some memory pages where committed because they are flagged as present. However, when running the game natively, the huge memory segment is not fragmented, which should mean that it was never written to?

We would have more understanding if we could look at /proc/PID/kpageflags file, to get a flag of the physical memory page, but this requires CAP_SYS_ADMIN (basically root) because of some vulnerabilities.

@jakobhellermann
Copy link
Author

Interesting. After that change, I can now savestate (with skip unmapped pages at least, without that option I get "state invalid because new threads were created" 100% of the time).

When launching with GDB, I still get the ThreadManager::getThreadTid segfault consistently, but without gdb everything works as expected.

@clementgallet
Copy link
Owner

The state invalid because new threads were created is another issue: .NET threadpool is creating and destroying threads depending on tasks given to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants