Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mednafen improvements #667

Merged
merged 3 commits into from
Feb 4, 2018
Merged

Conversation

braindx
Copy link
Contributor

@braindx braindx commented Jan 24, 2018

Posting this PR for consideration as discussed in #566

Renderer will now function correctly when an emulator core does not write its final image to the top-left of the frame buffer, as is the case with PSX in Mednafen. As a result, I've removed the "wideScreen" flag as I believe that was hack to try and correct this behavior.

Cores can now implement double buffering and return YES from isDoubleBuffered to have the renderer take advantage of that fact and render in parallel to the emulation when needed. Mednafen now has double buffering implemented.

Replaced the mednafenCoreTiming hack with a call to setGameSpeed each frame as this will ensure that gameInterval stays in sync and it seems to fix various speed issues I was getting with PSX emulation.

Added some NULL checks to fix crashes I ran into in cases where a Mednafen ROM failed to load.

… the lock on it can be released while the data is uploaded to the GPU.
…rite its final image to the top-left of the frame buffer. As a result, I've removed the "wideScreen" flag. Cores can now implement double buffering and return YES from isDoubleBuffered to have the renderer take advantage of that fact and render in parallel to the emulation when needed. Mednafen now has double buffering implemented. I also replaced the mednafenCoreTiming hack with a call to setGameSpeed each frame as this will ensure that gameInterval stays in sync and it seems to fix various speed issues I was getting with PSX emulation.
@jasarien
Copy link
Member

This looks like good work. I’ll try to test it out asap

@JoeMatt JoeMatt mentioned this pull request Jan 27, 2018
@JoeMatt
Copy link
Member

JoeMatt commented Jan 27, 2018

Awesome work. Tested 32x and PSX (Crash Bandicoot Racing) with CRT filter on and both running full 60fps. These are the toughest cores we have. Looking forward to adding some more difficult cores now in the future.

@JoeMatt
Copy link
Member

JoeMatt commented Jan 27, 2018

Only thing I do notice is some horizontal taring in motion scenes. Does the double buffer need a v-sync?

@sevdestruct
Copy link
Collaborator

sevdestruct commented Jan 27, 2018

Looking forward to adding some more difficult cores now in the future.

Thinking to continue your progress on PPSSPP core integration, @JoeMatt ?? 😉

@braindx
Copy link
Contributor Author

braindx commented Jan 27, 2018

As I noticed shortly after creating this PR and as @JoeMatt noted, there are some issues with this. Let me outline the problem I'm solving first and then I'll speak to this PR and what I'm doing about it.

The way Provenance works (before this change), the emulator core locks a mutex while it's emulating a frame and then unlocks it for a few instructions while it computes the FPS, updates input, and (if it has time) sleeps the thread while it waits for the next frame. The renderer (which runs in parallel on another thread), locks that same mutex while it copies the framebuffer that the emulator outputs to the OpenGL driver and issues the rendering commands to draw the frame. Issuing the rendering commands takes basically no CPU time (whether you're using the CRT shader or not), but copying the framebuffer to the driver/GPU does take a few milliseconds and these are milliseconds that the CPU could theoretically be using to already start emulating the next frame rather than sitting there and waiting for the data to copy. Eliminating that stall would potentially allow more computationally expensive cores to run a little faster since they can use that extra time.

I believe the reason we see the performance drops right now with the CRT shader and the PSX core is a compound effect made up of 3 factors. First, the PSX takes more CPU time to emulate and, as a result, it can't afford to spend much time (if any depending on the frame) sleeping the thread while the renderer locks it and uploads the data. Second, (although I haven't checked it) I believe the framebuffer may be larger for the PSX core at times and so takes even longer to copy than other cores. Third, the GPU time spent in the CRT shader plus the time to upload the data exceeds 16.6ms and so, as the rendering falls below 60fps, the render thread isn't always ready to lock the mutex and upload the framebuffer as soon as the emulation core finishes a frame. This delays when the emulator core can begin the next frame.

Ok, so with that out of the way, let me discuss this PR specifically. Primarily, this PR as-is almost completely decouples the rendering thread from the emulator core's thread when the core supports double-buffering (Mednafen only right now). It does this by only locking the mutex for the time it takes to get a pointer to the emulator's frontbuffer (the framebuffer it's not currently writing to). This results in two things, one good and one bad. The good is that the emulation core is free to run faster since the render thread doesn't block it for more than a few instructions. The bad is that if the rendering falls below the emulator's frame rate then it's possible for the emulator to finish the first frame and begin writing a second frame to that framebuffer while it's being copied to the GPU. This is what @JoeMatt noted when he mentioned tearing. It also means that the FPS displayed in the corner no longer represents the true FPS, only the number of frames per second being emulated on the CPU.

I'm working on an update to this PR to do a couple things. One is to further optimize the CRT shader so it can hopefully run at 60fps on all devices. The other is to have the renderer no longer lock that main mutex at all, but instead just lock a mutex for the frontbuffer while it's copying it, which would prevent the tearing/corruption. To prevent stalling the emulation thread, it will also have the emulator avoid swapping the front and back buffers if it can't afford to wait (i.e. if the thread doesn't have any time before it needs to begin the next emulation cycle and the renderer currently has the frontbuffer locked). This means that the renderer might miss/drop a frame here and there but it wouldn't chug the emulation core which could stutter the audio as well. I think this is probably the best solution but I'm open to alternatives and discussion!

…ramerate for double-buffered cores. It will attempt to keep the renderer in sync with the emulation, but if the renderer is too slow the emulation will continue and the renderer might end up rendering frames that are only partially written. Most often this will result in tearing. However, as long as the shaders used in the renderer are performant enough, it will stay in sync. The FPS display will now show one number as long as the threads are synced but will display "RendererFPS (EmulationFPS)" as two numbers side by side when they diverge. Optimized CRT shader to run about 33% faster. Fixed a crash when Mednafen gets called to create an automatic save but no game is loaded.
@braindx
Copy link
Contributor Author

braindx commented Jan 28, 2018

Ok, I've just updated this PR with the changes I mentioned above and a few more fixes. Way better now. Try it out and let me know how it goes and definitely let me know if you have any questions!

@JoeMatt
Copy link
Member

JoeMatt commented Jan 28, 2018

OpenEMU at some point in the last year or so changed their rendering model. I've been looking at it for my N64 port, which is completely broken at this point. They have a 2 mutex model when using cores that render on their own thread. Maybe check out their work in OEOpenGL{2,3}GameRenderer.m
I tried to replicate this in my N64 core since it creates and renders to it's own context in a separate thread and I just needed to copy the texture from one context to the other. Somewhere along the way merging with our main the whole thing broke though, plus I more or less have very little idea what Im' doing in GLES.

@JoeMatt
Copy link
Member

JoeMatt commented Jan 28, 2018

Looks and plays incredible for my on ATV4. Tested all the cores, everything lines up great now across all sizes. Shader makes it feel way more like the real thing.

@braindx
Copy link
Contributor Author

braindx commented Jan 29, 2018

Awesome! Glad that's working!

I took a brief look at that OpenEMU source. Some of what they're doing doesn't apply directly to iOS since they use OpenGL extensions only available on Mac to speed up the framebuffer upload, etc. In terms of the synchronization, I also used two mutexes (one NSLock and one NSCondition) which allows the GPU to wait for the emulation core to finish rendering a frame while at the same allowing the emulation core to skip waiting for the renderer in order to maintain emulation speed, which I think is the ideal compromise. Of course, as long as the renderer doesn't use a shader that's too expensive it shouldn't matter.

Would love to have a working N64 core! If you get it to a point where you just need to get the OpenGL stuff sorted out I'm happy to pull it down and see if I can get that working.

@JoeMatt
Copy link
Member

JoeMatt commented Jan 29, 2018

@braindx Cool yeah after looking at your code see it's the same but different from them. I have a mupen n64 branch that compiles and loads roms. it used to play but stutter and the rendering was kind of glitchy and would eventually crash. I tried copying OpenEMUs multi-threaded double OpenGL context shared texture module for cores that self render using OpenGL in their own context. It kind of worked, but now it conflicts with the updates and I either get deadlocks, or if I remove my locks, I get OpenGL ES errors on the first frame. I had no idea what I was doing and was happy just to get an image on screen a while back. You can see my branch here,

https://github.com/JoeMatt/Provenance/tree/mupen

It's up to date with Master, which is what broke my already non perfect implimentation.
There's also a new graphics plugin called GlideN64 to replace the older Rice plugin that I've been following. I believe it has an ARM port as well for Raspberry Pi,
http://gliden64.blogspot.com

Haven't had a chance to import that in, figure that would follow just getting it to run correctly.

@braindx
Copy link
Contributor Author

braindx commented Jan 29, 2018

Oh, cool! I'll try to pull that down and take a look when I find some time.

@braindx
Copy link
Contributor Author

braindx commented Feb 1, 2018

Hey @JoeMatt, I tried to build your mupen branch but I'm having some issues loading the framework for the video plugin dynamically. I'm not super familiar with Xcode (I normally develop on Windows), so any help/guidance would be appreciated. We can take this conversation offline though. My email address is public on my Github profile if you want to contact me directly.

@JoeMatt
Copy link
Member

JoeMatt commented Feb 1, 2018

it'll load in the simulator. It used to load on device too but I noticed at some point that broke. DYLD used to be completely forbidden in iOS but when they added framework support (iOS 7?) it was added, but only for signed code. Off the top of my head, I'm going to assume maybe the framework isn't getting signed so that's why device won't load it but sim will, since sim doesn't check code signing. I could be completely off base here, but it kind of makes sense to me in my head without actually looking at it.

@JoeMatt
Copy link
Member

JoeMatt commented Feb 1, 2018

If you don't want to futz with getting it to code sign and load in device I'd suggest just running off of sim for now. Worst case scenario, I can hack the lib to be static and modify mupen that way, since we don't even need the ability for users to select different plugins since we'll only support one for each of audio, input, gfx.

@braindx
Copy link
Contributor Author

braindx commented Feb 2, 2018

I have it running on the simulator but need to do some OpenGL debugging which I can only do on-device. Have you been able to get the dlopen call to succeed on the AppleTV? If so, what are the steps? I tried setting up code-signing for the plugin but it didn't seem to do the trick. Is there an email I can hit you up at or a Slack where we could have this conversation in a more appropriate forum?

@braindx
Copy link
Contributor Author

braindx commented Feb 2, 2018

I've been able to get dlopen working on my AppleTV by deriving the path from [NSBundle mainBundle] instead of coreBundle. Not sure if that's correct, but it's allowed me to proceed. I'm now looking into glsl compile errors generated by the mupen rice plugin when it generates shaders at runtime. I believe the iOS GLSL compiler is stricter than others.

@braindx
Copy link
Contributor Author

braindx commented Feb 3, 2018

Ok, I managed to get Mupen rendering and it renders to a double-buffered texture so it should still work with the CRT shader and stuff like that. I've pushed it to the mupen-rendering branch on my GitHub fork so you can check it out. Still work needing to be done for input and whatnot. Let me know if you want me to take a crack at that or if you want to take that on. Unfortunately, the mupen-rendering branch got mixed with this pull request, so sorry about not keeping that separate but I think we'll want both anyway.

@braindx
Copy link
Contributor Author

braindx commented Feb 3, 2018

I've updated my mupen-rendering branch again, simplifying the rendering pipeline even further and fixing some rendering bugs. I finally realized you had changed the screen update timing config in Mupen from the way it was set up in OpenEmu, but changing it back seemed necessary to get it rendering correctly when it's double-buffered.

One thing to note: it doesn't seem to work in release unless I turn dynarec off (switch it to the pure interpreter) and wrap the #include "interpreter.def" line with #pragma clang optimize off/on, so something is going on with the codegen. Not sure if you guys have any experience dealing with stuff like that.

@jasarien
Copy link
Member

jasarien commented Feb 3, 2018

I could be wrong, but if the dynarec has anything to do with JIT, it’s unlikley to work on a non jailbroken device as JIT is disabled for apps in iOS.

@XeresRazor
Copy link

XeresRazor commented Feb 3, 2018 via email

@braindx
Copy link
Contributor Author

braindx commented Feb 3, 2018

Ah, right. Duh. Then the only issue is the one of compiler optimization messing up the interpreter somehow. On the audio side, sound is a bit warmly and input isn't hooked up yet but I haven't looked into either yet.

Side note: I don't believe you can add the dynamic-codesigning entitlement because Xcode won't allow you to build with that unless that entitlement is also present in your provisioning profile. As far as I know, there's no way to add it to that.

@braindx
Copy link
Contributor Author

braindx commented Feb 3, 2018

Ok, just pushed fixes for the audio and input to my mupen-rendering branch. I think it just needs a little bit of polish for resetting the emulator, switching roms, haven't checked savestates, etc. but it's pretty functional. Should I think about opening a pull request when I get it there?

@jasarien
Copy link
Member

jasarien commented Feb 4, 2018

@braindx did you mention that the mupen stuff was added to this PR? I don’t see any related commits to the mupen stuff only the double buffering

@braindx
Copy link
Contributor Author

braindx commented Feb 4, 2018

No, no, the other way around. This PR is just the double-buffering stuff, but my mupen-rendering branch has the mupen core and this PR in it.

@jasarien
Copy link
Member

jasarien commented Feb 4, 2018

Ah, I see, that’s fine.

@jasarien jasarien merged commit bbbafbc into Provenance-Emu:master Feb 4, 2018
@JoeMatt
Copy link
Member

JoeMatt commented Feb 5, 2018

Wow that's pretty awesome @braindx. I've been working on this for over a year now and have been begging someone with some OpenGL ES knowledge to figure out the remaining issues. This is pretty awesome stuff. If it's working and runnable we can work together on any remaining issues. I'm taking a look right now but couldn't wait to comment when I saw the notifications of progress. This discussion should probably continue in the existing ticket though, #17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants