Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add netpacket interface (request for comments) #15413

Merged
merged 11 commits into from
Jun 29, 2023

Conversation

schellingb
Copy link
Contributor

Description

This PR adds a new libretro interface for a core to send and receive custom network packets. This can be used to implement a communication based multiplayer system instead of using the default state serialization based multiplayer. Connection management is still done by the frontend while a core gains the ability to easily support tunneling of multi-console data communication traffic.

I have implemented this interface in a version of my DOSBox Pure core (in this branch) and tested it in a few configurations. I was able to successfully play DOS and Windows 9x multiplayer games that use IPX networking, Ethernet networking as well as Null-modem serial based communication. I connected a Windows PC, Linux desktop VM, Raspberry Pi and an Android phone and they all could play together which was quite exciting :-)

This all builds on the existing netplay feature of RetroArch. I cannot say I understand much of its inner workings but all my changes have been done very carefully to not change any of the existing behavior UNTIL a core passes this new interface with the new environment callback RETRO_ENVIRONMENT_SET_NETPACKET_INTERFACE. There is no change to UI, serialization or the core info files. The netplay feature acts different if the new environment callback is called or everything happens as it does now.

The new interface is fairly high level:

  • Connection management and client/host handshaking are done by the frontend and are transparent to the core (all using the existing RetroArch netplay protocol)
  • Once a netplay session has been started on a host, or a client has successfully connected to a host, the core gets informed about netplay having started
  • The frontend passes the core a function pointer that can be used to send a network packet containing up to 64kb of data
  • The host can send packets to any client, but a client can only send a packet to the host
  • Any number of packets can be sent and they will arrive reliably in sequential order without packet loss
  • The frontend passes incoming packets to the core
  • The core running on the host is informed about players connecting and leaving. No details like user name or IP addresses are shared with the core, only a numerical "client id"

Before merging

While this is tested and working for me and my use case, ideally (though not required) this would only be merged once there is a use case besides the core I'm developing. The interface and its feature set are not set in stone and more flexibility could be added to make this further usable.

Related Issues

Related Pull Requests

Reviewers

Any feedback welcome!

Especially from other core maintainers interested in this and especially especially from anybody who knows the inner workings of netplay_frontend.c.

Adds a new libretro interface for a core to send and receive custom network packets for implementing a communication based multiplayer system instead of using the default state serialization based multiplayer. Connection management is still done by the frontend while a core gains the ability to easily support tunneling of multi-console data communication traffic.
* Packets sent with this interface arrive at this callback in a reliable
* manner, meaning in the same order they were sent and without packet loss.
*/
typedef void (RETRO_CALLCONV *retro_netpacket_receive_t)(const void* buf, size_t len, uint16_t client_id);
Copy link
Contributor

@JesseTG JesseTG Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you specify requirements for the transport layer somewhere? Cores don't need to care, but frontends will so they can connect to other frontends. "Reliable" could mean TCP, WebSockets, something custom on top of UDP...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ideally retro_netpacket_send_t would have something like a enum retro_netpacket_flag flag argument which could be set to various modes of sending (i.e. reliable, unreliable, unsequenced). But at least in the implementation I can contribute it will be limited to what RetroArch netplay currently can do, which is only reliable as it goes over TCP.

Maybe we add the flag for now for future expansion but only have reliable as an option? Or nothing and have reliable be the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we add the flag for now for future expansion but only have reliable as an option?

I like this idea, but I still think you should explicitly document how the protocol (at the application level) works. You might say "TCP", and that's fine, but what can other frontends expect in the payload? Just the buffer? Some preceding metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the interface stands now it is all transparent to the core. The core sends 10 bytes with retro_netpacket_send_t, the other side will receive that 10 bytes with retro_netpacket_receive_t. By design the core shouldn't care what communication was used to send the packet. It could be pigeon post :-)
This is what I meant with this part of the spec:

* The frontend will take care of connecting players together.
* The core only needs to send the actual data as needed for the
* emulation while handshake and connection management happens in
* the background.

A frontend other than RetroArch could use websockets to transmit the data, or basing it on a library like ENET. In the end it doesn't matter if it's TCP or UDP or something else. This is not meant to connect frontends together. If another frontend wants to support netplay with RetroArch, then it needs to follow the specs of netplay in RetroArch - which are not part of libretro.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new commit c0f85f8 I added a flags argument to retro_netpacket_send_t with the following options:

/* Netpacket flags for retro_netpacket_send_t */
#define RETRO_NETPACKET_UNRELIABLE  0        /* Packet to be sent unreliable, depending on network quality it might not arrive. */
#define RETRO_NETPACKET_RELIABLE    (1 << 0) /* Reliable packets are guaranteed to arrive at the target in the order they were send. */
#define RETRO_NETPACKET_UNSEQUENCED (1 << 1) /* Packet will not be sequenced with other packets and may arrive out of order. Cannot be set on reliable packets. */

Also I extended the comment for retro_netpacket_send_t with this:

 * A frontend must support sending of reliable packets (RETRO_NETPACKET_RELIABLE).
 * Unreliable packets might not be supported by the frontend but the flags can
 * still be specified, reliable transmission will be used instead.

Because NetPlay in RetroArch can only send packets reliable. Other frontends may use communication over something like UDP which would enable the other reliability modes.

* This function is not guaranteed to be thread-safe and must be called during
* retro_run or any of the netpacket callbacks passed with this interface.
*/
typedef void (RETRO_CALLCONV *retro_netpacket_send_t)(const void* buf, size_t len, uint16_t client_id, bool broadcast);
Copy link
Contributor

@JesseTG JesseTG Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of a use case for all clients being able to talk to each other (rather than just the host). This thread on Discord has the details, but in a nutshell:

  • Handheld consoles like the Nintendo DS or Game Boy support local multiplayer via link cables or custom wireless protocols.
  • These protocols usually have extremely tight latency requirements that can't be met over the Internet.
  • The best general solution (currently adopted by at least one Game Boy core) is for all clients to emulate all consoles, and synchronize the results via rollback.
  • Not to be confused with true Internet connectivity, like the Wi-fi services offered by the Wii and DS. The original hardware had to account for Wi-fi latency, so Wi-fi can be emulated without rollback tricks.

Given this, a practical use case for P2P communication would be to send savestates of all consoles to all other participants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be feasible to allow clients talking to clients via this directly.
In RetroArch netplay this always will have to relayed via the host (i.e. CLIENTA -> HOST -> CLIENTB) but perhaps in another frontend it could be a different kind of network setup.
Currently in the implementation I did in my DOS core I have such a packet relaying done inside the core, which is aware if it acts as the host and it does the packet relaying there. But we could move that into the frontend and change the API to allow a client talking to another client. I'll see if that can be done.

As for using this in a tight latency requirements, an easy first try might be to do lockstep without rollback. This new interface has this specified:

* When 2 or more players are connected and this interface has been
* set, time manipulation features (pausing, slow motion, fast forward,
* rewinding, save state loading, etc.) are disabled to not interrupt
* communication.

So when this is set up and cores can talk to each other, the frontend will not mess with timing. Thus a core could take over and only advance emulation when the data over the emulated local wi-fi or link cable has arrived.
Once that works, rollback could be implemented on top of that. Don't lockstep anymore but roll back when wifi/cable data from the past arrives.

As for sending whole savestates over this, the limitation of 64kb of data per packet might be annoying. Though splitting it up into multiple packets wouldn't be too hard. The 64kb limit I chose pretty arbitrarily, we could even remove it and have the frontend realloc the buffer it uses for this once a core wants to send a bigger packet at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new commit c0f85f8 now adds the option for clients to send packets directly to other clients.

This means all peers are equal in regards to what retro_netpacket_send_t can do. Anyone can send a packet to any other peer as well as send broadcasts to everyone connected.

Because in RetroArch netplay clients don't connect to each other (they only connect to the host) it is implemented by having the host relay packets as needed. If the core on client A sends a packet addressed to client B, the netplay host will relay that packet but it won't call retro_netpacket_receive_t locally. Another frontend could have a mesh connection where clients can directly talk to each other, resulting in the core experiencing the same behavior.

@anzz1
Copy link

anzz1 commented Jun 21, 2023

The ability to send/receive custom packets as an idea definitely makes sense and is a good addition ! 👍
The code looks pretty solid to me, however in my opinion the approach needs rethinking.

First of all, intermingling the new interface code along with the old is hacky and results in nonsensical code flow making it complicated and hard to follow. Adding a if (networking_driver_st.core_netpacket_interface) return; in functions like netplay_sync_post_frame() and send_input_frame() makes no sense, as those functions shouldn't be called when using the interface in the first place.
Those are the examples, but similar problems permeate the code as a result of approaching the problem from the wrong angle.
In my opinion, refactoring is necessary.

The way I see it, there's two options to approach this which make sense:

  1. Keep the new interface. In this case, do not intermingle the two and use hacky constructs like returns on top of functions. Instead, refactor the code as so that it's clear that there are now two interfaces. Move each of the "state-sync" & "netpacket" interfaces in their own files, and call their functions from the main netplay code accordingly. Functions which aren't used should not be called.

  2. Lose the new interface. In this case, you could use the pre-existing netplay command interface (i.e. NETPLAY_CMD_PLAYER_CHAT), adding something like NETPLAY_CMD_CUSTOM_PACKET and then following it with the custom data. No need for a new interface or "mode" there, and you could send those custom packets even if the state syncing is enabled. Whether the state is synced or not, could be controlled by a configuration option named as such.

@schellingb
Copy link
Contributor Author

schellingb commented Jun 21, 2023

@anzz1 Thanks for the detailed feedback! I do agree my approach to modifying netplay_frontend.c (which already has 10k lines of code) is somewhat hacky... But if there is no one around that knows the code, it really is hard to add something to it without a high chance of breaking existing stuff. Because I don't know much about the existing netplay implementation I tried to make as few changes as possible. The main argument there is so someone can read through it and state with a certain degree of confidence "Yes, this doesn't break anything existing".

Overall I do think the approach chosen here is close to your 2. suggestion. It does use a new netplay cmd which I named NETPLAY_CMD_NETPACKET to transmit the data. A lot of the existing commands used for connection management and handshaking are still used (ACK, CONNECT, DISCONNECT, NICK, PASSWORD, INFO, PLAY, MODE, PLAYER_CHAT, etc.) while some specific cmds to the state serialize based approach are unused while the new interface is active (REQUEST_SAVESTATE, LOAD_SAVESTATE, NETPLAY_CMD_CRC`, etc.).

I can give it another pass to make code flow a bit more reasonable and make more sense. I also think accessing networking_driver_st in all the checks I added is not very clean. I will bite the bullet and add a bool to struct netplay and use that in my checks instead. Or maybe a enum, something like NETPLAY_CONDUCT_INPUTSERIALIZATION and NETPLAY_CONDUCT_PACKET? There already is something called "mode" and "protocol".

@anzz1
Copy link

anzz1 commented Jun 21, 2023

I definitely understand the decision that you'd rather not go route 1) and agree with the points made.
However, you are also making the arguments why precisely you can't just tack on more code which'd make it more confusing.
It is a problem largely present in libretro cores too, original developers long gone and then new features are added without a proper refactor or hard look how it fits into the bigger picture, many of the cores then ultimately ending up a completely unmaintainable and undecipherable mess.

Like said, if you were to go with the option 2), in my opinion you should drop the "interface" moniker and mindset altogether, which naming it "conduct" is just same thing with different shoes and doesn't address the issue at all. Having two different modes/interfaces/conducts in the same, already large and complex file and set of functions definitely veers the whole thing towards unmaintainability.

To further explain the intention what I meant by 2), I'll try to elaborate with some pointers:

  • remove the mode, the whole idea of 2) is that you can't tack on a mode on top of another mode and finding another way. for keeping the mode, that's what route 1 is for.
  • instead, use a custom command to send custom packets, whenever, as there is no mode
  • for the parts which you need to disable, add new configuration options with descriptive names, something like NETPLAY_SHARE_STATE and NETPLAY_SHARE_INPUTS or something like that. this ultimately achieves same thing as disabling them with the current "mode", but makes the code much more readable and understandable.

Also I am not a maintainer on this project, so don't take this as approval of method 2).
I just made comments which could make method 2) acceptable in my opinion, but even with the proposed changes it's still inherently messier than 1) which the maintainers might not approve and require a proper refactor isntead, which would be understandable.

The idea is good and your code quality looks on point. However, the implementation still needs some thought.
I wouldn't go too hard yet towards 2) until hearing the maintainers' opinion on the approach.

You had an excellent point there with the "Yes, this doesn't break anything existing" , which is a good guiding principle to use. Just add a "Yes, this is just as understandable and maintainable as before" as another guidance, and code which follows those two principles are fit for large projects where you aren't going to be around forever.

@schellingb
Copy link
Contributor Author

For clarity, "interface" and "mode" (or "conduct") are not the same thing.

The "interface" is an API interface between the frontend and the core. Which did not exist before and nothing similar did as before this PR netplay was the frontend doing things without the core even knowing it is being played in netplay.

The "mode" is the two different ways netplay can operate once fully connected. In one mode, the frontend is using save states of cores in a clever way to add online play without the cores knowing. It does so by having multiple devices each getting control of one emulated controller and synchronizing the states between them. All connected devices show the same screen, each device controls a different input channel. This is used to netplay games that originally were played with multiple controllers on one console (one TV). In the new mode I'm adding the startup process is still the same (how devices connect together), but then instead of sharing save states and input device data the core is aware and can share data. In this mode each device still runs independently. This would most likely be used to netplay games that originally were played with multiple linked consoles/computers (multiple TVs/monitors).

until hearing the maintainers' opinion on the approach

As much as I'd love that I think that's impossible though. As far as I know the author of the netplay feature is long long gone and someone who took over also left a while ago. Yes, tacking new stuff onto code which we don't understand isn't great. But I fear it's either that or we're stuck without supporting multiplayer in a core like my DOS core. I will give a fair shot to making things a bit more concise. But I admit I don't understand the whole 10k lines of code of netplay_frontend.c. So if I were asked why send_input_frame() is called from 5 places, or what the use of a function like netplay_sync_pre_frame is I could not give an answer. If I could I'd much more likely be inclined to refactor things properly.

I really appreciate you chiming in though because you are 100% correct about how cores and parts of a big long lasting project like RA can become very hard to maintain. I fully admit I'm being selfish with wanting this feature now primarily for my core (though I think it is far more useful for others than another libretro extension I contributed that is only used by my core... shhh...).

Though honestly I would say "this is just as understandable and maintainable as before". My changes are minimal and I added comments to everything that isn't inherently clear. I'm not convinced having 50 changes that use an enum or variables named after the code comment I wrote makes my part more understandable and maintainable than the 30 changes I currently have. But I'm clearly biased because I wrote the 30 changes and not the 10k lines around it. Can I go and refactor the 10k lines and make them more understandable? Maybe with time...

Sorry for the wall of text but again thanks for your time for writing here!

@schellingb
Copy link
Contributor Author

@anzz1 In the commit b20e3e6 and ed4db42 I did some clarifications to my changes part of this PR. Most notably the addition of this enum with clear description what the 2 different modes of operation are:

enum netplay_modus
{
   /* Netplay operates by having all participants send input data every
      frame and run cores deterministically in sync on all connected devices.
      It will rewind frames when input data from the past arrives. */
   NETPLAY_MODUS_INPUT_FRAME_SYNC = 0,

   /* Netplay operates by having the active core send and receive custom
      packets once connection setup and handshake has been completed.
      Time skips (pausing, fast forward, save state loading) are refused. */
   NETPLAY_MODUS_CORE_PACKET_INTERFACE = 1
};

Some of this is repeating the specs in libretro.h but it seems good to clarify things with this.

The if (x) return; at the top of functions has been removed and code flows more clearly. Also access to the global networking_driver_st has been severely reduced with the new enum being part of the (usually local) netplay variable.

At this point it's much more clear which functionality are shared between the two modes and which parts are exclusive to one of the two. A further clarification could maybe be achieved by splitting up the large file "netplay_frontend.c" into 3 files "netplay_frontend.c" (shared functionality) "netplay_input_frame_sync.c" (functionality exclusive to NETPLAY_MODUS_INPUT_FRAME_SYNC) and "netplay_core_packet_interface.c" (functionality exclusive to NETPLAY_MODUS_CORE_PACKET_INTERFACE). But I think this could be part of a future PR.

I have locally annotated exclusive functions with NETPLAY_ASSERT_MODUS(X) macros which result in an assertion error were these functions used while in the wrong mode. But RetroArch code generally never uses asserts so I'm not sure this should be part of this PR.

- Enable core host to refuse connecting new players to limit the number of connected players
- Enable a core to flush outgoing packets and read incoming packets without waiting for the next frame (can be used for lower latency or blocking reads)
@schellingb
Copy link
Contributor Author

I added some more possibilities to the API for more use cases of this feature in commit 356f31b:

  • Enable core host to refuse connecting new players to limit the number of connected players
  • Enable a core to flush outgoing packets and read incoming packets without waiting for the next frame (can be used for lower latency or blocking reads)

I used these to convert the existing direct socket communication in gambatte-libretro to this new interface: schellingb/gambatte-libretro@7a4bc19

Until now the direct socket communication required 14 core options (server/client mode, server port and 12 options to enter an IP address). Now with this it's all based on RetroArch's netplay feature so the core doesn't need these options or platform dependent code (which is actually broken in its current form (on Windows at least) as reported in libretro/gambatte-libretro#201).

@schellingb
Copy link
Contributor Author

I attempted the same in the new melonDS DS core to emulate local wireless multiplayer. But that target has an even stricter latency requirement than the target of gambatte, so it doesn't work very well. Even running through a local loopback the TCP based netplay socket communication adds too much delay that emulation drop from 60 FPS to 30 FPS while multiplayer communication is active.

In the end, for consoles with very strict low latency communication, the only practical solution seems to be what TGB Dual does. It simulates multiple full devices on all participants and runs netplay via the regular (existing) player input synchronizing modus.

Other potential candidates for this would be PUAE or VICE. These both emulate a serial device which could be used on original hardware to connect two computers together for multiplayer games.

@schellingb
Copy link
Contributor Author

Here's the code of my attempt on melonDS DS: schellingb/melonds-ds@004d3fb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants