Piero V.

WebRTC on the server side with Pion and FFI

WebSockets and WebRTC

At the beginning of the year, I played a little bit with WebSockets and Rust. I tried to create a backend for a browser game implemented with Three.js.

However, things did not work very well. A friend and I did some tests, but his connection was underperforming, which is actually an advantage for testing. Its delays were very high, packets kept accumulating and being applied all at once. Moreover, I had implemented everything with unbounded queues. Therefore, the server program running on my PC occupied several GB of RAM at a certain point. Good thing we found all these problems soon.

The first suggestion you find when you start programming multiplayer games is that you should use UDP if you can. So, I finally decided to try WebRTC.

I soon discovered that WebSocket and WebRTC are completely different. The former is a protocol for full-duplex communication, usually between a browser and a webserver. The latter is a series of APIs for real-time communications, and spans from media capture, streams, screen sharing to data channels, which is the capability I was interested in.

RTCPeerConnection is at the heart of all these features. As its name states, it is a peer-to-peer connection built with ICE, STUN, TURN, and/or other protocols. Notice that WebRTC does not need to be on UDP: it can fallback to TCP, at least from one peer to the TURN server. Anyway, even if you use it on a server, with a public IP address, you will have to create a connection in this way, and there is no way to use a fixed port number.

As a consequence, WebRTC alone is not enough. First, it needs an ICE server (otherwise, only local communication will be available), but Google’s or other public ones are okay. Then it requires an additional channel for signaling: one of the peers creates an offer and sends it to the other one using the signaling channel, which is then used by the other peer to send back an answer. After this exchange, the rest of the communications can be carried on with WebRTC.

HTTP is a great candidate for signaling, and probably you are already using it if you are considering WebRTC. Still, it involves additional code and dependencies.

Go

At that point, I was working in Rust. Hence, the obvious choice would have been WebRTC.rs. However, I was a bit tired of Arc, mutexes, etc… And my experience with hyper had not been great, either. Thus, I went directly to the origin: Pion, a Go WebRTC implementation.

I had never used Go before. I only knew it was a bit different from other languages. I knew that parallel coroutines were first-class citizens and that it was very appreciated for networking applications.

After trying it, I have also understood why. Writing Go felt to me almost as fast as writing JavaScript… with the only exception that Go is actually statically typed and compiled into native binaries. It is higher-level than other compiled languages, thus possibly slower (one reason above all is that Go is garbage collected). I have not looked for benchmarks, though. However, I expect the language’s performance to be the least concern, with all the times and latencies networking introduces.

Another Go’s advantage is that the resulting binary can be either an executable or a library (both static and dynamic). And any language with an FFI with C can interact with the latter! My only complaint about this feature is that it abuses the comments’ syntax.

Finally, Go has an HTTP server in its standard library. And it is trivial to use, thanks to its sensible API.

The grand plan

I tried to keep my plan very easy. I thought of exposing only 4 functions with FFI.

  1. WebrtcServerRun: a blocking runner. It really does everything. Every other function or callback sends data to it using thread-safe channels, so I could avoid additional synchronization mechanisms. (And FFI cannot access Go’s channels directly).
  2. WebrtcSend: send a message to one peer. It accepts a binary buffer and its size. First, it copies it to a buffer managed by Go, then it pushes it to the channel so the caller can immediately free the memory it used.
  3. WebrtcBroadcast: broadcast a message to all peers. Calling WebrtcSend for each peer could have worked as well, but it would have involved multiple copies of the same message.
  4. WebrtcServerStop: stop the runner.

The communication is bidirectional. I decided to use function pointers to handle the other way. The runner needs 3 callbacks:

  1. a hello function, called when a peer connects;
  2. a message function, called when a peer sends a message;
  3. a bye function, called when a peer disconnects.

All these functions block Go, so implementations could just push the events to other queues managed by the main program. In case, be sure to copy the received data in the message callback: Go retains its ownership, so any buffers may have been deleted/reused when a message is processed.

Peers need a way to be identified across all these operations. I opted for random UUIDs (as uint8_t[16] for FFI), but you may want to do something else, especially if you authenticate your users first.

The HTTP server

My webserver handles only one resource: /, and it accepts only two methods: OPTIONS and POST. The former is needed only for CORS (I used localhost with two different ports for developments), and the latter is for the real WebRTC signaling.

My code should look quite standard: it goes through the needed phases, and sends the data required to initialize the communication, when available, or some HTTP error, in case any happens.

I also setup the event callbacks on the HTTP response handler. In particular, the runner is informed of the new peer only when it opens the correct data channel. The peer has 30 seconds to open it from our answer, or we disconnect it. I have implemented this mechanism with a select between a timeout and a channel in a Goroutine.

One of the part that may be especially improved is the gathering of local candidates. But of course, probably all the code can be improved since it was my first Go program 😁️.

The channels

The runner is an infinite loop with a select that goes through a few channels. A pair is used for messages from the server to the clients: one for direct messages and one for broadcasts. The relay mechanism uses a list of peers, and it is updated when one connects or disconnects with another couple of channels. Finally, a channel is used to stop the machinery.

Currently, they are unbuffered, which means that they are synchronous! This may become a problem, but I have not encountered any issue in my (shallow) tests. Making channels buffered is super easy, if needed, but might require some testing to find the correct size.

Events generated from a client (connection/disconnection/messages) are handled directly on Pion’s callback: C functions need to manage any possible race. The implementation must support multiple producers because Goroutines can spawn across several system threads.

Caveats of Cgo

I started from Pion’s c-data-channel example to write my code. It has been illuminating: I learned Go cannot call C function pointers directly, so I adopted the same trick of creating inline functions to call them in the C code compiled by Go.

Actually, my original idea was to declare some extern callbacks, create a static library and let the linker do its job, not using function pointers. Sadly, this is impossible because Go tries to link the code also when creating static libraries. It is a known issue/request. A workaround is to mess with the linker’s flags. I think function pointers are cleaner, at this point.

At the moment, many channels are stored in global variables: C code cannot store Go pointers because of the garbage collector. Using Handles may work, but I have found them only at the moment of writing this article.

Another caveat is that Go’s slices paired with the unsafe package are wrappers around the original buffers, rather than copies! The result is better performances, but it also means that C may delete them before they are used. This could lead to segmentation faults, uses after free, or delivery of completely wrong messages because the memory area has been reused. Easy to happen, much harder to debug 😖️.

Finally, Cgo does not care about const-correctness. Pointer arguments will be taken as mutable, even if you just read the data without writing. But symbol names are not mangled, so you could create your own function signatures, in which you pretend arguments to be const, instead of using the header generated by Go. (Provided, of course, you maintain const-correctness in your Go code).

Show me the code!

Sure! You can download it here. While I started by reading Pion’s example, I think I have only copied the trick of the middle function to call pointers. Anyway, they are released under the MIT license, which should be liberal enough for everyone, and releasing my code under the same license is okay for me. We can talk if you need something else.

You can compile it with these steps:

go mod init webrtcserver # Or whatever you prefer
go mod tidy # Create a go.mod and go.sum with the dependencies information
go build -o libwebrtc.a -buildmode=c-archive webrtc.go # Instead of c-archive you might prefer c-shared
strip libwebrtc.a # Optional: on my Linux x86_64 libwebrtc.a was initially 31.5MB, and 9MB after strip