Sponsored By

In-depth: Writing your own WebSocket server

In this reprinted <a href="http://altdevblogaday.com/">#altdevblogaday</a> in-depth piece, SCEA's senior developer support engineer John McCutchan explains how WebSocket works, how to implement your own server, and lessons he's learned from the protocol.

January 25, 2012

10 Min Read
Game Developer logo in a gray background | Game Developer

Author: by John McCutchan

[In this reprinted #altdevblogaday in-depth piece, SCEA's senior developer support engineer John McCutchan explains how WebSocket works, how to implement your own server, and lessons he's learned from the protocol.] The WebSocket protocol has applications beyond plain vanilla web development. I will explain how the protocol works, how to implement your own server and share some insights I had along the way. Before we get down and dirty, I will explain what I've been doing with it. At this point I expect many of you are saying "I'm not working on a web game; this doesn't seem relevant to me." Well, neither am I. I embed a WebSocket server into my game engine and with a local web application use the WebSocket protocol as a medium to control, configure, and monitor my game engine. Some concrete examples of what I've done so far:

  • Monitor memory allocation statistics

  • Monitor performance of subsystems

  • Set and query configuration variables

  • Live edit the world

  • Loaded asset preview

Soon, I will attempt to stream the display of the game to the web browser and stream mouse, as well as the keyboard data back to the game. In other words, remote desktop for the game engine. Another use case I would like to investigate is writing unit tests for the engine in Javascript and driving it from the web browser. The possibilities are endless. Another benefit of this approach is that your development UI is platform-independent. This is nice when you are developing a title against many architectures. For example, consoles where the target does not usually have keyboard or mouse input are harder to interact with -- by moving your UI, to the web browser this problem is avoided. Hang in there, this article is about the plumbing. My next article will be about the above applications. WebSocket is a communication protocol that allows for bi-directional text and binary message passing. The client is a Javascript application running inside a web browser and, typically, the server is a web server. I say 'typically' because that is not how I use it. WebSocket was developed because sending data between the web server and web application over HTTP was inefficient. The WebSocket protocol is very bandwidth efficient (message framing is at most 14 bytes) and the payloads are custom to the application. A WebSocket connection begins life as a regular HTTP connection. The connection is upgraded from HTTP to WebSocket. This upgrade is one way- you can't revert back to an HTTP connection. You can read more about WebSocket at WikiPedia and the complete specification is available. The WebSocket protocol was only recently finalized in December 2011. Okay, let's dive into how WebSocket works. I will cover creating a connection, sending and receiving messages, and responding to pings. Connecting Creating a WebSocket connection is initiated by the client sending the following upgrade request:

       GET /servicename HTTP/1.1
       Host: server.example.com
       Upgrade: websocket
       Connection: Upgrade
       Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
       Origin: http://example.com

The server responds with:

       HTTP/1.1 101 Switching Protocols
       Upgrade: websocket
       Connection: Upgrade
       Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

Most of these HTTP fields are self explanatory but not Sec-WebSocket-Key and Sec-WebSocket-Accept. Sec-WebSocket-Key is a string sent by the client as a challenge to the server. This leads to the question --- how does the server calculate the value of Sec-WebSocket-Accept and complete the challenge? It is quite simple. The server first takes Sec-WebSocket-Key and concatenates it with a GUID string from the WebSocket specification. Then the SHA-1 hash of the resulting string is computed and, finally, Sec-WebSocket-Accept is the base64 encoding of the hash value. Let's work through an example:

  SpecifcationGUID = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
  FullWebSocketKey = concatenate(Sec-WebSocket-Key, SpecifcationGUID);
    // dGhlIHNhbXBsZSBub25jZQ==258EAFA5-E914-47DA-95CA-C5AB0DC85B11
  KeyHash = SHA-1(FullWebSocketKey);
    // 0xb3 0x7a 0x4f 0x2c 0xc0 0x62 0x4f 0x16 0x90 0xf6 0x46 0x06 0xcf
    // 0x38 0x59 0x45 0xb2 0xbe 0xc4 0xea
  Sec-Websocket-Accept = base64(KeyHash);
    // s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

Transmission WebSocket is a message based protocol. Each message begins with a header defining the length of the message, the type (text, binary or control) and other meta-data. The payload immediately follows the header. All incoming messages will include a 32-bit mask, which must be applied to the entire payload with a XOR operation. Each message will have a different mask. The masking is used to guard against simple snooping. The header begins with a 16-bit mask (blue) and up to 12-bytes of optional header (orange). The header mask indicates whether this is the final fragment of a message (messages can be split into fragments), the op-code, and whether a mask is present. The payload length field plays double duty. For small messages (less than 125 bytes) it is the length of the message, but for messages that are longer, the payload length is used as a flag to indicate how large the extended payload length field is. The extended payload length follows immediately after the first 16-bits of the header (it comes before the mask). When payload length is equal to 126, the extended payload length is 16-bits and when it is equal to 127 the extended payload length is 64-bits. WebSocket op-codes are split into three categories: continuation, non-control and control. Continuation and non-control op-codes indicate user messages and control frames are used to configure the protocol itself. Presently the following op-codes are defined:

Op-code

Meaning

0×0

Message continuation [continuation]

0×1

Text message [non-control]

0×2

Binary message [non-control]

0×8

Connection Close [control]

0×9

Ping [control]

0xA

Pong [control]

Once you have parsed the header, extracting the payload is trivial. Do not forget to XOR in the mask. Parsing the header is made interesting by the fact that its size and layout is variable and thus cannot be mapped directly to a C structure. Or can it?

struct WebSocketMessageHeader {
  union {
    struct {
      unsigned int OP_CODE : 4;
      unsigned int RSV1 : 1;
      unsigned int RSV2 : 1;
      unsigned int RSV3 : 1;
      unsigned int FIN : 1;
      unsigned int PAYLOAD : 7;
      unsigned int MASK : 1;
    } bits;
    uint16_t short_header;
  };

  size_t GetMessageLength() const;
  size_t GetPayloadOffset() const;
  size_t GetPayloadLength() const;
  uint32_t GetMask() const;
  uint8_t GetOpCode() const;
  bool IsFinal() const;
  bool IsMasked() const;
  // …
};

A WebSocketMessageHeader will always be at least 16-bits long, so the only data element defined inside the struct is short_header. Accessing the mask, extended payload lengths, or the payload is done with an offset from &short_header. When I want to parse a header, I simply do this: WebSocketMessageHeader* header = &incoming_buffer[read_index]; I found this to be a very clean approach and is generally useful when dealing with structures that do not have a fixed length or layout. Messages can be split into multiple fragments. When this happens the FINAL-FRAGMENT bit will be zero until the final fragment of the message. The first fragment will have the op-code indicating either a text (0×1) or binary (0×2) message and the rest of the fragments will have the op-code of continuation (0×0). Ping pong The protocol supports ping (0×9) and pong (0xA) messages. When a ping message has a payload, the resulting pong message must have an identical payload. You are only required to pong the most recent ping if more than one arrive. Server design Finally, I want to describe the high level design of my WebSocket server. My server uses three buffers. One buffer for incoming WebSocket data, one for outgoing WebSocket data and one to store fully parsed incoming messages. An outline of the API:

class WebSocketServer {
public:
  WebSocketServer();
  int AcceptConnection(TcpListener* listener);
  int CloseConnection();
  void Update();

  int SendTextMessage(const char* msg);
  int SendTextMessage(const char* msg, size_t msg_length);

  uint64_t PendingMessageCount() const;
  void ProcessMessages(OnMessageDelegate del, void* userdata);
  void ClearMessages();
};

NOTE: I've trimmed a bunch of trivial methods from the outline and only left the ones worth discussing. Connection handling It is important to decouple listening for a connection over TCP from the WebSocket server itself. Each instance of WebSocketServer is responsible for only one client. This keeps the code and resource allocation simple. A higher level system should manage multiple connection requests and multiple live WebSocket connections. Updating My WebSocket server has a single Update method. This method pumps the connection, it is responsible for sending any pending messages, receiving any new messages (ultimately moving them to the message buffer), and updating status flags (connection opened, connection closed, connection error). Message processing Complete incoming messages are stored in their own buffer. When the engine system is ready to process incoming messages, a call to ProcessMessages is made and a delegate function is passed in. The WebSocketServer will iterate over all messages in the buffer and call this delegate for each one. When the engine is done with the messages they must be cleared by calling ClearMessages. Conclusion Hopefully, you are still with me and have a clear grasp on how WebSocket protocol works and how I designed my WebSocket server. In my next article, I will take this a step further -- using WebSocket inside my engine as a remote procedure call medium and controlling my engine using a web browser. Next month I will be speaking at AltDevConf on this subject. Hope to see you there. [This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like