Aesel Zero Latency Streaming Design

Section 1: Abstract

This document outlines the design for Phase 3 of the Distributed Visualization Server. This effort has multiple focuses:

  1. Allow for deployment configurations which support zero-latency event processing.
  2. Allow end-to-end HTTPS encryption of transactions.
  3. Enable transaction network to utilize existing load balancers.

One of the primary benefits that Aesel provides is as a back-end server that can be deployed across cloud networks, with enough speed to power a traditional multi-player video game. A single-server deployment should be capable of meeting the performance requirements, but not scalability. However, due to the expected network latencies present within the network of a cloud provider, we are not able to guarantee that such a deployment will meet the performance requirements. We need to support a cloud-native, zero-latency solution.

As a part of this enhancement, we will be introducing several new concepts, outlined in the Vocabulary section below.

Section 2: Vocabulary

Vocabulary

Term Description
User Device A Unique Hardware device utilized by an end-user to render the objects received from Aesel
Object An Object in 3-space that is being rendered and/or interacted with by user devices
Scene A logical grouping of objects which devices can register to. Each scene is associated to a particular coordinate system
Coordinate System A set of 3 Axis (X, Y, & Z), as well as an origin. Coordinate Systems are stored in relative terms, using matrix transformations to move from one to another.
Transaction A single interaction with Aesel by a client through the HTTP interface. Strong consistency and atomicity are guaranteed, with slower performance profiles than events. Confirmation is received synchronously.
Event A single interaction with Aesel by a client through the UDP interface. Weak consistency and atomicity is not guaranteed, with much faster performance profiles than transactions. Confirmation is received asynchronously.
Scene Cluster A cluster consisting of one or more instances (and/or clusters) of CLyman, Crazy Ivan, Mongo, and Consul Agents. Responsible for serving up Object information for one or more particular scenes.

The following components are utilized within phase 3 of development:

Type Name Description
Cluster Manager Adrestia Tracks available Scene Clusters, deciding which instance manages which Scene(s). Handles routing of requests for Objects and Scene Registrations.
Data Storage Mongo Storage of text-based data to hold Mesh data, Shader data, etc
Transformation Updates Clyman Storage for Object Transformations with real-time change feeds
Scenes & Coordinate Systems CrazyIvan Tracks sets of objects & user devices which form scenes to know what to load. Tracks coordinate systems to resolve the differences between devices.

Section 3: Events & Transactions

Events and Transactions are the means by which clients make changes to the state of an Aesel Scene. Generally, events are used to update an active scene (ie. live updates during games or simulations), while transactions are used for inactive scenes. This is not, however, a hard rule.

The difference between events and transactions comes down to one of consistency vs speed. In cases where speed is critical (such as live updates during games/simulations), events become the preferred method of communication. In other cases, transactions offer stronger consistency guarantees at the cost of more time spent making the request. Each client is expected to make use of both APIs, at various times throughout their lifecycle.

Section 3.1: Transactions

A transaction generally passes through an external load balancer, and is routed to either the Crazy Ivan or Adrestia. Scene API requests are passed directly to Crazy Ivan. In the case of Object updates, Adrestia forwards the message on to the correct instance of CLyman.

In the case of Scene Registration messages, Adrestia does the same but may need to decide which cluster is acting as a primary for the given scene. Either way, it then responds with the address of the cluster to which a device has been assigned. The device can use this interface to send in events to the cluster.

../_images/transaction_interface.png

Section 3.2: Events

An event is sent directly from the client to a CLyman instance, via an exposed UDP port (more details in section 4). Events are currently limited to a simple Object update, overwriting the transformation, location, and scale of the object. Future development work will expand on the Event API to provide configurable event types and event stream logic.

Because we are only supporting single-object events in the implementation, we do not have to worry about cross-scene and cross-cluster events until the next phase of development.

Section 4: Zero-Latency Deployments

Zero-latency means that there can be submillisecond network latency incurred from receipt of an Object Update (as detailed in the Aesel High Speed Streaming Design), to output of the Change Stream to other registered devices.

To accomplish this, CLyman and CrazyIvan can be deployed with several types of event streams: in memory, udp, and kafka. These strategies vary in terms of speed vs reliability.

Section 4.1: In-Memory Streaming

In-memory streaming involves a block of shared memory for CLyman and Crazy Ivan, that is used as a queue. This is the fastest method of deployment, but also the least flexible. Each physical server should have one pair of CLyman and Crazy Ivan running, and each pair can manage separate scenes.

../_images/inmemory_event_stream.png

Section 4.2: UDP Streaming

UDP Streaming involves CLyman instances sending UDP messages to Crazy Ivan instances within the same cluster. This provides a good mix of speed and scalability, as CLyman and Crazy Ivan can be scaled horizontally within a cluster.

../_images/udp_event_stream.png

Section 4.3: In-Memory Caching

In-memory caching can be used to prevent repeated network calls within Crazy Ivan

  • On receipt of kafka update, add scene to time-expired cache of scene ID’s
  • Check Scene-Device cache (seperate, time-expired cache of scene ID’s and device info) for scene
  • If not present, execute query & populate. If configured to still send updates, do so
  • If present, send update
  • On background thread, periodically query Neo4j and populate Scene-Device cache
  • Crazy Ivan should always keep in memory those scenes in it’s region (per region-specific server pairs)

Section 4.4: CLyman UDP Endpoint

CLyman needs to be able to accept Object Overwrite messages via UDP, mimicking the UDP API exposed by Adrestia.

Section 4.5: Region-Specific Server Pairs

Clyman/CrazyIvan pairs will need to be able to register as pairs with Consul, making them distinct enough to register the pairing within Adrestia. They will also need to register to the specified region.

Adrestia will need to provide the address of particular pairs of CLyman/CrazyIvan to devices upon registration. The devices can then send UDP updates directly to CLyman. Adrestia will also need logic to provide alternate server pairs upon request when devices detect that the server is down.

Section 5: Required Changes

  • Update CLyman and CrazyIvan to use HTTP(S) interfaces rather than ZMQ
  • Add UDP Endpoint to CLyman
  • Add configurable options for event stream in CLyman and Crazy Ivan (Kafka, UDP, Shared Memory)
  • Add logic in Adrestia to maintain mapping of Scenes-clusters
  • Add configurable logic in Adrestia to pass UDP endpoints to devices
  • Add Scene-Device Caching logic in CrazyIvan so no DB calls need to be made
  • Logic in Adrestia to provide an updated UDP endpoint to devices on failure