Engineering

Engineering

Implementing Cascading to Achieve Better Scalability at Huddle01

Implementing Cascading to Achieve Better Scalability at Huddle01

Implementing Cascading to Achieve Better Scalability at Huddle01

Mar 7, 2024

Mar 7, 2024

Om Gupta

Om Gupta

Huddle01 faced a Herculean challenge last year - to upscale and host live interactions for up to 10,000 participants concurrently from around the globe. For live interactions the latency has to be less than 100ms to not affect the UX. Normally, in all your API calls, latency is around 450ms or less to not cause any effect on the UX. And here, we are targeting a latency of < 100ms to maintain good video and audio quality for all sorts of different devices in different network conditions.

Our initial thought was straightforward – why not deploy a massive media server node? But much to our surprise, we were exceeding our bandwidth fairly quickly, even if we ran it on the biggest servers provided by the cloud. It was too much bandwidth usage, making it fairly expensive. And the connection quality was still terrible in some geographies.

This experience laid bare the constraints we were operating within and underscored the importance of our soon to be launched decentralised Real-Time Communication (dRTC) network. Here, bandwidth is provided by globally distributed nodes powered by the community, that gets incentivized in $HUDL tokens for their service.

It's evident that our entire vision hinges on how we distribute bandwidth. Upon getting this cleared, we started by drawing experience from the existing well-known system design of sharding where you divide the database, and started by dividing bandwidth, using a system called CASCADING.

How does Cascading Work?

Cascading, despite its complexity is grounded in a straightforward principle: two nodes communicating directly with each other. This approach resonates deeply with my belief in the wisdom of Edsger W. Dijkstra: “Simplicity is the prerequisite for reliability.”

Before delving into the concept of Cascading, let's understand the outcomes we anticipate upon its successful implementation. Currently, each Media Node that a client connects to via WebRTC has access solely to the RTC stream it manages. This setup means the audio and video of users are confined to the specific Media Node they are connected to.

However, in scenarios where a single meeting, room, or session spans multiple Media Nodes which is what we call distributing bandwidth, there arises a need to access streams managed by a different Media Node. To facilitate this, we must establish a mechanism for these Media Nodes to share media between them. This process of interconnecting Media Nodes is what will be achieved after Cascading.

The entire architecture of our system leverages the elegantly simple offer and answer model, which is how WebRTC operates. This process begins when one node initiates the communication by opening a port. This specifies that the port it will be used to do all RTC-related tasks with the another node only. But it goes beyond that, incorporating essential information such as the Maximum Buffer Size, the Maximum SCTP message size, IP addresses, and other critical data needs to be negotiated with the counterpart node before making a connection.

In a reciprocal manner, upon receiving this initial offer, the other node prepares and shares its own set of parameters. Once both nodes are armed with each other's data, they proceed to establish a connection, each operating under the informed assumption of where and how to send RTC packets to the other party. Upon successful exchange and alignment of data, what emerges is a pipeTransport. That's it, that's Cascading.

Well that was it. Or was it? 👀
After doing this you have created more problems for yourselves than you solved 😂 If you just naively keep on connecting two Media Nodes you will end-up making a mesh and as it goes in Computer Science Mesh Network hardly scales and that is the problem we saw early on thankfully. So, what did we do differently?

We simply added an element called an Orchestrator. In our approach by adding one more entity to the whole system we have vastly improved upon the connection nightmare. In this system design, every Media Node is supplying bandwidth and the Orchestrators can tap into them to provide the bandwidth.

Orchestrator and Media Nodes make your RTC infra complete. One Part is the Orchestration or we say connection establishment of who will supply the requested bandwidth and Media Nodes are what will supply them. Doing this the Media Node care only what the Orchestrator says which ranges from which client the Media Node will serve bandwidth to and which Media Node needs to connected to each other for optimal supply of bandwidth.

What is Orchestration?

Connecting two Media Nodes is a fairly simple task, connecting which two Media Node is the question? This doesn't have to be overly complicated if we take some cues from the strategies employed by cloud providers. These providers have divided the world into regions to streamline processes. In our case, we operated under the assumption that if a Media Node were responsible for one region, then we could maintain latency below 100 ms. Our testing confirmed this was generally achievable. Yet, it's important to note that this wasn't consistently the case across all regions. For instance, in North America, we discovered that a single region wasn't sufficient to meet our latency goals, necessitating the establishment of multiple regions to ensure optimal performance.

As it goes in any distributed system we have to have a Discovery Map which knows which Media Node is where and how to reach that particular Media Node. In our case as well when ever a new Media Node comes online it registers itself explaining how many CPU cores, Bandwidth and other resources it has.

Every System must adhere to certain rules. After thoroughly analysing the network conditions and requirements, we developed our "Orchestration System.”

  • Each new meeting is assigned to a single Orchestrator. This Orchestrator does not necessarily need to be the closest to any specific location but is selected based on randomness or whichever has the least load.


  • Every client establishes a WebSocket connection with an Orchestrator before initiating any RTC (Real-Time Communication) connections with any Media Node. The Orchestrator is responsible for managing the RTC connections between all entities in the system, including Clients and Media Nodes which is also called Signalling


  • The latency to reach an Orchestrator may vary between >100ms to <450ms. We prioritise the ongoing RTC latency more significantly. Considering that in any meeting, there won't be more than 10 to 20 active speakers, orchestration requires less frequent communication, which needs to be transmitted to every client.


  • Each Media Node, upon reaching its capacity — which we are also currently determining the most ideal parameters for — will be considered at full capacity. For the time being, we have adopted a simple formula of CPU * 500. However, we are also exploring the inclusion of the number of video and audio streams a Media Node is managing, given that video streams tend to require ~50x times more bandwidth than audio streams.


  • Although a Media Node can handle substantial load, it will only be assigned 1,000 connections for a particular meeting by the Orchestrator. This means 1,000 connections for Room abc and another 1,000 for Room xyz and so on and so forth before it is considered to have reached its capacity and deregisters itself from Discovery. After which the new incoming load will start to get transmitted to a new Media Node in that region


  • Not every Media Node will be Piped with another Media Node We have taken the philosophy of Pipe on Demand, As in the use cases of RTC can go way beyond just Video conferencing we also consider different use cases where everyone is not concerned with the Audio or Video or other person, e.g. RPG games. If one connected user of Media Node A needs to hear someone on Media Node B then only these two Media Nodes will be piped and share there RTP for that particular requested Media only which saves tons of bandwidth between two Nodes.

Indeed, the system we're discussing involves numerous complex components, and there's a wealth of detail that, unfortunately, cannot be fully explored here. Anticipation will have to build until we're ready to open source the entire framework. What I can share at this moment is our foundational approach: simplicity.

From day one, our philosophy has been to create a straightforward base system that can be iteratively enhanced, rather than striving for an unattainable perfection from the start. Our current attempt has shown some great results for the product to be available for the requirements brought before us and more and I think thats the most important part for the engineers building systems like these.

Conclusion

At the end of the day we have to show results we accomplished after all these changes 🫡

We started out with the goal of decreasing average latency below <100ms in global meetings, ability to scale above 10,000 concurrent connections in a single meeting. We are happy to report we have achieved it and more. It can be seen on our usage bump we have seen since we rolled out the new system which is being used by communities like Phaver , Unlock , Learnweb3DAO and more.

Nothing can be more wholesome for us than seeing you use our systems. Soon you'll be able to earn rewards and power your own meetings and of others. The time ahead for Huddle01 is so exciting, and we are so close to launching the 1st People Powered Communication Network.

Just for the sake of keeping a record showing you a internal picture of our load tests 1️⃣ 0️⃣ 0️⃣ 0️⃣ 0️⃣

This is when we reached 10000 for the first time. We didn’t hope we will be able to, so we removed logging from Media Nodes and suddenly we scaled. I know it was stupid but everyone learns.

I hope you learned whats been happening in Engineering inside Huddle01. The narrative of Huddle01 is far from complete. As we go ahead, our vision for a People Powered Communication network continues to guide our path.

————- x

PS: We are on the lookout for like-minded individuals who are aspire by our mission and eager to contribute to our collective journey. If you find resonance with our vision and possess the zeal, your place might just be among us. You can reach out to me on om@huddle01.com or our careers page: here