Who we are and what we do 

Tango is an internationally operating social network with a focus on live streaming. The company had been in the messenger business for over a decade, with a strong emphasis on video calling. Eventually, Tango’s business turned to broadcasting.

Right now our product is as follows. A number of streamers create content and monetize it. They play musical instruments, sing karaoke, draw, or just communicate with other users who are ready to consume the content. According to recent data, more than 50,000 unique streamers have more than 250,000 streaming sessions a day. The volume of data is enormous – exceeding 10 petabytes per month.

We really care about what we call the “Emotional Touch” – or the ability to deliver the reaction to the stream participants’ actions as quickly as possible. This erases boundaries in communication, making streaming more fun and business more efficient.

The main traffic comes from mobile platforms, but there is also a web version available. iOS- and Android-application have a common part written in C++. Since the project has existed for quite a long time, there is also code in Objective-c, as well as in Java. The new code is mostly written in Swift and Kotlin.

Our Product at the Start 

When streaming first began, we used the following protocols:

  • RTMP (Real Time Messaging Protocol) for transmitting media content from a streamer to the servers. This is a protocol on top of TCP (Transmission Control Protocol). As a real-time video transmission,  TCP has a number of disadvantages. For instance, the delivery guarantee starts playing against it. In the case of a network failure and its subsequent recovery, the user will get a picture the moment the failure began, not the picture that the presenter’s camera captures here and now. The latency also increases during this time.
  • HLS (HTTP Live Streaming) protocol for viewing media. AVPlayer was used on iOS, and EXOPlayer on Android. This protocol works as follows:

At the beginning of a session, the player downloads a playlist from the CDN (Content Delivery Network) containing information on all available media streams. Based on the estimated channel width and settings, the player decides which stream to play. The player then downloads a playlist of the selected media stream containing information about specific media segments and their durations, subsequently downloading some of those segments individually. When the buffer is full, playback begins. Until then there is no picture. So the delay in HLS depends on the values one sets for the length of one track and the size of the buffers to start playing, as well as on the policy of supporting uninterrupted playback and preventing freezes. A delay of 4-6 seconds is considered a good value. With some modifications, we can reduce it to 2-3 seconds. Often the player’s condition after a freeze also increases the delay, which might grow without additional manipulation as well as with a network environment being quite stable.

Reasons for changing

It soon became clear that with the growth and development of the business, such increasing delays were critical – unstable networks made it impossible to achieve the desired emotional touch. For this, the time of delay was supposed to take less than a second.

 At the same time, we identified other shortcomings within the existing system:

  • Opening time has proven to be quite hard to optimize. Generally, one needs to make three requests to get the first frame of the video, and with buffering it is almost guaranteed that even more are made. Add to this the closed source code of the player in iOS – something which can’t be changed according to our needs.
  • Simulcast. To support multiple resolutions and without transcoding them on the server side, multiple video streams are streamed over RTMP. On the one hand, this makes the system cheaper, eliminating the need for transcoding. On the other hand, it overloads the already strained streamer’s cell phone with the need to encode video in multiple resolutions. This transmission method does not make the best use of the bandwidth, which results in increased battery consumption and phone overheating.

Having analyzed these problems and challenges, we decided upon making radical changes and building our own video platform.

New Platform Tasks 

Together with the business, we selected the following metrics to assess the quality of the future system:

  • Latency. Broadcast latency should be consistently low (Latency less than a second). The viewer should see or hear what the streamer did immediately. 
  • Time to first frame (Cold start). This is the time between the moment when the user clicks on a particular broadcast and the first picture appears. Ideally it shouldn’t be longer than the opening animation time (200-300 milliseconds).
  • Video quality. Uploaded content should  correspond to certain parameters (resolution, bitrate, FPS). The higher the quality of the picture, the longer its viewing time is, all other things being equal. The development of video quality has proven to also be profitable for business.
  • Crucially, as with all things in business, the solution must be cost-effective.

Where We Started

The first stage of the transition to the new technology was to replace the protocol. As an alternative to RTMP we considered WebRTC and SRT.

WebRTC is a technology for real-time communication. It is used for calls and conferences, such as Google Meet/Hangout, Viber, and WhatsApp. However, these video formats are usually limited by the number of participants.

The benefits of the solution are that WebRTC uses a floating buffer to minimize latency, and the protocol is adapted to almost any network conditions. However, there are significant disadvantages. The complex set up protocol may negatively affect the speed of opening. Also, a webRTC solution is difficult enough to scale while delivering content to a large audience.

As for SRT (Secure reliable transport), it is a special application layer protocol for efficient transmission of media on the Internet. It works on top of UDP, using some of the mechanisms similar to TCP to confirm the delivery of some content.

  • Delayed Delivery Guarantee. Data will be delivered in the allotted interval or not delivered at all. 
  • Fast connection establishment. All packets transmitted via SRT are numbered and time-stamped. 
  • Automatic retransmission for loss recovery (within the delay interval). If a packet is lost, only the lost packet will be retransmitted, not the whole group as with TCP.
  • Openness of the protocol source code.

As a result, we chose SRT because:

  • It is an application layer protocol, and we can implement almost any logic on top of it.
  • Scalability can be built into the solution at the system design stage.
  • Fast connectivity allows us to implement an acceptable level of cold start.
  • No need to patch a huge third-party project.
  • The startup platform can be improved iteratively, its parts being moved to new rails in running production.

New System Architecture. 

In the process of creating our own video platform, we’ve already made several important steps.

1) SRT has been chosen as the transport on both the sending and receiving sides. Moreover, the “server-to-server” interaction also uses SRT. This allowed us to proceed iteratively. In the first iteration, we only replaced the protocol for uploading content to the server, while the rest of the system stayed the same.

2) We implemented our own players where the transport was also SRT. This allowed us to measure the entire content flow, optimize opening times, and fine tune the player step by step with full control over the codebase.

3) Full codebase control allowed us to perform monitoring built in at all stages: from the moment of receiving the picture and sound on a streamer side to that of their rendering by a client. Also, the monitoring made it possible to track traffic consumption paths and optimize routes based on cost and quality at any given site.

4) The SRT traffic delivery network in Multicloud is the most complex part of the system, which required the most time for implementation and deployment. We initially suggested the possibility of working with multiple cloud providers in order to use the best channels in different regions of the world in terms of price and quality. 

5) We left HLS as a backup scheme in case a particular user has problems with SRT (couldn’t establish a connection, something went wrong on the server side, etc). This could also be applied in the case of our servers not being ready for an explosive influx of viewers. In this situation, part of the load will be transferred to the CDN, which will help with scalability.

The First Results of the New System’s Work 

The implementation of the content delivery system through SRT allowed us to achieve a Latency of less than 1 second during streaming. May I remind you that, with the RTMP+HLS bundle, the latency was 2-3 seconds at best. If two streamers are located in the same region, Latency may be as small as two to three hundred milliseconds. Our experience with the backend allows us to reduce latency in communications between streamers that are widely separated and located in different regions.  

In addition, we can highlight the following advantages of the new system:

  • The native SRT content delivery network enables load balancing between multiple nodes in each region, which affects scalability and cost efficiency. 
  • The ability to fine-tune and optimize the new system, which, significantly reduces cold start time. 
  • Optional video transcoding. Reduces the load on the streamer application and uses its bandwidth the best way possible – reducing the load on the battery and the risk of the phone overheating.
  • HLS compatibility built in at the design stage allowed users to seamlessly migrate from the old video platform to the new one and watch web broadcasts where SRT support is not available. Finally, this also made it possible to maintain a stable fallback circuit, which, when combined with the replacement of RTMP and SRT, already shows better values than the old one.

There is still a lot of work ahead of us in order to improve the platform. We want to reduce latency even more along with continuing to optimize cold start and video quality. We will explain all of this in future posts. In the meantime, please leave comments with any remarks or questions and I will be happy to get back to you.


Yury Kamyshenko

iOS Architect

Share this article

More articles

January 27, 2022
How to recommend the unknown
Hi, I am Daniil, Head of Data Science at Tango, the leading live streaming platform worldwide. Data Science is responsible for a plethora of various fields across the company, from conversion funnel optimization and revenue prediction to moderation and fraud prevention. However, today I’d like to give you a sneak peek into the other one, namely Recommender Engine.
December 9, 2021
MLOps: How Machines Take Care of People in Tango
My name is Igor Gorbenko. I am the Team Lead and Architect of the ML-engineering team in Tango. Today, I will tell you how and why we’ve implemented the Machine Learning Operations process in Tango, what challenges we have addressed, and our results so far.
November 15, 2021
How Does A Massive Streaming Service Moderate Its Content?
The popularity of streaming services is growing around the world. While this is fantastic for companies and content creators, developers are challenged to ensure high load, improve fail-safety, streaming quality, app user-friendliness, and, most importantly, moderation of the incoming content so that the environment is safe and comfortable for streamers and users.

Stay tech!