December 2, 2021
Hello everyone! My name is Yuri, I’m a video team engineer here at Tango Live - one of the top livestreaming platforms around the world! Not so long ago we came to a sudden realization that modern technological solutions to video livestreaming, sadly, were not meeting our business objectives. It became clear we needed to build our own video platform. Today we’re going to go over how we’ve started this groundbreaking process, and the progress we’ve made in revolutionizing live video streaming!
Tango is an internationally operating social network with a focus on live streaming. The company had been in the messenger business for over a decade, with a strong emphasis on video calling. Eventually, Tango’s business turned to broadcasting.
Right now our product is as follows. A number of streamers create content and monetize it. They play musical instruments, sing karaoke, draw, or just communicate with other users who are ready to consume the content. According to recent data, more than 50,000 unique streamers have more than 250,000 streaming sessions a day. The volume of data is enormous – exceeding 10 petabytes per month.
We really care about what we call the “Emotional Touch” – or the ability to deliver the reaction to the stream participants’ actions as quickly as possible. This erases boundaries in communication, making streaming more fun and business more efficient.
The main traffic comes from mobile platforms, but there is also a web version available. iOS- and Android-application have a common part written in C++. Since the project has existed for quite a long time, there is also code in Objective-c, as well as in Java. The new code is mostly written in Swift and Kotlin.
Our Product at the Start
When streaming first began, we used the following protocols:
At the beginning of a session, the player downloads a playlist from the CDN (Content Delivery Network) containing information on all available media streams. Based on the estimated channel width and settings, the player decides which stream to play. The player then downloads a playlist of the selected media stream containing information about specific media segments and their durations, subsequently downloading some of those segments individually. When the buffer is full, playback begins. Until then there is no picture. So the delay in HLS depends on the values one sets for the length of one track and the size of the buffers to start playing, as well as on the policy of supporting uninterrupted playback and preventing freezes. A delay of 4-6 seconds is considered a good value. With some modifications, we can reduce it to 2-3 seconds. Often the player’s condition after a freeze also increases the delay, which might grow without additional manipulation as well as with a network environment being quite stable.
Reasons for changing
It soon became clear that with the growth and development of the business, such increasing delays were critical – unstable networks made it impossible to achieve the desired emotional touch. For this, the time of delay was supposed to take less than a second.
At the same time, we identified other shortcomings within the existing system:
Having analyzed these problems and challenges, we decided upon making radical changes and building our own video platform.
New Platform Tasks
Together with the business, we selected the following metrics to assess the quality of the future system:
Where We Started
The first stage of the transition to the new technology was to replace the protocol. As an alternative to RTMP we considered WebRTC and SRT.
WebRTC is a technology for real-time communication. It is used for calls and conferences, such as Google Meet/Hangout, Viber, and WhatsApp. However, these video formats are usually limited by the number of participants.
The benefits of the solution are that WebRTC uses a floating buffer to minimize latency, and the protocol is adapted to almost any network conditions. However, there are significant disadvantages. The complex set up protocol may negatively affect the speed of opening. Also, a webRTC solution is difficult enough to scale while delivering content to a large audience.
As for SRT (Secure reliable transport), it is a special application layer protocol for efficient transmission of media on the Internet. It works on top of UDP, using some of the mechanisms similar to TCP to confirm the delivery of some content.
As a result, we chose SRT because:
New System Architecture.
In the process of creating our own video platform, we’ve already made several important steps.
1) SRT has been chosen as the transport on both the sending and receiving sides. Moreover, the “server-to-server” interaction also uses SRT. This allowed us to proceed iteratively. In the first iteration, we only replaced the protocol for uploading content to the server, while the rest of the system stayed the same.
2) We implemented our own players where the transport was also SRT. This allowed us to measure the entire content flow, optimize opening times, and fine tune the player step by step with full control over the codebase.
3) Full codebase control allowed us to perform monitoring built in at all stages: from the moment of receiving the picture and sound on a streamer side to that of their rendering by a client. Also, the monitoring made it possible to track traffic consumption paths and optimize routes based on cost and quality at any given site.
4) The SRT traffic delivery network in Multicloud is the most complex part of the system, which required the most time for implementation and deployment. We initially suggested the possibility of working with multiple cloud providers in order to use the best channels in different regions of the world in terms of price and quality.
5) We left HLS as a backup scheme in case a particular user has problems with SRT (couldn’t establish a connection, something went wrong on the server side, etc). This could also be applied in the case of our servers not being ready for an explosive influx of viewers. In this situation, part of the load will be transferred to the CDN, which will help with scalability.
The First Results of the New System’s Work
The implementation of the content delivery system through SRT allowed us to achieve a Latency of less than 1 second during streaming. May I remind you that, with the RTMP+HLS bundle, the latency was 2-3 seconds at best. If two streamers are located in the same region, Latency may be as small as two to three hundred milliseconds. Our experience with the backend allows us to reduce latency in communications between streamers that are widely separated and located in different regions.
In addition, we can highlight the following advantages of the new system:
There is still a lot of work ahead of us in order to improve the platform. We want to reduce latency even more along with continuing to optimize cold start and video quality. We will explain all of this in future posts. In the meantime, please leave comments with any remarks or questions and I will be happy to get back to you.