Poster

The popularity of streaming services is growing around the world. While this is fantastic for companies and content creators, developers are challenged to ensure high load, improve fail-safety, streaming quality, app user-friendliness, and, most importantly, moderation of the incoming content so that the environment is safe and comfortable for streamers and users.

Today we’re going to dive into Tango’s moderation system for video and audio streaming and what’s behind the solutions and processes. Content moderation is Tango’s top priority since we went Live in 2018. Here at Tango, we make it a top priority to provide a safe environment and protect our community from illegal and offensive content 24/7.

In our work, we are driven not only by our internal criteria of safe content; our solutions are also evaluated by Apple, Google, Morgan Stanley, Payoneer, PwC, EY, Barclays, and our many other business partners. Typically, our service comprises 53,000 unique streamers who launch over 250,000 stream sessions on a daily basis. However, the process also has the flip side: we issue about 6,000 warnings daily, shut down around 5,000 streams, and ban approximately 100 streamers. 

That’s where the numbers stand now, but they are continuously growing.

Streaming content includes:

  • Video signal
  • Audio signal
  • Photos
  • Text

In this article, we’re going to examine the most complicated streaming types for moderation: video and audio content.

Filters for moderating video and audio content

We need to identify what’s happening on the screen, find violations on different content types within several seconds. The streamer’s avatar and stream preview also need to be checked.

Examples of categories identified by Tango moderation:

  • Sexual activity
  • Underwear
  • Sex toys
  • Nudity
  • Extremist conduct, violence, and/or weapons
  • Babies and minors with no adults 

Hive is the first filter that analyses the video stream. We implemented the tool last year. Now it is an indispensable part of the Tango moderation system.  Hive is a hybrid visual, audio, and text content moderation technology that uses both advanced AI and manual moderation. This up-to-date tech relies on face recognition and identification and is even able to evaluate the stream based on context.

Besides Hive we at Tango also have our own moderation team that works 24/7. 
How it works: Hive monitors for prohibited objects and/or actions during a stream. When we receive an alert, we can process it in several ways. 

For example: let’s say Hive is 80-90% sure that the stream includes nudity – that stream is automatically blocked. If Hive is less certain, or we want to apply some internal rules to check the content, the manual moderation is alerted. 

Manual moderation is our second filter for ensuring that no prohibited content is streamed.

This combination allows us to predict user behavior by analyzing the “red flags.” We analyze what is being streamed, the previous behavior of the streamer, and streams that were live from this device. All to determine if the content is prohibited or not.  If we believe that the stream potentially contains prohibited objects or behavior, we send it for moderation as a high priority.

Another issue is deploying a filter that is able to determine if there are any minors streaming without adults – something Tango prohibits.

The Microsoft Azure age-detection model based on Computer Vision helps us to apply this filter and the Microsoft Computer Vision is switched on for each stream. In case of suspicious activity, the moderation team joins and asks for an ID document with a photo to prove the age of the streamer. A recently implemented tool is our audio streaming moderation. This system allows recognizing “special” sounds (sounds of a sexual nature). Hive also assists in this process and helps our in-house AI to analyze a 5-second piece from a stream.  The audio moderation system is still in beta testing and has not been released yet, so the streams filtered by it are mostly subjected to manual moderation.

The 4 filter systems run in parallel and the frequency of checks is set up by our scoring model.

Streamer and stream score

We have a huge body of data about streamers’ behavior and all the checks they’ve undergone. This body is used to evaluate how “dangerous” the streamer is. The “danger” of each streamer is rated from 1 to 10. We take into account previous cut-offs, blocked accounts on the device, and warnings issued from our moderation.  Using the streamer score we evaluate their particular stream and rate it from 0 to 400. We also follow how their scores change over time. The streamer’s score can go up and down as well. When the score goes down, the streamer’s content has a lower moderation priority. We follow two rules when blocking content or a streamer: a stream is cut off after two warnings, the account is blocked after two cut-offs. In order to block streamers who use multiple accounts, we will sometimes simply block the device.

In order to avoid confusion, we give the user evidence of each violation for each warning or block.

Technical Aspects of Moderation

We evaluate the stream from the point of view of its content. The engineering challenge for moderation depends on the video quality. Our streamers broadcast from all around the world with varying internet speeds. We moderate SD and even LD videos. Above all, a stream can last several hours and prohibited content can appear at any moment. We can only send 5 PB of our traffic to Hive leaving a lot left over which still falls on our shoulders.

The main challenge is to determine the right frequency of checks depending on the streamer’s “danger”. The frequency identifies how often we send screenshots from a stream to an external provider, get responses and decide on our next steps. Over 5,000 streams can be live at the same time so we have to cope with a significantly high load. Besides the ongoing stream moderation, we keep improving the quality of moderation to lessen the number of false-positive and false-negative stream cut-offs, i.e. when banned content is appropriate or we identify inappropriate content too late (but we ban it anyway).

Tango has a QA Moderation team that works on improving the quality of moderation. Within a year the team reduced the number of banned appropriate content from 19% to 5.2% and the inappropriate content we were too late to stop from 5% to 0.87%. Additionally, auto-moderation (without manual moderation) has increased from 63% to 73%.

We keep improving the processes including training the object recognition system for a video stream with latency issues. As an example, training our AI to distinguish between a microphone and a sex object even on a blurry stream.

Currently, the ML model is as good as a human when it comes to analyzing the content of a picture for appropriate vs inappropriate content.

Architecture and technology stack

Tango architecture is based on Google cloud functions (similar to AWS Lambda). At first, we assumed this would help us to avoid scalability problems by allowing us to send huge request volumes without system error issues:

  • All operations actively use the stream component (this is our core signaling service).
  • Stream sends notice about stream start, and if a stream should be cut off.
  • Cloud functionality interacts with image recognition service API.
  • API sends response to cloud functionality.
  • Response sends command to stream service.

Later we understood that stream has turned into a macro service and its functionality should be split. We then created a separate moderation service that is responsible for signaling and processing responses from the image recognition service. The moderation system has a lot of integrations with other services including streamer notification, authorization, and more. Communication between services is generally asynchronous. We use PubSub for communication between server microservices and Google Cloud entities. Kafka is used for sending messages between Java microservices.

Cloud functions allow writing small pieces of logic in any language (we use Python). The created script is applied to each sent message (in our case, to each message in the PubSub topic). This is a worker that executes a conditionally limited logic when we request it using a message with the input parameters embedded in it. 

Most of the resources of a GCP project have the same credentials for accessing various subsystems of the cloud system. This process is seamless due to Google Cloud packages used inside the cloud function. Google Cloud functions can be set up in a way that the system is launched when an event occurs:

  • A file is created/updated/deleted in a bucket
  • A message is sent in PubSub
  • HTTP makes a request to a cloud endpoint

BI pipeline is applied for DataFlow. We use BigQuery and store in order to process a huge amount of data so that we know if there were shutdowns or which checks were applied to the stream by the image recognition service. Logs also include the time when a check took place.

We calculate the streamer’s and stream’s risk rate at intervals of several seconds to receive real-time information about whether the streamer is dangerous now or not. Apache Beam wrapped in Google Cloud dataflow helps us in this. This system collects all data in real time and transforms it into a discrete representation of the risk score calculated as a number from 0 to 400. In this form, it is much easier to analyze, group data, and test hypotheses.

Our technology stack is a combination of Java and Python. Cloud functions are written in Python, the use of which becomes more and more expensive for us and will eventually have to be rewritten. The rest of the microservices are written in Java. 

Production issues and solutions

Production queue issue case

We used Google PubSub as a message queue to connect Google cloud functions and backend service.

Queues have their pros and cons, and we have faced some difficulties using them.

At a certain point, we began to moderate a lot of content, gradually increasing the load on our backend and cloud functions. We realized that some of the checks (i.e. messages in PubSub) sent by the cloud function to the backend are not processed. 

The problem has worsened with the load growth.

To solve this issue, we scrutinized Google Cloud’s documentation and found a useful troubleshooting section. We found a coincidence in the Both the oldest_unacked_message_ageand num_undelivered_messages are growing in tandem section. To solve the problem we increased the number of flows for the subscriber service. This service listens to the messages in the queue.

Sample monitoring diagrams for an active issue:

The scenario in which the number of undelivered messages grows + the number of queued messages with the expired lifetime peak usually indicates one of two things:

  • A bug in the subscriber’s code.
  • Not enough flows for processing messages.

We solved the issues by adding more flows to the message processor.

This article brushes the surface of the complicated moderation flow in Tango and is constantly evolving. While we are regularly facing serious engineering challenges, currently, we are adapting and improvising with a plan to make Tango the biggest and safest place to enjoy going live.


Artem Anin

Head of Web and Moderation

Share this article

Stay tech!