High-quality Video Transcoding at Scale

High-quality Video Transcoding at Scale

Transcoding is the process of converting a media file from one format type to another, a higher resolution to a lower resolution and a higher bitrate to a lower bitrate and vice versa. The growing variety of technologies-including different generations of equipment, low-speed to high-speed networks and over-the-top (OTT) services-are creating the need to transcode videos to a common format in order to maintain interoperability across devices and ensure a higher quality of experience.

Due to the diversity of viewing devices and varying bandwidth scenarios, multiple quality representations at different bitrates—ranging from 100 kbps to 16 Mbps-need to be generated at the same time to deliver the best experience to users. Transcoders can perform audio conversion, caption conversion, packaging and metadata transfer, allowing a provider to offer various audio formats alongside multiple video formats, such as MPEG-2, H.264 and HEVC.

The majority of OTT service providers, including Netflix, use real-time transcoders whenever a request is placed by a user to view a video. The transcoder processes the request and transcodes the video based on the capability of the end user’s device. Depending on the type of viewing device requesting the content, the transcoder can repackage it into the appropriate adaptive bitrate content format such as Flash HDS, Microsoft Smooth Streaming (MSS), MPEG-DASH and HTTP Live Streaming (HLS). Client devices adaptively select the optimal stream based on the available bandwidth.

Transcoding Flow

Traditional transcoding solutions are limited in their ability to provide the processing power required for transcoding high-definition media content such as 4K, UHD and HD for video services. With the steady growth of OTT services, it is essential to build a video transcoding pipeline that is highly robust, efficient and scalable to guarantee a high-quality experience for end users.

Figure 1 below shows a high-level overview of a transcoding system. High-quality video sources are ingested through transcoding requests along with input and output metadata. Output of videos with various codec profiles and multiple quality representations are generated based on the metadata and deployed to a content delivery network for streaming. During a streaming session, the client requests the format that their device can play and adaptively switches between quality levels based on network conditions.

High-quality Video Transcoding at Scale
Figure 1. Traditional Transcoding System.

In case of traditional transcoding solutions, the application server receives the transcoding request from the user, downloads the original source content to the shared file system and queues the transcoding job based on priority. The transcoder picks jobs one at a time from the queue and processes the request. The time taken by the transcoder to complete one transcoding job may vary based on the nature of a variety of factors such as input and output configurations, hardware processing power, the transcoder’s computational complexity, etc.

Hence, the main obstacle of software decoding and encoding is the poor processing speed compared to hardware processing. Waiting time for the job in the queue will be higher when the number of transcoding requests is high. This limitation can be addressed by deploying a parallel transcoding architecture.

Parallel Video Transcoding

Parallel video transcoding uses a scheduler-worker architecture where the scheduler and worker run on separate machines. (See Figure 2.) Parallel transcoding systems are composed of an application server, job scheduler (scheduler), shared file system and multiple transcoder engines (worker). Scalability is achieved by increasing the number of worker machines with each scheduler, which results in processing a higher number of transcoding requests in parallel.

Figure 2. Parallel Video Transcoding.

Processing more requests requires a lot of computational power and resources. Thus, it is critical to have an energy-efficient, low-cost video transcoding solution on premises. The most cost-effective solution for scalability of transcoding is a robust cloud-based system.

Using Cloud for Transcoding

The capabilities of the cloud help to achieve scalability, security and high availability by seamlessly scaling up the transcoder instances when more transcoding requests need to be processed and scaling them down to free up cloud resources when transcoding requests decline. Since the number transcoder instances can be dynamically scaled as needed, video transcoding solutions don’t require any special hardware capability and can run on several cloud instances.

Transcoding large files can take a significant amount of time. If the job has to be completed within a few minutes, then distributed video transcoding becomes a more realistic option.

Distributed Video Transcoding

Modern transcoding solutions use distributed systems for fast transcoding by dividing a given transcoding job into multiple sub-jobs and using multiple transcoding engines for processing.

Distributed video transcoding systems are composed of an application server, a shared file system, job splitter, job merger, scheduler and multiple transcode engines. (See Figure 3.) The application server receives the transcoding request from the content delivery network (CDN) service provider or the user and downloads the source content to the shared file system.

Multiple transcoding sub-jobs are created by the job splitter based on the nature of the request. The scheduler assigns each sub-job to one cloud instance to transcode the content. Job splitting is performed by multiple approaches including splitting the input file into multiple chunks or by marking split points using a GOP (group of pictures) table and letting each transcoder use the split point to generate partial output without splitting the input file. The job merger is responsible for all intermediate outputs generated for each sub-job and stitches them into a single output. Figure 3 shows the high-level architecture of the proposed distributed transcoding solution using Amazon Cloud Services.

Figure 3. Distributed Video Transcoding.

The main advantage of distributed transcoding is if a problem is detected at any point in the job, it can be immediately resolved without waiting for the entire video to be transcoded. When all the chunks corresponding to a stream have successfully completed, they are stitched together by the video assembler.

Quality of Service

Prior to implementing distributed transcoding, a full HD or 4K movie took days to encode, and a failure occurring late in the process would further delay the transcoding. With the distributed transcoding approach, a video can be fully inspected and transcoded at the different profiles and quality representations within a fixed duration. Also, since the work is done in parallel, processing time is not increased for longer sources.


Whether it’s broadcasters or content owners, media providers are looking to adapt video content so it is compatible with the capabilities of the user’s device—such as screen size, computation power and network conditions—and can harness the capabilities of the cloud to create high-performance, flexible, scalable and economically affordable transcoding solution. The proposed system design for high-quality distributed video transcoding detailed in this blog has been designed with these objectives in mind to make it quick and easy for video providers to scale their transcoding workflows in line with business growth.


About the Author


Vivekanandhan Subramanian
Senior Technical Leader

Vivekanandhan Subramanian is a Senior Technical Leader at Aricent and has worked on key technologies like IoT, OTT, Automotive, Multimedia, Content Adaptation, Connectivity and Security. He is motivated by problems and challenges at work and loves Coding, Gardening and Driving.


Leave a Reply

Your email address will not be published. Required fields are marked *