How to Burn Subtitles with Transparent Background with VideoLocalize

Why transparent background is not standard?
When burning subtitles into a video, by default, the subtitles will display as white text with a black border around the letters for clear visibility and no background. However, since the text has no background, it may blend into the visuals of the video or clash with the visuals, rendering the subtitled text unclear.

Why use a transparent background?
Instead, the subtitles will display much clearer if there is a background to give a higher contrast between the subtitle text and the visual. A transparent background is a good solution because the transparency does not obscure the visual behind the text (one can still see through the text) and the rectangular box around the text gives it that higher contrast.

Yet, creating a transparent background is not a standard function of many subtitle editing tools.  Although many popular editing tools can help you create a transparent background, it is not a straightforward process, and usually require many steps and trials to get it right.

How VideoLocalize makes it easy
With the VideoLocalize tool, creating a transparent background around subtitles is done automatically. You would need to upload your video and subtitle file first. The subtitle file needs to be in the proper .srt or .ass file format. Then you are ready to select the background style.

Under the “Style” section, you just choose “Transparent”, and then click “Start Processing”. Once the processing is done, your subtitle will be embedded with a transparent background.

We have made our Subtitle Burning tool easy to use and fully automated – for your convenience. Try it out yourself at

Video Localization: Driving Down Costs, Enabling Scale

Video, the content of tomorrow

Imagine you’ve bought an espresso machine, one of those fancy ones, with many moving parts and multiple configurations. After wrestling with the content in the instruction manual, you turn to Google. Fortunately, the manufacturer is video savvy, and has uploaded a video installation guide to YouTube. That video appears at the top of the search results, so within minutes, you’re assembling the machine, thankful for the clear guidance, your mouth watering as you anticipate your first espresso.

This scenario is not imagined; it’s a true story, one that plays out thousands of times each day. YouTube, today’s video giant, has more than 1.3 billion users who watch more than 5 billion videos each day. According to Cisco, video will grow four-fold by 2020, with business videos accounting for 66 percent of business traffic.

Facing the video complexity-cost barrier

But there’s a problem. Keeping up with consumer demand for video is complicated and costly, especially for companies selling into international markets. Each video must be transcribed, timed, and translated. Voiceover talents must record new audio, which must then be synchronized with the original video. These tasks are manual and labor intensive, and come with skyrocketing costs and lowering margins.

The highest cost involved lies with human voiceover.  Although some companies use multilingual subtitles to avoid the cost of voiceover, subtitles are not appropriate for many formats, such as for software training videos. Viewers of training videos need to see the operation of the software, and it’s unreasonable to expect them to read subtitles while watching — a distraction that irritates users, disrupts learning and weakens trust.

Answering the question: Why is voiceover so expensive?

Voiceover is expensive for two reasons. First, the expense of resolving the synchronization issues that arise from the difference in length of spoken sentences (Figure 1).

Figure 1: An illustration of the voiceover synchronization problem. The pink track shows the length of the original voice. The blue track shows the length of the longer, translated voice.

For example, a sentence that takes 10 seconds to speak in English may take 15 seconds to speak in a language like French or German. Resolving this issue pushes up costs, as it takes 30 minutes of engineering time on average to sync one minute of video.

The other reason why voiceover is so expensive is because voice actors are expensive to hire. And hiring expensive voiceover talent has been (until now) the only route to voice localization, a service provided only by high-cost voiceover and dubbing studios that typically create high quality, “fancy” productions, such as feature films and TV advertisements (Figure 2).

Figure 2: Adding to the high cost of video localization: Voiceover studios charge the same high rates no matter the production, whether a “fancy” TV ad or movie, or a far less complicated business video.

Introducing a cost-effective solution:

VideoLocalize is the world’s first video translation management system (VTMS) that uses hybrid technology to automate and synchronize the tasks involved in video localization, as well as to address the voiceover issues that drive up costs.

Synchronization issues are resolved through the system’s patent-pending hybrid technology that automatically applies segment-by-segment syncing to eliminate the need for post-engineering. In other words, if the length of a new segment is longer than that same segment in the original, the system adjusts, editing the audio and video by imperceptible amounts, slowing down and speeding up the individual components as needed. The work is completely automatic, saving countless editing hours and many budget dollars.

VideoLocalize also expands voiceover options by providing built-in, text-to-speech engines, and a hiring marketplace that offers access to lower voice talents (Figure 3). With VideoLocalize, you can save your use of costly studio voice actors for your highest-value projects.

Figure 3: Voiceover options available through

Lowering cost, enabling scale

Today, start-to-finish, automatic video localization is possible and profitable, even at scale, thanks to the VideoLocalize VTMS. Future-thinking language service providers may now expand their offerings by bringing video localization services to their clients.

In addition to handling and automating all the tasks involved in video localization, VideoLocalize also provides a voiceover talent pool and project management environment, giving translators, project managers, voice talent and clients the tools and online workspace they need to complete video localization projects at scale, faster and at a far lower cost than before.


About VideoLocalize

VideoLocalize is the world’s first video translation management system that allows you to manage
the entire process of video localization on a single platform. The system automates transcription,
timing, subtitling, translation, text-to-speech and audio-video synchronization. Winner of the TAUS
Innovation Excellence Award in 2016 and the Process Innovation Challenge (PIC) at LocWorld in
2017, VideoLocalize is on a mission to make video localization faster and more cost-effective.
For more information and a free trial, please visit

How to Control Text-to-Speech Pronunciation Using SSML

The pronunciation of a certain word or sound unit may be different between various languages.  For example, in the Japanese word “genba”, the first syllable is pronounced with a hard “g” sound as in “get”, not a soft “g” sound as in “gem”.  So, how can you manipulate your Text-to-Speech (TTS) engine to reflect such differences?

Speech Synthesis Markup Language (SSML) is a markup language that provides a standard way to mark up text for the generation of synthetic speech. Using SSML tags to format the text content of a prompt, you can control many aspects of synthetic speech production, such as pronunciation, pitch, pauses, rate of speech, etc.  Here’s an example of how a SSML tag can guide pronunciation:

Original written script:

Genba is a Japanese term meaning “the actual place”.

Without SSML tags, here is what the original script sounds like in TTS:

In order for the TTS engine to generate the correct pronunciation, this is the formatted SSML tag and the correct audio:

  <phoneme alphabet="x-sampa" ph="gInbA">Genba</phoneme> is a Japanese term meaning "the actual place".

As you can see, by inserting a simple tag, the word “genba” is now pronounced correctly.

SSML has many other capabilities when working with TTS.  It can add pauses between sentences and/or paragraphs, emphasize certain words, select a speaking voice by attributes, as well as set the pitch, rate, and volume of the speaking voice.  Read more about SSML and its features at either the Microsoft or Amazon Speech Platforms.

Using SSML in VideoLocalize

When working on your project in VideoLocalize, you can use SSML formatted text to control various aspects of synthetic speech production very easily.

To insert your own SSML tags, take a look at the below sample screen shot and follow these steps:

  • Step 1: Click on the “Pronunciation Guide” of the segment you want formatted.  Once you click on it, a text box will appear beneath the translated Chinese text.
  • Step 2: Type in your SSML tag, and then click the “Confirm” button.
  • Step 3: If there was a previous recording of that segment, you should “erase” it first.   If there is no previous recording, then just skip Step 3.
  • Step 4: Generate a new recording by clicking on the “Text-to-Speech” button.


VideoLocalize Integrates Voice Over and Transcription Services

Markham, Ontario – March 23, 2018 – VideoLocalize, the world’s first fully automatic video localization platform, is announcing the release of its platform’s latest features. In addition to automating the dubbing process, users can now order voice over and transcription services directly from its platform.

“Even before this updated version, people called VideoLocalize ‘brilliant’ and ‘a disruptor and game changer,’” says George Jie Zhao, CEO of VideoLocalize. “Now, a fully automatic video localization process is available, giving companies everything they need to manage low-cost video localization projects in a world increasingly dominated by video.”

With, a project manager can manage the entire video localization project on this one platform: order a transcription service, add subtitles, select a voice talent and book a voice over recording, and then automatically synchronize the recording with the video. Alternatively, using the TTS (text to speech) feature can generate a localized video in another language almost instantly.

Says Zhao, “All the services are integrated, so it’s easy to order the services you need and manage the whole project all on one platform. You don’t need to contact services separately anymore.” It’s a platform where users can manage the whole process and the resources needed, simplifying communication and reducing project management costs.

To learn more about the platform, visit

VideoLocalize wins the Process Innovation Challenge at LocWorld Barcelona

June 20, 2017, Toronto –  Boffin, an Asian language provider, is pleased to announce that it has been named the LocWorld Process Innovator 2017 by winning the prestigious Process Innovation Challenge (PIC) at LocWorld in Barcelona, Spain.  George Zhao, President and co-founder of Boffin, demonstrated a simpler, less time-consuming process of localizing video using its new VideoLocalize platform.  Calling this new innovative process “Interpreting Video”, Zhao combined the translation, voiceover and post-engineering aspects of the video localization process into one simple step.


Held on June 15-16, 2017, the Process Innovation Challenge (PIC) is a fast and furious competition, taking the top 6 out of the 35 innovation entries and shortlisting them down to two finalists.  The top two then goes through to the final round on day two.  The winner, chosen by three Process Dragons and the audience, is then named the LocWorld Process Innovator 2017.  Boffin beat out the other finalist, a Microsoft innovation, with its cost-saving and time-saving approach to video localization, to take this coveted title.

“Video localization has always been a complicated process,” says Zhao.  “My idea to merge translation and voiceover recording by using an interpreter instead has never been done before.  It’s only possible because of the VideoLocalize platform.”

With VideoLocalize’s innovative technology, one can significantly simplify the video localization process by merging the translation and voice-over recording aspects into one process: direct interpretation into the tool.  The synchronization feature automatically adjusts the video and voice over recording lengths so that they match.  Usually, this part of the process would have to be done manually, requiring hours of tedious work.

Boffin launches

January 25, 2017, Toronto – Boffin, the Asian language service provider, today announced the launch of its platform, winner of the coveted TAUS Innovation Excellence Award. The platform, deemed “brilliant” and “a disruptor and game changer,” is an all-in-one video localization solution addressing the market’s growing need for multi-lingual voice talent and low-cost video localization in a world increasingly dominated by video.

“We are pleased to bring VideoLocalize to the industry,” says George Jie Zhao, co-founder and president of Boffin. “People who find themselves doing more video work can expect significant financial savings using the platform, as it eliminates the need for costly post-editing work and provides a central, online location for identifying, hiring, and managing voice-over talent.”

The platform, part synchronization tool, part voice-talent pool, and part project management environment, gives translators, project managers, voice talent, and clients the tools they need to complete video localization projects, and a way to work together online, simplifying communication and project management.

As a tool, VideoLocalize provides two methods for synchronizing audio and video. The first, Magic Sync, stretches a video to match a voiceover. The second, Karaoke Recording, presents voice-over talent with a line or two at a time so that each spoken sentence matches in every language the first time.

As a platform, VideoLocalize is a project-management environment and voiceover talent pool combined. “Karaoke Recording, which won us the TAUS Innovation award, was really the first step,” says Zhao. “By adding Magic Sync, the talent pool, and the co-working system, we have effectively created the world’s first method for scaling video translation and voiceover.”

To learn more about the platform, and for a free, online demo, visit

ABOUT VideoLocalize

Video localization is complicated. There are translators and voice-over talent to source and manage. Translation processes and graphic engineering to oversee. And, costly post-production audio/video editing to drive up the budget. Until now. Until

Videolocalize, the brainchild of Boffin Language, an Asian-language service provider, is part synchronization tool, part voiceover talent pool, and part project management environment, giving translators, project managers, voice talent, and clients the tools and online workspace they need to complete video localization projects—faster and at a far lower cost than before.

Try it yourself for free at

ABOUT Boffin

Founded in China in 1996 and headquartered in Toronto, Boffin Language Group Inc. has secured a strong niche position and a reputation as a reliable provider of high-quality translation and localization services, primarily for the Chinese Simplified, Chinese Traditional, Korean and Japanese languages. Boffin is also an established engineering service provider, handling software localization, QA testing, DTP, and audio/video engineering as well.

To learn more about Boffin, visit

VideoLocalize wins the TAUS Innovation Excellence Award

November 8, 2016, Toronto – Boffin, the Asian language service provider, today announced that its new VideoLocalize platform has won the coveted TAUS Innovation Excellence Award in the Insider category. TAUS, the resource center for the global language and translation industries, presented the award to President and Co-founder George Zhao following his presentation at the organization’s Annual Conference in Porland, Oregon.

“We are very pleased to be recognized as one of the translation industry’s ‘Gamer Changers’ of 2016,” says Zhao. “Our VideoLocalize platform grew from our clients’ expanding need for voice-over video work, as many clients are producing more and more video, yet it is expensive and time-consuming to map translated voice to video.”

VideoLocalize, according to Zhao, solves that problem by presenting voice-over talent with just a line at a time, Karaoke style, so that each spoken sentence matches in every language the first time. “When you can record just a sentence at a time, you save many hours of post audio-video editing work,” says Zhao. “And that translates into big financial savings.”

It was that combination of innovation and benefit to the industry at large that led to Boffin’s receipt of the Award. “All industries must change to keep pace with a fast-moving world, and the translation industry is no different” says Zhao. “With text increasingly shifting to video, it was clear to us that low-cost video localization is a must, which in turn led us to create, the industry’s first video localization platform.”

To learn more about the platform, and to try it yourself, visit