Browser-Based Voice-Over & Dubbing Training Platform
We built a web platform for recording voice over video with real-time synchronization.Client
The client runs a voice-over and dubbing school with a team of instructors and a professional recording studio. Students typically train in a studio setting, similar to a traditional vocal coaching session.
When COVID-19 disrupted in-person sessions, the school decided to extend its training model online and build a browser-based platform that would allow students to record voice over video with precise synchronization.
Challenges
The client already had an MVP built around the Video.js library. At that stage, the assumption was that the existing solution needed further development and refinement before it could be released.
For that reason, the client deliberately searched for a team with hands-on experience in Video.js. However, most developers use the library as a ready-made video player, while this project required building a full-featured browser tool with voice recording over video and precise synchronization. The search took time, and the client found the right expertise only in 2021.
During our initial discovery meeting, the client shared the project roadmap and demonstrated the existing MVP. Although labeled an MVP, the prototype could hardly be considered viable in real-world use. Video playback froze under load, voice recording started with delays of up to 30 seconds, and each take was uploaded only after completion, which made the workflow unsuitable for any audio-based training.
After reviewing the prototype, we prepared a technical assessment and roadmap. The root cause became clear: the recording pipeline relied entirely on server-side processing, with each file transmitted in full only after recording ended.
We had a strong frontend portfolio, but more importantly, deep hands-on expertise with Video.js beyond standard player implementations.

Resource Constraints
The noticeable lag in the MVP immediately pointed to resource consumption issues. Delays reached up to 30 seconds even on our development machines, which meant they would be even higher on typical user devices.

Reliable Synchronization
Precise alignment between video and voice recording is a must-have for dubbing tools. We were confident we could implement this reliably, but it required deep customization of the underlying library. In addition, synchronization had to remain consistent across different browsers.
Development process
How Dubtrainer Works
For students, the process is straightforward: click “Record,” perform the line, and within seconds the completed take appears in their dashboard. Instructors highlight the convenience of the workflow: recordings are saved automatically, and the platform supports storing multiple takes and comparing results.
Instructor
The instructor prepares the materials by uploading a video with the original audio track and a script segmented by timestamps.
Student
The student joins the online session and begins recording. Voice is captured directly in the browser using built-in Web APIs and layered over the video.
Cloud Upload
The take is automatically uploaded to the cloud, where files are normalized into a unified format.
Review
The instructor can review all takes in the dashboard, evaluate performance, and review the take together with the student to provide immediate feedback.
Developing a Real-Time Voice Recording Platform
1. Re-engineered the MVP
We assembled a dedicated team: a frontend developer, a backend developer, and a PM. Later, a DevOps engineer joined to address account security and payment-related concerns.
We rebuilt the MVP into a stable web application, initially optimized for Chrome. It launched instantly and ran reliably. After the demo, the client tested the recording himself by overlaying voice onto a sample video and confirmed the improvement.
2. Prepared the Product for Market Launch
At this stage, the focus shifted to preparing the product for its initial market release.
– We defined the supported browsers, selecting the most widely used ones to avoid unnecessary costs:
• Google Chrome, which holds a leading position globally;
• Firefox, the most commonly used desktop browser in Germany in 2021;
• Safari, to ensure coverage for Apple device users.
– We expanded the functionality step by step: first adding a lesson management module for uploading and editing materials, then implementing an admin panel for user management.
3. Optimized Performance
Although the interface appears simple, Dubtrainer operates under significant media load.
To ensure stable performance even on lower-end devices, we optimized resource consumption:
– refined the recording plugin and optimized client-side audio processing;
– configured background server tasks for final audio processing;
– integrated and configured a CDN to accelerate content delivery.
4. Streaming Upload
In the original MVP, recording was processed only after the take was completed. The entire file was sent to the server at once, which meant each submission could take up to one or two minutes.
We redesigned the upload flow so that audio is transmitted incrementally during recording instead of waiting for the take to finish.
This reduced the delay between recording completion and result availability to just a few seconds and removed a major bottleneck in the user workflow.
5. Added Group Sessions
Initially, the platform was designed for perfectly synchronized one-to-one sessions between an instructor and a student. In real dubbing practice, however, multiple actors often participate in the same take. To support this workflow, we added group sessions for up to five participants.
– Each participant joins a shared session. Their media stream is sent to the server, and the required streams are redistributed accordingly.
– The server sends a synchronization signal to initiate playback, ensuring that video starts simultaneously across all browsers.
– The instructor can monitor students in real time, provide immediate feedback, and completed takes are automatically saved to their dashboards.
Technologies
Backend
Node.js
Frontend
React
Video
Video.js
Audio
Browser APIs
Database
MongoDB
Infra
AWS
Result
Over the course of a year, Dubtrainer evolved into a stable, production-ready web platform. The focus remained on functionality and performance rather than visual complexity, allowing the client to launch without unnecessary design overhead.
What began as a request to synchronize video and audio resulted in a full-featured training platform with a dedicated admin panel for instructors and a structured workflow for students.
- Individual sessions account for approximately 70% of all classes, with the primary format being one instructor and one student.
- Group sessions support up to four students simultaneously, expanding the training model beyond one-to-one lessons.
- A typical take lasts 15–30 seconds, and a full 60-minute session includes around 20 takes plus discussion time, enabling structured and repeatable training cycles.
The platform was initially conceived as an internal business tool rather than a standalone monetized product. However, it enabled the client to expand geographically and reach new audiences. Over time, Dubtrainer also attracted podcasters as an additional user segment.
students completed courses in the first year.
What happens next:
Having received and processed your request, we will reach you shortly to detail your project needs.
After examining requirements, our analysts and developers devise a project proposal with the scope of works, team size, time and cost estimates.
We arrange a meeting with you to discuss the offer and come to an agreement.
We sign a contract and start working on your project as quickly as possible.