A paper edit can be the fastest way to get things moving in your editing timeline. These resources can get you there even faster.

A paper edit is a time-coded list of the bits and quotes that you want to use, in the order you’d like to use them, often accompanied by some notes on what illustrative footage (B-roll) you will use to cover the cuts and add depth. Paper edits are especially helpful in corporate or documentary projects, where much of the footage is interview-based. So what makes a good paper edit?

A good paper edit is accurate, specific, and well-structured. Getting to a good paper edit, especially with interviews, depends on good transcriptions of your interviews that you can quickly scan through and highlight.

Historically, transcription was expensive, as you actually had to pay a typist to listen to the audio and manually write out everything, word by word. These days, computer-based speech-to-text services can provide highly accurate transcripts at up to ten times the speed of real time. This improvement in the technology, along with the falling costs of cloud-based services, means that now more than ever, transcription is readily available to everyone.

In this post, I’ve rounded up some of the best online transcription services, including a couple of free options.


Common Pitfalls of Using Transcriptions

Speech bubbles icon

Before we actually get into the different services and software you can use, it’s important to know what pitfalls can arise if you rely on transcriptions and paper edits too heavily.

Working from a paper edit means you’re less likely to have sat through the entire interview and absorbed everything. Instead, you are simply jumping from timecode to timecode, collecting clips, so you may not know the footage all that well. This saves time in the beginning, but in the long run, it can mean that finding “that bit we really need” means wading through the transcripts again or starting over with the footage.

The biggest pitfall when relying on transcriptions is the two-fold problem of inaccurate transcription and text-based thinking. The downside of inaccuracy is obvious — when you get to that part in the video, the interviewee simply didn’t say what you see in the transcript. The downside to text-based thinking is that you’re editing the words in the way you want to hear them (in your head), which may not reflect how the interviewee said them.

This leads to another pitfall, common (I’ve found) among corporate clients who try to edit the text too tightly, stealing instances of “and” and “because” from random parts of a sentence and trying to “franken-bite” them together. Sometimes you can get away with this, especially if you can hide the composite phrase behind some cutaways, but most of the time, you’ll paper-edit yourself into a corner.


A Free Transcription Service for Video Editors

The Guide to Transcription Services and Software for Video Editors — Free Transcription

Let’s start with the free option to give you a taste of why digital services are superior to transcribing manually. 

The free option is … you. You listen to the audio and type out the transcript yourself. I used to do this often for interviews on my blog. This epic interview with editor Vashi Nedomansky, on how he helped the Deadpool editors prep to cut on Premiere Pro, literally took days to transcribe by hand. More recently. I got a chance to check out the beta of SpeedScriber from Digital Heaven, which saved me so much time transcribing this interview with Sven Paper from This Guy Edits that I could hardly believe it.

If you are going to transcribe an interview for a blog post, then timecode stamps aren’t all that important. If you need timecodes, then something like Inqscribe ($99) might be a better option than the following VLC-based workflow.

When transcribing by hand, I used to use VLC Media Player for a few reasons:

  • It’s a free multi-platform (Windows, OSX, Android, iOS) media player that will handle almost any file format.
  • You can slow down the audio playback so that you can keep up as you type (Playback > Playback Speed Scrubber).
  • Keyboard shortcuts like “Step Backward” (Alt+Command+Left Arrow) make repeating a section quick and easy.

My suggestion for getting through an interview as quickly as possible is to slow the sound down by about 50 percent, type as fast as you can, and don’t stop to fix any errors or typos until you’re at the end of a paragraph. That way, you can type fast, loose, and messy, which will get you through big chunks rather than stopping every few seconds.

Be aware, however, that no matter how fast you type, it will still take you ages to get through something like an hour-long interview.


Using YouTube for Transcription and Translation

Another free method is to upload your video file to your (highly subscribed) YouTube channel and allow your viewers to transcribe and translate your video for you.

To do this, you’ll need to enable Community Contributions, from your YouTube Creator dashboard, which you can learn about here.

This will likely work for a final video, rather than a raw interview. (You’re unlikely to want to publish your raw material online anyway.) You can also now purchase closed captions and translations (beta) via the YouTube dashboard.

The benefits of subtitled and translated videos mean that your video is now more accessible to a wider audience. They can watch it on the train, late at night in bed, or just in a different language.

It’s always worth translating your video Titles and Descriptions into a few popular languages so they are more searchable on YouTube. Just use Google Translate.

The Guide to Transcription Services and Software for Video Editors — Automatic Captioning

Alternatively, you can rely on YouTube’s automatic speech recognition algorithm to do all this work for you. The results vary quite wildly, depending on the complexity, quality, and content of the audio in your video. A video with lots of music and noise under the dialogue isn’t going to produce results as accurate as well-recorded, clean dialogue. Even then, the speech-to-text accuracy can be embarrassingly bad and hugely inaccurate. But hey, it’s free!

The Guide to Transcription Services and Software for Video Editors — Video Language

To get YouTube to transcribe your video for you, turn on Subtitles/Closed Captions (CC) for the video you want it to process and set it to the language spoken in the video. The first time you click on the Subtitles/CC tab in the YouTube Creator dashboard, the popup window above will appear. Setting the language will start auto transcription, if it’s available in your language.

You can, of course, edit and review these closed captions manually — or get your viewers to do it.

The Guide to Transcription Services and Software for Video Editors — Transcript

To download the finished transcript from YouTube, just click on the “… More button” of the video and jump to Transcript. This will give you a timecoded, line-by-line transcript of the video.


SpeedScriber Automated Transcription

The Guide to Transcription Services and Software for Video Editors — SpeedScriber

Of all the services in this post, Digital Heaven’s SpeedScriber is the one I have the most personal experience with, and I have to say it lives up to it’s strapline. It is fast, surprisingly accurate, and almost magical to use!

If you want a more detailed look at the software, check out my long preamble to using it here. For a forty-second understanding of what SpeedScriber does and how it works, check out the demo/preview below.

SpeedScriber is currently a Mac-only standalone app, into which you drag and drop your audio and video. The program compresses these to an optimal format for upload to SpeedScriber’s servers, where the speedy transcription occurs. Secure cloud servers then process these files, and the transcription downloads to your machine.

From there, you can edit the transcribed file for errors in wording, speaker assignment, or grammar. Once you’re happy with the text, you can export the file as an Avid Script, FCPX XMLS, PDF, Plain Text, or SubRip (.srt) file. Exports for Adobe Premiere are in development.

The Guide to Transcription Services and Software for Video Editors — SpeedScriber Avid Script

Some of the other key features of SpeedScriber include the following:

  • Supports American, Australian, or British accents and detects different speakers.
  • Automatic timestamps.
  • Unique tools for quickly making corrections to transcripts.

The app runs on a pay-per-minute basis (USD $0.50/min), so you only pay for the length of the files you’re uploading. This does mean that you may want to pre-edit your files to save some money if there are any big gaps or pre-interview waffle that you don’t need. Although this is true of any transcription service.

Having used the app on a number of different accents, I can say it handles various twangs and idiosyncrasies very well.

Sign up for the beta at SpeedScriber.com.

In this video, you can see a quick demo of using SpeedScriber with Final Cut Pro X. Since the app is still in beta, there have been quite a few advancements to the integration since the creation of this video.

In this similar demo, you can see how easy it is to bring SpeedScriber transcriptions into Avid Media Composer as scripts.


Nexidia the Power behind Avid Phrase Find

The Guide to Transcription Services and Software for Video Editors — Nexidia

The first speech-to-text application that I ever came across was Avid Media Composer’s Phrase Find and Scriptsync functionality. Nexidia’s proprietary Dialogue Search software powered this feature under the hood. There was a time when the rights to that underlying software were no longer available in Media Composer, so this superior functionality disappeared. But now it’s back in Avid Media Composer 8.8 and better than ever in version 2.0.

In fact, since Avid has acquired the exclusive licensing to the rights to develop and commercialize the technology, you now have to buy all of Nexidia’s media products from Avid.

This article on TheBroadcastBridge.com sheds some light on the newly rebranded suite of applications, including the following:

  • Avid Dialogue Search: lets you find any media clip during which any combination of words or phrases occurs
  • Avid Illuminate: verifies captioning, video description, and multiple language support
  • Avid Comply: ensures that the right caption appears with the right media in the right language
  • Avid QC: verifies that captions, language, and video description meet regulatory requirements
  • Avid Align: automatically adjusts caption timing errors
  • Avid Search Grid: the underlying engine that runs all of these products

So, if you want to use Avid Dialogue Search (A.K.A. Phrase Find) in the standalone app alongside your NLE of choice, say, FCPX or Adobe Premiere Pro, then you’ll need to buy it directly from Avid.

There used to be a Premiere Pro panel extension for direction integration in the NLE, but this no longer seems to be available. Either way, you can export the timecodes and transcripts from Avid Dialogue Search to work with them in other apps.

Michael Kammes has put together a “5 Things” episode on using Nexidia’s products that will give you more details.

In this twelve-minute walkthrough, Frontline editor Steve Audette, ACE explores the new Script Sync and Phrase Find that is now back inside Avid in version 8.8 of Media Composer. Steve demonstrates how to customize the Script settings to your own preferences and workflow needs and how to use Phrase Find and a few other tricks. Steve also highlights a few bugs in the beta that he is working with.

What’s the difference between Phrase Find and Scriptsync?

Media Composer | PhraseFind option is a powerful phonetic indexing and search engine that automatically indexes your audio media so that you can search for material based on spoken words.

Media Composer | Scriptsync option automatically phonetically indexes your media and links your clips and takes to your imported script text.

How much does it cost? Adding Phrase Find 2.0 to your Media Composer setup costs $199adding Scriptsync is an additional $499. If you bundle them together, you can save $100.


Castingwords.com — Human Transcription

The Guide to Transcription Services and Software for Video Editors — CastingWords

I’d not heard of CastingWords.com before researching this post, but they are a pretty nifty looking transcription service with new technological twists.

First of all, you can set up an account and share a Dropbox folder, and then CastingWords will automatically transcribe any file you drop into that folder. This seems like a big time saver if you frequently need transcription services. Additionally, since their pricing is per minute, you’ll also have a good sense of what each job might cost, depending on the turnaround time.

Compared to SpeedScriber’s 50 cents a minute for real-time or faster transcription, $2.50 a minute for 24 hours seems a bit steep. But you will need to factor in the time it will take you to edit the SpeedScriber transcript too. Depending on the quality of the audio that could be quick or it could take a long time.

An hour-long interview through CastingWords with a one-day turnaround will cost about $150, while something like SpeedScriber will cost $30.

Another technological twist is that CastingWords uses crowdsourcing to do the work. Hopefully, their approval processes and quality control are rigorous enough to ensure good results every time.

What does it mean that you crowdsource transcription?

CastingWords uses crowdsourcing to produce transcripts. This means that work (transcription, grading, editing) is distributed to our workers over the Internet. Source material and intermediate work are posted temporarily on the web so workers can access it.

Rummaging a little further into the FAQ, you’ll find a few more things worth noting:

What is CastingWords’ standard transcription style?

Our standard transcription product is non-verbatim, i.e., we clean up the language, omitting “um,” “er,” “uh,” etc.; filler words and phrases such as “I mean,” “you know,” “like,” etc. (legitimate uses of such words are left in), false starts and redundancies (unless spoken with that intent). Quotations are an exception, they are transcribed verbatim.

We also may leave out conjunctions that are used to begin a sentence, e.g., “And,” “So,” etc., as they usually tend to be a distraction in written language. We never summarize or paraphrase. Our goal is a readable, well-written transcript.

What is a verbatim transcription?Our verbatim transcription product retains every utterance, including redundancies; false starts; filler words like, “um,” “uh,” “er,” etc., and “I mean,” “you know”; all slang, e.g., “gonna,” “kinda,” “sorta,” “cuz,” (or “coz”), etc.

You may request a certain level of verbatim, by leaving special instruction in the notes box, e.g., “Frank stutters a bit – please remove stutters – but keep all the filler words, and make sure you catch Frank’s use of ‘gonna,’ instead of ‘going to’ and his frequent use of ‘like,’ as we want to retain the speakers’ character/jargon.”

Selecting the right service could be a huge benefit, depending on what you’re doing with the final transcript and the level of accuracy you need. The fact that you can request special notes from a live human transcriber is obviously a huge boon, compared to the computer-based services.


If you’re a regular user of CastingWords.com hit the comments and let us know about your experience with the service.