Communications

Smarter Video OCR

September 22, 2017

The search for smarter video OCR started many years ago. But it all really started with text.

Search engines today have made a science of indexing text. Modern spiders find and record every last written word — and return results so efficiently that some efficiency experts are recommending people give up their email filing systems and web browser favorites bars and simply rely on search to turn up what they need.

But for most organizations, that depth of search capability is reserved for text alone. Video, in particular, remains a black box, limited to manually entered metadata like titles and tags.

Video OCR is a problem that needs to be solved

According to a study by McKinsey and IDC, the average knowledge worker now spends nearly 20% of their time — nearly one whole day, every week — just searching for the information they need to do their job effectively. As businesses share more and more using video, that wasted time will only worsen without a video search solution in place.

That’s why today, more and more video platforms are expanding their video search capabilities. Yet as the field of solutions expands, it’s becoming more difficult for organizations to navigate. Why? Because not all video search engines are created equal.

Forrester Research recently commended Panopto as having “the best support for video search.” It’s easy to see why — no one goes deeper or broader than Panopto when it comes to video search, as shown on the following chart.

If a video is worth recording and storing, it’s worth finding. You want video search capabilities that can rise to that task. Modern video platforms are now finding creative ways to index the content inside videos, finding new ways to capture metadata, audio inputs, and visual content.

How much could you save by
making your videos easier to search?

Calculate Your Savings

So what capabilities should a video search engine have?

Fundamentally, if a video search tool is going to index your videos, it should be able to find and return all the words spoken and shown on-screen.

While there are a number of technical strategies to get at this information, they tend to fall into two groups — automated or manual.

Automated video indexing through ASR and OCR

Automated video indexing relies on one or more intelligent video technologies to capture and discern what’s happening in your video. These automated tools can often be applied to a video the very instant recording is completed, expediting the process of indexing the content.

Common automated video indexing systems include automatic speech recognition (ASR), optical character recognition (OCR), and slide content ingestion. These three systems do very different things, so let’s look into each a little more closely.

Automatic Speech Recognition (ASR) is a technology used to identify each word that is spoken in a recording. Once identified, the words are time stamped and added to a search index. Users can then search for spoken words, find the precise moment in the video when they were mentioned, and fast-forward to that point in the video. Since many viewers will be searching for a moment based on an idea or phrase they remember, ASR is an incredibly helpful part of your video search engine.
Optical Character Recognition (OCR) is a technology used to recognize text shown on-screen within videos. Often in today’s modern presentations, a speaker will switch from between slides, live on-screen content, and even other videos. Without OCR, any text shown as part of those presentations cannot be indexed because search engines like Google cannot recognize text that’s saved as an image. OCR technology, however, is designed to identify and decipher those words, allowing your viewers to search for literally any word that appears on-screen anywhere in a video.
Slide Content Ingestion refers to the technology that imports and indexes your actual PowerPoint or Keynote presentation slides when used in your video. Content ingestion differs from OCR in that it programmatically extracts the actual text strings from your slides, rather than taking a picture of the slide and attempting to identify words. Slide ingestion also extracts additional information that isn’t shown on-screen, such as speaker’s notes, so that your team can always find precise moments in video based on any word contained on any slide.

Manual video indexing

Manual video indexing, on the other hand, relies on human intervention that takes place after a video is completed in order to help index video content.

The usefulness of manual indexing processes varies based on the amount of information they can add. Some processes are quite comprehensive, others, much more limited. Let’s take a look at the two most common manual inputs:

Manual Metadata refers the information added onto a video file such as title, author, and a description. Viewer notes and comments may also be added here. These are a fundamental part of video search, but for business videos — which often last 30-60 minutes or more and cover a range of topics — manual metadata almost never provides enough description to be useful by themselves.
Transcripts are a more comprehensive approach, done by simply appending an actual video transcript to your video files for indexing. Transcript production is an evolving field — while many services still produce these files manually, the process can be automated. However you develop it, the quality of your input is essential — complete transcripts will be more valuable than partial transcripts, and those transcripts that also include notes about the content shown on-screen will be more valuable than those that only recite the dialog.

Which is better for video search: automatic or manual indexing?

The choice really depends on your needs. Automatic indexing systems that rely on technology offer faster results and can often be applied to every video, but the accuracy isn’t 100% with ASR and OCR. Manual, human-based approaches such as transcription typically offer improved accuracy but take longer to produce and often come at an added cost.

Fortunately, you don’t have to choose with Panopto.

Panopto’s Smart Search video search technology is the industry’s most comprehensive inside-video search engine. With Panopto, you can search through your video library the same way you’d search across the internet, or through your email.

By any keyword spoken in your videos, with ASR
By any word that ever appears on-screen or anywhere else in your video, with OCR and Slide Content Ingestion,
By traditional and advanced metadata, including tags and titles, viewer notes and comments,
And optionally by complete manual transcriptions of your video content.

Try our video search engine for yourself!

Ready to see what your video search has been missing? Contact our team today to schedule a demo.