Caption and transcript standards


  • Introduction
  • Transcripts
  • Speech
  • Voice-overs
  • Written content
  • Music and songs
  • Sound effects
  • Social media
  • Example videos


This page offers guidelines for captions and subtitles in University of St Andrews video content. Captions and transcripts must meet the accessibility guidelines.

Who benefits from captions?

Captions provide content to people who are deaf and others who cannot hear the audio. They are also used by people who process written information better than audio. It can even be personal preference.

What are captions?

Captions are a text version of the speech and non-speech audio information needed to understand the content. They are displayed within the media player and are synchronized with the audio.

There are two types of captions:

  • Closed caption can be turned on or off.
  • Open captions are always displayed and cannot be turned off.

For best practice, use closed captions are the University's preference as they can be easily switched for alternative languages.

  • They must be exact transcription of the dialogue.
  • They should be as well-timed timed as possible.
  • Additional labels that convey important audio context and sound effects, such as an alarm going off or music, must be included.
  • When using captions, try to support as many major languages as possible accurately. English (UK) captions are essential.


Transcripts are the text-only version of the speech and non-speech audio from a video. Captions are usually created from transcripts. All video and audio content on the University website must also have a written transcript provided alongside it as an alternative format.

Transcripts are used by people who are deaf, are hard of hearing, have difficulty processing auditory information, and others.

Transcripts work best with videos that are less than an hour long with good sound quality and clear speech. The transcript file should be in the same language as the dialogue in the video.

The rest of this web page will detail specific formatting, presentation, and context of how captions and transcripts should be written.


  • Identify the name of the person speaking on screen by prefixing their name with '>>' (two greater-than sign characters).
  • The speaker's name must be in all uppercase.
  • To force the start of a new caption, use a line break.
  • Each sentence should be a new caption.

>> JENNY: Hi, my name is Jenny, and this is John.

>> JOHN: We are the owners of JJ's Cupcakes.

>> JENNY: Today we'll be teaching you how to make
our famous chocolate chip cookies!

They really are world famous you know!

If the name of a speaker is unknown, prefix the new speaker with a white dash (not a hyphen).

>> JOHN: What do you think of our cupcakes?

– I think they are wonderful. Best cupcakes I've ever tasted!


When a speaker or narrator is not visually shown in the video, use single quote marks.

Put a single quote-mark at the beginning of each new voiceover subtitle, but do not close the single quotes at the end of each subtitle or segment - only close them when the person has finished speaking, as is the case with paragraphs in a book.

'If you're thinking of applying to St Andrews, you definitely should.

'It's so rewarding. There are so many things to do here.

'Although it's a small town, the opportunities are completely endless.'

Mechanical speech

Double quotes can suggest mechanically reproduced speech, e.g. radio, loudspeakers etc.

"The Bell Pettigrew Museum is now closed."

Written content

All written information in a video must appear in the transcript. This includes titles and communications. This should be written as it appears in the video.

If it is essential to the context of the video it should be included in the captions.

Music and songs

Describe incidental music

If the music is incidental music (not part of the action) and well known or identifiable, the label begins 'MUSIC:' followed by the name or title of the music (music titles should be fully researched).

'MUSIC' is in caps (to indicate a label), but the words following it are in sentence case, as these labels are often fairly long, and a large amount of text in upper case is hard to read.

MUSIC: "Hey There Delilah" by Plain White T's

MUSIC: "Flower of Scotland"

MUSIC: The Swedish National Anthem

If the title of the music is unknown wrap the word 'MUSIC' in square brackets.


Label mood music only when required

If the music is "incidental music" but is an unknown piece, written purely to add atmosphere or dramatic effect, do not label it.

If the music is not part of the action but is crucial for the viewer’s understanding of the plot, a sound-effect label should be used wrapped in square brackets:


Indicate song lyrics with #

Song lyrics are almost always subtitled - whether they are part of the action or not. Every song subtitle starts with a hash mark (#) and the final song subtitle has a hash mark at the start and the end:

# Is this the real life? Is this just fantasy? #

There are two exceptions:

  • In cases where you consider the visual information on the screen to be more important than the song lyrics, leave the screen free of subtitles.
  • Where snippets of a song are interspersed with any kind of speech, and it would be confusing to subtitle both the lyrics and the speech, it is better to put up a music label and to leave the lyrics unsubtitled.
  • If there is speech over a song, then prioritise the speech over the music lyrics.

Sound effects

Sound effect labels describe sounds, not actions. Sound effects should be typed in full caps and wrapped in square brackets.




Subtitles on social media

Here are guides on how to add subtitles and closed captions into social media posts.

Example video

Here is an example video that is using caption standards correctly. It also includes a transcript below the video.

An introduction to the support provided by Student Services.