Technical notes: mixing speaker and slides recording with FFmpeg

Usual disclaimer: "technical notes" posts are probably of zero interest to the blog followers and are just meant for Google. If they annoy, tell me and I'll get a wiki or something.

In a past life I wrote FFmpeg filters, which has the interesting side effect of making you think of the FFmpeg filtergraph as sane. Colleagues who detect that won't fail to take advantage, so I ended up tasked with crafting an unholy command line to mix the CloudFlare London tech talk videos.

The inputs are pretty common:

  • a camera video of the speaker waving their hands
  • a Keynote recording of the slides
  • a nice background

The desired output is both streams scaled and placed on top of the background at the opposite corners, with audio from one of them, DEFCON style.

Color background

Here's a first iteration with a black background:

ffmpeg -i slides.mp4 -i speaker.mp4 -filter_complex "  
color=size=hd1080:c=black [background];  
[0:v] setpts=PTS-STARTPTS, scale=w=960:h=-1 [slides];
[1:v] setpts=PTS-STARTPTS, scale=w=1240:h=-1 [speaker];
[background][speaker] overlay=shortest=1:x=main_w-overlay_w:y=main_h-overlay_h [background+speaker];
[background+speaker][slides] overlay=shortest=1 [mix]
" -map "[mix]" -map "1:a" video.mp4

Let's break it down a bit. There are three parts to the command: inputs, graph and outputs.

-i slides.mp4 -i speaker.mp4 are the inputs. Nice and easy. The ordering is important, as we will refer to slides.mp4 as [0] and speaker.mp4 as [1].

The -filter_complex argument is the graph. It is composed of sources and filters. Each line, separated by ;, takes zero or more input streams, one or more source/filter, and defines one (or more) output streams.

In this graph we first generate a [background] stream of the right color and size to work on with a color source. Then we take the video streams [0:v] and [1:v], sync them and scale them to the final size we want them while keeping the proportions (h=-1), generating the [slides] and [speaker] streams.

A note about that setpts filter: streams can have timestamps that say for example that the first frame of the video is meant to show at second 5. This is often the case when you previously cut the video, and since overlay respects that, one stream would start after the other. We fix that by passing the stream through a filter that sets each timestamp to "timestamp minus timestamp of the first frame" (PTS-STARTPTS).

Finally we use the overlay filter twice. overlay slaps the second input stream on top of the first at the specified position. The first overlay places the video at the bottom right corner using parameters (x=main_w-overlay_w:y=main_h-overlay_h) and the second at the top left. shortest=1 makes the output terminate as soon as any input terminates. We call the final video result [mix].

Last part, the outputs. Again, argument order is all: video.mp4 will contain the streams specified by the -map that precede it, so [mix] for video and 1:a for audio.

Picture background

Here's how you can use a picture instead of a black background:

ffmpeg -i slides.mp4 -i speaker.mp4 -i background.jpg -filter_complex "  
[2:v] scale=s=hd1080, loop=loop=-1:size=1 [background];
[0:v] setpts=PTS-STARTPTS, scale=w=960:h=-1 [slides];
[1:v] setpts=PTS-STARTPTS, scale=w=1240:h=-1 [speaker];
[background][speaker] overlay=shortest=1:x=main_w-overlay_w:y=main_h-overlay_h [background+speaker];
[background+speaker][slides] overlay=shortest=1 [mix]
" -map "[mix]" -map "1:a" video.mp4

Note that you need to compile FFmpeg from master (brew install --HEAD ffmpeg) for that to work, or you will see a No such filter: 'loop' error. Apparently that's what "general users" are supposed to do anyway:

23:58:35 #ffmpeg <llogan> FiloSottile: your ffmpeg is too old. the filter is newer than the 3.0 branch.  
23:59:08 #ffmpeg <llogan> general users are recommended to use a build from git master instead of releases which are mainly for distributors  

If you are stuck with a release without loop filter, here's a workaround: use the loop demuxing option: -loop 1 -i background.jpg and remove the loop filter like [2:v] scale=s=hd1080 [background];. However, be advised that it'll be about twice slower as this way it will scale the background again for each and every frame.

Customization

The filtergraph doesn't need any change to adapt to different aspect ratios, but here are some things you might want to adapt.

Output size

The main stream is the background, so just change the size of that by editing size=hd1080 or scale=s=hd1080 in the [background] stream definition. You can use a format like 1920x1080 instead of hd1080.

The two overlaid videos will stick to their corners.

Positioning

You can change the two videos positioning by messing with the overlay filter parameters. x and y refer to the position of the top-left corner of the overlaid video relative to the top-left of the entire canvas. You can use a bunch of parameters, here are the docs, good luck.

To change the videos size instead change the scale parameters of the [slides] or [speaker] stream definitions. If you used overlay parameteres properly that should not mess with the alignment.

Output options

You can use all your usual output arguments by sticking them just before the target filename, like -c:a libfdk_aac video.mp4 to set the encoder.

Audio source

If you want the audio from the first video instead of the second, just change the second -map into -map "0:a".

You can also easily add an mp3 as an additional -i after all the others and use -map "3:a" to take the audio from there.

Input options

I'm not sure what input options you ffmpeg [slides.mp4 options] -i slides.mp4 [speaker.mp4 options] -i speaker.mp4

Syncing

You can adjust the start time of one stream or the other by messing with the setpts argument. For example to cut the first 20 seconds do setpts=PTS-STARTPTS-20/TB.

Crazy cool effects

If you want to do cooler stuff, like cutting a stream or applying some slick perspective, you're on your own, but it's probably just a matter of picking a filter and chaining it to one of the [slides] or [speaker] stream definitions.

Have fun, and maybe follow me on Twitter for completely unrelated material. (I swear.)