When clipping a video, mind the GOP and hope the I-frame is an IDR-frame

Nicolás Dato

A picture is worth a thousand words, but when it is a B-frame it may be worth two hundred words.

One of our products ,ViDeus Auditor, lets you clip and join videos, showing a preview before doing the actual clipping. For doing it, we have to understand how an encoded video is composed. We usually work with H264.

When encoding each video picture you can get a I-frame, a P-frame or a B-frame.

The I-frame is the easy one, all the information for decoding the picture is within the I-frame.
The P-frame is a frame which needs previous decoded pictures for being decoded. So it uses information from old pictures.
And the B-frame needs decoded pictures from the past and from the future. So it uses information from old and future pictures.
For example, you can get something like this:

I B B P B B P B B P

The I-frame can be decoded instantaneously, then the second frame (B-frame) needs information from previous frames (the I-frame for example) and from future frames (like the P-frame).

As the B-frames may need information from the following frames, the stream is rearranged for decoding, in a way such as when a B-frame is being decoded, everything needed is there. So usually the frames are transmitted like this:

I P B B P B B P B B

This results in having a decoding time-stamp (DTS) less than the presentation time-stamp (PTS) in the rearranged frames.

That series of frames can be a GOP, a Group of Picture, a video is composed by a series of GOPs, each GOP starting with an I-frame, this would be three GOPs:

I B B P B B P B B P I B B P B B P B B P I B B P B B P B B P

As the I-frame doesn’t need any other information for decoding, that’s a good point for fast-clipping a video because all the information for decoding is within it; clipping a video in the middle of a GOP (when it’s not an I-frame) will most likely result in a corrupt output for a while until a new full GOP is decoded.

IDR-frame

But, clipping a video at a GOP start will not always result in a clean output.

The I-frame at the beginning will certainly be decoded fine, it doesn’t need anything special. However, the following B-frames and P-frames will probably need previous frames for being decoded correctly. Sometimes those needed frames are within the GOP which it is usefull, but sometimes they are outside the GOP which is bad for clipping, because it means they reference pictures which are before the I-frame where we cut the video, resulting in a corrupt output.

When frames from a GOP reference frames from another GOP it’s called Open GOP. If not, it is called a Closed GOP.

Hopefully, the video was encoded with IDR-frames. Those are a special case of I-frames. Apart from being an I-frame the IDR-frame ensures the following frames will not reference any frame before the IDR.
In a GOP the IDR-frame replace the I-frame, all IDR-frames are I-frames but not all I-frames are IDR-frames.

So, if an IDR is found that’s a good place for clipping, because that frame will be decoded without any other information and all the following frames will not require information from before the IDR-frame.

Next time you want to clip a video, mind the GOP and find an IDR-frame.