Hardware-Accelerated Video Transcoding¶

What exactly is Transcoding?

Transcoding is the technique of transforming one media encoding format into another.

This is typically done for compatibility purposes, such as when a media source provides a format that the intended target is not able to process; an in-between adaptation step is required:

Decode media from its originally encoded state into raw, uncompressed information.
Encode the raw data back, using a different codec that is supported by end user.

DeFFcode's FFdecoder API in conjunction with VidGear's WriteGear API is able to exploit almost any FFmpeg parameter for achieving anything imaginable with multimedia video data all while allowing us to process real-time video frames with immense flexibility. Both these APIs are capable of utilizing the potential of GPU backed fully-accelerated Hardware based video Decoding(FFdecoder API with hardware decoder) and Encoding (WriteGear API with hardware encoder), thus dramatically improving the transcoding performance. At same time, FFdecoder API Hardware-decoded frames are fully compatible with OpenCV's VideoWriter API for producing high-quality output video in real-time.

Limitation: Bottleneck in Hardware-Accelerated Video Transcoding performance with Real-time Frame processing

As we know, using the –hwaccel cuda -hwaccel_output_format cuda flags in FFmpeg pipeline will keep video frames in GPU memory, and this ensures that the memory transfers (system memory to video memory and vice versa) are eliminated, and that transcoding is performed with the highest possible performance on the available GPU hardware.

HW Acceleration — General Memory Flow with Hardware Acceleration

But unfortunately, for processing real-time frames in our python script with FFdecoder and WriteGear APIs, we're bound to sacrifice this performance gain by explicitly copying raw decoded frames between System and GPU memory (via the PCIe bus), thereby creating self-made latency in transfer time and increasing PCIe bandwidth occupancy due to overheads in communication over the bus. Moreover, given PCIe bandwidth limits, copying uncompressed image data would quickly saturate the PCIe bus.

HW Acceleration Limitation — Memory Flow with Hardware Acceleration
and Real-time Processing

On the bright side, however, GPU enabled Hardware based encoding/decoding is inherently faster and more efficient (do not use much CPU resources when frames in GPU) thus freeing up the CPU for other tasks, as compared to Software based encoding/decoding that is known to be completely CPU intensive. Plus scaling, de-interlacing, filtering, etc. tasks will be way faster and efficient than usual using these Hardware based decoders/encoders as oppose to Software ones.

As you can see the pros definitely outweigh the cons and you're getting to process video frames in the real-time with immense speed and flexibility, which is impossible to do otherwise.

We'll discuss its Hardware-Accelerated Video Transcoding capabilities using these APIs briefly in the following recipes:

DeFFcode APIs requires FFmpeg executable

DeFFcode APIs MUST requires valid FFmpeg executable for all of its core functionality, and any failure in detection will raise RuntimeError immediately. Follow dedicated FFmpeg Installation doc ➶ for its installation.

Additional Python Dependencies for following recipes

Following recipes requires additional python dependencies which can be installed easily as below:

OpenCV: OpenCV is required for previewing video frames. You can easily install it directly via pip:

OpenCV installation from source

You can also follow online tutorials for building & installing OpenCV on Windows, Linux, MacOS and Raspberry Pi machines manually from its source.

Make sure not to install both pip and source version together. Otherwise installation will fail to work!

Other OpenCV binaries

OpenCV maintainers also provide additional binaries via pip that contains both main modules and contrib/extra modules opencv-contrib-python, and for server (headless) environments like opencv-python-headless and opencv-contrib-python-headless. You can also install any one of them in similar manner. More information can be found here.
```
pip install opencv-python       
```
VidGear: VidGear is required for lossless encoding of video frames into file/stream. You can easily install it directly via pip:
```
pip install vidgear[core]       
```

Always use FFdecoder API's terminate() method at the end to avoid undesired behavior.

CUDA-accelerated Video Transcoding with OpenCV's VideoWriter API¶

Example Assumptions

Please note that following recipe explicitly assumes:

You're running Linux operating system with a supported NVIDIA GPU.

You're using FFmpeg 4.4 or newer, configured with at least --enable-nonfree --enable-cuda-nvcc --enable-libnpp --enable-cuvid --enable-nvenc configuration flags during compilation. For compilation follow these instructions ➶

Verifying NVDEC/CUDA support in FFmpeg

To use CUDA Video-decoder(cuda), remember to check if your FFmpeg compiled with it by executing following commands in your terminal, and observing if output contains something similar as follows:

$ ffmpeg  -hide_banner -pix_fmts | grep cuda
..H.. cuda                   0              0      0

$ ffmpeg  -hide_banner -filters | egrep "cuda|npp"
... bilateral_cuda    V->V       GPU accelerated bilateral filter
... chromakey_cuda    V->V       GPU accelerated chromakey filter
... colorspace_cuda   V->V       CUDA accelerated video color converter
... hwupload_cuda     V->V       Upload a system memory frame to a CUDA device.
... overlay_cuda      VV->V      Overlay one video on top of another using CUDA
... scale_cuda        V->V       GPU accelerated video resizer
... scale_npp         V->V       NVIDIA Performance Primitives video scaling and format conversion
... scale2ref_npp     VV->VV     NVIDIA Performance Primitives video scaling and format conversion to the given reference.
... sharpen_npp       V->V       NVIDIA Performance Primitives video sharpening filter.
... thumbnail_cuda    V->V       Select the most representative frame in a given sequence of consecutive frames.
... transpose_npp     V->V       NVIDIA Performance Primitives video transpose
T.. yadif_cuda        V->V       Deinterlace CUDA frames

Verifying H.264 NVENC encoder support in FFmpeg

To use NVENC Video-encoder(cuda), remember to check if your FFmpeg compiled with H.264 NVENC encoder support. You can easily do this by executing following one-liner command in your terminal, and observing if output contains something similar as follows:

$ ffmpeg  -hide_banner -encoders | grep nvenc 

V....D av1_nvenc            NVIDIA NVENC av1 encoder (codec av1)
V....D h264_nvenc           NVIDIA NVENC H.264 encoder (codec h264)
V....D hevc_nvenc           NVIDIA NVENC hevc encoder (codec hevc)

You can also use other NVENC encoder in the similar way, if supported.

You already have appropriate Nvidia video drivers and related softwares installed on your machine.
If the stream is not decodable in hardware (for example, it is an unsupported codec or profile) then it will still be decoded in software automatically, but hardware filters won't be applicable.

These assumptions MAY/MAY NOT suit your current setup. Kindly use suitable parameters based your system platform and hardware settings only.

In this example, we will be:

Using Nvidia's CUDA Internal hwaccel Video decoder(cuda) in FFdecoder API to automatically detect best NV-accelerated video codec and keeping video frames in GPU memory (for applying hardware filters) for achieving GPU-accelerated decoding of a given video file (say foo.mp4).
Scaling and Cropping decoded frames in GPU memory.
Downloading decoded frames into system memory as patched NV12 frames.
Converting NV12 frames into BGR pixel-format using OpenCV's cvtcolor method.
Encoding BGR frames with OpenCV's VideoWriter API.

You can use FFdecoder's metadata property object that dumps source Video's metadata information (as JSON string) to retrieve source framerate.

With FFdecoder API, frames extracted with YUV pixel formats (yuv420p, yuv444p, nv12, nv21 etc.) are generally incompatible with OpenCV APIs such as imshow(). But you can make them easily compatible by using exclusive -enforce_cv_patch boolean attribute of its ffparam dictionary parameter.

More information on Nvidia's NVENC Encoder can be found here ➶

# import the necessary packages
from deffcode import FFdecoder
import json
import cv2

# define suitable FFmpeg parameter
ffparams = {
    "-vcodec": None,  # skip source decoder and let FFmpeg chose
    "-enforce_cv_patch": True # enable OpenCV patch for YUV(NV12) frames
    "-ffprefixes": [
        "-vsync",
        "0",  # prevent duplicate frames
        "-hwaccel",
        "cuda",  # accelerator
        "-hwaccel_output_format",
        "cuda",  # output accelerator
    ],
    "-custom_resolution": "null",  # discard source `-custom_resolution`
    "-framerate": "null",  # discard source `-framerate`
    "-vf": "scale_cuda=640:360," # scale to 640x360 in GPU memory
    + "crop=80:60:200:100," # crop a 80×60 section from position (200, 100) in GPU memory
    + "hwdownload,"  # download hardware frames to system memory
    + "format=nv12",  # convert downloaded frames to NV12 pixel format
}

# initialize and formulate the decoder with `foo.mp4` source
decoder = FFdecoder(
    "foo.mp4",
    frame_format="null",  # discard source frame pixel format
    verbose = False, # to avoid too much clutter
    **ffparams # apply various params and custom filters
).formulate()

# retrieve JSON Metadata and convert it to dict
metadata_dict = json.loads(decoder.metadata)

# prepare OpenCV parameters
FOURCC = cv2.VideoWriter_fourcc("M", "J", "P", "G")
FRAMERATE = metadata_dict["output_framerate"]
FRAMESIZE = tuple(metadata_dict["output_frames_resolution"])

# Define writer with parameters and suitable output filename for e.g. `output_foo_gray.avi`
writer = cv2.VideoWriter("output_foo.avi", FOURCC, FRAMERATE, FRAMESIZE)

# grab the NV12 frames from the decoder
for frame in decoder.generateFrame():

    # check if frame is None
    if frame is None:
        break

    # convert it to `BGR` pixel format,
    # since write() method only accepts `BGR` frames
    frame = cv2.cvtColor(frame, cv2.COLOR_YUV2BGR_NV12)

    # {do something with the BGR frame here}

    # writing BGR frame to writer
    writer.write(frame)

# close output window
cv2.destroyAllWindows()

# terminate the decoder
decoder.terminate()

# safely close writer
writer.close()

CUDA-NVENC-accelerated Video Transcoding with WriteGear API¶

WriteGear's Compression Mode support for FFdecoder API is currently in beta so you can expect much higher than usual CPU utilization!

Lossless transcoding with FFdecoder and WriteGear API

VidGear's WriteGear API implements a complete, flexible, and robust wrapper around FFmpeg in compression mode for encoding real-time video frames to a lossless compressed multimedia output file(s)/stream(s).

DeFFcode's FFdecoder API in conjunction with WriteGear API creates a high-level High-performance Lossless FFmpeg Transcoding (Decoding + Encoding) Pipeline that is able to exploit almost any FFmpeg parameter for achieving anything imaginable with multimedia video data all while allow us to manipulate the real-time video frames with immense flexibility.

Example Assumptions

Please note that following recipe explicitly assumes:

You're running Linux operating system with a supported NVIDIA GPU.

You're using FFmpeg 4.4 or newer, configured with at least --enable-nonfree --enable-cuda-nvcc --enable-libnpp --enable-cuvid --enable-nvenc configuration flags during compilation. For compilation follow these instructions ➶

Verifying NVDEC/CUDA support in FFmpeg

To use CUDA Video-decoder(cuda), remember to check if your FFmpeg compiled with it by executing following commands in your terminal, and observing if output contains something similar as follows:

$ ffmpeg  -hide_banner -pix_fmts | grep cuda
..H.. cuda                   0              0      0

$ ffmpeg  -hide_banner -filters | egrep "cuda|npp"
... bilateral_cuda    V->V       GPU accelerated bilateral filter
... chromakey_cuda    V->V       GPU accelerated chromakey filter
... colorspace_cuda   V->V       CUDA accelerated video color converter
... hwupload_cuda     V->V       Upload a system memory frame to a CUDA device.
... overlay_cuda      VV->V      Overlay one video on top of another using CUDA
... scale_cuda        V->V       GPU accelerated video resizer
... scale_npp         V->V       NVIDIA Performance Primitives video scaling and format conversion
... scale2ref_npp     VV->VV     NVIDIA Performance Primitives video scaling and format conversion to the given reference.
... sharpen_npp       V->V       NVIDIA Performance Primitives video sharpening filter.
... thumbnail_cuda    V->V       Select the most representative frame in a given sequence of consecutive frames.
... transpose_npp     V->V       NVIDIA Performance Primitives video transpose
T.. yadif_cuda        V->V       Deinterlace CUDA frames

Verifying H.264 NVENC encoder support in FFmpeg

To use NVENC Video-encoder(cuda), remember to check if your FFmpeg compiled with H.264 NVENC encoder support. You can easily do this by executing following one-liner command in your terminal, and observing if output contains something similar as follows:

$ ffmpeg  -hide_banner -encoders | grep nvenc 

V....D av1_nvenc            NVIDIA NVENC av1 encoder (codec av1)
V....D h264_nvenc           NVIDIA NVENC H.264 encoder (codec h264)
V....D hevc_nvenc           NVIDIA NVENC hevc encoder (codec hevc)

You can also use other NVENC encoder in the similar way, if supported.

You already have appropriate Nvidia video drivers and related softwares installed on your machine.
If the stream is not decodable in hardware (for example, it is an unsupported codec or profile) then it will still be decoded in software automatically, but hardware filters won't be applicable.

These assumptions MAY/MAY NOT suit your current setup. Kindly use suitable parameters based your system platform and hardware settings only.

Additional Parameters in WriteGear API

WriteGear API only requires a valid Output filename (e.g. output_foo.mp4) as input, but you can easily control any output specifications (such as bitrate, codec, framerate, resolution, subtitles, etc.) supported by FFmpeg (in use).

You can use FFdecoder's metadata property object that dumps source Video's metadata information (as JSON string) to retrieve source framerate.

Consuming BGR framesConsuming NV12 frames

In this example, we will be:

Using Nvidia's CUDA Internal hwaccel Video decoder(cuda) in FFdecoder API to automatically detect best NV-accelerated video codec and keeping video frames in GPU memory (for applying hardware filters) for achieving GPU-accelerated decoding of a given video file (say foo.mp4).
Scaling and Cropping decoded frames in GPU memory.
Downloading decoded frames into system memory as patched NV12 frames.
Converting patched NV12 frames into BGR pixel-format using OpenCV's cvtcolor method.
Encoding BGR frames with WriteGear API using Nvidia's Hardware accelerated H.264 NVENC Video-encoder(h264_nvenc) into lossless video file in the GPU memory.

# import the necessary packages
from deffcode import FFdecoder
from vidgear.gears import WriteGear
import json
import cv2

# define suitable FFmpeg parameter
ffparams = {
    "-vcodec": None,  # skip source decoder and let FFmpeg chose
    "-enforce_cv_patch": True # enable OpenCV patch for YUV(NV12) frames
    "-ffprefixes": [
        "-vsync",
        "0",  # prevent duplicate frames
        "-hwaccel",
        "cuda",  # accelerator
        "-hwaccel_output_format",
        "cuda",  # output accelerator
    ],
    "-custom_resolution": "null",  # discard source `-custom_resolution`
    "-framerate": "null",  # discard source `-framerate`
    "-vf": "scale_cuda=640:360,"  # scale to 640x360 in GPU memory
    + "crop=80:60:200:100," # crop a 80×60 section from position (200, 100) in GPU memory
    + "hwdownload,"  # download hardware frames to system memory
    + "format=nv12",  # convert downloaded frames to NV12 pixel format
}

# initialize and formulate the decoder with `foo.mp4` source
decoder = FFdecoder(
    "foo.mp4",
    frame_format="null",  # discard source frame pixel format
    verbose = False, # to avoid too much clutter
    **ffparams # apply various params and custom filters
).formulate()

# retrieve framerate from JSON Metadata and pass it as
# `-input_framerate` parameter for controlled framerate
output_params = {
    "-input_framerate": json.loads(decoder.metadata)["output_framerate"],
    "-vcodec": "h264_nvenc", # H.264 NVENC Video-encoder

}

# Define writer with default parameters and suitable
# output filename for e.g. `output_foo.mp4`
writer = WriteGear(output="output_foo.mp4", logging=True, **output_params)

# grab the NV12 frames from the decoder
for frame in decoder.generateFrame():

    # check if frame is None
    if frame is None:
        break

    # convert it to `BGR` pixel format
    frame = cv2.cvtColor(frame, cv2.COLOR_YUV2BGR_NV12)

    # {do something with the BGR frame here}

    # writing BGR frame to writer
    writer.write(frame)

# close output window
cv2.destroyAllWindows()

# terminate the decoder
decoder.terminate()

# safely close writer
writer.close()

In this example, we will be:

Using Nvidia's CUDA Internal hwaccel Video decoder(cuda) in FFdecoder API to automatically detect best NV-accelerated video codec and keeping video frames in GPU memory (for applying hardware filters) for achieving GPU-accelerated decoding of a given video file (say foo.mp4).
Scaling and Cropping decoded frames in GPU memory.
Downloading decoded frames into system memory as NV12 frames.
Encoding NV12 frames directly with WriteGear API using Nvidia's Hardware accelerated H.264 NVENC Video-encoder(h264_nvenc) into lossless video file in the GPU memory.

# import the necessary packages
from deffcode import FFdecoder
from vidgear.gears import WriteGear
import json
import cv2

# define suitable FFmpeg parameter
ffparams = {
    "-vcodec": None,  # skip source decoder and let FFmpeg chose
    "-ffprefixes": [
        "-vsync",
        "0",  # prevent duplicate frames
        "-hwaccel",
        "cuda",  # accelerator
        "-hwaccel_output_format",
        "cuda",  # output accelerator
    ],
    "-custom_resolution": "null",  # discard source `-custom_resolution`
    "-framerate": "null",  # discard source `-framerate`
    "-vf": "scale_cuda=640:360,"  # scale to 640x360 in GPU memory
    + "crop=80:60:200:100,"  # crop a 80×60 section from position (200, 100) in GPU memory
    + "hwdownload,"  # download hardware frames to system memory
    + "format=nv12",  # convert downloaded frames to NV12 pixel format
}

# initialize and formulate the decoder with `foo.mp4` source
decoder = FFdecoder(
    "foo.mp4",
    frame_format="null",  # discard source frame pixel format
    verbose = False, # to avoid too much clutter
    **ffparams # apply various params and custom filters
).formulate()

# retrieve framerate from JSON Metadata and pass it as
# `-input_framerate` parameter for controlled framerate
output_params = {
    "-input_framerate": json.loads(decoder.metadata)["output_framerate"],
    "-vcodec": "h264_nvenc", # H.264 NVENC Video-encoder
    "-input_pixfmt": "nv12", # input frames pixel format as `NV12`
}

# Define writer with default parameters and suitable
# output filename for e.g. `output_foo.mp4`
writer = WriteGear(output="output_foo.mp4", logging=True, **output_params)

# grab the NV12 frames from the decoder
for frame in decoder.generateFrame():

    # check if frame is None
    if frame is None:
        break

    # {do something with the NV12 frame here}

    # writing NV12 frame to writer
    writer.write(frame)

# close output window
cv2.destroyAllWindows()

# terminate the decoder
decoder.terminate()

# safely close writer
writer.close()

CUDA-NVENC-accelerated End-to-end Lossless Video Transcoding with WriteGear API¶

DeFFcode's FFdecoder API in conjunction with VidGear's WriteGear API creates a High-performance Lossless FFmpeg Transcoding Pipeline