Realtime GPU processing software (Windows, Linux, ARM) for machine vision camera applications. Performance benchmarks and Glass-to-Glass time measurements.
Camera sample application with realtime GPU image processing performance (Windows, Linux, Jetson)
That software is based on the following image processing pipeline for camera applications that includes:
Processing is done on NVIDIA GPU to speedup the performance. The software could also work with raw bayer images in PGM format and you can utilize these images for testing or if you don’t have a camera, or if your camera is not supported. More info about that project you can find here.
From the benchmarks on NVIDIA Quadro RTX 6000 or GeForce RTX 4090 we can see that GPU-based raw image processing is very fast and it could offer high image quality at the same time. The total performance could reach 4 GPix/s for color cameras. The performance strongly depends on complexity of the pipeline. Multiple GPU solutions could significantly improve the performance.
Currently the software is working with XIMEA cameras via XIMEA SDK. FLIR cameras are supported via Spinnaker SDK. We can work with Imperx cameras via Imperx SDK. LUCID Vision Labs cameras are supported via Arena SDK.
Via GenICam the software can work with XIMEA, MATRIX VISION, Basler, FLIR, Imperx, JAI, LUCID Vision Labs, Daheng Imaging cameras.
The software is also working with Leopard Imaging mipi csi cameras on Jetson. You need to have a proper driver to be able to acquire raw frames from mipi camera for further image processing on the GPU with 16/32-bit precision. The software doesn’t use NVIDIA ISP via libargus.
Soon we are going to add support for Emergent Vision Technologies, IDS Imaging Development Systems, Baumer, Kaya Instruments cameras. You can add support for desired cameras by yourself. The software is working with demo version of Fastvideo SDK, that is why you can see a watermark on the screen. To get a Fastvideo SDK license for development and for deployment, please contact Fastvideo company.
sudo apt-get install qtbase5-dev qtbase5-dev-tools qtcreator git
Jetson users have to build FFmpeg libraries from sources. See this shell script for details.
sudo apt-get install libavutil-dev libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavresample-dev libx264-dev
sudo apt-get install libjpeg-dev zlib1g-dev
git clone https://github.com/fastvideo/gpu-camera-sample.git
You also can download precompiled libs.
Here and after we assume you put source code into home directory, so project root is ~/gpu-camera-sample
chmod 755 ~/gpu-camera-sample/Scripts/make_links.sh
cd ~/gpu-camera-sample/Scripts
./make_links.sh ~/gpu-camera-sample/OtherLibsLinux/FastvideoSDK/fastvideo_sdk/lib/Linux64
for x64 platform and
sudo /.Arena_SDK_Linux_x64.conf
for arm64 platform
sudo /.Arena_SDK_ARM64.conf
if everything is OK you will see something like that
./IpConfigUtility /list
run
Scanning for devices...
index MAC IP SUBNET GATEWAY IP CONFIG
0 1C0FAF5A908A 169.254.139.144 255.255.0.0 0.0.0.0 DHCP= 1 Persistent Ip= 0 LLA = 1
to test that camera is working.
./Cpp_Acquisition
NVIDIA Jetson provides many features related to power management, thermal management, and electrical management. These features deliver the best user experience possible given the constraints of a particular platform. The target user experience ensures the perception that the device provides:
Utility nvpmodel has to been used to change the power mode. Mode with power consumption is MAXN. To activate this mode call
sudo /usr/sbin/nvpmodel –m 0
Also you have to call jetson_clocks script to maximize Jetson performance by setting the static maximum frequencies of the CPU, GPU, and EMC clocks. You can also use the script to show current clock settings, store current clock settings into a file, and restore clock settings from a file.
sudo /usr/bin/jetson_clocks
NVIDIA Jetson TX2 has two CPU core types. These are Denver2 and A57. During benchmarking of Fastvideo SDK we have realized that better performance for J2K encoder and decoder could be achieved with A57 core type. Affinity mask has to be assigned to run only on A57 cores. Linux command taskset assign process affinity.
taskset -c 3,4,5 myprogram
TX2 has the following core numbers: 0 – A57; 1, 2 – Denver2; 3, 4, 5 – A57. Core 0 is used by Linux for interrupt processing. We do not recommend include it in the user affinity mask.
To check system latency we’ve implemented the software to run G2G tests in the gpu-camera-sample application.
We have the following choices for G2G tests:
We can also measure the latency for the case when we stream compressed data from one PC to another over network. Latency depends on camera frame rate, monitor fps, NVIDIA GPU performance, network bandwidth, complexity of image processing pipeline, etc.
gpu-camera-sample is a multithreaded application. It consists of the following threads:
Here we’ve implemented the simplest approach for camera application. Camera driver is writing raw data to memory ring buffer, then we copy data from that ring buffer to GPU for computations. Full image processing pipeline is done on GPU, so we need just to collect processed frames at the output.
In general case, Fastvideo SDK can import/export data from/to SSD / CPU memory / GPU memory. This is done to ensure compatibility with third-party libraries on CPU and GPU. You can get more info at Fastvideo SDK Manual.
We also recommend to check PCI-Express bandwidth for Host-to-Device and Device-to-Host transfers. For GPU with Gen3 x16 it should be in the range of 10-12 GB/s, and for GPU with Gen4 x16 it should be in the range of 20-24 GB/s. GPU memory size could be a bottleneck for image processing from high resolution cameras, so please check GPU memory usage in the software.
If you are working with images which reside on HDD, please place them on SSD or M2.
For testing purposes you can utilize the latest NVIDIA GeForce RTX 2060/2070/2080ti, 3070/3080ti/3090, 4080/4090 or Jetson TX2, NX and AGX Xavier / Orin.
For continuous high performance applications we recommend professional NVIDIA Quadro RTX Ada GPUs.
To run the software for multi-camera setups, we recommend to run one process per camera. If you have enough GPU memory and processing performance is ok, this is the simplest solution, which was tested in many applications. This is also a good choice for Linux solutions, please don’t forget to turn on CUDA MPS.
You can also create a software module to collect frames from different cameras and process them at the same pipeline with gpu-camera-sample application. In that case you will need less GPU memory which could be important for embedded solutions.
Please bear in mind that this is just a sample application. It’s intended to show how machine vision cameras can work with Fastvideo SDK to get high perfromance image processing on the NVIDIA GPU.
To test a real application with XIMEA cameras (USB3 or PCIe), please have a look at the following page and download FastVCR software. That software with GenICam (GenTL) support will be released soon.