The acquisition and processing of a video stream can be very computationally expensive. Typical image processing applications split the work across multiple threads, one acquiring the images, and another one running the actual algorithms. In MATLAB we can get multi-threading by interfacing with other languages, but there is a significant cost associated with exchanging data across the resulting language barrier. In this blog post, we compare different approaches for getting data through MATLAB’s Java interface, and we show how to acquire high-resolution video streams in real-time and with low overhead.
For our booth at ICRA 2014, we put together a demo system in MATLAB that used stereo vision for tracking colored bean bags, and a robot arm to pick them up. We used two IP cameras that streamed H.264 video over RTSP. While developing the image processing and robot control parts worked as expected, it proved to be a challenge to acquire images from both video streams fast enough to be useful.
Since we did not want to switch to another language, we decided to develop a small library for acquiring video streams. The project was later open sourced as HebiCam.
In order to save bandwidth most IP cameras compress video before sending it over the network. Since the resulting decoding step can be computationally expensive, it is common practice to move the acquisition to a separate thread in order to reduce the load on the main processing thread.
Unfortunately, doing this in MATLAB requires some workarounds due to the language’s single threaded nature, i.e., background threads need to run in another language. Out of the box, there are two supported interfaces: MEX for calling C/C++ code, and the Java Interface for calling Java code.
While both interfaces have strengths and weaknesses, practically all use cases can be solved using either one. For this project, we chose the Java interface in order to simplify cross-platform development and the deployment of binaries. The diagram below shows an overview of the resulting system.
Starting background threads and getting the video stream into Java was relatively straightforward. We used the JavaCV library, which is a Java wrapper around OpenCV and FFMpeg that includes pre-compiled native binaries for all major platforms. However, passing the acquired image data from Java into MATLAB turned out to be more challenging.
The Java interface automatically converts between Java and MATLAB types by following a set of rules. This makes it much simpler to develop for than the MEX interface, but it does cause additional overhead when calling Java functions. Most of the time this overhead is negligible. However, for certain types of data, such as large and multi-dimensional matrices, the default rules are very inefficient and can become prohibitively expensive. For example, a
1080x1920x3 MATLAB image matrix gets translated to a
byte in Java, which means that there is a separate array object for every single pixel in the image.
As an additional complication, MATLAB stores image data in a different memory layout than most other libraries (e.g. OpenCV’s
Mat or Java’s
BufferedImage). While pixels are commonly stored in row-major order (
[width][height][channels]), MATLAB stores images transposed and in column-major order (
[channels][width][height]). For example, if the Red-Green-Blue pixels of a
BufferedImage would be laid out as
[RGB][RGB][RGB]…, the same image would be laid out as
[RRR…][GGG…][BBB…] in MATLAB. Depending on the resolution this conversion can become fairly expensive.
In order to process images at a frame rate of 30 fps in real-time, the total time budget of the main MATLAB thread is 33ms per cycle. Thus, the acquisition overhead imposed on the main thread needs to be sufficiently low, i.e., a low number of milliseconds, to leave enough time for the actual processing.
We benchmarked five different ways to get image data from Java into MATLAB and compared their respective overhead on the main MATLAB thread. We omitted overhead incurred by background threads because it had no effect on the time budget available for image processing.
The full benchmark code is available here.
1. Default 3D Array
By default MATLAB image matrices convert to
byte[height][width][channels] Java arrays. However, when converting back to MATLAB there are some additional problems:
bytegets converted to
uint8, resulting in an invalid image matrix
changing the type back to
uint8is somewhat messy because the
uint8(matrix)cast sets all negative values to zero, and the alternative
typecast(matrix, 'uint8')only works on vectors
Thus, converting the data to a valid image matrix still requires several operations.
% (1) Get matrix from byte[height][width][channels] data = getRawFormat3d(this.javaConverter); [height,width,channels] = size(data); % (2) Reshape matrix to vector vector = reshape(data, width * height * channels, 1); % (3) Cast int8 data to uint8 vector = typecast(vector, 'uint8'); % (4) Reshape vector back to original shape image = reshape(vector, height, width, channels);
2. Compressed 1D Array
A common approach to move image data across distributed components (e.g. ROS) is to encode the individual images using MJPEG compression. Doing this within a single process is obviously wasteful, but we included it because it is common practice in many distributed systems. Since MATLAB did not offer a way to decompress jpeg images in memory, we needed to save the compressed data to a file located on a RAM disk.
% (1) Get compressed data from byte data = getJpegData(this.javaConverter); % (2) Save as jpeg file fileID = fopen('tmp.jpg','w+'); fwrite(fileID, data, 'int8'); fclose(fileID); % (3) Read jpeg file image = imread('tmp.jpg');
3. Java Layout as 1D Pixel Array
Another approach is to copy the pixel array of Java’s
BufferedImage and to reshape the memory using MATLAB. This is also the accepted answer for How can I convert a Java Image object to a MATLAB image matrix?.
% (1) Get data from byte and cast to correct type data = getJavaPixelFormat1d(this.javaConverter); data = typecast(data, 'uint8'); [h,w,c] = size(this.matlabImage); % get dim info % (2) Reshape matrix for indexing pixelsData = reshape(data, 3, w, h); % (3) Transpose and convert from row major to col major format (RGB case) image = cat(3, ... transpose(reshape(pixelsData(3, :, :), w, h)), ... transpose(reshape(pixelsData(2, :, :), w, h)), ... transpose(reshape(pixelsData(1, :, :), w, h)));
4. MATLAB Layout as 1D Pixel Array
The fourth approach also copies a single pixel array, but this time the pixels are already stored in the MATLAB convention.
% (1) Get data from byte and cast to correct type data = getMatlabPixelFormat1d(this.javaConverter); [h,w,c] = size(this.matlabImage); % get dim info vector = typecast(data, 'uint8'); % (2) Interpret pre-laid out memory as matrix image = reshape(vector,h,w,c);
Note that the most efficient way we found for converting the memory layout on the Java side was to use OpenCV’s
transpose functions. The code can be found in MatlabImageConverterBGR and MatlabImageConverterGrayscale.
5. MATLAB Layout as Shared Memory
The fifth approach is the same as the fourth with the difference that the Java translation layer is bypassed entirely by using shared memory via
memmapfile. Shared memory is typically used for inter-process communication, but it can also be used within a single process. Running within the same process also simplifies synchronization since MATLAB can access Java locks.
% (1) Lock memory lock(this.javaObj); % (2) Force a copy of the data image = this.memFile.Data.pixels * 1; % (3) Unlock memory unlock(this.javaObj);
Note that the code could be interrupted (ctrl+c) at any line, so the locking mechanism would need to be able to recover from bad states, or the unlocking would need to be guaranteed by using a destructor or onCleanup.
The multiplication by one forces a copy of the data. This is necessary because under-the-hood
memmapfile only returns a reference to the underlying memory.
All benchmarks were run in MATLAB 2017b on an Intel NUC6I7KYK. The performance was measured using MATLAB’s
timeit function. The background color of each cell in the result tables represents a rough classification of the overhead on the main MATLAB thread.
|Color||Overhead||At 30 FPS|
The two tables below show the results for converting color (RGB) images as well as grayscale images. All measurements are in milliseconds.
The results show that the default conversion, as well as jpeg compression, are essentially non-starters for color images. For grayscale images, the default conversion works significantly better due to the fact that the data is stored in a much more efficient 2D array (
byte[height][width]), and that there is no need to re-order pixels by color. Unfortunately, we currently don’t have a good explanation for the ~10x cost increase (rather than ~4x) between 1080p and 4K grayscale. The behavior was the same across computers and various different memory settings.
When copying the backing array of a
BufferedImage we can see another significant performance increase due to the data being stored in a single contiguous array. At this point much of the overhead comes from re-ordering pixels, so by doing the conversion beforehand, we can get another 2-3x improvement.
Lastly, although accessing shared memory in combination with the locking overhead results in a slightly higher fixed cost, the copying itself is significantly cheaper, resulting in another 2-3x speedup for high-resolution images. Overall, going through shared memory scales very well and would even allow streaming of 4K color images from two cameras simultaneously.
Our main takeaway was that although MATLAB’s Java interface can be inefficient for certain cases, there are simple workarounds that can remove most bottlenecks. The most important rule is to avoid converting to and from large multi-dimensional matrices whenever possible.
Another insight was that shared-memory provides a very efficient way to transfer large amounts of data to and from MATLAB. We also found it useful for inter-process communication between multiple MATLAB instances. For example, one instance can track a target while another instance can use its output for real-time control. This is useful for avoiding coupling a fast control loop to the (usually lower) frame rate of a camera or sensor.
As for our initial motivation, after creating HebiCam we were able to develop and reliably run the entire demo in MATLAB. The video below shows the setup using old-generation S-Series actuators.