<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Software for Robots]]></title><description><![CDATA[Real-Time Control, Networking, Operating Systems, Languages]]></description><link>https://ennerf.github.io</link><image><url>https://raw.githubusercontent.com/ennerf/ennerf.github.io/master/images/cover-image.jpg</url><title>Software for Robots</title><link>https://ennerf.github.io</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 25 Jan 2018 14:20:46 GMT</lastBuildDate><atom:link href="https://ennerf.github.io/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Efficient Data Acquisition in MATLAB: Streaming HD Video in Real-Time]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>The acquisition and processing of a video stream can be very computationally expensive. Typical image processing applications split the work across multiple threads, one acquiring the images, and another one running the actual algorithms. In MATLAB we can get multi-threading by interfacing with other languages, but there is a significant cost associated with exchanging data across the resulting language barrier. In this blog post, we compare different approaches for getting data through MATLAB&#8217;s Java interface, and we show how to acquire high-resolution video streams in real-time and with low overhead.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>For our booth at ICRA 2014, we put together a demo system in MATLAB that used stereo vision for tracking colored bean bags, and a robot arm to pick them up. We used two IP cameras that streamed <a href="https://de.wikipedia.org/wiki/H.264">H.264</a> video over <a href="https://en.wikipedia.org/wiki/Real_Time_Streaming_Protocol">RTSP</a>. While developing the image processing and robot control parts worked as expected, it proved to be a challenge to acquire images from both video streams fast enough to be useful.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://www.mathworks.com/hardware-support/ip-camera.html">IP Camera Support</a> only supports <a href="https://en.wikipedia.org/wiki/Motion_JPEG">MJPEG</a> over <a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">HTTP</a> and didn&#8217;t exist at the time</p>
</li>
<li>
<p><a href="http://www.mathworks.com/hardware-support/matlab-webcam.html">USB Webcam Support</a> only supports USB cameras</p>
</li>
<li>
<p><a href="http://www.mathworks.com/help/matlab/ref/imread.html">imread</a> and <a href="http://www.mathworks.com/help/matlab/ref/webread.html">webread</a> are limited to HTTP and too slow for real-time</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Since we did not want to switch to another language, we decided to develop a small library for acquiring video streams. The project was later open sourced as <a href="http://www.github.com/HebiRobotics/HebiCam">HebiCam</a>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_technical_background">Technical Background</h2>
<div class="sectionbody">
<div class="paragraph">
<p>In order to save bandwidth most IP cameras compress video before sending it over the network. Since the resulting decoding step can be computationally expensive, it is common practice to move the acquisition to a separate thread in order to reduce the load on the main processing thread.</p>
</div>
<div class="paragraph">
<p>Unfortunately, doing this in MATLAB requires some workarounds due to the language&#8217;s single threaded nature, i.e., background threads need to run in another language. Out of the box, there are two supported interfaces: <a href="https://www.mathworks.com/help/matlab/matlab_external/introducing-mex-files.html">MEX</a> for calling C/C++ code, and the <a href="https://www.mathworks.com/help/matlab/matlab_external/product-overview.html">Java Interface</a> for calling Java code.</p>
</div>
<div class="paragraph">
<p>While both interfaces have strengths and weaknesses, practically all use cases can be solved using either one. For this project, we chose the Java interface in order to simplify cross-platform development and the deployment of binaries. The diagram below shows an overview of the resulting system.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<img src="https://ennerf.github.io/images/streaming/stereocam-matlab.svg" alt="stereocam matlab.svg" width="80%">
</div>
<div class="title">Figure 1. System overview for a stereo vision setup</div>
</div>
<div class="paragraph">
<p>Starting background threads and getting the video stream into Java was relatively straightforward. We used the <a href="https://github.com/bytedeco/javacv">JavaCV</a> library, which is a Java wrapper around <a href="https://opencv.org/">OpenCV</a> and <a href="https://www.ffmpeg.org/">FFMpeg</a> that includes pre-compiled native binaries for all major platforms. However, passing the acquired image data from Java into MATLAB turned out to be more challenging.</p>
</div>
<div class="paragraph">
<p>The Java interface automatically converts between Java and MATLAB types by following a set of <a href="https://www.mathworks.com/help/matlab/matlab_external/handling-data-returned-from-java-methods.html">rules</a>. This makes it much simpler to develop for than the MEX interface, but it does cause additional overhead when calling Java functions. Most of the time this overhead is negligible. However, for certain types of data, such as large and multi-dimensional matrices, the default rules are very inefficient and can become prohibitively expensive. For example, a <code>1080x1920x3</code> MATLAB image matrix gets translated to a <code>byte[1080][1920][3]</code> in Java, which means that there is a separate array object for every single pixel in the image.</p>
</div>
<div class="paragraph">
<p>As an additional complication, MATLAB stores image data in a different memory layout than most other libraries (e.g. OpenCV&#8217;s <code>Mat</code> or Java&#8217;s <code>BufferedImage</code>). While pixels are commonly stored in row-major order (<code>[width][height][channels]</code>), MATLAB stores images transposed and in column-major order (<code>[channels][width][height]</code>). For example, if the Red-Green-Blue pixels of a <code>BufferedImage</code> would be laid out as <code>[RGB][RGB][RGB]&#8230;&#8203;</code>, the same image would be laid out as <code>[RRR&#8230;&#8203;][GGG&#8230;&#8203;][BBB&#8230;&#8203;]</code> in MATLAB. Depending on the resolution this conversion can become fairly expensive.</p>
</div>
<div class="paragraph">
<p>In order to process images at a frame rate of 30 fps in real-time, the total time budget of the main MATLAB thread is 33ms per cycle. Thus, the acquisition overhead imposed on the main thread needs to be sufficiently low, i.e., a low number of milliseconds, to leave enough time for the actual processing.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_data_translation">Data Translation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>We benchmarked five different ways to get image data from Java into MATLAB and compared their respective overhead on the main MATLAB thread. We omitted overhead incurred by background threads because it had no effect on the time budget available for image processing.</p>
</div>
<div class="paragraph">
<p>The full benchmark code is available <a href="https://github.com/HebiRobotics/HebiCam/tree/benchmark">here</a>.</p>
</div>
<div class="paragraph">
<p><strong>1. Default 3D Array</strong></p>
</div>
<div class="paragraph">
<p>By default MATLAB image matrices convert to <code>byte[height][width][channels]</code> Java arrays. However, when converting back to MATLAB there are some additional problems:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><code>byte</code> gets converted to <code>int8</code> instead of <code>uint8</code>, resulting in an invalid image matrix</p>
</li>
<li>
<p>changing the type back to <code>uint8</code> is somewhat messy because the <code>uint8(matrix)</code> cast sets all negative values to zero, and the alternative <code>typecast(matrix, 'uint8')</code> only works on vectors</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Thus, converting the data to a valid image matrix still requires several operations.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">% (1) Get matrix from byte[height][width][channels]
data = getRawFormat3d(this.javaConverter);
[height,width,channels] = size(data);

% (2) Reshape matrix to vector
vector = reshape(data, width * height * channels, 1);

% (3) Cast int8 data to uint8
vector = typecast(vector, 'uint8');

% (4) Reshape vector back to original shape
image = reshape(vector, height, width, channels);</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>2. Compressed 1D Array</strong></p>
</div>
<div class="paragraph">
<p>A common approach to move image data across distributed components (e.g. <a href="http://www.ros.org/">ROS</a>) is to encode the individual images using <a href="https://en.wikipedia.org/wiki/Motion_JPEG">MJPEG</a> compression. Doing this within a single process is obviously wasteful, but we included it because it is common practice in many distributed systems. Since MATLAB did not offer a way to decompress jpeg images in memory, we needed to save the compressed data to a file located on a RAM disk.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">% (1) Get compressed data from byte[]
data = getJpegData(this.javaConverter);

% (2) Save as jpeg file
fileID = fopen('tmp.jpg','w+');
fwrite(fileID, data, 'int8');
fclose(fileID);

% (3) Read jpeg file
image = imread('tmp.jpg');</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>3. Java Layout as 1D Pixel Array</strong></p>
</div>
<div class="paragraph">
<p>Another approach is to copy the pixel array of Java&#8217;s <code>BufferedImage</code> and to reshape the memory using MATLAB. This is also the accepted answer for <a href="https://mathworks.com/matlabcentral/answers/100155-how-can-i-convert-a-java-image-object-into-a-matlab-image-matrix#answer_109503">How can I convert a Java Image object to a MATLAB image matrix?</a>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">% (1) Get data from byte[] and cast to correct type
data = getJavaPixelFormat1d(this.javaConverter);
data = typecast(data, 'uint8');
[h,w,c] = size(this.matlabImage); % get dim info

% (2) Reshape matrix for indexing
pixelsData = reshape(data, 3, w, h);

% (3) Transpose and convert from row major to col major format (RGB case)
image = cat(3, ...
    transpose(reshape(pixelsData(3, :, :), w, h)), ...
    transpose(reshape(pixelsData(2, :, :), w, h)), ...
    transpose(reshape(pixelsData(1, :, :), w, h)));</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>4. MATLAB Layout as 1D Pixel Array</strong></p>
</div>
<div class="paragraph">
<p>The fourth approach also copies a single pixel array, but this time the pixels are already stored in the MATLAB convention.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">% (1) Get data from byte[] and cast to correct type
data = getMatlabPixelFormat1d(this.javaConverter);
[h,w,c] = size(this.matlabImage);  % get dim info
vector = typecast(data, 'uint8');

% (2) Interpret pre-laid out memory as matrix
image = reshape(vector,h,w,c);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Note that the most efficient way we found for converting the memory layout on the Java side was to use OpenCV&#8217;s <code>split</code> and <code>transpose</code> functions. The code can be found in <a href="https://github.com/HebiRobotics/HebiCam/blob/master/src/main/java/us/hebi/matlab/streaming/MatlabImageConverterBGR.java">MatlabImageConverterBGR</a> and <a href="https://github.com/HebiRobotics/HebiCam/blob/master/src/main/java/us/hebi/matlab/streaming/MatlabImageConverterGrayscale.java">MatlabImageConverterGrayscale</a>.</p>
</div>
<div class="paragraph">
<p><strong>5. MATLAB Layout as Shared Memory</strong></p>
</div>
<div class="paragraph">
<p>The fifth approach is the same as the fourth with the difference that the Java translation layer is bypassed entirely by using shared memory via <code><a href="https://mathworks.com/help/matlab/ref/memmapfile.html">memmapfile</a></code>. Shared memory is typically used for inter-process communication, but it can also be used within a single process. Running within the same process also simplifies synchronization since MATLAB can access Java locks.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">% (1) Lock memory
lock(this.javaObj);

% (2) Force a copy of the data
image = this.memFile.Data.pixels * 1;

% (3) Unlock memory
unlock(this.javaObj);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Note that the code could be interrupted (ctrl+c) at any line, so the locking mechanism would need to be able to recover from bad states, or the unlocking would need to be guaranteed by using a destructor or <a href="https://mathworks.com/help/matlab/ref/oncleanup.html">onCleanup</a>.</p>
</div>
<div class="paragraph">
<p>The multiplication by one forces a copy of the data. This is necessary because under-the-hood <code>memmapfile</code> only returns a reference to the underlying memory.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_results">Results</h2>
<div class="sectionbody">
<div class="paragraph">
<p>All benchmarks were run in MATLAB 2017b on an <a href="https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc6i7kyk.html">Intel NUC6I7KYK</a>. The performance was measured using MATLAB&#8217;s <code><a href="https://mathworks.com/help/matlab/ref/timeit.html">timeit</a></code> function. The background color of each cell in the result tables represents a rough classification of the overhead on the main MATLAB thread.</p>
</div>
<div class="paragraph">
<p></p>
</div>
<table class="tableblock frame-topbot grid-all spread">
<caption class="title">Table 1. Color classification</caption>
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top" style="background-color: white;">Color</th>
<th class="tableblock halign-left valign-top" style="background-color: white;">Overhead</th>
<th class="tableblock halign-left valign-top" style="background-color: white;">At 30 FPS</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>Green</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&lt;10%</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&lt;3.3 ms</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>Yellow</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&lt;50%</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&lt;16.5 ms</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>Orange</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&lt;100%</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&lt;33.3 ms</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>Red</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&gt;100%</p>
</div></div></td>
<td class="tableblock halign-left valign-top" style="background-color: white;"><div><div class="paragraph">
<p>&gt;33.3 ms</p>
</div></div></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p></p>
</div>
<div class="paragraph">
<p>The two tables below show the results for converting color (RGB) images as well as grayscale images. All measurements are in milliseconds.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<img src="https://ennerf.github.io/images/streaming/table_performance.svg" alt="table performance.svg" width="100%">
</div>
<div class="title">Figure 2. Conversion overhead on the MATLAB thread in [ms]</div>
</div>
<br>
<div class="paragraph">
<p>The results show that the default conversion, as well as jpeg compression, are essentially non-starters for color images. For grayscale images, the default conversion works significantly better due to the fact that the data is stored in a much more efficient 2D array (<code>byte[height][width]</code>), and that there is no need to re-order pixels by color. Unfortunately, we currently don&#8217;t have a good explanation for the ~10x cost increase (rather than ~4x) between 1080p and 4K grayscale. The behavior was the same across computers and various different memory settings.</p>
</div>
<div class="paragraph">
<p>When copying the backing array of a <code>BufferedImage</code> we can see another significant performance increase due to the data being stored in a single contiguous array. At this point much of the overhead comes from re-ordering pixels, so by doing the conversion beforehand, we can get another 2-3x improvement.</p>
</div>
<div class="paragraph">
<p>Lastly, although accessing shared memory in combination with the locking overhead results in a slightly higher fixed cost, the copying itself is significantly cheaper, resulting in another 2-3x speedup for high-resolution images. Overall, going through shared memory scales very well and would even allow streaming of 4K color images from two cameras simultaneously.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_final_notes">Final Notes</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Our main takeaway was that although MATLAB&#8217;s Java interface can be inefficient for certain cases, there are simple workarounds that can remove most bottlenecks. The most important rule is to avoid converting to and from large multi-dimensional matrices whenever possible.</p>
</div>
<div class="paragraph">
<p>Another insight was that shared-memory provides a very efficient way to transfer large amounts of data to and from MATLAB. We also found it useful for inter-process communication between multiple MATLAB instances. For example, one instance can track a target while another instance can use its output for real-time control. This is useful for avoiding coupling a fast control loop to the (usually lower) frame rate of a camera or sensor.</p>
</div>
<div class="paragraph">
<p>As for our initial motivation, after creating <a href="https://github.com/HebiRobotics/HebiCam">HebiCam</a> we were able to develop and reliably run the entire demo in MATLAB. The video below shows the setup using old-generation S-Series actuators.</p>
</div>
<div class="videoblock">
<div class="content">
<iframe src="https://www.youtube.com/embed/R0nQSxt8uic?rel=0?rel=0" frameborder="0" allowfullscreen></iframe>
</div>
</div>
<link rel="stylesheet" href="https://cdn.rawgit.com/ennerf/ennerf.github.io/master/resources/highlight.js/9.9.0/styles/matlab.css">
<!-- TODO: figure out how to keep Hubpress from adding the default code highlighter to make this work -->
<!--<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.9.0/highlight.min.js"></script>-->
<!--<script src="http://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.9.0/languages/matlab.min.js"></script>-->
<!--<script type="text/javascript">hljs.initHighlightingOnLoad()</script>-->
</div>
</div>]]></description><link>https://ennerf.github.io/2017/10/14/Efficient-Data-Acquisition-in-MATLAB-Streaming-HD-Video-in-Real-Time.html</link><guid isPermaLink="true">https://ennerf.github.io/2017/10/14/Efficient-Data-Acquisition-in-MATLAB-Streaming-HD-Video-in-Real-Time.html</guid><category><![CDATA[MATLAB]]></category><category><![CDATA[ MATLAB-Java Interface]]></category><category><![CDATA[ shared memory]]></category><category><![CDATA[ computer vision]]></category><category><![CDATA[ OpenCV]]></category><category><![CDATA[ JavaCV]]></category><category><![CDATA[ FFMpeg]]></category><dc:creator><![CDATA[Florian Enner]]></dc:creator><pubDate>Sat, 14 Oct 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Using MATLAB for hardware-in-the-loop prototyping #1 : Message Passing Systems]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>MATLAB&#169; is a programming language and environment designed for scientific computing. It is one of the best languages for developing robot control algorithms and is widely used in the research community. While it is often thought of as an offline programming language, there are several ways to interface with it to control robotic hardware 'in the loop'. As part of our own development we surveyed a number of different projects that accomplish this by using a <a href="https://en.wikipedia.org/wiki/Message_passing">message passing</a> system and we compared the approaches they took. This post focuses on bindings for the following message passing frameworks: LCM, ROS, DDS, and ZeroMQ.</p>
</div>
<div class="paragraph">
<p>The main motivation for using MATLAB to prototype directly on real hardware is to dramatically accelerate the development cycle by reducing the time it takes to find out out whether an algorithm can withstand ubiquitous real-world problems like noisy and poorly-calibrated sensors, imperfect actuator controls, and unmodeled robot dynamics. Additionally, a workflow that requires researchers to port prototype code to another language before being able to test on real hardware can often lead to weeks or months being lost in chasing down new technical bugs introduceed by the port. Finally, programming in a language like C++ can pose a significant barrier to controls engineers who often have a strong electro-mechanical background but are not as strong in computer science or software engineering.</p>
</div>
<div class="paragraph">
<p>We have also noticed that over the past few years several other groups in the robotics community also experience these problems and have started to develop ways to control hardware directly from MATLAB.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_the_need_for_external_languages">The Need for External Languages</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The main limitation when trying to use MATLAB to interface with hardware stems from the fact that its scripting language is fundamentally single threaded. It has been designed to allow non-programmers to do complex math operations without needing to worry about programming concepts like multi-threading or synchronization.  However, this poses a problem for real-time control of hardware because all communication is forced to happen synchronously in the main thread. For example, if a control loop runs at 100Hz and it takes a message ~8ms for a round-trip, the main thread ends up wasting 80% of the available time budget waiting for a response without doing any actual work.</p>
</div>
<div class="paragraph">
<p>A second hurdle is that while MATLAB is very efficient in the execution of math operations, it is not particularly well suited for byte manipulation. This makes it difficult to develop code that can efficiently create and parse binary message formats that the target hardware can understand. Thus, after having the main thread spend its time waiting for and parsing the incoming data, there may not be any time left for performing interesting math operations.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/matlab/comms-single-threaded.png"><img src="https://ennerf.github.io/images/matlab/comms-single-threaded.png" alt="comms single threaded.png" width="100%"></a>
</div>
<div class="title">Figure 1. Communications overhead in the main MATLAB thread</div>
</div>
<div class="paragraph">
<p>Pure MATLAB implementations can work for simple applications, such as interfacing with an Arduino to gather temperature data or blink an LED, but it is not feasible to control complex robotic systems (e.g. a humanoid) at high rates (e.g. 100Hz-1KHz). Fortunately, MATLAB does have the ability to interface with other programming languages that allow users to create background threads that can offload the communications aspect from the main thread.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/matlab/comms-multi-threaded.png"><img src="https://ennerf.github.io/images/matlab/comms-multi-threaded.png" alt="comms multi threaded.png" width="100%"></a>
</div>
<div class="title">Figure 2. Communications overhead offloaded to other threads</div>
</div>
<div class="paragraph">
<p>Out of the box MATLAB provides two interfaces to other languages:  <a href="https://www.mathworks.com/help/matlab/matlab_external/introducing-mex-files.html">MEX</a> for calling C/C++ code, and the <a href="https://www.mathworks.com/help/matlab/matlab_external/product-overview.html">Java Interface</a> for calling Java code. There are some differences between the two, but at the end of the day the choice effectively comes down to personal preference. Both provide enough capabilities for developing sophisticated interfaces and have orders of magnitude better performance than required.  There are additional interfaces to <a href="https://www.mathworks.com/help/matlab/calling-external-functions.html">other languages</a>, but those require additional setup steps.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_message_passing_frameworks">Message Passing Frameworks</h2>
<div class="sectionbody">
<div class="paragraph">
<p><a href="https://en.wikipedia.org/wiki/Message_passing">Message passing</a> frameworks such as <a href="http://www.ros.org/">Robot Operating System (ROS)</a> and <a href="https://lcm-proj.github.io/">Lightweight Communication and Marshalling (LCM)</a> have been widely adopted in the robotics research community. At the core they typically consist of two parts: a way to exchange data between processes (e.g. UDP/TCP), as well as a defined binary format for encoding and decoding the messages. They allow systems to be built with distributed components (e.g. processes) that run on different computers, different operating systems, and different programming languages.</p>
</div>
<div class="paragraph">
<p>The resulting systems are very extensible and provide convenient ways for prototyping. For example, a component communicating with a physical robot can be exchanged with a simulator without affecting the rest of the system. Similarly, a new walking controller could be implemented in MATLAB and communicate with external processes (e.g. robot comms) through the exchange of messages.  With ROS and LCM in particular, their flexibility, wide-spread adoption, and support for different languages make them a nice starting point for a MATLAB-hardware interface.</p>
</div>
<div class="sect2">
<h3 id="_lightweight_communication_and_marshalling_lcm">Lightweight Communication and Marshalling (LCM)</h3>
<div class="paragraph">
<p><a href="https://lcm-proj.github.io/tut_matlab.html">LCM</a> was developed in 2006 at <a href="http://www.mit.edu/">MIT</a> for their entry to DARPA&#8217;s Urban Challenge. In recent years it has become a popular alternative to ROS-messaging, and it was as far as we know the first message passing framework for robotics that supported MATLAB as a core language.</p>
</div>
<div class="paragraph">
<p>The snippet below shows how the MATLAB code for sending a command message could look like. The code creates a struct-like <em>message</em>, sets desired values, and publishes it on an appropriate channel.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for sending an LCM message
% Setup
lc = lcm.lcm.LCM.getSingleton();

% Fill message
cmd = types.command();
cmd.position = [1 2 3];
cmd.velocity = [1 2 3];

% Publish
lc.publish('COMMAND_CHANNEL', cmd);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Interestingly, the backing implementation of these bindings was done in pure Java and did not contain any actual MATLAB code. The exposed interface consisted of two Java classes as well as auto-generated message types.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The <a href="https://github.com/lcm-proj/lcm/blob/master/lcm-java/lcm/lcm/LCM.java">LCM</a> class provides a way to publish messages and subscribe to channels</p>
</li>
<li>
<p>The generated Java messages handle the binary encoding and exposed fields that MATLAB can access</p>
</li>
<li>
<p>The <a href="https://github.com/lcm-proj/lcm/blob/master/lcm-java/lcm/lcm/MessageAggregator.java">MessageAggregator</a> class provides a way to receive messages on a background thread and queue them for MATLAB.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Thus, even though the snippet looks similar to MATLAB code, all variables are actually Java objects. For example, the struct-like <em>command</em> type is a Java object that exposes public fields as shown in the snippet below. Users can access them the same way as fields of a standard MATLAB struct (or class properties) resulting in nice syntax. The types are automatically converted according to the <a href="https://mathworks.com/help/matlab/matlab_external/passing-data-to-java-methods.html">type mapping</a>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-java" data-lang="java">/**
 * Java class that behaves like a MATLAB struct
 */
public final class command implements lcm.lcm.LCMEncodable
{
    public double[] position;
    public double[] velocity;
    // etc. ...
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Receiving messages is done by subscribing an <em>aggregator</em> to one or more channels. The aggregator receives messages from a background thread and stores them in a queue that MATLAB can access in a synchronous manner using <em>aggregator.getNextMessage()</em>. Each message contains the raw bytes as well as some meta data for selecting an appropriate type for decoding.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for receiving an LCM message
% Setup
lc = lcm.lcm.LCM.getSingleton();
aggregator = lcm.lcm.MessageAggregator();
lc.subscribe('FEEDBACK_CHANNEL', aggregator);

% Continuously check for new messages
timeoutMs = 1000;
while true

    % Receive raw message
    msg = aggregator.getNextMessage(timeoutMs);

    % Ignore timeouts
    if ~isempty(msg)

        % Select message type based on channel name
        if strcmp('FEEDBACK_CHANNEL', char(msg.channel))

            % Decode raw bytes to a usable type
            fbk = types.feedback(msg.data);

            % Use data
            position = fbk.position;
            velocity = fbk.velocity;

        end

    end
end</code></pre>
</div>
</div>
<div class="paragraph">
<p>The snippet below shows a simplified version of the backing Java code for the aggregator class. Since Java is limited to a single return argument, the <em>getNextMessage</em> call returns a Java type that contains the received bytes as well as meta data to identify the type, i.e., the source channel name.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-java" data-lang="java">/**
 * Java class for receiving messages in the background
 */
public class MessageAggregator implements LCMSubscriber {

    /**
     * Value type that combines multiple return arguments
     */
    public static class Message {

        final public byte[] data; // raw bytes
        final public String channel; // source channel name

        public Message(String channel_, byte[] data_) {
            data = data_;
            channel = channel_;
        }
    }

    /**
     * Method that gets called from MATLAB to receive new messages
     */
    public synchronized Message getNextMessage(long timeout_ms) {

		if (!messages.isEmpty()) {
		    return messages.removeFirst();
        }

        if (timeout_ms == 0) { // non-blocking
            return null;
        }

        // Wait for new message until timeout ...
    }

}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Note that the <em>getNextMessage</em> method requires a timeout argument. In general it is important for blocking Java methods to have a timeout in order to prevent the main thread from getting stuck permanently. Being in a Java call prohibits users from aborting the execution (ctrl-c), so timeouts should be reasonably short, i.e., in the low seconds. Otherwise this could cause the UI to become unresponsive and users may be forced to close MATLAB without being able to save their workspace. Passing in a timeout of zero serves as a non-blocking interface that immediately returns empty if no messages are available. This is often useful for working with multiple aggregators or for integrating asynchronous messages with unknown timing, such as user input.</p>
</div>
<div class="paragraph">
<p>Overall, we thought that this was a well thought out API and a great example for a minimum viable interface that works well in practice. By receiving messages on a background thread and by moving the encoding and decoding steps to the Java language, the main thread is able to spend most of its time on actually working with the data. Its minimalistic implementation is comparatively simple and we would recommend it as a starting point for developing similar interfaces.</p>
</div>
<div class="paragraph">
<p>Some minor points for improvement that we found were:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The decoding step <em>fbk = types.feedback(msg.data)</em> forces two unnecessary translations due to <em>msg.data</em> being a <em>byte[]</em>, which automatically gets converted to and from <em>int8</em>. This could result in a noticeable performance hit when receiving larger messages (e.g. images) and could be avoided by adding an overload that accepts a non-primitive type that does not get translated, e.g., <em>fbk = types.feedback(msg)</em>.</p>
</li>
<li>
<p>The Java classes did not implement <a href="https://mathworks.com/help/matlab/matlab_external/save-and-load-java-objects-to-mat-files.html">Serializable</a>, which could become bothersome when trying to save the workspace.</p>
</li>
<li>
<p>We would prefer to select the decoding type during the subscription step, e.g., <em>lc.subscribe('FEEDBACK_CHANNEL', aggregator, 'types.feedback')</em>, rather than requiring users to instantiate the type manually. This would clean up the parsing code a bit and allow for a less confusing error message if types are missing.</p>
</li>
</ul>
</div>
</div>
<div class="sect2">
<h3 id="_robot_operating_system_ros">Robot Operating System (ROS)</h3>
<div class="paragraph">
<p><a href="http://www.ros.org">ROS</a> is by far the most widespread messaging framework in the robotics research community and has been officially supported by Mathworks' <a href="https://www.mathworks.com/products/robotics.html">Robotics System Toolbox</a> since 2014. While the Simulink code generation uses ROS C++, the MATLAB implementation is built on the less common RosJava.</p>
</div>
<div class="paragraph">
<p>The API was designed such that each topic requires dedicated publishers and subscribers, which is different from LCM where each subscriber may listen to multiple channels/topics. While this may result in potentially more subscribers, the specification of the expected type at initialization removes much of the boiler plate code necessary for dealing with message types.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for publishing a ROS message
% Setup Publisher
chatpub = rospublisher('/chatter', 'std_msgs/String');

% Fill message
msg = rosmessage(chatpub);
msg.Data = 'Some test string';

% Publish
chatpub.send(msg);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Subscribers support three different styles to access messages: blocking calls, non-blocking calls, and callbacks.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for receiving a ROS message
% Setup Subscriber
laser = rossubscriber('/scan');

% (1) Blocking receive
scan = laser.receive(1); % timeout [s]

% (2) Non-blocking latest message (may not be new)
scan = laser.LatestMessage;

% (3) Callback
callback = @(msg) disp(msg);
subscriber = rossubscriber('/scan', @callback);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Contrary to LCM, all objects that are visible to users are actually MATLAB classes. Even though the implementation is using Java underneath, all exposed functionality is wrapped in MATLAB classes that hide all Java calls. For example, each message type is associated with a generated wrapper class. The code below shows a simplified example of a wrapper for a message that has a <em>Name</em> property.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for wrapping a Java message type
classdef WrappedMessage

    properties (Access = protected)
        % The underlying Java message object (hidden from user)
        JavaMessage
    end

    methods

        function name = get.Name(obj)
            % value = msg.Name;
            name = char(obj.JavaMessage.getName);
        end

        function set.Name(obj, name)
            % msg.Name = value;
            validateattributes(name, {'char'}, {}, 'WrappedMessage', 'Name');
            obj.JavaMessage.setName(name); % Forward to Java method
        end

        function out = doSomething(obj)
            % msg.doSomething() and doSomething(msg)
            try
                out = obj.JavaMessage.doSomething(); % Forward to Java method
            catch javaException
                throw(WrappedException(javaException)); % Hide Java exception
            end
        end

    end
end</code></pre>
</div>
</div>
<div class="paragraph">
<p>Due to the implementation being closed-source, we were only able to look at the public toolbox files as well as the compiled Java bytecode. As far as we could tell they built a small Java library that wrapped RosJava functionality in order to provide an interface that is easier to call from MATLAB. Most of the actual logic seemed to be implemented in MATLAB code, but we also found several calls to various Java libraries for problems that would have been difficult to implement in pure MATLAB, e.g., listing networking interfaces or doing in-memory decompression of images.</p>
</div>
<div class="paragraph">
<p>Overall, we found that the ROS support toolbox looked very nice and was a great example of how seamless external languages could be integrated with MATLAB. We also really liked that they offered a way to load log files (rosbags).</p>
</div>
<div class="paragraph">
<p>One concern we had was that there did not seem to be a simple non-blocking way to check for new messages, e.g., a <em>hasNewMessage()</em> method or functionality equivalent to LCM&#8217;s <em>getNextMessage(0)</em>. We often found this useful for applications that combined data from multiple topics that arrived at different rates (e.g. sensor feedback and joystick input events). We checked whether this behavior could be emulated by specifying a very small timeout in the <em>receive</em> method (shown in the snippet below), but any value below 0.1s seemed to never successfully return.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% Trying to check whether a new message has arrived without blocking
try
    msg = sub.receive(0.1); % below 0.1s always threw an error
    % ... use message ...
catch ex
    % ignore
end</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_data_distribution_service_dds">Data Distribution Service (DDS)</h3>
<div class="paragraph">
<p>In 2014 Mathworks also added a <a href="https://www.mathworks.com/hardware-support/rti-dds.html">support package for DDS</a>, which is the messaging middleware that ROS 2.0 is based on. It supports MATLAB and Simulink, as
well as code generation. Unfortunately, we did not have all the requirements to get it setup, and we could not find much information about the underlying implementation. After looking at some of the intro videos, we believe that the resulting code should look as follows.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for sending and receiving DDS messages
% Setup
DDS.import('ShapeType.idl','matlab');
dp = DDS.DomainParticipant

% Create message
myTopic = ShapeType;
myTopic.x = int32(23);
myTopic.y = int32(35);

% Send Message
dp.addWriter('ShapeType', 'Square');
dp.write(myTopic);

% Receive message
dp.addReader('ShapeType', 'Square');
readTopic = dp.read();</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_zeromq">ZeroMQ</h3>
<div class="paragraph">
<p>ZeroMQ is another asynchonous messaging library that is popular for building distributed systems. It only handles the messaging aspect, so users need to supply their own wire format. <a href="https://github.com/smcgill3/zeromq-matlab">ZeroMQ-matlab</a> is a MATLAB interface to ZeroMQ that was developed at UPenn between 2013-2015. We were not able to find much documentation, but as far as we could tell the resulting code should look similar to following snippet.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-matlab" data-lang="matlab">%% MATLAB code for sending and receiving ZeroMQ data
% Setup
subscriber = zmq( 'subscribe', 'tcp', '127.0.0.1', 43210 );
publisher = zmq( 'publish', 'tcp', 43210 );

% Publish data
bytes = uint8(rand(100,1));
nbytes = zmq( 'send', publisher, bytes );

% Receive data
receiver = zmq('poll', 1000); // polls for next message
[recv_data, has_more] = zmq( 'receive', receiver );

disp(char(recv_data));</code></pre>
</div>
</div>
<div class="paragraph">
<p>It was implemented as a single MEX function that selects appropriate sub-functions based on a string argument. State was maintained by using socket IDs that were passed in by the user at every call. The code below shows a simplified snippet of the send action.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-c++" data-lang="c++">// Parsing the selected ZeroMQ action behind the MEX barrier
// Grab command String
if ( !(command = mxArrayToString(prhs[0])) )
	mexErrMsgTxt("Could not read command string. (1st argument)");

// Match command String with desired action (e.g. 'send')
if (strcasecmp(command, "send") == 0){
	// ... (argument validation)

	// retrieve arguments
	socket_id = *( (uint8_t*)mxGetData(prhs[1]) );
	size_t n_el = mxGetNumberOfElements(prhs[2]);
	size_t el_sz = mxGetElementSize(prhs[2]);
	size_t msglen = n_el*el_sz;

	// send data
	void* msg = (void*)mxGetData(prhs[2]);
	int nbytes = zmq_send( sockets[ socket_id ], msg, msglen, 0 );

	// ... check outcome and return
}
// ... other actions</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_other_frameworks">Other Frameworks</h3>
<div class="paragraph">
<p>Below is a list of APIs to other frameworks that we looked at but could not cover in more detail.</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 25%;">
<col style="width: 75%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Project</th>
<th class="tableblock halign-left valign-top">Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p><a href="https://github.com/ragavsathish/RabbitMQ-Matlab-Client">RabbitMQ-Matlab-Client</a></p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>Simple Java wrapper for RabbitMQ with callbacks into MATLAB</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p><a href="https://sourceforge.net/projects/urbi/?source=typ_redirect">URBI</a> (<a href="http://agents.csse.uwa.edu.au/aibosig/resources/downloads/tutorial_liburbiMatlab_0.1.pdf">tutorial</a>)</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>Seems to be deprecated</p>
</div></div></td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_final_notes">Final Notes</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Contrary to the situation a few years ago, nowadays there exist interfaces for most of the common message passing frameworks that allow researchers to do at least basic hardware-in-the-loop prototyping directly from MATLAB. However, if none of the available options work for you and you are planning on developing your own, we recommend the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>If there is no clear pre-existing preference between C++ and Java, we recommend to start with a Java implementation. MEX interfaces require a lot of conversion code that Java interfaces would handle automatically.</p>
</li>
<li>
<p>We would recommend starting with a minimalstic LCM-like implementation and then add complexity when necessary.</p>
</li>
<li>
<p>While interfaces that only expose MATLAB code can provide a better and more consistent user experience (e.g. help documentation), there is a significant cost associated with maintaing all of the involved layers. We would recommend holding off on creating MATLAB wrappers until the API is relatively stable.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Finally, even though message passing systems are very widespread in the robotics community, they do have drawbacks and are not appropriate for every application. Future posts in this series will focus on some of the alternatives.</p>
</div>
<link rel="stylesheet" href="https://cdn.rawgit.com/ennerf/ennerf.github.io/master/resources/highlight.js/9.9.0/styles/matlab.css">
<!-- TODO: figure out how to keep Hubpress from adding the default code highlighter to make this work -->
<!--<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.9.0/highlight.min.js"></script>-->
<!--<script src="http://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.9.0/languages/matlab.min.js"></script>-->
<!--<script type="text/javascript">hljs.initHighlightingOnLoad()</script>-->
</div>
</div>]]></description><link>https://ennerf.github.io/2017/06/25/Using-MATLAB-for-hardware-in-the-loop-prototyping-1-Message-Passing-Systems.html</link><guid isPermaLink="true">https://ennerf.github.io/2017/06/25/Using-MATLAB-for-hardware-in-the-loop-prototyping-1-Message-Passing-Systems.html</guid><category><![CDATA[MATLAB]]></category><category><![CDATA[ ROS]]></category><category><![CDATA[ LCM]]></category><category><![CDATA[ DDS]]></category><category><![CDATA[ ZeroMQ]]></category><category><![CDATA[ MEX]]></category><category><![CDATA[ Java]]></category><dc:creator><![CDATA[Florian Enner]]></dc:creator><pubDate>Sun, 25 Jun 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Analyzing the viability of Ethernet and UDP for robot control]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Ethernet is the most pervasive communication standard in the world. However, it is often dismissed for robotics applications because of its presumed non-deterministic behavior. In this article, we show that in practice Ethernet can be made to be extremely deterministic and provide a flexible and reliable solution for robot communication.</p>
</div>
<div class="paragraph">
<p>The network topologies and traffic patterns used to control robotic systems exhibit different characteristics than those studied by traditional networking work that focuses on large, ad-hoc networks.
Below we present results from a number of tests and benchmarks, involving over 100 million transmitted packets. Over the course of all of our tests no packets were dropped or received out of order.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_technical_background">Technical Background</h2>
<div class="sectionbody">
<div class="paragraph">
<p>One of the primary concerns that roboticists have when considering technologies for real-time control is the predictability of latency. The worst case latency tends to be more important than the overall throughput, so the possibility of latency spikes and packet loss in a communication standard represent significant red flags.</p>
</div>
<div class="paragraph">
<p>Much of the prevalent hesitance towards using Ethernet for real-time control originated in the early days of networking. Nodes used to communicate over a single shared media that employed a control method with random elements for arbitrating access (<a href="https://en.wikipedia.org/wiki/Carrier_sense_multiple_access_with_collision_detection">CSMA/CD</a>). When two Frames collided during a transmission, the senders backed off for random timeouts and attempted to retransmit. After a number of failed attempts, frames could be dropped entirely. By connecting more nodes through <a href="https://en.wikipedia.org/wiki/Ethernet_hub">Hubs</a> the <a href="https://en.wikipedia.org/wiki/Collision_domain">Collision Domain</a> was extended further, resulting in more collisions and less predictable behavior.</p>
</div>
<div class="paragraph">
<p>In a process that started in <a href="https://en.wikipedia.org/wiki/Kalpana_(company)">1990</a>, Hubs have been fully replaced with <a href="https://en.wikipedia.org/wiki/Network_switch">Switches</a> that have dedicated full-duplex (separate lines for transmitting and receiving) connections for each port. This separates segments and isolates collision domains, which eliminates any collisions that were happening on the physical (wire) level. CSMA/CD is still supported for backwards compatibility and half-duplex connections, but it is largely obsolete.</p>
</div>
<div class="paragraph">
<p>Using dedicated lines introduces additional buffering and overhead for forwarding Frames to intended receivers. As of 2016, virtually all Switches implement the <a href="https://en.wikipedia.org/wiki/Store_and_forward">Store-and-Forward</a> switching architecture in which Switches fully receive packets, store them in an internal buffer, and then forward them to the appropriate receiver port. This adds a latency cost that scales linearly with the number of Switches that a packet has to go through.
In the alternative <a href="https://en.wikipedia.org/wiki/Cut-through_switching">Cut-through</a> approach Switches can forward packets immediately after receiving the target address, potentially resulting in lower latency. While this is sometimes used in latency sensitive applications, such as financial trading applications, it generally can&#8217;t be found in consumer grade hardware. It is more difficult to implement, only works well if both ports negotiate the same speed, and requires the receiving port to be idle. The benefits are also less significant on smaller packets due to the requirement to buffer enough data to evaluate the target address.</p>
</div>
<div class="paragraph">
<p>Another problem that many roboticists are often concerned about is  <a href="https://en.wikipedia.org/wiki/Out-of-order_delivery">Out-of-Order Delivery</a>, which means that a sequence of packets coming from a single source may be received in a different order. This is relevant for communicating over the Internet, but generally does not apply to local networks without redundant routes and load balancing. Depending on the driver implementation it can theoretically happen on a local network, but we have yet to observe such a case.</p>
</div>
<div class="paragraph">
<p>There are several competing networking standards that are built on Ethernet and can guarantee enough determinism to be used in industrial automation (<a href="https://en.wikipedia.org/wiki/Industrial_Ethernet">Industrial Ethernet</a>). They achieve this by enforcing tight control over the network layout and by limiting the components that can be connected. However, even cheap consumer grade network equipment can produce very good results if the network is controlled in a similar manner.</p>
</div>
<div class="paragraph">
<p>Note that this is not a new concept. We found several resources that discussed similar findings more than a decade ago, e.g., <a href="http://www.embedded.com/design/connectivity/4023291/Real-Time-Ethernet">Real-Time-Ethernet</a> (2001), <a href="https://www.researchgate.net/publication/4232548_Real-time_performance_measurements_using_UDP_on_Windows_and_Linux">Real-time performance measurements using UDP on Windows and Linux</a> (2005), <a href="http://literature.rockwellautomation.com/idc/groups/literature/documents/wp/enet-wp002_-en-p.pdf">Evaluating Industrial Ethernet</a> (2007), and  <a href="http://www.embedded.com/electronics-blogs/cole-bin/4406659/1/Deterministic-networking&#8212;&#8203;from-niches-to-the-mainstream-">Deterministic Networking: from niches to the mainstream</a> (2013).</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_benchmark_setup">Benchmark Setup</h2>
<div class="sectionbody">
<div class="paragraph">
<p>A common way to benchmark networks is to setup two computers and have a sender transmit a message to a receiver that echoes it back. That way the sender can measure the <a href="https://en.wikipedia.org/wiki/Round-trip_delay_time">round-trip time (RTT)</a> and gather statistics of the network. This generally works well, but large operating system stacks and device drivers can potentially add a lot of variation. In an attempt to reduce unwanted jitter, we decided to setup a benchmark using two embedded devices instead.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/io-board.jpg"><img src="https://ennerf.github.io/images/udp/io-board.jpg" alt="io board.jpg" width="100%"></a>
</div>
<div class="title">Figure 1. HEBI Robotics I/O Board</div>
</div>
<div class="paragraph">
<p>Our startup <a href="http://hebirobotics.com/">HEBI Robotics</a> builds a variety of building blocks that enable quick development of custom robotic systems. We mainly focus on actuators, but we&#8217;ve also developed other devices such as the I/O Board shown in the picture above. Each board has 48 pins that serve a variety of functions (analog and digitial I/O, PWM, Encoder input, etc.) that can be accessed remotely via network. We normally use them in conjunction with our actuators to interface with external devices, such as a gripper or pneumatic valve, or to get various sensor input into MATLAB.</p>
</div>
<div class="paragraph">
<p>Each device contains a 168MHz ARM microcontroller (STM32f407) and a 100 Mbit/s network port, so we found them to be very convenient for doing network tests. We selected two I/O Boards to act as the sender and receiver nodes and developed custom firmware in order to isolate the network stack. The resulting firmware was based on <a href="http://www.chibios.org/">ChibiOS 2.6.8</a> and <a href="http://savannah.nongnu.org/projects/lwip/">lwIP 1.4.1</a>. The relevant code pieces can be found <a href="https://gist.github.com/ennerf/36a57d432bcff20a58efcdee10f91bd9">here</a>. The elapsed time was measured using a hardware counter with a resolution of 250ns.</p>
</div>
<div class="paragraph">
<p>Since there was no way to store multiple Gigabytes on these devices, we decided to log data remotely using a UDP service that can receive measurement data and persist to disk (see  <a href="https://gist.github.com/ennerf/0ddc4396d15852d28e4eca4a8a923eb7">code</a>). In order to avoid stalls caused by disk I/O, the main socket handler wrote into a double buffered structure that got persisted by a background thread. The synchronization between the threads was done using a <a href="http://stuff-gil-says.blogspot.com/2014/11/writerreaderphaser-story-about-new.html">WriterReaderPhaser</a>, which is a synchronization primitive that allows readers to flip buffers while keeping writers wait-free. We found this primitive to be very useful for persisting events that are represented by small amounts of data.</p>
</div>
<div class="paragraph">
<p>The step by step flow was as follows:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Sender wakes up at a fixed rate, e.g., 100Hz</p>
</li>
<li>
<p>Sender increments sequence number</p>
</li>
<li>
<p>Sender measures time ("transmit timestamp") and sends packet to receiver</p>
</li>
<li>
<p>Receiver echoes packet back to sender</p>
</li>
<li>
<p>Sender receives packet and measures time ("receive timestamp")</p>
</li>
<li>
<p>Sender sends measurement to logging server</p>
</li>
<li>
<p>Logging server receives measurement and persists to disk</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>The resulting binary data was loaded into MATLAB&#169; for analysis and visualization. The code for reading the binary file can be found <a href="https://gist.github.com/ennerf/19b48406a066f6e946a0567a1a4de1ed">here</a>.</p>
</div>
<div class="paragraph">
<p>The round-trip time is the difference between the receive and transmit timestamps. We also recorded the sequence number of each packet and the ip address of the receiver node in order to detect packet loss and track ordering.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_udp_datagram_size">UDP datagram size</h2>
<div class="sectionbody">
<div class="paragraph">
<p>UDP datagrams include a variety of headers that result in a minimum of 66 bytes of overhead. Additionally, Ethernet Frames have a minimum size of 84 bytes, which makes the minimum payload for a UDP Datagram 18 bytes. The rough structure is shown below. More detailed information can be found at <a href="https://en.wikipedia.org/wiki/Ethernet_frame">Ethernet II</a>,  <a href="https://en.wikipedia.org/wiki/IPv4">Internet Protocol (IPv4)</a>, and <a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol">User Datagram Protocol (UDP)</a>.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/ethernet-ip-udp-header.png"><img src="https://ennerf.github.io/images/udp/ethernet-ip-udp-header.png" alt="ethernet ip udp header.png" width="100%"></a>
</div>
<div class="title">Figure 2. UDP / IPv4 / Ethernet II Frame Structure</div>
</div>
<div class="paragraph">
<p>Although this overhead may seem high for traditional automation applications with small payloads (&lt;10 bytes), it quickly amortizes when communicating with smarter devices. For example, each one of our <a href="http://hebirobotics.com/products/">X-Series</a> actuators contains more than 40 sensors (position, velocity, torque, 3-axis gyroscope, 3-axis accelerometer, several temperature sensors, etc.) that get combined into a single packet that uses between 185 and 215 bytes payload. Typical feedback packets from an I/O Board are even larger and require about 300 bytes. When comparing overhead it is also important to consider the available bandwidth, i.e., as sending 100 bytes over Gigabit Ethernet (even over 100 Mbit/s) tends to be faster than sending a single byte using traditional non-Ethernet based alternatives such as RS485 or CAN Bus.</p>
</div>
<div class="paragraph">
<p>For these benchmarks we chose to measure the round-trip time for a payload of 200 bytes. After including all overhead, the actual size on the wire is 266 bytes. The theoretical time it takes to transfer 266 bytes over 100 Mbit/s and 1Gbit/s Ethernet is 20.3us and 2.03us respectively.</p>
</div>
<div class="paragraph">
<p>Note that while the size is representative of a typical actuator feedback packet, the round-trip times in production may be faster because outgoing packets (commands) tend to be significantly smaller than response packets (feedback).</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_baseline_single_switch">Baseline - Single Switch</h2>
<div class="sectionbody">
<div class="paragraph">
<p>We can establish a baseline of the best-case round-trip time by having the sender and receiver nodes communicate with each other through a single Switch that does not see any external traffic. We did not setup a point-to-point connection without any Switches because the logging server needed to be on the same network and because we rarely see this case in practice.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/setup-baseline.png"><img src="https://ennerf.github.io/images/udp/setup-baseline.png" alt="setup baseline.png" width="100%"></a>
</div>
<div class="title">Figure 3. Baseline setup using single Switch</div>
</div>
<div class="paragraph">
<p>We set the frequency to 100Hz and logged data for ~24 hours. We chose this frequency because it is a common control rate for sending high-level trajectories, and because 10ms is a safe deadline in case there are large outliers. During normal operations we typically used rates between 100-200Hz for updating set targets of controllers that get executed on-board each device (e.g. position/velocity/torque), and rates of up to 1KHz when bypassing local controllers and remotely controlling the output (e.g. PWM). The network would technically support even higher rates, but there are usually other limitations that come into play at around 1KHz (e.g. OS scheduler and limited sensor polling rates).</p>
</div>
<div class="paragraph">
<p>First, we looked at the jitter of the underlying embedded real-time operating system (RTOS). The figure below shows the difference between an idealized signal that ticks every 10ms and the measured transmit timestamps. 99% are within the lowest measurement resolution (250ns), and the total observed range is slightly below 6us. Note that this is significantly better than the 150us base jitter range we observed on real-time Linux (see <a href="https://ennerf.github.io/2016/09/20/A-Practical-Look-at-Latency-in-Robotics-The-Importance-of-Metrics-and-Operating-Systems.html">The Importance of Metrics and Operating Systems</a>).</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/os-jitter-embedded.png"><img src="https://ennerf.github.io/images/udp/os-jitter-embedded.png" alt="os jitter embedded.png" width="100%"></a>
</div>
<div class="title">Figure 4. OS jitter of ChibiOS 2.6.8 on STM32F407 (24h)</div>
</div>
<div class="paragraph">
<p>The two figures below show the round-trip time for all packets and the corresponding percentile distribution. There were a total of 8.5 million messages. None of them were lost and none of them arrived out of order.</p>
</div>
<div id="img-rtt-24h" class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-baseline.png"><img src="https://ennerf.github.io/images/udp/rtt-baseline.png" alt="rtt baseline.png" width="100%"></a>
</div>
<div class="title">Figure 5. RTT for 200 byte payload (24h)</div>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-baseline-zoomed.png"><img src="https://ennerf.github.io/images/udp/rtt-baseline-zoomed.png" alt="rtt baseline zoomed.png" width="100%"></a>
</div>
<div class="title">Figure 6. Zoomed in view of RTT for 200 byte payload (10min)</div>
</div>
<div class="paragraph">
<p>90% of all packets arrived within 194us and a jitter of less than 1 microsecond. Roughly 80us of this time was spent on the wire, so using chips that support Gigabit (rather than 100Mbit) could lower the round-trip time to ~120us. Above the common case, there were three different periodically reoccuring modes that resulted in additional latency.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Mode 1 occurs consistently every ~5.3 minutes and lasts for ~15.01 seconds. During this time it adds up to 4 us latency.</p>
</li>
<li>
<p>Mode 2 occurs exactly once every 5 seconds and is always at 210 us.</p>
</li>
<li>
<p>Mode 3 occurs roughly once an hour and adds linearly increasing latency of up to 150 us to 10 packets.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The zoomed in view of a 10 minute time span highlights Modes 1 and 2. All three modes seemed to be related to actual time and independent of rate and packet count. We were unable to find the root cause of these modes, but after several tests we strongly suspected that all of them were caused by the programmed firmware rather than being tied to the Switch or the actual protocol.</p>
</div>
<div class="paragraph">
<p>Overall this initial data looked very promising for being able to use UDP for many real-time control tasks. With more tuning and a better implementation (e.g. lwip with zero copy and tuned options) it seems likely that the maximum jitter could be reduced to below 6us and maybe even below 1us.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_switching_cost">Switching Cost</h2>
<div class="sectionbody">
<div class="paragraph">
<p>As mentioned in the background section, most modern Switches use the 'store-and-forward' approach that requires the Switch to fully receive a packet before forwarding it appropriately. Therefore, the latency cost per Switch is the time a packet spends on the wire plus any switching overhead. The wire time is constant (2.03us or 20.3us for 266 bytes), but the overhead depends on the Switch implementation. It can be difficult to find good performance data for specific devices, so depending on your requirements you may need to conduct your own benchmarks if you need to evaluate hardware.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/setup-switching-cost.png"><img src="https://ennerf.github.io/images/udp/setup-switching-cost.png" alt="setup switching cost.png" width="100%"></a>
</div>
<div class="title">Figure 7. Benchmark setup with additional Switch</div>
</div>
<div class="paragraph">
<p>For this benchmark we tested the three following Switches and added them individually to the baseline setup as shown above,</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8863MLL_FLL_RLL_DS.pdf">MICREL KSZ8863</a> (embedded in X-Series actuators)</p>
</li>
<li>
<p><a href="http://www.downloads.netgear.com/files/GDC/GS105/GS105_datasheet_04Sept03.pdf">NETGEAR ProSAFE GS105</a></p>
</li>
<li>
<p><a href="https://routerboard.com/RB750Gr2">MikroTik RB750Gr2 (RouterBOARD hEX)</a> (technically a Router, but disabling DHCP makes it act similar to a Switch)</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>In total there were about 1 million packets. Again, we did not observe any packet loss or out-of-order delivery.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-switch-comparison.png"><img src="https://ennerf.github.io/images/udp/rtt-switch-comparison.png" alt="rtt switch comparison.png" width="100%"></a>
</div>
<div class="title">Figure 8. Comparison of RTT through different Switches (35min)</div>
</div>
<div class="paragraph">
<p>The figure below shows a zoomed view of the time series highlighting the added jitter characteristics. Modes 1 and 3 do not seem to be affected by additional switches. Mode 2 remains constant at 210 us and disappears for higher round-trip times, indicating an issue at the receiving step of the sender.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/comparison-switch-latency.png"><img src="https://ennerf.github.io/images/udp/comparison-switch-latency.png" alt="comparison switch latency.png" width="100%"></a>
</div>
<div class="title">Figure 9. Zoomed in view of Switch comparison (10min)</div>
</div>
<div class="paragraph">
<p>Both KSZ8863 and the RB750Gr2 add a constant switching latency of 2.9 us and 3.6 us in addition to the wire time of 40.6 us and 4.06 us respectively to the RTT. The added jitter seems to be negligible at well below 1us. Surprisingly, the GS105 seems to have problems with this use case, resulting in higher latency and more jitter than the KSZ8863 even though it was connected using Gigabit. More details are in the table below.</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 50%;">
<col style="width: 16.6666%;">
<col style="width: 16.6666%;">
<col style="width: 16.6668%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Switch</th>
<th class="tableblock halign-left valign-top">Connection</th>
<th class="tableblock halign-left valign-top">90%-ile RTT</th>
<th class="tableblock halign-left valign-top">Overhead (not-on-wire)</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>Baseline</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>2x 100 MBit/s</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>193.8 us</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>112.6 us</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>MICREL KSZ8863</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>100 Mbit/s</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>+43.5 us</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>2.9 us</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>NETGEAR ProSAFE GS105</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>1 Gbit/s</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>+51.0 us</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>47 us</p>
</div></div></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>MikroTik RB750Gr2 (RouterBOARD hEX)</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>1 Gbit/s</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>+7.7 us</p>
</div></div></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>3.6 us</p>
</div></div></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>According to the <a href="http://www.downloads.netgear.com/files/GDC/GS105/GS105_datasheet_04Sept03.pdf">GS105 spec sheet</a>, the added network latency should be below 10us for 1 Gbit/s and 20us for 100 Mbit/s connections. We did additional tests and the GS105 did seem to perform according to spec when using exclusively 100 Mbit/s or 1 Gbit/s on all ports.</p>
</div>
<div class="paragraph">
<p>We also conducted another baseline test that replaced the GS105 with a RB750Gr2. While we found a consistent improvement of 0.5us, we did not consider this significant enough to rerun all tests.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_scaling_to_many_devices">Scaling to Many Devices</h2>
<div class="sectionbody">
<div class="paragraph">
<p>So far all tests were measuring the round-trip time between a sender node and a single target node. Since real robotic systems can contain many devices, e.g., one per axis or degree of freedom, we also looked at how UDP performs with multiple devices on the same network. In conversations with other roboticists we often found an expectation that there would be significant packet loss if multiple packets were to arrive at a Switch at the same time. The worst case would occur if all devices were connected to a single Switch as shown below.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/setup-bursting.png"><img src="https://ennerf.github.io/images/udp/setup-bursting.png" alt="setup bursting.png" width="100%"></a>
</div>
<div class="title">Figure 10. Multiple devices connected to a single Switch</div>
</div>
<div class="paragraph">
<p>In order to test the actual behavior we setup a test consisting of 40 HEBI Robotics I/O boards that were connected to a single 48-port Ethernet Switch (<a href="http://www.downloads.netgear.com/files/GDC/GS748Tv1/GS748T_ds_03Feb05.pdf">GS748T</a>). All devices were running the same (receiver) firmware as before, so sending a single broadcast message triggered 40 response packets that caused more than 10 KB of total traffic to arrive at the Switch within occasionally less than 250 nanoseconds. These <a href="https://en.wikipedia.org/wiki/Micro-bursting_(networking)">Microbursts</a> were well beyond the sustainable bandwidth of Gigabit Ethernet. The setup shown below was representative of a high degree of freedom system such as a full body humanoid robot without daisy-chaining.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/multiple-boards.jpg"><img src="https://ennerf.github.io/images/udp/multiple-boards.jpg" alt="multiple boards.jpg" width="100%"></a>
</div>
<div class="title">Figure 11. Network test setup with 40 HEBI Robotics I/O Boards</div>
</div>
<div class="paragraph">
<p>We would also like to mention that this setup heavily benefited from two side effects of using a standard Ethernet stack. First, there was no need for any manual addressing because of <a href="https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol">DHCP</a> and device specific globally unique mac addresses. Second, we were able to re-program the firmware on all 40 devices simultaneously within 3-6 seconds due to the fact that we had a bootloader with TCP/IP support. It would have been very tedious to setup such a system if any step had required manual intervention.</p>
</div>
<div class="paragraph">
<p>Since the combined responses resulted in more load than the sender device was able to easily handle, we exchanged the sender I/O Board with a <a href="http://www.gigabyte.com/products/product-page.aspx?pid=4888#ov">Gigabyte Brix i7-4770R</a> desktop computer running Scientific Linux 6.6 with a real-time kernel. We setup the system as described in <a href="https://ennerf.github.io/2016/09/20/A-Practical-Look-at-Latency-in-Robotics-The-Importance-of-Metrics-and-Operating-Systems.html">The Importance of Metrics and Operating Systems</a> and disabled the firewall.</p>
</div>
<div class="paragraph">
<p>Running the benchmark at 100Hz for ~90 minutes resulted in more than 20 million measurements.</p>
</div>
<div class="paragraph">
<p>Again, we first looked at the jitter of the underlying operating system. The figure below shows the difference between an idealized signal that ticks every 10ms and the measured transmit timestamps. It shows that this setup suffers from more than an order of magnitude more jitter than the embedded RTOS. Note that the corresponding jHiccup control chart looks identical as in the OS blog post.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/os-jitter-linux-rt.png"><img src="https://ennerf.github.io/images/udp/os-jitter-linux-rt.png" alt="os jitter linux rt.png" width="100%"></a>
</div>
<div class="title">Figure 12. Operating system jitter of Scientific Linux 6.6 with MRG Realtime</div>
</div>
<div class="paragraph">
<p>The two figures below show the round-trip time for each measurement. It may be surprising, but there was again no packet loss or re-ordering of packets from any single source.</p>
</div>
<div class="paragraph">
<p>Rather than packets being dropped, what actually happened was that all packets were stored in the internal 1.6 MB buffer of the switch, queued, and forwarded to the target port as fast as possible. Since the sender was connected via Gigabit, the packets arrived roughly every ~2us. The time axis in the chart is based on the transmit timestamp, so each cycle shows up as vertical column in the graphs. We also conducted the same test at 1KHz and found identical results.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-linux-40x-zoomed.png"><img src="https://ennerf.github.io/images/udp/rtt-linux-40x-zoomed.png" alt="rtt linux 40x zoomed.png" width="100%"></a>
</div>
<div class="title">Figure 13. Zoomed in RTT for 40 devices</div>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-linux-40x.png"><img src="https://ennerf.github.io/images/udp/rtt-linux-40x.png" alt="rtt linux 40x.png" width="100%"></a>
</div>
<div class="title">Figure 14. RTT for 40 devices (90 min)</div>
</div>
<div class="paragraph">
<p>However, the amount of latency and jitter turned out to be worse than we anticipated. We expected most columns to start at around ~180us and end at ~280us. While this was sometimes the case, the majority of columns started above 300 us. After some initial research we suspected that this delay was mostly caused by the Linux <a href="https://en.wikipedia.org/wiki/New_API">NAPI</a> using polling mode rather than interrupts, and by using a low-cost network interface paired with suboptimal device drivers. While we expected the OS and driver stack to introduce additional latency and jitter, we were surprised by the order of magnitude.</p>
</div>
<div class="paragraph">
<p>The installed network interface and driver are below.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-shell" data-lang="shell">$ lspci | grep Ethernet</code></pre>
</div>
</div>
<div class="paragraph">
<p>03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-shell" data-lang="shell">$ sudo dmesg | grep "Ethernet driver"</code></pre>
</div>
</div>
<div class="paragraph">
<p>r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Even consumer-grade Ethernet networks can exhibit very deterministic performance with regards to latency. In the more than 100 million packets that were sent for this blog post, we did not observe any packet loss or out-of order delivery. Even when communicating with 40 smart devices that represent a total of 1.600 sensors at a rate of 1KHz we found the network to be very reliable. While we still believe that large and dangerous industrial robots should be controlled using specialized industrial networking equipment, we feel that standard UDP is more than sufficient for most robotic applications.</p>
</div>
<div class="paragraph">
<p>We also found that most of the observed latency and jitter were caused by the underlying operating systems and their device drivers. To further illustrate this point we did additional comparisons of the baseline setup with the sender node running on different operating systems. The configurations were as follows:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>ChibiOS 2.6.8 with lwIP 1.4.1 on 168 MHz STM32F407</p>
</li>
<li>
<p>Windows 10 on Gigabyte Brix-i7-4470R with Realtek NIC</p>
</li>
<li>
<p>Scientific Linux 6.6 with MRG Realtime on Gigabyte Brix-i7-4470R with Realtek NIC</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The two charts below show the round trip time for each system communicating with a single I/O Board over a single Switch. Note that Linux and Windows were connected to the Switch via Gigabit and should have received datagrams ~40us before the embedded device.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-linux-1x-comparison-10h.png"><img src="https://ennerf.github.io/images/udp/rtt-linux-1x-comparison-10h.png" alt="rtt linux 1x comparison 10h.png" width="100%"></a>
</div>
<div class="title">Figure 15. Baseline RTT comparing RTOS vs RT-Linux vs Windows (10h)</div>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="https://ennerf.github.io/images/udp/rtt-linux-1x-comparison-10m.png"><img src="https://ennerf.github.io/images/udp/rtt-linux-1x-comparison-10m.png" alt="rtt linux 1x comparison 10m.png" width="100%"></a>
</div>
<div class="title">Figure 16. Zoomed in baseline RTT comparing RTOS vs RT-Linux vs Windows (10min)</div>
</div>
<div class="paragraph">
<p>We realize that there are many more interesting questions that were beyond the scope of this work. We are currently considering the following networking-related topics for future blog posts:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Comparison of device drivers and network interfaces from various vendors</p>
</li>
<li>
<p>Performance impact of uncontrolled traffic (e.g. streaming video)</p>
</li>
<li>
<p>Redundant routes and sudden disconnects</p>
</li>
<li>
<p>Controlling through wireless networks</p>
</li>
<li>
<p>Clock drift and time synchronization using IEEE 1588v2</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If there are other topics that you think would be worth covering, please leave a note in the comment section. If you are working for a hardware vendor that specializes in low-latency networking equipment and would be willing to provide samples for evaluation, please contact us through our  <a href="http://hebirobotics.com/contact/">website</a>.</p>
</div>
</div>
</div>]]></description><link>https://ennerf.github.io/2016/11/23/Analyzing-the-viability-of-Ethernet-and-UDP-for-robot-control.html</link><guid isPermaLink="true">https://ennerf.github.io/2016/11/23/Analyzing-the-viability-of-Ethernet-and-UDP-for-robot-control.html</guid><category><![CDATA[Latency]]></category><category><![CDATA[ Ethernet]]></category><category><![CDATA[ UDP]]></category><category><![CDATA[ real-time control]]></category><dc:creator><![CDATA[Florian Enner]]></dc:creator><pubDate>Wed, 23 Nov 2016 00:00:00 GMT</pubDate></item><item><title><![CDATA[A Practical Look at Latency in Robotics : The Importance of Metrics and Operating Systems]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>This is the first in a series of blog posts where I will try to share some of my own impressions and findings that have stemmed from several years of creating tools for robotics research.</p>
</div>
<div class="paragraph">
<p>Latency is an important practical concern in the robotics world that is often poorly understood. I feel that a better understanding of latency can help robotics researchers and engineers make design and architecture decisions that greatly streamline and accelerate the R&amp;D process. I&#8217;ve personally spent many hours looking for information on the latency characteristics of various robotic components, but have had difficulty finding anything that is clearly presented or backed by solid data. From what I&#8217;ve found, most benchmarks focus on the maximum throughput and either ignore the subject of latency or measure it incorrectly.</p>
</div>
<div class="paragraph">
<p>Because of this my first post is on the topic of latency and will cover two main topics:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>The details of how you measure latency matters.</p>
</li>
<li>
<p>The OS that you use affects your latency.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>My own background is in academic research. I&#8217;ve spent several years as a staff software engineer at the Robotics Institute at Carnegie Mellon University and I am a co-founder of HEBI Robotics, a startup developing modular robotic components. The teams I&#8217;ve been part of have worked on many different types of robots, including <a href="https://youtu.be/heOXsEnGb20">collaborative manipulators</a>, <a href="https://www.youtube.com/watch?v=zaPtxre4tFc">wheeled robots</a>, <a href="https://www.youtube.com/watch?v=7Mh3kqxle1c">walking robots</a> and <a href="https://www.youtube.com/watch?v=DUgt3NwzN-c">snake robots</a>.</p>
</div>
<div class="paragraph">
<p>Over time, we have come to believe that robotics research can be greatly accelerated by designing systems that relax hard real-time requirements on the software that is exposed to users. One approach is to implement low-level control (motor control and safety features) in dedicated components that are decoupled from high-level control (position, velocity, torque, etc.). In many cases, this can enable users to leverage common consumer hardware and software tools that can accelerate development.</p>
</div>
<div class="paragraph">
<p>For example, the Biorobotics lab at Carnegie Mellon has researchers that tend to be mechanical or electrical engineers, rather than computer scientists or software engineers. As such tend they to be less familiar with Linux and C/C++ and much more comfortable with Windows/macOS and scripting languages like MATLAB. After our lab started providing cross-platform support and bindings for MATLAB (in ~2011), we ended up seeing a significant increase in our research output that roughly doubled the lab&#8217;s paper publications related to snake robots. In particular, the lab has been able to develop and demonstrate new complex behaviors that would have been difficult to prototype beforehand (see <a href="https://youtu.be/NJ1FIsjt0yE">compliant control</a> or <a href="https://youtu.be/0CNQMiQnesc">inside pipes</a>).</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_measuring_latency">Measuring Latency</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Robots are controlled in <em>real-time</em>, which means that a command gets executed within a <em>deadline</em> (fixed period of time). There are <em>hard real-time</em> systems that must never exceed their deadline, and <em>soft real-time</em> systems that are able to occasionally exceed their deadline. Missing deadlines when performing motor control of a robot can result in unwanted motions and 'jerky' behavior.</p>
</div>
<div class="paragraph">
<p>Although there is a lot of information on the theoretical definition of these terms, it can be challenging to determine the maximum deadline (point at which a system&#8217;s performance starts to degrade or become unsafe) for practical applications. This is especially true for research institutions that build novel mechanisms and target cutting-edge applications. Many research groups end up assuming that everything needs to be hard real-time with very stringent deadlines. While this approach provides solid performance guarantees, it can also create a lot of unnecessary development overhead.</p>
</div>
<div class="paragraph">
<p>Many benchmarks and tools make the assumption that latency follows a <a href="https://en.wikipedia.org/wiki/Gaussian_function">Gaussian distribution</a> and report only the mean and the standard deviation. Unfortunately, latency tends to be very <a href="https://en.wikipedia.org/wiki/Multimodal_distribution">multi-modal</a> and the most important part of the distribution when it comes to determinism are the 'outliers'. Even if a system&#8217;s latency behaves as expected in 99% of the cases, the leftover 1% can be worse than all of the other 99% of measurements combined. Looking at only the mean and standard deviation completely fails to capture the more systemic issues. For example, I&#8217;ve seen many data sets where the worst observed case was more than 1000 standard deviations away from the mean. Such stutters are usually the main problem when working on real robotic systems.</p>
</div>
<div class="paragraph">
<p>Because of this, a more appropriate way to look at latency is via histograms and percentile plots, e.g., <em>"99.9% of measurements were below X ms"</em>. There are several good resources about recording latency out there that I recommend checking out, such as <a href="https://youtu.be/lJ8ydIuPFeU">How NOT to Measure Latency</a> or <a href="http://psy-lob-saw.blogspot.com/2015/02/hdrhistogram-better-latency-capture.html">HdrHistogram: A better latency capture method</a>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_operating_systems">Operating Systems</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The operating system is at the base of everything. No matter how performant the high-level software stack is, the system is fundamentally bound by the capabilities of the OS, it&#8217;s scheduler, and the overall load on the system. Before you start optimizing your own software, you should make sure that your goal is actually achievable on the underlying platform.</p>
</div>
<div class="paragraph">
<p>There are trade-offs between always responding in a timely manner and overall performance, battery life, as well as several other concerns. Because of this, the major consumer operating systems don&#8217;t guarantee to meet hard deadlines and can theoretically have arbitrarily long pauses.</p>
</div>
<div class="paragraph">
<p>However, since using operating systems that users are familiar with can significantly ease development, it is worth evaluating their actual performance and capabilities. Even though there may not be any theoretical guarantees, the practical differences are often not noticeable.</p>
</div>
<div class="paragraph">
<p>Developing hard real-time systems has a lot of pitfalls and can require a lot of development effort. Requiring researchers to write hard real-time compliant code is not something that I would recommend.</p>
</div>
<div class="sect2">
<h3 id="_benchmark_setup">Benchmark Setup</h3>
<div class="paragraph">
<p><a href="https://www.azul.com">Azul Systems</a> sells products targeted at latency sensitive applications and they&#8217;ve created a variety of useful tools to measure latency. <a href="https://github.com/giltene/jHiccup">jHiccup</a> is a tool that measures and records system level latency spikes, which they call 'hiccups'. It measures the time for <em>sleep(1ms)</em> and records the delta to the fastest previously recorded sample. For example, if the fastest sample was 1ms, but it took 3ms to wake up, it will record a 2ms hiccup. Hiccups can be caused by a large number of reasons, including scheduling, paging, indexing, and many more. By running it on an otherwise idle system, we can get an idea of the behavior of the underlying platform. It can be started with the following command:</p>
</div>
<div class="listingblock">
<div class="content">
<pre># record logs each second for 48 hours
intervalMs=1000
runtimeMs=172800000
java -javaagent:jHiccup.jar="-d 0 -i ${intervalMs}" -cp jHiccup.jar org.jhiccup.Idle -t ${runtimeMs}</pre>
</div>
</div>
<div class="paragraph">
<p>jHiccup uses <a href="https://github.com/HdrHistogram/HdrHistogram">HdrHistogram</a> to record samples and to generate the output log. There are a variety of tools and utilities for interacting and visualizing these logs. The graphs in this post were generated by my own <a href="https://github.com/ennerf/HdrHistogramVisualizer">HdrHistogramVisualizer</a>.</p>
</div>
<div class="paragraph">
<p>To run these tests, I setup two standard desktop computers, one for Mac tests and one for everything else.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Mac Mini 2014, i7-3720QM @ 2.6 GHz, 16 GB 1600 MHz DDR3</p>
</li>
<li>
<p>Gigabyte Brix BXi7-4770R, i7-4770R @ 3.2 GHz, 16 GB 1600 MHz DDR3</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Note that when doing latency tests on Windows it is important to be aware of the system timer. It has variable timer intervals that range from 0.5ms to 15.6ms. By calling <em>timeBeginPeriod</em> and <em>timeEndPeriod</em> applications can notify the OS whenever they need a higher resolution. The timer interrupt is a global resource that gets set to the lowest interrupt interval requested by any application. For example, watching a video in Chrome requests a timer interrupt interval of 0.5ms. A lower period results in a more responsive system at the cost of overall throughput and battery life. <a href="https://vvvv.org/contribution/windows-system-timer-tool">System Timer Tool</a> is a little utility that let&#8217;s you view the current state. jHiccup automatically requests a 1ms timer interval by calling Java&#8217;s <em>Thread.sleep()</em> with a value of below 10ms.</p>
</div>
</div>
<div class="sect2">
<h3 id="_windows_mac_linux">Windows / Mac / Linux</h3>
<div class="paragraph">
<p>Let&#8217;s first look at the performance of consumer operating systems: Windows, Mac and Linux. Each test started off with a clean install for each OS. The only two modifications to the stock installation were to disable sleep mode and to install JDK8 (update 101). I then started jHiccup, unplugged all external cables and let the computer sit 'idle' for &gt;24 hours. The actual OS versions were,</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Windows 10 Enterprise, version 1511 (OS build: 10586.545)</p>
</li>
<li>
<p>OS X, version 10.9.5</p>
</li>
<li>
<p>Ubuntu 16.04 Desktop, kernel 4.4.0-31-generic</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Each image below contains two charts. The top section shows the worst hiccup that occured within a given interval window, i.e., the first data point shows the worst hiccup within the first 3 minutes and the next data point shows the worst hiccup within the next 3 minutes. The bottom chart shows the percentiles of all measurements across the entire duration. Each 24 hour data set contains roughly 70-80 million samples.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="/images/os/osx-win-ubuntu_24h.png"><img src="https://ennerf.github.io/images/os/osx-win-ubuntu_24h.png" alt="Windows vs. Linux vs. Mac hiccups (24h)"></a>
</div>
<div class="title">Figure 1. Windows vs. Linux vs. Mac hiccups (24h)</div>
</div>
<div class="paragraph">
<p>These results show that Linux had fewer and lower outliers than Windows. Up to the 90th percentile all three systems respond relatively similarly, but there are significant differences at the higher percentiles. There also seems to have been a period of increased system activity on OSX after 7 hours. The chart below shows a zoomed in view of a 10 minute time period starting at 10 hours.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="/images/os/osx-win-ubuntu_10m.png"><img src="https://ennerf.github.io/images/os/osx-win-ubuntu_10m.png" alt="Windows vs. Linux vs. Mac hiccups (10 min)"></a>
</div>
<div class="title">Figure 2. Windows vs. Linux vs. Mac hiccups (10min)</div>
</div>
<div class="paragraph">
<p>Zoomed in we can see that the Windows hiccups are actually very repeatable. 99.9% are below 2ms, but there are frequent spikes to relatively discrete values up to 16ms. This also highlights the importance of looking at the details of the latency distribution. In other data sets that I&#8217;ve seen, it is rare for the worst case to be equal to the 99.99% percentile. It&#8217;s also interesting that the distribution for 10 minutes looks identical to the 24 hour chart. OSX shows similar behavior, but with lower spikes. Ubuntu 16.04 is comparatively quiet.</p>
</div>
<div class="paragraph">
<p>It&#8217;s debatable whether this makes any difference for robotic systems in practice. All of the systems I&#8217;ve worked with either had hard real-time requirements below 1ms, in which case none of these OS would be sufficient, or they were soft real-time systems that could handle occasional hiccups to 25 or even 100 ms. I have yet to see one of our robotic systems perform perceivably worse on Windows versus Linux.</p>
</div>
</div>
<div class="sect2">
<h3 id="_real_time_linux">Real Time Linux</h3>
<div class="paragraph">
<p>Now that we have an understanding of how traditional systems without tuning perform, let&#8217;s take a look at the performance of Linux with a real-time kernel. The rt kernel (PREEMPT_RT patch) can preempt lower priority tasks, which results in worse overall performance, but more deterministic behavior with respect to latency.</p>
</div>
<div class="paragraph">
<p>I chose Scientific Linux 6 because of it&#8217;s support for <a href="https://access.redhat.com/products/red-hat-enterprise-mrg-realtime">Red Hat&#174; Enterprise MRG Realtime&#174;</a>. You can download the  <a href="http://ftp.scientificlinux.org/linux/scientific/">ISO</a> and find instructions for installing MRG Realtime <a href="http://linux.web.cern.ch/linux/mrg/">here</a>. The version I&#8217;ve tested was,</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Scientific Linux 6.6, kernel 3.10.0-327.rt56.194.el6rt.x86_64</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Note that there is a huge number of tuning options that may improve the performance of your application. There are various tuning guides that can provide more information, e.g., Red Hat&#8217;s <a href="http://linux.web.cern.ch/linux/mrg/2.3/Red_Hat_Enterprise_MRG-2-Realtime_Tuning_Guide-en-US.pdf">MRG Realtime Tuning Guide</a>. I&#8217;m not very familiar with tuning systems at this level, so I&#8217;ve only applied the following small list of changes.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><em>/boot/grub/menu.lst</em> &#8658; <em>transparent_hugepage=never</em></p>
</li>
<li>
<p><em>/etc/sysctl.conf</em> &#8658; <em>vm.swappiness=0</em></p>
</li>
<li>
<p><em>/etc/inittab</em> &#8658; <em>id:3:initdefault</em> (no GUI)</p>
</li>
<li>
<p><em>chkconfig --level 0123456 cpuspeed off</em></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The process priority was set to 98, which is the highest priority available for real-time threads. I&#8217;d advise consulting
<a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/2/html/Realtime_Tuning_Guide/chap-Realtime-Specific_Tuning.html#Setting_scheduler_priorities">scheduler priorities</a> before deciding on priorities for tasks that actually use cpu time.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-shell" data-lang="shell"># find process id
pid=$(pgrep -f "[j]Hiccup.jar")

# show current priority
echo $(chrt -p $pid)

# set priority
sudo chrt -p 98 $pid</code></pre>
</div>
</div>
<div class="paragraph">
<p>Below is a comparison of the two Linux variants.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="/images/os/ubuntu-scl_24h.png"><img src="https://ennerf.github.io/images/os/ubuntu-scl_24h.png" alt="Linux vs. RT Linux hiccups (24h)"></a>
</div>
<div class="title">Figure 3. Linux vs. RT Linux hiccups (24h)</div>
</div>
<div class="paragraph">
<p>Looking at the 24 hour chart (above) and the 10 minute chart (below), we can see that worst case has gone down significantly. While Ubuntu 16.04 was barely visible when compared to Windows, it looks very noisy compared to the real-time variant. All measurements were within a 150us range, which is good enough for most applications.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="/images/os/ubuntu-scl_10m.png"><img src="https://ennerf.github.io/images/os/ubuntu-scl_10m.png" alt="Linux vs. RT Linux hiccups (10 min)"></a>
</div>
<div class="title">Figure 4. Linux vs. RT Linux hiccups (10 min)</div>
</div>
<div class="paragraph">
<p>I&#8217;ve also added the 24 hour chart for the real-time variant by itself to provide a better scale. Note that this resolution is getting close to the resolution of what we can measure and record.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="/images/os/scl_24h.png"><img src="https://ennerf.github.io/images/os/scl_24h.png" alt="RT Linux hiccups (24h)"></a>
</div>
<div class="title">Figure 5. RT Linux hiccups (24h)</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_summary">Summary</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I&#8217;ve tried to provide a basic idea of the out of the box performance of various off the shelf operating systems. In my experience the three major consumer OS can be treated relatively equal, i.e., either software will work well on all of them, or won&#8217;t work correctly on any of them. If you do work on a problem that does have hard deadlines, there are many different <a href="https://en.wikipedia.org/wiki/Comparison_of_real-time_operating_systems">RTOS</a> to choose from. Aside from the mentioned real-time Linux and the various embedded solutions, there are even real-time extensions for Windows, such as <a href="http://www.tenasys.com/overview-ifw">INtime</a> or <a href="http://kithara.com/en/products/realtime-suite">Kithara</a>.</p>
</div>
<div class="paragraph">
<p>We&#8217;ve had very good experiences with implementing the low-level control (PID loops, motor control, safety features, etc.) on a per actuator level. That way all of the safety critical and latency sensitive pieces get handled by a dedicated RTOS and are independent of user code. The high-level controller (trajectories and multi-joint coordination) then only needs to update set targets (e.g. position/velocity/torque), which is far less sensitive to latency and doesn&#8217;t require hard real-time communications. This approach enables quick prototyping of high-level behaviors using 'non-deterministic' technologies, such as Windows, MATLAB and standard UDP messages.</p>
</div>
<div class="paragraph">
<p>For example, the high-level control in <a href="https://youtu.be/zaPtxre4tFc">Teleop Taxi</a> was done over Wi-Fi from MATLAB running on Windows, while simultaneously streaming video from an Android phone in the back of the robot. By removing the requirement for a local control computer, it only took 20-30 lines of code (see  <a href="https://gist.github.com/ennerf/b349c56d320da1db89b298fd807f00e4">simplified</a>, <a href="https://gist.github.com/ennerf/7d59a9765da25ed7c02117da1805551c">full</a>) to run the entire demo. Actually using a local computer resulted in no perceivable benefit. While not every system can be controlled entirely through Wi-Fi, we&#8217;ve seen similar results even with more complex systems.</p>
</div>
<div class="sect2">
<h3 id="_latency_is_not_gaussian">Latency is not Gaussian</h3>
<div class="paragraph">
<p>Finally, I&#8217;d like to stress again that latency practically never follows a Gaussian distribution. For example, the maximum for OSX is more than 400 standard deviations away from the average. The table for these data sets is below.</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 23.0769%;">
<col style="width: 15.3846%;">
<col style="width: 15.3846%;">
<col style="width: 15.3846%;">
<col style="width: 15.3846%;">
<col style="width: 15.3847%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Samples</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Mean</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>StdDev</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Max</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>(max-mean) /stddev</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Windows 10</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">80,304,595</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.55 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.37</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">17.17 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">44.9</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OSX 10.9.5</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">65,282,969</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.32 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.03</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">12.65 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">411</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Ubuntu 16.04</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">78,039,162</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.10 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.01</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">3.03 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">293</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Scientific Linux 6.6-rt</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">79.753.643</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.08 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.01</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">0.15 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">7</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>The figure below compares the data&#8217;s actual distribution for Windows to a theoretical gaussian distribution. Rather than a classic 'bell-curve', it shows several spikes that are spread apart in regular intervals. The distance between these spikes is almost exactly one millisecond, which matches the Windows timer interrupt interval that was set while gathering the data. Interestingly, the spikes at above 2ms all seem to happen at roughly the same likelihood.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<a class="image" href="/images/os/windows-gaussian_distribution_24h.png"><img src="https://ennerf.github.io/images/os/windows-gaussian_distribution_24h.png" alt="Actual vs Gaussian Distribution for Windows"></a>
</div>
<div class="title">Figure 6. Actual Distribution compared to Gaussian-fit (Windows)</div>
</div>
<div class="paragraph">
<p>Using only mean and standard deviation for any sort of latency comparison can produce deceptive results. Aside from giving little to no information about the higher percentiles, there are many cases where systems with seemingly 'better' values exhibit worse actual performance.</p>
</div>
</div>
</div>
</div>]]></description><link>https://ennerf.github.io/2016/09/20/A-Practical-Look-at-Latency-in-Robotics-The-Importance-of-Metrics-and-Operating-Systems.html</link><guid isPermaLink="true">https://ennerf.github.io/2016/09/20/A-Practical-Look-at-Latency-in-Robotics-The-Importance-of-Metrics-and-Operating-Systems.html</guid><category><![CDATA[Latency]]></category><category><![CDATA[ Operating System]]></category><category><![CDATA[ Windows]]></category><category><![CDATA[ OSX]]></category><category><![CDATA[ Ubuntu]]></category><category><![CDATA[ Scientific Linux]]></category><category><![CDATA[ Sleep]]></category><category><![CDATA[ Real-Time]]></category><category><![CDATA[ HdrHistogram]]></category><dc:creator><![CDATA[Florian Enner]]></dc:creator><pubDate>Tue, 20 Sep 2016 00:00:00 GMT</pubDate></item></channel></rss>