Jekyll2023-09-19T10:32:34+00:00//navoshta.com/feed.xmlAlex Staravoitau’s BlogiOS & MLAlex StaravoitauFile system permissions and paths in iOS2021-02-20T00:00:00+00:002021-02-20T00:00:00+00:00//navoshta.com/ios-file-system<p>Although Juno makes coding on iPad a breeze, there are still some tricks you need to know — one of them is working with the file system and handling paths. For example, when your code is supposed to read file’s contents or write data to a file, how do you specify file’s location in iOS?</p>Alex StaravoitauAlthough Juno makes coding on iPad a breeze, there are still some tricks you need to know — one of them is working with the file system and handling paths. For example, when your code is supposed to read file’s contents or write data to a file, how do you specify file’s location in iOS?Jupyter client for iPad2018-02-10T00:00:00+00:002018-02-10T00:00:00+00:00//navoshta.com/juno<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-none"></i> Contents</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#jupyter" id="markdown-toc-jupyter">Jupyter</a></li>
<li><a href="#backends" id="markdown-toc-backends">Backends</a></li>
<li><a href="#bundled-notebooks" id="markdown-toc-bundled-notebooks">Bundled Notebooks</a></li>
<li><a href="#interface" id="markdown-toc-interface">Interface</a></li>
</ul>
</nav>
</aside>
<p>That’s why I thought that Jupyter is really missing a proper client iPad application with a native iOS interface, that would let you connect to a remote backend and work with Jupyter on your iPad — and finally, after months of making and beta testing my app <strong>Juno Connect</strong> has made it to the AppStore!</p>
<p><strong>Juno Connect</strong> is a Jupyter Notebook client for iPad, which allows you to connect to an arbitrary remote Jupyter Notebook server, and do pretty much everything you do in desktop Jupyter on your iPad. It supports hardware keyboard, code completion driven by your server’s kernel and has a beautiful touch friendly interface, that feels much more natural than trying to access Jupyter through your iPad’s Safari browser. Actually, some reviews suggest it’s easier to work with Jupyter in <strong>Juno Connect</strong> rather than on desktop! 😉</p>
<h1 id="jupyter">Jupyter</h1>
<p>I did cover Jupyter in my posts already, it’s an <a href="http://jupyter.org" target="_blank">interactive cloud computing environment</a>, where you can combine code execution, Markdown, LaTeX, plots and rich media. It supports over 40 programming languages (including Python, R, Julia and Scala) and most big data and machine learning tools.</p>
<p>Now, the most beautiful part is that code execution is separated from the development environment, which means that whenever you hit “Run” the hardware that actually executes your code and delivers the output can be anywhere where it can be reachable with a networking interface. Essentially, this means that with Juno Connect you can use your iPad to run code on a superpower computing cluster somewhere on another continent, and still receive output and feedback (including code completion suggestions!) in realtime. How awesome is that?</p>
<p><img src="/images/posts/juno/screenshot_h_01@2x.png" alt="image-center" class="align-center" /></p>
<p>I did realise, however, that Jupyter may not be the most user-friendly tool to work with, so I tried to make sure that Juno Connect provides the easiest entry point to using Jupyter with two things: backend integrations and bundled introductory notebooks.</p>
<h1 id="backends">Backends</h1>
<p>Jupyter can sometimes be tricky to setup for remote access. There are plenty of tutorials out there (including <a href="https://juno.sh/ssl-self-signed-cert/" target="_blank">mine</a> about configuring SSL), but some of them require additional knowledge of networking, command line interfaces and Unix systems. Luckily, there are cloud computing services that eliminate this by providing you a remote Jupyter Notebook environment out of the box, such as <a href="https://notebooks.azure.com" target="_blank">Azure Notebooks</a> and <a href="https://cocalc.com/" target="_blank">CoCalc</a>. Both have free tiers, although CoCalc also offers paid plans with less restricted access and better hardware.</p>
<p>What you get is a virtual server running Jupyter Notebook that you can access from anywhere in browser — or in Juno Connect as well! You can simply log in with your Microsoft or CoCalc account and access all your projects/libraries, and work with all your notebooks using Juno’s interface. It’s easier to think of it as a special preconfigured server that simply provides a computational backend for Juno Connect.</p>
<p><img src="/images/posts/juno/screenshot_h_02@2x.png" alt="image-center" class="align-center" /></p>
<h1 id="bundled-notebooks">Bundled Notebooks</h1>
<p>Even setting up an account with cloud computing service and trying to understand how Jupyter works can be a significant time investment for users not familiar with it. That’s why I have included a set of introductory notebooks that are available and runnable as soon as you download the app. They have plenty of sample code snippets and generated output (including stunning retina graphics), showing some of the amazing things you can do with Jupyter. Those notebooks are launched on temporary servers individually for each user, so any changes you make in these introductory notebooks will only appear for you, and will only persist until your server is restarted due to inactivity.</p>
<p>Under the hood Juno Connect uses <a href="https://mybinder.org" target="_blank">Binder</a> to launch these notebooks, Binder is a service that turns any GitHub repo into a collection of interactive notebooks by launching a temporary server for it. It works amazingly well, and I am planning to introduce a better integration with it in Juno Connect, essentially allowing users to launch any GitHub repo as a server right in the app.</p>
<p><img src="/images/posts/juno/screenshot_h_06@2x.png" alt="image-center" class="align-center" /></p>
<h1 id="interface">Interface</h1>
<p>I have spent quite some time trying to make the user interface touch- and iPad-friendly. I believe users have certain expectations in terms of UI when working with an iPad app, and writing code is something that hasn’t been tackled too often in other apps up until this point. So this has been quite a challenge, but I’m pretty happy with how it turned out eventually. It did take a couple of iterations (and a lot of feedback), but at least when it comes to notebook editing, the experience is much better now! What in my opinion makes Juno’s interface stand out is how it managed to declutter navigation panel using context actions and menus.</p>
<figure class="third ">
<a href="//navoshta.com/images/posts/juno/screenshot_v_01.png">
<img src="//navoshta.com/images/posts/juno/screenshot_v_01.png" alt="" />
</a>
<a href="//navoshta.com/images/posts/juno/screenshot_v_02.png">
<img src="//navoshta.com/images/posts/juno/screenshot_v_02.png" alt="" />
</a>
<a href="//navoshta.com/images/posts/juno/screenshot_v_05.png">
<img src="//navoshta.com/images/posts/juno/screenshot_v_05.png" alt="" />
</a>
<a href="//navoshta.com/images/posts/juno/screenshot_v_06.png">
<img src="//navoshta.com/images/posts/juno/screenshot_v_06.png" alt="" />
</a>
<a href="//navoshta.com/images/posts/juno/screenshot_v_08.png">
<img src="//navoshta.com/images/posts/juno/screenshot_v_08.png" alt="" />
</a>
<a href="//navoshta.com/images/posts/juno/screenshot_v_09.png">
<img src="//navoshta.com/images/posts/juno/screenshot_v_09.png" alt="" />
</a>
</figure>
<p>I would like to take this opportunity to thank all beta testers (more than 1200 of them!) who helped testing it and shared their feedback. Thank you once again, and I hope you will enjoy all the new things planned for <strong>Juno Connect</strong> in the coming year! Stay tuned. 😉</p>
<p align="center">
<a href="https://itunes.apple.com/app/juno-jupyter-notebook-client/id1315744137" target="_blank"><img src="/images/posts/juno/download_black.svg" style="height: 58px;" /></a>
</p>Alex StaravoitauI have been a huge fan of Jupyter for a while now, and most importantly of the flexibility it is offering: I strongly believe that the fact that you only need a screen and network connection to get access to pretty much unlimited computational resources has enormous potential.Self-signed SSL certificate in Jupyter2017-09-01T00:00:00+00:002017-09-01T00:00:00+00:00//navoshta.com/jupyter-ssl-self-signed-cert<p>In order to use Jupyter Notebook on iPad, one needs to correctly configure SSL certificates. Since issuing a proper certificate from a trusted authority could be challenging in some cases, a self-signed certificate should suffice, provided it was signed by a CA that is trusted by device. Follow these steps to get it working on your iPad!</p>Alex StaravoitauIn order to use Jupyter Notebook on iPad, one needs to correctly configure SSL certificates. Since issuing a proper certificate from a trusted authority could be challenging in some cases, a self-signed certificate should suffice, provided it was signed by a CA that is trusted by device. Follow these steps to get it working on your iPad!Visualizing lidar data2017-05-26T00:00:00+00:002017-05-26T00:00:00+00:00//navoshta.com/kitti-lidar<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-none"></i> Contents</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#dataset" id="markdown-toc-dataset">Dataset</a></li>
<li><a href="#dependencies" id="markdown-toc-dependencies">Dependencies</a></li>
<li><a href="#visualization" id="markdown-toc-visualization">Visualization</a> <ul>
<li><a href="#cameras" id="markdown-toc-cameras">Cameras</a></li>
<li><a href="#lidar" id="markdown-toc-lidar">Lidar</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
<p>Although lidars used to be the most expensive components of self-driving cars, and could easily cost you as much as $75,000 just a couple of years ago, prices have plummeted recently and there are really good lidar sensors on the market in sub-$8000 range these days. And it just keeps getting better as Velodyne has just <a href="http://www.businesswire.com/news/home/20170419005516/en/Velodyne-LiDAR-Announces-%E2%80%9CVelarray%E2%80%9D-LiDAR-Sensor">announced</a> a whole magnitude cheaper model range with a limited field-of-view, presumably costing just under $1000.</p>
<h1 id="dataset">Dataset</h1>
<p>Luckily, you don’t have to spend that much money to get hold of data generated by a lidar. <a href="http://www.cvlibs.net/datasets/kitti/">KITTI Vision Benchmark Suite</a> contains datasets collected with a car driving around rural areas of a city — a car equipped with a lidar and a bunch of cameras, of course. Some of those datasets are labeled, e.g. they also contain information about objects around it; we will visualize those as well. These datasets are <a href="http://www.cvlibs.net/datasets/kitti/raw_data.php">publicly available here</a>, if you would like to follow along just go ahead and download one of them.</p>
<p>I will use the <code class="language-plaintext highlighter-rouge">2011_09_26_drive_0001</code> dataset and corresponding tracklets, e.g. labeled surrounding objects. It is one of the smallest datasets out there (0.4 GB) which contains data for just 11 seconds of driving:</p>
<ul>
<li><strong>Length</strong>: 114 frames (00:11 minutes)</li>
<li><strong>Image resolution</strong>: <code class="language-plaintext highlighter-rouge">1392 x 512</code> pixels</li>
<li><strong>Labels</strong>: 12 Cars, 0 Vans, 0 Trucks, 0 Pedestrians, 0 Sitters, 2 Cyclists, 1 Trams, 0 Misc</li>
</ul>
<h1 id="dependencies">Dependencies</h1>
<p>A lidar operates by streaming a laser beam at high frequencies, generating a 3D point cloud as an output in realtime. We are going to use a couple of dependencies to work with the point cloud presented in the KITTI dataset: apart from the familiar toolset of <code class="language-plaintext highlighter-rouge">numpy</code> and <code class="language-plaintext highlighter-rouge">matplotlib</code> we will use <a href="https://github.com/utiasSTARS/pykitti"><code class="language-plaintext highlighter-rouge">pykitti</code></a>. In order to make tracklets parsing math easier we will use a couple of methods originally implemented by Christian Herdtweck that I have updated for Python 3, you can find them in <code class="language-plaintext highlighter-rouge">source/parseTrackletXML.py</code> in the project repo.</p>
<h1 id="visualization">Visualization</h1>
<h2 id="cameras">Cameras</h2>
<p>In addition to the lidar 3D point cloud data KITTI dataset also contains video frames from a set of forward facing cameras mounted on the vehicle. The regular camera data is not half as exciting as the lidar data, but is still worth checking out.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/kitti-lidar/cameras.png" alt="image-center" class="align-center" />
Sample frames from cameras</p>
<p>Camera frames look pretty straightforward: you can see a tram track on the right with a lonely tram far ahead and some parked cars on the left. Although those road features may seem obvious to detect to you, a computer vision algorithm would struggle to differentiate those by relying solely on the visual data.</p>
<h2 id="lidar">Lidar</h2>
<p>The dataset in question contains 114 lidar point cloud frames over duration of 11 seconds. This equals to approximately 10 frames per second, which is a very decent scanning rate, given that we get a 360° field-of-view with each frame containing approximately 120,000 points — a fair amount of data to stream in realtime. Not to clutter the visualizations we will randomly sample 20% of the points for each frame and discard the rest.</p>
<p>We will additionally visualize <em>tracklets</em>, e.g. labeled objects like cars, trams, pedestrians and so on. With a bit of math we will grab information from the KITTI tracklets file and work out each object’s bounding box for each frame, feel free to check out the <a href="https://github.com/alexstaravoitau/KITTI-Dataset/blob/master/kitti-dataset.ipynb">notebook</a> for more details. There are only 3 types of objects in this particular 11-seconds piece, we will mark them with bounding boxes as follows: cars will be marked in <strong>blue</strong>, trams in <strong>red</strong> and cyclists in <strong>green</strong>. Let’s first visualize a sample lidar frame on a 3D plot.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/kitti-lidar/lidar_frame.png" alt="image-center" class="align-center" />
Sample lidar frame</p>
<p>Looks pretty neat! You can see the car with a lidar in the center of a black circle, with laser beams coming out of it. You can even see silhouettes of the cars parked on the left side of the road and tram tracks on the right! And of course bounding boxes for tram and cars, they seem to be exactly where you would expect them looking at the regular camera data. You might have also noticed that only the objects that are visible to the cameras are labeled.</p>
<p>Having this data as a point cloud is extremely useful, as it can be represented in various ways specific to particular applications. You could scale the data points over some particular axis, or simply discard one of the axes to create a plane projection of the point cloud. This is what this velodyne frame would look like when projected on <code class="language-plaintext highlighter-rouge">XZ</code>, <code class="language-plaintext highlighter-rouge">XY</code> and <code class="language-plaintext highlighter-rouge">YZ</code> planes respectively:</p>
<p style="text-align: center;" class="small"><img src="/images/posts/kitti-lidar/lidar_frame_projections.png" alt="image-center" class="align-center" />
Projections of a sample lidar frame</p>
<p>Usually you can significantly improve your model performance by preprocessing the data. What you are trying to achieve is a reduction in dimensionality of the input, hoping to extract some useful features and remove those that would be redundant or slow down and confuse the model. In this particular case discarding <code class="language-plaintext highlighter-rouge">Z</code> coordinate seems like a promising path to explore, as it gives us pretty much a bird’s-eye view of the vehicle surroundings. With a more sophisticated feature-engineering coupled with regular camera data as an additional input, you could achieve decent performance on detecting and classifying surrounding objects.</p>
<p>Finally, let’s plot all 114 sequential frames and combine them into a short video representing how point cloud changes over time.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/kitti-lidar/pcl_data.gif" alt="image-center" class="align-center" />
Lidar data plotted over time</p>
<p>This should give a much better idea of what lidar data looks like. You can clearly see silhouettes of trees and parked cars that our vehicle is passing by — now <em>that</em> would be much easier for an algorithm to interpret. And although lidar is usually used in conjunction with a bunch of other sensors and data sources, it plays a significant role in vehicle <a href="https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping">simultaneous localization and mapping</a>.</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau" data-style="mega" data-count-href="/navoshta/followers" data-count-api="/users/navoshta#followers" data-count-aria-label="# followers on GitHub" aria-label="Follow @alexstaravoitau on GitHub">Follow @alexstaravoitau</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/KITTI-Dataset" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/KITTI-Dataset/stargazers" data-count-api="/repos/navoshta/KITTI-Dataset#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/KITTI-Dataset on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/KITTI-Dataset/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/KITTI-Dataset/network" data-count-api="/repos/navoshta/KITTI-Dataset#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/KITTI-Dataset on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/KITTI-Dataset/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/KITTI-Dataset on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>Alex StaravoitauArguably the most essential piece of hardware for a self-driving car setup is a lidar. A [lidar](https://en.wikipedia.org/wiki/Lidar) allows to collect precise distances to nearby objects by continuously scanning vehicle surroundings with a beam of laser light, and measuring how long it took the reflected pulses to travel back to sensor.Detecting road features2017-03-06T00:00:00+00:002017-03-06T00:00:00+00:00//navoshta.com/detecting-road-features<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-none"></i> Contents</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#source-video" id="markdown-toc-source-video">Source video</a></li>
<li><a href="#lane-tracking" id="markdown-toc-lane-tracking">Lane Tracking</a> <ul>
<li><a href="#camera-calibration" id="markdown-toc-camera-calibration">Camera calibration</a></li>
<li><a href="#edge-detection" id="markdown-toc-edge-detection">Edge detection</a></li>
<li><a href="#perspective-transform" id="markdown-toc-perspective-transform">Perspective transform</a></li>
<li><a href="#detect-boundaries" id="markdown-toc-detect-boundaries">Detect boundaries</a></li>
<li><a href="#approximate-properties" id="markdown-toc-approximate-properties">Approximate properties</a></li>
<li><a href="#sequence-of-frames" id="markdown-toc-sequence-of-frames">Sequence of frames</a></li>
</ul>
</li>
<li><a href="#vehicle-tracking" id="markdown-toc-vehicle-tracking">Vehicle Tracking</a> <ul>
<li><a href="#feature-extraction" id="markdown-toc-feature-extraction">Feature extraction</a></li>
<li><a href="#training-a-classifier" id="markdown-toc-training-a-classifier">Training a classifier</a></li>
<li><a href="#frame-segmentation" id="markdown-toc-frame-segmentation">Frame segmentation</a></li>
<li><a href="#merging-multiple-detections" id="markdown-toc-merging-multiple-detections">Merging multiple detections</a></li>
<li><a href="#sequence-of-frames-1" id="markdown-toc-sequence-of-frames-1">Sequence of frames</a></li>
</ul>
</li>
<li><a href="#results" id="markdown-toc-results">Results</a></li>
</ul>
</nav>
</aside>
<p>We are going to try detecting and tracking some basic road features in a video stream from a front-facing camera on a vehicle, this is clearly a very naive way of doing it and can hardly be applied in the field, however it is a good representation of what we <em>can</em> detect using mainly computer vision techniques: e.g. fiddling with color spaces and various filters. We will cover tracking of the following features:</p>
<ul>
<li><strong>Lane boundaries.</strong> Understanding where the lane is could be useful in many applications, be it a self-driving car or some driving assistant software.</li>
<li><strong>Surrounding vehicles.</strong> Keeping track of other vehicles around you is just as important if you were to implement some collision-avoiding algorithm.</li>
</ul>
<p>We will implement it in two major steps, first we will prepare a pipeline for lane tracking, and will then learn how to detect surrounding vehicles.</p>
<p class="notice">Road features detection is one of the assignments in <a href="http://udacity.com/drive"><strong>Udacity Self-Driving Car Nanodegree</strong></a> program, however the concepts described here should be easy to follow even without that context.</p>
<h1 id="source-video">Source video</h1>
<p>I am going to use a short video clip shot from a vehicle front-facing camera while driving on a highway. It was shot in close to perfect conditions: sunny weather, not many vehicles around, road markings clearly visible, etc. — so using just computer vision techinques alone should be sufficient for a quick demonstration. You can check out the full <a href="https://github.com/alexstaravoitau/advanced-lane-finding/blob/master/data/video/project_video.mp4" target="_blank">50 seconds video here</a>.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/project_source_video_sample.gif" alt="image-center" class="align-center" />
Source video</p>
<h1 id="lane-tracking">Lane Tracking</h1>
<p>Let’s first prepare a processing pipeline to identify the lane boundaries in a video. The pipeline includes the following steps that we apply to each frame:</p>
<ul>
<li><strong>Camera calibration.</strong> To cater for inevitable camera distortions, we calculate camera calibration using a set of calibration chessboard images, and applying correction to each of the frames.</li>
<li><strong>Edge detection with gradient and color thresholds.</strong> We then use a bunch of metrics based on gradients and color information to highlight edges in the frame.</li>
<li><strong>Perspective transformation.</strong> To make lane boundaries extraction easier we apply a perspective transformation, resulting in something similar to a bird’s eye view of the road ahead of the vehicle.</li>
<li><strong>Fitting boundary lines.</strong> We then scan resulting frame for pixels that could belong to lane boundaries and try to approximate lines into those pixels.</li>
<li><strong>Approximate road properties and vehicle position.</strong> We also provide a rough estimate on road curvature and vehicle position within the lane using known road dimensions.</li>
</ul>
<h2 id="camera-calibration">Camera calibration</h2>
<p>We are going to use some heavy image warping on later stages, which would make any distortions introduced by the camera lense very apparent. So in order to cater for that we will introduce a camera correction step based on a set of calibration images shot with the same camera. A very common techinque would be shooting a printed chessboard from various angles and calculating the distortions introduced by the camera based on the expected chessboard orientation in the photo.</p>
<p>We are going to use a number of OpenCV routines in order to apply correction for camera distortion. I first prepare a <code class="language-plaintext highlighter-rouge">pattern</code> variable holding <em>object points</em> in <code class="language-plaintext highlighter-rouge">(x, y, z)</code> coordinate space of the chessboard, which are essentially inner corners of the chessboard. Here <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are horizontal and vertical indices of the chessboard squares, and <code class="language-plaintext highlighter-rouge">z</code> is always <code class="language-plaintext highlighter-rouge">0</code> (as chessboard inner corners lie in the same plane). Those <em>object points</em> are going to be the same for each calibration image, as we expect the same chessboard in each.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pattern</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">pattern_size</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">pattern_size</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">3</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="n">pattern</span><span class="p">[:,</span> <span class="p">:</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mgrid</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">pattern_size</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">0</span><span class="p">:</span><span class="n">pattern_size</span><span class="p">[</span><span class="mi">1</span><span class="p">]].</span><span class="n">T</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<p>We then use <code class="language-plaintext highlighter-rouge">cv2.findChessboardCorners()</code> function to get coordinates of the corresponding corners in each calibration image.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pattern_points</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">image_points</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">found</span><span class="p">,</span> <span class="n">corners</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">findChessboardCorners</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="bp">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">found</span><span class="p">:</span>
<span class="n">pattern_points</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">pattern</span><span class="p">)</span>
<span class="n">image_points</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">corners</span><span class="p">)</span>
</code></pre></div></div>
<p>Once we have collected all the points from each image, we can compute the camera calibration matrix and distortion coefficients using the <code class="language-plaintext highlighter-rouge">cv2.calibrateCamera()</code> function.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">camera_matrix</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">dist_coefficients</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">calibrateCamera</span><span class="p">(</span>
<span class="n">pattern_points</span><span class="p">,</span> <span class="n">image_points</span><span class="p">,</span> <span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">None</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Now that we have camera calibration matrix and distortion coefficients we can use <code class="language-plaintext highlighter-rouge">cv2.undistort()</code> to apply camera distortion correction to any image.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">corrected_image</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">undistort</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">camera_matrix</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">dist_coefficients</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">camera_matrix</span><span class="p">)</span>
</code></pre></div></div>
<p>As some of the calibration images did not have chessboard fully visible, we will use one of those for verifying aforementioned calibration pipeline.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/calibration_1.png" alt="image-center" class="align-center" />
Original vs. calibrated images</p>
<p class="notice">For implementation details check <code class="language-plaintext highlighter-rouge">CameraCalibration</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/camera.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/camera.py</code></a>.</p>
<h2 id="edge-detection">Edge detection</h2>
<p>We use a set of gradient and color based thresholds to detect edges in the frame. For gradients we use <a href="https://en.wikipedia.org/wiki/Sobel_operator" target="_blank">Sobel operator</a>, which essentially highlights rapid changes in color over either of two axes by approximating derivatives using a simple convolution kernel. For color we simply convert the frame to <a href="https://en.wikipedia.org/wiki/HSL_and_HSV" target="_blank"><strong>HLS</strong> color space</a> and apply a threshold on the S channel. The reason we use HLS here is because it proved to perform best in separating light pixels (road markings) from dark pixels (road) using the saturation channel.</p>
<ul>
<li><strong>Gradient absolute value</strong>. For absolute gradient value we simply apply a threshold to `cv2.Sobel() output for each axis.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sobel</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">(</span><span class="n">cv2</span><span class="p">.</span><span class="n">Sobel</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">cv2</span><span class="p">.</span><span class="n">CV_64F</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ksize</span><span class="o">=</span><span class="mi">3</span><span class="p">))</span>
</code></pre></div></div>
<ul>
<li><strong>Gradient magnitude</strong>. Additionaly we include pixels within a threshold of the gradient magnitude.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sobel_x</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">Sobel</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">cv2</span><span class="p">.</span><span class="n">CV_64F</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ksize</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">sobel_y</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">Sobel</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">cv2</span><span class="p">.</span><span class="n">CV_64F</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">ksize</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">magnitude</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sobel_x</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">sobel_y</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<ul>
<li><strong>Gradient direction</strong>. We also include pixels that happen to be withing a threshold of the gradient direction.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sobel_x</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">Sobel</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">cv2</span><span class="p">.</span><span class="n">CV_64F</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ksize</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">sobel_y</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">Sobel</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">cv2</span><span class="p">.</span><span class="n">CV_64F</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">ksize</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">direction</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arctan2</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">(</span><span class="n">sobel_y</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">(</span><span class="n">sobel_x</span><span class="p">))</span>
</code></pre></div></div>
<ul>
<li><strong>Color</strong>. Finally, we extract S channel of image representation in the HLS color space and then apply a threshold to its absolute value.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hls</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">image</span><span class="p">),</span> <span class="n">cv2</span><span class="p">.</span><span class="n">COLOR_RGB2HLS</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">float</span><span class="p">)</span>
<span class="n">s_channel</span> <span class="o">=</span> <span class="n">hls</span><span class="p">[:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]</span>
</code></pre></div></div>
<p>We apply a combination of all these filters as an edge detection pipeline. Here is an example of its output, where pixels masked by color are blue, and pixels masked by gradient are green.</p>
<p align="center">
<a href="/images/posts/detecting-road-features/edges.jpg"><img src="/images/posts/detecting-road-features/edges.jpg" /></a>
</p>
<p style="text-align: center;" class="small">Original vs. highlighted edges</p>
<p class="notice">For implementation details check functions in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/gradients.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/gradients.py</code></a>.</p>
<h2 id="perspective-transform">Perspective transform</h2>
<p>It would be much easier to detect lane boundaries if we could get hold of a bird’s eye view of the road, and we can get something fairly close to it by applying a perspective transform to the camera frames. For the sake of this demo project I manually pin-pointed source and destination points in the camera frames, so perspective transform simply maps the following coordinates.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Source</th>
<th style="text-align: center">Destination</th>
<th style="text-align: center">Position</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(564, 450)</code></td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(100, 0)</code></td>
<td style="text-align: center">Top left corner.</td>
</tr>
<tr>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(716, 450)</code></td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(1180, 0)</code></td>
<td style="text-align: center">Top right corner.</td>
</tr>
<tr>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(-100, 720)</code></td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(100, 720)</code></td>
<td style="text-align: center">Bottom left corner.</td>
</tr>
<tr>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(1380, 720)</code></td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">(1180, 720)</code></td>
<td style="text-align: center">Bottom right corner.</td>
</tr>
</tbody>
</table>
<p>The transformation is applied using <code class="language-plaintext highlighter-rouge">cv2.getPerspectiveTransform()</code> function.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">source</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">([[</span><span class="n">w</span> <span class="o">//</span> <span class="mi">2</span> <span class="o">-</span> <span class="mi">76</span><span class="p">,</span> <span class="n">h</span> <span class="o">*</span> <span class="p">.</span><span class="mi">625</span><span class="p">],</span> <span class="p">[</span><span class="n">w</span> <span class="o">//</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">76</span><span class="p">,</span> <span class="n">h</span> <span class="o">*</span> <span class="p">.</span><span class="mi">625</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">100</span><span class="p">,</span> <span class="n">h</span><span class="p">],</span> <span class="p">[</span><span class="n">w</span> <span class="o">+</span> <span class="mi">100</span><span class="p">,</span> <span class="n">h</span><span class="p">]])</span>
<span class="n">destination</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">([[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="n">w</span> <span class="o">-</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="n">h</span><span class="p">],</span> <span class="p">[</span><span class="n">w</span> <span class="o">-</span> <span class="mi">100</span><span class="p">,</span> <span class="n">h</span><span class="p">]])</span>
<span class="n">transform_matrix</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">getPerspectiveTransform</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">destination</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">warpPerspective</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">transform_matrix</span><span class="p">,</span> <span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">h</span><span class="p">))</span>
</code></pre></div></div>
<p>This is what it looks like for an arbitrary test image.</p>
<p align="center">
<a href="/images/posts/detecting-road-features/perspective.jpg"><img src="/images/posts/detecting-road-features/perspective.jpg" /></a>
</p>
<p style="text-align: center;" class="small">Original vs. bird’s eye view</p>
<p class="notice">For implementation details check functions in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/perspective.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/perspective.py</code></a>.</p>
<h2 id="detect-boundaries">Detect boundaries</h2>
<p>We are now going to scan the resulting frame from bottom to top trying to isolate pixels that could be representing lane boundaries. What we are trying to detect is two lines (each represented by <code class="language-plaintext highlighter-rouge">Line</code> class) that would make up lane boundaries. For each of those lines we have a set of <em>windows</em> (represented by <code class="language-plaintext highlighter-rouge">Window</code> class). We scan the frame with those windows, collecting non-zero pixels within window bounds. Once we reach the top, we try to fit a second order polynomial into collected points. This polynomial coefficients would represent a single lane boundary.</p>
<p>Here is a debug image representing the process. On the left is the <em>original</em> image after we apply camera calibration and perspective transform. On the right is the same image, but with edges highlighted in <strong><span style="color: green">green</span></strong> and <strong><span style="color: blue">blue</span></strong>, scanning windows boundaries highlighted in <strong><span style="color: yellow">yellow</span></strong>, and a second order polynomial approximation of collected points in <strong><span style="color: red">red</span></strong>.</p>
<p align="center">
<a href="/images/posts/detecting-road-features/detection.jpg"><img src="/images/posts/detecting-road-features/detection.jpg" /></a>
</p>
<p style="text-align: center;" class="small">Boundary detection pipeline</p>
<p class="notice">For implementation details check <code class="language-plaintext highlighter-rouge">LaneTracker</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/perspective.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/tracker.py</code></a>, <code class="language-plaintext highlighter-rouge">Window</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/perspective.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/window.py</code></a> and <code class="language-plaintext highlighter-rouge">Line</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/perspective.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/line.py</code></a>.</p>
<h2 id="approximate-properties">Approximate properties</h2>
<p>We can now approximate some of the road properties and vehicle spacial position using known real world dimensions. Here we assume that the visible vertical part of the bird’s eye view warped frame is <strong>27 meters</strong>, based on the known length of the dashed lines on american roads. We also assume that lane width is around <strong>3.7 meters</strong>, again, based on american regulations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ym_per_pix</span> <span class="o">=</span> <span class="mi">27</span> <span class="o">/</span> <span class="mi">720</span> <span class="c1"># meters per pixel in y dimension
</span><span class="n">xm_per_pix</span> <span class="o">=</span> <span class="mf">3.7</span> <span class="o">/</span> <span class="mi">700</span> <span class="c1"># meters per pixel in x dimension
</span></code></pre></div></div>
<h3 class="no_toc" id="road-curvature">Road curvature</h3>
<p>Previously we approximated each lane boundary as a second order polynomial curve, which can be represented with the following equation.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/poly_2.png" alt="image-center" class="align-center" height="80px" width="295px" />
Second order polynomial</p>
<p>As per <a href="http://www.intmath.com/applications-differentiation/8-radius-curvature.php" target="_blank">this tutorial</a>, we can get the radius of curvature in an arbitrary point using the following equation.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/curve_grad.png" alt="image-center" class="align-center" height="80px" width="295px" />
Radius equation</p>
<p>If we calculate actual derivatives of the second order polynomial, we get the following.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/curve_coef.png" alt="image-center" class="align-center" height="80px" width="295px" />
Radius equation</p>
<p>Therefore, given <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> variables contain coordinates of points making up the curve, we can get curvature radius as follows.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Fit a new polynomial in real world coordinate space
</span><span class="n">poly_coef</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">polyfit</span><span class="p">(</span><span class="n">y</span> <span class="o">*</span> <span class="n">ym_per_pix</span><span class="p">,</span> <span class="n">x</span> <span class="o">*</span> <span class="n">xm_per_pix</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">radius</span> <span class="o">=</span> <span class="p">((</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">poly_coef</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="mi">720</span> <span class="o">*</span> <span class="n">ym_per_pix</span> <span class="o">+</span> <span class="n">poly_coef</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">**</span> <span class="mf">1.5</span><span class="p">)</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">poly_coef</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
</code></pre></div></div>
<h3 class="no_toc" id="vehicle-position">Vehicle position</h3>
<p>We can also approximate vehicle position within the lane. This rountine would calculate an approximate distance to a curve at the bottom of the frame, given that <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> contain coordinates of points making up the curve.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">_</span><span class="p">)</span> <span class="o">=</span> <span class="n">frame</span><span class="p">.</span><span class="n">shape</span>
<span class="n">distance</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">((</span><span class="n">w</span> <span class="o">//</span> <span class="mi">2</span> <span class="o">-</span> <span class="n">x</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">y</span><span class="p">)])</span> <span class="o">*</span> <span class="n">xm_per_pix</span><span class="p">)</span>
</code></pre></div></div>
<p class="notice">For implementation details check <code class="language-plaintext highlighter-rouge">Line</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/perspective.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/line.py</code></a>.</p>
<h2 id="sequence-of-frames">Sequence of frames</h2>
<p>We can now try to apply the whole pipeline to a sequence of frames. We will use an approximation of lane boundaries detected over last 5 frames in the video using a <code class="language-plaintext highlighter-rouge">deque</code> collection type. It will make sure we only store last 5 boundary approximations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">deque</span>
<span class="n">coefficients</span> <span class="o">=</span> <span class="n">deque</span><span class="p">(</span><span class="n">maxlen</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>
<p>We then check if we detected enough points (<code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> arrays of coordinates) in the current frame to approximate a line, and append polynomial coefficients to <code class="language-plaintext highlighter-rouge">coefficients</code>. The sanity check here is to ensure detected points span over image height, otherwise we wouldn’t be able to get a reasonable line approximation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="o">></span> <span class="n">h</span> <span class="o">*</span> <span class="p">.</span><span class="mi">625</span><span class="p">:</span>
<span class="n">coefficients</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">polyfit</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>
<p>Whenever we want to draw a line, we get an average of polynomial coefficients detected over last 5 frames.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mean_coefficients</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">coefficients</span><span class="p">).</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<p>This approach proved iself to work reasonably well, you can check out the <a href="https://github.com/alexstaravoitau/advanced-lane-finding/blob/master/data/video/project_video_annotated_lane.mp4" target="_blank">full annotated video here</a>.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/project_video_sample.gif" alt="image-center" class="align-center" />
Sample of the annotated project video</p>
<p class="notice">For implementation details check <code class="language-plaintext highlighter-rouge">LaneTracker</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/lanetracker/tracker.py" target="_blank"><code class="language-plaintext highlighter-rouge">lanetracker/tracker.py</code></a>.</p>
<h1 id="vehicle-tracking">Vehicle Tracking</h1>
<p>We are going to use a bit of machine learning to detect vehicle presence in an image by training a classifer that would classify an image as either containing or not containing a vehicle. We will train this classifer using a dataset provided by Udacity which comes in two separate archives: <a href="https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/vehicles.zip" target="_blank">images containing cars</a> and <a href="https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/non-vehicles.zip" target="_blank">images not containing cars</a>. The dataset contains <strong>17,760</strong> color RGB images <strong>64×64 px</strong> each, with <strong>8,792</strong> samples labeled as containing <strong>vehicles</strong> and <strong>8,968</strong> samples labeled as <strong>non-vehicles</strong>.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/cars.png" alt="image-center" class="align-center" />
Random sample labeled as containing cars</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/non-cars.png" alt="image-center" class="align-center" />
Random sample of non-cars</p>
<p>In order to prepare a processing pipeline to identify surrounding vehicles, we are going to break it down into the following steps:</p>
<ul>
<li><strong>Extract features and train a classifier.</strong> We need to identify features that would be useful for vehicle detections and prepare a feature extraction pipeline. We then use it to train a classifier to detect a car in individual frame segment.</li>
<li><strong>Apply frame segmentation.</strong> We then segment frame into <em>windows</em> of various size that we run through the aforementioned classifier.</li>
<li><strong>Merge individual segment detections.</strong> As there will inevitably be multiple detections we merge them together using a heat map, which should also help reducing the number of false positives.</li>
</ul>
<h2 id="feature-extraction">Feature extraction</h2>
<p>After experimenting with various features I settled on a combination of <strong>HOG</strong> (<a href="https://en.wikipedia.org/wiki/Histogram_of_oriented_gradients" target="_blank">Histogram of Oriented Gradients</a>), <strong>spatial information</strong> and <strong>color channel histograms</strong>, all using <a href="https://en.wikipedia.org/wiki/YCbCr" target="_blank"><strong>YCbCr</strong> color space</a>. Feature extraction is implemented as a context-preserving class (<code class="language-plaintext highlighter-rouge">FeatureExtractor</code>) to allow some pre-calculations for each frame. As some features take a lot of time to compute (looking at you, HOG), we only do that once for entire image and then return regions of it.</p>
<h3 class="no_toc" id="histogram-of-oriented-gradients">Histogram of Oriented Gradients</h3>
<p>I had to run a bunch of experiments to come up with final parameters, and eventually I settled on <strong>HOG</strong> with <strong>10 orientations</strong>, <strong>8 pixels per cell</strong> and <strong>2 cells per block</strong>. The experiments went as follows:</p>
<ol>
<li>Train and evaluate the classifier for a wide range of parameters and identify promising smaller ranges.</li>
<li>Train and evaluate the classifier on those smaller ranges of parameters multiple times for each experiment and assess average accuracy.</li>
</ol>
<p>The winning combination turned out to be the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> orient px/cell clls/blck feat-s iter acc sec/test
10 8 2 5880 0 0.982 0.01408
10 8 2 5880 1 0.9854 0.01405
10 8 2 5880 2 0.9834 0.01415
10 8 2 5880 3 0.9825 0.01412
10 8 2 5880 4 0.9834 0.01413
Average accuracy = 0.98334
</code></pre></div></div>
<p>This is what Histogram of Oriented Gradients looks like applied to a random dataset sample.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/original.png" alt="image-center" class="align-center" />
Original (Y channel of YCbCr color space)</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/hog.png" alt="image-center" class="align-center" />
HOG (Histogram of Oriented Gradients)</p>
<p>Initial calculation of HOG for entire image is done using <code class="language-plaintext highlighter-rouge">hog()</code> function in <code class="language-plaintext highlighter-rouge">skimage.feature</code> module. We concatenate HOG features for all color channels.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span>
<span class="n">hog_features</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">channel</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">d</span><span class="p">):</span>
<span class="n">hog_features</span><span class="p">.</span><span class="n">append</span><span class="p">(</span>
<span class="n">hog</span><span class="p">(</span>
<span class="n">image</span><span class="p">[:,</span> <span class="p">:,</span> <span class="n">channel</span><span class="p">],</span>
<span class="n">orientations</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">pixels_per_cell</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">),</span>
<span class="n">cells_per_block</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
<span class="n">transform_sqrt</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">visualise</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">feature_vector</span><span class="o">=</span><span class="bp">False</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="n">hog_features</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">hog_features</span><span class="p">)</span>
</code></pre></div></div>
<p>This allows us to get features for an individual image window by calculating HOG array offsets, given that <code class="language-plaintext highlighter-rouge">x</code> is the window horizontal offset, <code class="language-plaintext highlighter-rouge">y</code> is the vertical offset and <code class="language-plaintext highlighter-rouge">k</code> is the size of the window (single value, side of a square region).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hog_k</span> <span class="o">=</span> <span class="p">(</span><span class="n">k</span> <span class="o">//</span> <span class="mi">8</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">hog_x</span> <span class="o">=</span> <span class="nb">max</span><span class="p">((</span><span class="n">x</span> <span class="o">//</span> <span class="mi">8</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">hog_x</span> <span class="o">=</span> <span class="n">hog_features</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">-</span> <span class="n">hog_k</span> <span class="k">if</span> <span class="n">hog_x</span> <span class="o">+</span> <span class="n">hog_k</span> <span class="o">></span> <span class="n">hog_features</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="k">else</span> <span class="n">hog_x</span>
<span class="n">hog_y</span> <span class="o">=</span> <span class="nb">max</span><span class="p">((</span><span class="n">y</span> <span class="o">//</span> <span class="mi">8</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">hog_y</span> <span class="o">=</span> <span class="n">hog_features</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">hog_k</span> <span class="k">if</span> <span class="n">hog_y</span> <span class="o">+</span> <span class="n">hog_k</span> <span class="o">></span> <span class="n">hog_features</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">else</span> <span class="n">hog_y</span>
<span class="n">region_hog</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ravel</span><span class="p">(</span><span class="n">hog_features</span><span class="p">[:,</span> <span class="n">hog_y</span><span class="p">:</span><span class="n">hog_y</span><span class="o">+</span><span class="n">hog_k</span><span class="p">,</span> <span class="n">hog_x</span><span class="p">:</span><span class="n">hog_x</span><span class="o">+</span><span class="n">hog_k</span><span class="p">,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:])</span>
</code></pre></div></div>
<h3 class="no_toc" id="spatial-information">Spatial information</h3>
<p>For spatial information we simply resize the image to 16×16 and flatten to a 1-D vector.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">spatial</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">16</span><span class="p">)).</span><span class="n">ravel</span><span class="p">()</span>
</code></pre></div></div>
<h3 class="no_toc" id="color-channel-histogram">Color channel histogram</h3>
<p>We additionally use individual color channel histogram information, breaking it into <strong>16 bins</strong> within <strong>(0, 256) range</strong>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">color_hist</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">((</span>
<span class="n">np</span><span class="p">.</span><span class="n">histogram</span><span class="p">(</span><span class="n">image</span><span class="p">[:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">bins</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="nb">range</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">256</span><span class="p">))[</span><span class="mi">0</span><span class="p">],</span>
<span class="n">np</span><span class="p">.</span><span class="n">histogram</span><span class="p">(</span><span class="n">image</span><span class="p">[:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">bins</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="nb">range</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">256</span><span class="p">))[</span><span class="mi">0</span><span class="p">],</span>
<span class="n">np</span><span class="p">.</span><span class="n">histogram</span><span class="p">(</span><span class="n">image</span><span class="p">[:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">],</span> <span class="n">bins</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="nb">range</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">256</span><span class="p">))[</span><span class="mi">0</span><span class="p">]</span>
<span class="p">))</span>
</code></pre></div></div>
<h3 class="no_toc" id="featureextractor"><code class="language-plaintext highlighter-rouge">FeatureExtractor</code></h3>
<p>The way <code class="language-plaintext highlighter-rouge">FeatureExtractor</code> class works is that you initialise it with a single frame, and then request a feature vector for individual regions. In this case it only calculates computationally expensive features once. You then call <code class="language-plaintext highlighter-rouge">feature_vector()</code> method to get a concatenated combination of HOG, spatial and color histogram feature vectors.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">extractor</span> <span class="o">=</span> <span class="n">FeatureExtractor</span><span class="p">(</span><span class="n">frame</span><span class="p">)</span>
<span class="c1"># Feature vector for entire frame
</span><span class="n">feature_vector</span> <span class="o">=</span> <span class="n">extractor</span><span class="p">.</span><span class="n">feature_vector</span><span class="p">()</span>
<span class="c1"># Feature vector for a 64×64 frame region at (0, 0) point
</span><span class="n">feature_vector</span> <span class="o">=</span> <span class="n">extractor</span><span class="p">.</span><span class="n">feature_vector</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">64</span><span class="p">)</span>
</code></pre></div></div>
<p class="notice">For implementation details check <code class="language-plaintext highlighter-rouge">FeatureExtractor</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/vehicletracker/features.py" target="_blank"><code class="language-plaintext highlighter-rouge">vehicletracker/features.py</code></a>.</p>
<h2 id="training-a-classifier">Training a classifier</h2>
<p>I trained a Linear SVC (<code class="language-plaintext highlighter-rouge">sklearn</code> implementation), using feature extractor described above. Nothing fancy here, I used <code class="language-plaintext highlighter-rouge">sklearn</code>’s <code class="language-plaintext highlighter-rouge">train_test_split</code> to split the dataset into training and validation sets, and used <code class="language-plaintext highlighter-rouge">sklearn</code>’s <code class="language-plaintext highlighter-rouge">StandardScaler</code> for feature scaling. I didn’t bother with a proper test set, assuming that classifier performance on the project video would be a good proxy for it.</p>
<p class="notice">For implementation details check <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/detecting-road-features.ipynb" target="_blank">`detecting-road-features.ipynb</a> notebook.</p>
<h2 id="frame-segmentation">Frame segmentation</h2>
<p>I use a sliding window approach with a couple of additional constraints. For instance, we can approximate vehicle size we expect in different frame regions, which makes searching a bit easier.</p>
<p align="center">
<a href="/images/posts/detecting-road-features/windows.jpg"><img src="/images/posts/detecting-road-features/windows.jpg" /></a>
</p>
<p style="text-align: center;" class="small">Window size varies across scanning locations</p>
<p>Since frame segments must be of various size, and we eventually need to use 64×64 regions as a classifier input, I decided to simply scale the frame to various sizes and then scan them with a 64×64 window. This can be roughly encoded as follows.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Scan with 64×64 window across 8 differently scaled images, ranging from 30% to 80% of the original frame size.
</span><span class="k">for</span> <span class="p">(</span><span class="n">scale</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(.</span><span class="mi">3</span><span class="p">,</span> <span class="p">.</span><span class="mi">8</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">logspace</span><span class="p">(.</span><span class="mi">6</span><span class="p">,</span> <span class="p">.</span><span class="mi">55</span><span class="p">,</span> <span class="mi">4</span><span class="p">)):</span>
<span class="c1"># Scale the original frame
</span> <span class="n">scaled</span> <span class="o">=</span> <span class="n">resize</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">scale</span><span class="p">,</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">scale</span><span class="p">,</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]))</span>
<span class="c1"># Prepare a feature extractor
</span> <span class="n">extractor</span> <span class="o">=</span> <span class="n">FeatureExtractor</span><span class="p">(</span><span class="n">scaled</span><span class="p">)</span>
<span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="o">=</span> <span class="n">scaled</span><span class="p">.</span><span class="n">shape</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">64</span> <span class="o">//</span> <span class="mi">3</span>
<span class="c1"># Target stride is no more than s (1/3 of the window size here),
</span> <span class="c1"># making sure windows are equally distributed along the frame width.
</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">w</span> <span class="o">-</span> <span class="n">k</span><span class="p">,</span> <span class="p">(</span><span class="n">w</span> <span class="o">+</span> <span class="n">s</span><span class="p">)</span> <span class="o">//</span> <span class="n">s</span><span class="p">):</span>
<span class="c1"># Extract features for current window.
</span> <span class="n">features</span> <span class="o">=</span> <span class="n">extractor</span><span class="p">.</span><span class="n">feature_vector</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">h</span><span class="o">*</span><span class="n">y</span><span class="p">,</span> <span class="mi">64</span><span class="p">)</span>
<span class="c1"># Run features through a scaler and classifier and add window coordinates
</span> <span class="c1"># to `detections` if classified as containing a vehicle
</span> <span class="p">...</span>
</code></pre></div></div>
<h2 id="merging-multiple-detections">Merging multiple detections</h2>
<p>As there are multiple detections on different scales and overlapping windows, we need to merge nearby detections. In order to do that we calculate a heatmap of intersecting regions that were classified as containing vehicles.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">heatmap</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="c1"># Add heat to each box in box list
</span><span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">detections</span><span class="p">:</span>
<span class="c1"># Assuming each set of coordinates takes the form (x1, y1, x2, y2)
</span> <span class="n">heatmap</span><span class="p">[</span><span class="n">c</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span><span class="n">c</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">c</span><span class="p">[</span><span class="mi">0</span><span class="p">]:</span><span class="n">c</span><span class="p">[</span><span class="mi">2</span><span class="p">]]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># Apply threshold to help remove false positives
</span><span class="n">heatmap</span><span class="p">[</span><span class="n">heatmap</span> <span class="o"><</span> <span class="n">threshold</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
</code></pre></div></div>
<p>Then we use <code class="language-plaintext highlighter-rouge">label()</code> function from <code class="language-plaintext highlighter-rouge">scipy.ndimage.measurements</code> module to detect individual groups of detections, and calculate a bounding rect for each of them.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">groups</span> <span class="o">=</span> <span class="n">label</span><span class="p">(</span><span class="n">heatmap</span><span class="p">)</span>
<span class="n">detections</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span>
<span class="c1"># Iterate through all labeled groups
</span><span class="k">for</span> <span class="n">group</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">groups</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="c1"># Find pixels belonging to the same group
</span> <span class="n">nonzero</span> <span class="o">=</span> <span class="p">(</span><span class="n">groups</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">group</span><span class="p">).</span><span class="n">nonzero</span><span class="p">()</span>
<span class="n">detections</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span>
<span class="n">detections</span><span class="p">,</span>
<span class="p">[[</span><span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">nonzero</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">nonzero</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">nonzero</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">nonzero</span><span class="p">[</span><span class="mi">0</span><span class="p">])]],</span>
<span class="n">axis</span><span class="o">=</span><span class="mi">0</span>
<span class="p">)</span>
</code></pre></div></div>
<p align="center">
<a href="/images/posts/detecting-road-features/detections.jpg"><img src="/images/posts/detecting-road-features/detections.jpg" /></a>
</p>
<p style="text-align: center;" class="small">Merging detections with a heat map</p>
<h2 id="sequence-of-frames-1">Sequence of frames</h2>
<p>Working with video allowes us to use a couple of additional constraints, in a sense that we expect it to be a stream of consecutive frames. In order to eliminate false positives I, again, use <code class="language-plaintext highlighter-rouge">deque</code> collection type in order to accumulate detections over last <code class="language-plaintext highlighter-rouge">N</code> frames instead of classifying each frame individually. And before returning a final set of detected regions I run those accumulated detections through the heatmap merging process once again, but with a higher detection threshold.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">detections_history</span> <span class="o">=</span> <span class="n">deque</span><span class="p">(</span><span class="n">maxlen</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">frame</span><span class="p">):</span>
<span class="p">...</span>
<span class="c1"># Scan frame with windows through a classifier
</span> <span class="p">...</span>
<span class="c1"># Merge detections
</span> <span class="p">...</span>
<span class="c1"># Add merged detections to history
</span> <span class="n">detections_history</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">detections</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">heatmap_merge</span><span class="p">(</span><span class="n">detections</span><span class="p">,</span> <span class="n">threshold</span><span class="p">):</span>
<span class="c1"># Calculate heatmap for detections
</span> <span class="p">...</span>
<span class="c1"># Apply threshold
</span> <span class="p">...</span>
<span class="c1"># Merge detections with `label()
</span> <span class="p">...</span>
<span class="c1"># Calculate bounding rects
</span> <span class="p">...</span>
<span class="k">def</span> <span class="nf">detections</span><span class="p">():</span>
<span class="k">return</span> <span class="n">heatmap_merge</span><span class="p">(</span>
<span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">detections_history</span><span class="p">)),</span>
<span class="n">threshold</span><span class="o">=</span><span class="nb">min</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">detections_history</span><span class="p">),</span> <span class="mi">15</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>
<p>This approach proved iself to work reasonably well on the source video, you can check out the <a href="https://github.com/alexstaravoitau/advanced-lane-finding/blob/master/data/video/project_video_annotated_vehicle.mp4" target="_blank">full annotated video here</a>. There is the current frame heat map in the top right corner — you may notice quite a few false positives, but most of them are eliminated by merging detections over the last <code class="language-plaintext highlighter-rouge">N</code> consecutive frames.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/detecting-road-features/project_video_sample-2.gif" alt="image-center" class="align-center" />
Sample of the annotated project video</p>
<p class="notice">For implementation details check <code class="language-plaintext highlighter-rouge">VehicleTracker</code> class in <a href="https://github.com/alexstaravoitau/detecting-road-features/blob/master/source/vehicletracker/tracker.py" target="_blank"><code class="language-plaintext highlighter-rouge">vehicletracker/tracker.py</code></a>.</p>
<h1 id="results">Results</h1>
<p>This clearly is a very naive way of detecting and tracking road features, and wouldn’t be used in real world application as-is, since it is likely to fail in too many scenarios:</p>
<ul>
<li>Going up or down the hill.</li>
<li>Changing weather conditions.</li>
<li>Worn out lane markings.</li>
<li>Obstruction by other vehicles or vehicles obstructing each other.</li>
<li>Vehicles and vehicle positions different from those classifier was trained on.</li>
<li>…</li>
</ul>
<p>Not to mention it is painfully slow and would not run in real time without substantial optimisations. Nevertheless this project is a good representation of what can be done by simply inspecting pixel values’ gradients and color spaces. It shows that even with these limited tools we can extract a lot of useful information from an image, and that this information can potentially be used as a feature input to more sophisticated algorithms.</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau" data-style="mega" data-count-href="/navoshta/followers" data-count-api="/users/navoshta#followers" data-count-aria-label="# followers on GitHub" aria-label="Follow @alexstaravoitau on GitHub">Follow @alexstaravoitau</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/detecting-road-features" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/detecting-road-features/stargazers" data-count-api="/repos/navoshta/detecting-road-features#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/detecting-road-features on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/detecting-road-features/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/detecting-road-features/network" data-count-api="/repos/navoshta/detecting-road-features#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/detecting-road-features on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/detecting-road-features/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/detecting-road-features on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>Alex StaravoitauThe goal of this project was to try and detect a set of road features in a forward facing vehicle camera data. This is a somewhat naive way as it is mainly using computer vision techniques (no relation to naive Bayesian!). Features we are going to detect and track are lane boundaries and surrounding vehicles.Meet Fenton (my data crunching machine)2017-02-25T00:00:00+00:002017-02-25T00:00:00+00:00//navoshta.com/meet-fenton<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-none"></i> Contents</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#hardware" id="markdown-toc-hardware">Hardware</a> <ul>
<li><a href="#video-card" id="markdown-toc-video-card">Video Card</a></li>
<li><a href="#motherboard" id="markdown-toc-motherboard">Motherboard</a></li>
<li><a href="#cpu" id="markdown-toc-cpu">CPU</a></li>
<li><a href="#ram" id="markdown-toc-ram">RAM</a></li>
<li><a href="#storage" id="markdown-toc-storage">Storage</a></li>
<li><a href="#power-supply" id="markdown-toc-power-supply">Power supply</a></li>
<li><a href="#case" id="markdown-toc-case">Case</a></li>
<li><a href="#putting-it-together" id="markdown-toc-putting-it-together">Putting it together</a></li>
</ul>
</li>
<li><a href="#software" id="markdown-toc-software">Software</a> <ul>
<li><a href="#operating-system" id="markdown-toc-operating-system">Operating System</a></li>
<li><a href="#ssh" id="markdown-toc-ssh">SSH</a></li>
<li><a href="#ssh-file-system" id="markdown-toc-ssh-file-system">SSH File system</a></li>
<li><a href="#jupyter-notebook" id="markdown-toc-jupyter-notebook">Jupyter Notebook</a></li>
<li><a href="#pycharm" id="markdown-toc-pycharm">PyCharm</a></li>
<li><a href="#monitoring" id="markdown-toc-monitoring">Monitoring</a></li>
</ul>
</li>
<li><a href="#pick-a-name" id="markdown-toc-pick-a-name">Pick a name</a></li>
</ul>
</nav>
</aside>
<p>As you might be aware, I have been experimenting with <a href="http://navoshta.com/aws-tensorflow/">AWS as a remote GPU-enabled machine</a> for a while, configuring Jupyter Notebook to use it as a backend. It seemed to work fine, although costs did build over time, and I had to always keep in mind to shut it off, alongside with a couple of other limitations. Long story short, around 3 months ago I decided to build my own machine learning rig.</p>
<p>My idea in a nutshell was to build a machine that would only act as a server, being accessible from anywhere to me, always ready to unleash its computational powers on whichever task I’d be working on. Although this setup did take some time to assess, assemble and configure, it has been working flawlessly ever since, and I am very happy with it.</p>
<h1 id="hardware">Hardware</h1>
<p>Let’s start with hardware. This would include the server PC and some basic peripherals: I didn’t even bother to buy a monitor or a mouse, as I only intended to use this machine remotely from CLI. My main considerations were performance in machine learning tasks and extensibility in case I decided to upgrade at some point. This is the <a href="https://uk.pcpartpicker.com/list/tKjTzM">config I came up with</a>.</p>
<table>
<thead>
<tr>
<th style="text-align: left">Type</th>
<th style="text-align: left">Item</th>
<th style="text-align: left">Price</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><strong>Video Card</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/63yxFT/evga-video-card-08gp46183">EVGA GeForce GTX 1080 8GB Superclocked Gaming ACX 3.0 Video Card</a></td>
<td style="text-align: left">£629.84</td>
</tr>
<tr>
<td style="text-align: left"><strong>Motherboard</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/LsX2FT/asus-motherboard-z170pro">Asus Z170-PRO ATX LGA1151 Motherboard</a></td>
<td style="text-align: left">£129.99</td>
</tr>
<tr>
<td style="text-align: left"><strong>CPU</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/rK4NnQ/intel-cpu-bx80662i56400">Intel Core i5-6400 2.7GHz Quad-Core Processor</a></td>
<td style="text-align: left">£161.99</td>
</tr>
<tr>
<td style="text-align: left"><strong>Memory</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/YPX2FT/corsair-vengeance-lpx-32gb-2-x-16gb-ddr4-3200-memory-cmk32gx4m2b3200c16w">Corsair Vengeance LPX 32GB (2 × 16GB) DDR4-3200 Memory</a></td>
<td style="text-align: left">£182.86</td>
</tr>
<tr>
<td style="text-align: left"><strong>Storage</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/RbvZxr/samsung-internal-hard-drive-mz75e1t0bam">Samsung 850 EVO-Series 1TB 2.5” Solid State Drive</a></td>
<td style="text-align: left">£295.98</td>
</tr>
<tr>
<td style="text-align: left"><strong>Power Supply</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/9q4NnQ/evga-power-supply-220g20650y1">EVGA SuperNOVA G2 650W 80+ Gold Certified Fully-Modular ATX Power Supply</a></td>
<td style="text-align: left">£89.99</td>
</tr>
<tr>
<td style="text-align: left"><strong>Case</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/Vpdqqs/nzxt-case-cas340ww1">NZXT S340 (White) ATX Mid Tower Case</a></td>
<td style="text-align: left">£59.98</td>
</tr>
<tr>
<td style="text-align: left"><strong>Keyboard</strong></td>
<td style="text-align: left"><a href="https://uk.pcpartpicker.com/product/2PnG3C/microsoft-keyboard-anb00006">Microsoft ANB-00006 Wired Slim Keyboard</a></td>
<td style="text-align: left">£11.63</td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: left"> </td>
<td style="text-align: left"><strong>£1562.26</strong></td>
</tr>
</tbody>
</table>
<figure class="third ">
<a href="//navoshta.com/images/posts/fenton/evga-geforce-gtx-1080.jpg" title="EVGA GeForce GTX 1080 8GB Superclocked">
<img src="//navoshta.com/images/posts/fenton/evga-geforce-gtx-1080.jpg" alt="EVGA GeForce GTX 1080 8GB Superclocked" />
</a>
<a href="//navoshta.com/images/posts/fenton/evga-geforce-gtx-1080_2.jpg" title="EVGA GeForce GTX 1080 8GB Superclocked">
<img src="//navoshta.com/images/posts/fenton/evga-geforce-gtx-1080_2.jpg" alt="EVGA GeForce GTX 1080 8GB Superclocked" />
</a>
<a href="//navoshta.com/images/posts/fenton/evga-geforce-gtx-1080_3.jpg" title="EVGA GeForce GTX 1080 8GB Superclocked">
<img src="//navoshta.com/images/posts/fenton/evga-geforce-gtx-1080_3.jpg" alt="EVGA GeForce GTX 1080 8GB Superclocked" />
</a>
<a href="//navoshta.com/images/posts/fenton/asus-z170.jpg" title="ASUS Z170 Pro">
<img src="//navoshta.com/images/posts/fenton/asus-z170.jpg" alt="ASUS Z170 Pro" />
</a>
<a href="//navoshta.com/images/posts/fenton/core-i5.jpg" title="Intel Core i5-6400">
<img src="//navoshta.com/images/posts/fenton/core-i5.jpg" alt="Intel Core i5-6400" />
</a>
<a href="//navoshta.com/images/posts/fenton/vengeance-ram.jpg" title="Corsair Vengeance LPX 32GB (2 × 16GB) DDR4-3200">
<img src="//navoshta.com/images/posts/fenton/vengeance-ram.jpg" alt="Corsair Vengeance LPX 32GB (2 × 16GB) DDR4-3200" />
</a>
<a href="//navoshta.com/images/posts/fenton/samsung-850-evo.jpg" title="Samsung 850 Evo 1 TB SSD">
<img src="//navoshta.com/images/posts/fenton/samsung-850-evo.jpg" alt="Samsung 850 Evo 1 TB SSD" />
</a>
<a href="//navoshta.com/images/posts/fenton/evga-650-g2.jpg" title="EVGA SuperNOVA G2 650W">
<img src="//navoshta.com/images/posts/fenton/evga-650-g2.jpg" alt="EVGA SuperNOVA G2 650W" />
</a>
<a href="//navoshta.com/images/posts/fenton/nzxt.jpg" title="NZXT S340 ATX Mid Tower Case">
<img src="//navoshta.com/images/posts/fenton/nzxt.jpg" alt="NZXT S340 ATX Mid Tower Case" />
</a>
</figure>
<p>Let’s break this list down and I will elaborate on some of the choices I made.</p>
<h2 id="video-card">Video Card</h2>
<p>This is the most crucial part. After serious consideration and leveraging the budget I decided to invest into <strong>EVGA GeForce GTX 1080 8GB</strong> card backed by <strong>Nvidia GTX 1080</strong> GPU. It is really snappy (and expensive), and in <a href="http://navoshta.com/cpu-vs-gpu/">this particular case</a> it only takes 15 minutes to run — 3 times faster than a <strong>g2.2xlarge</strong> AWS machine! If you still feel hesitant, think of it this way: the faster your model runs, the more experiments you can carry out over the same period of time.</p>
<h2 id="motherboard">Motherboard</h2>
<p><strong>ASUS Z170 Pro</strong> had some nice reviews, and, most importantly, is capable of handling a maximum of two massive GPUs like GTX 1080. Yes, GTX 1080 is pretty large and is going to take 2 PCI slots on your motherboard — something to keep in mind if you plan to stack them in future. Asus Z170 even supports SLI, although you wouldn’t need it if you are only using GPUs for machine learning tasks. It supports a maximum 64 Gb of RAM which should also be enough if I decide to upgrade.</p>
<h2 id="cpu">CPU</h2>
<p>This part was easy. I simply went with what was not too expensive, and didn’t pursue any outstanding computational power here — this happened to be <strong>Intel Core i5-6400</strong> at the moment. I was thinking of buying a neat and quiet Noctua cooler at first, but the stock one seems to just do the job and is pretty quiet as well, so I never bothered to replace it.</p>
<h2 id="ram">RAM</h2>
<p>I went with <strong>32GB (2 × 16GB) DDR4-3200</strong>, although it actually works at a lower clock rate. However, the important part was to get 2 × 16 Gb modules, so that they only occupy 2 out of 4 available motherboard slots. In this case whenever I realise I need more RAM, I can simply get 2 more memory modules and bump it up to 64 Gb.</p>
<h2 id="storage">Storage</h2>
<p>I decided to go with a <strong>Samsung 1 TB SSD</strong> for a system drive, and that is where OS would go. However currently I use it for everything, and still have an option of adding an additional 4-6 Tb HDD when I start working with fairly large datasets.</p>
<h2 id="power-supply">Power supply</h2>
<p>Since my machine was supposed to be a server, it would be plugged in all the time. <strong>EVGA SuperNOVA G2 650W</strong> has an automatic eco mode for times when you don’t use all of the machine’s power, and is 80+ Gold Certified. Thinking about it now, it would make sense to go up to a 850W for potential upgrades, but 650W is more than enough for now. I would also highly recommend fully-modular power supplies as they are so much eaiser to install.</p>
<h2 id="case">Case</h2>
<p>Main consideration here was to have a case that would support a potential upgrade, e.g. could fit the motherboard I decided to go with. <strong>NZXT S340 ATX Mid Tower Case</strong> however turned out to be a pretty good choice in terms of cable management and looks!</p>
<h2 id="putting-it-together">Putting it together</h2>
<figure class="third ">
<a href="//navoshta.com/images/posts/fenton/piled.jpg" title="Everything piled together">
<img src="//navoshta.com/images/posts/fenton/piled.jpg" alt="Everything piled together" />
</a>
<a href="//navoshta.com/images/posts/fenton/installed.jpg" title="Everything put together">
<img src="//navoshta.com/images/posts/fenton/installed.jpg" alt="Everything put together" />
</a>
<a href="//navoshta.com/images/posts/fenton/install-ubuntu-2.jpg" title="Installing Ubuntu">
<img src="//navoshta.com/images/posts/fenton/install-ubuntu-2.jpg" alt="Installing Ubuntu" />
</a>
</figure>
<p>It took me a couple of hours to put everything together, but in my defense I never did anything like that before, so it would probably take you less if you are familiar with the process. Overall it is a pretty straightforward job, and it seemed like it would take some effort to screw things up big time.</p>
<p>Now, what I like most about this setup is a room for extension. If at some point I decide that it is not enough for my needs, there are a bunch of things I can improve by simply plugging something in, rather than replacing:</p>
<ul>
<li>Install 32Gb more RAM, resulting in 64 Gb altogether.</li>
<li>Install additional storage with a 4-6 Tb HDD.</li>
<li>Install another GPU, resulting in 2 × GTX 1080 setup.</li>
</ul>
<h1 id="software">Software</h1>
<h2 id="operating-system">Operating System</h2>
<p>It was supposed to be a server and it had to support all the modern machine learning libraries and frameworks, so I decided to go with <strong>Ubuntu 16.04</strong> as an operating system. It has a nice CLI, and I am familiar with Unix systems as I have macOS installed on my personal computer. I then installed most of the required frameworks and libraries with <strong><a href="https://www.continuum.io">Anaconda</a></strong> (apart from CUDA dependencies and <strong><a href="https://www.tensorflow.org">TensorFlow</a></strong>), and it was time to make my server accessible.</p>
<h2 id="ssh">SSH</h2>
<p>The easiest way to get hold of your server from other machine is by configuring <strong><a href="https://en.wikipedia.org/wiki/Secure_Shell#Key_management">SSH access with a key</a></strong>. The process is fairly straightforward and is explained in great detail <a href="https://www.digitalocean.com/community/tutorials/how-to-use-ssh-to-connect-to-a-remote-server-in-ubuntu">here</a>, and if you are less familiar with communication commands in Linux, you may want to check out <a href="https://www.guru99.com/communication-in-linux.html">this course</a>. Basically, you want your server to allow SSH connections, authenticating users with a key pair. You generate this key pair on your primary machine (the one you connect from) keeping your <em>private</em> key private, and transfering corresponding <em>public</em> key to the server. You then tell the server that this is <em>your</em> public key and whoever knocks with a corresponding private key must be you.</p>
<p>Now, all of this must now work while you are in the same local network. If you want to make it accessible to the outside world though, you may need to request a static IP from your provider, or install some sort of a <a href="https://en.wikipedia.org/wiki/Dynamic_DNS">dynamic DNS</a> daemon on your server (there are a couple of free services that allow that). You may also want to check your router settings first, as some of them support dynamic DNS services out of the box. Once you get hold of your machine’s domain name or IP, you can open a random port for SSH access in your router settings (and one for the Jupyter Notebook to broadcast its frontend). This is basically all it takes to make your server accessible from anywhere in the world (and this is why it is essential to secure your server with a key).</p>
<p class="notice"><strong>Don’t forget to set SSH keys!</strong> Exposing your server to the outside world is dangerous, for internet is dark and full of terrors. You don’t want those wildlings to hack into your machine.</p>
<h2 id="ssh-file-system">SSH File system</h2>
<p>Although command line may seem like a user-friendly interface to some, there is an alternative way of accessing your server’s file system called <strong>SSH File system</strong>. It allows you to mount a portion of a file system on your remote machine to a local folder. Coolest thing about it is that once it is mounted, you can use any sofware you like to work with these mounted folders, be it IDE or your favourite GUI git client. Things will definitely seem slower, but should overall work just as if you had all those remote files locally.</p>
<p>If your user on the server machine happens to be <code class="language-plaintext highlighter-rouge">tom</code> and server’s IP is <code class="language-plaintext highlighter-rouge">10.23.45.67</code>, this would mount your entire server home directory to <code class="language-plaintext highlighter-rouge">~/your/mount/folder/</code> on your local machine.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sshfs <span class="nt">-o</span> delay_connect,reconnect,ServerAliveInterval<span class="o">=</span>5,ServerAliveCountMax<span class="o">=</span>3,allow_other,defer_permissions,IdentityFile<span class="o">=</span>/local/path/to/private/key tom@10.23.45.67:/home/tom ~/your/mount/folder/
</code></pre></div></div>
<p>Here <code class="language-plaintext highlighter-rouge">/local/path/to/private/key</code> is, well, your local path to the private key for SSH access. Keep an eye on all those settings as they are supposed to make remote partition more stable in terms of retaining connection. Finally, this is how you unmount your server file system.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>umount tom@10.23.45.67:/ &> /dev/null
</code></pre></div></div>
<p class="notice"><strong>Disclaimer:</strong> Keep in mind that many operations may seem way slower in macOS <em>Finder</em> as opposed to ssh-ing into the machine and using CLI. For instance, if you want to unzip an archive with a lot of files (say, a dataset) which is physically stored on your server, you may be tempted to open enclosing folder in <em>Finder</em> and open with <em>Archive Utility</em>. However this would be painfully slow, and a much faster way to do that would be this (see code below).</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Way faster than double-click in Finder</span>
ssh tom@10.23.45.67
<span class="nb">sudo </span>apt-get <span class="nb">install </span>unzip
<span class="nb">cd</span> ~/path/to/archive/folder/
unzip archive.zip
</code></pre></div></div>
<h2 id="jupyter-notebook">Jupyter Notebook</h2>
<p>Jupyter already has this amazingly flexible concept of using web pages as a frontend, essentially allowing to run its backend anywhere. Setup and configuration were mentioned <a href="http://navoshta.com/aws-tensorflow/">in this post</a>, however you may want to take it one step further and make sure Jupyter is running even if you disconnect from your server. I use <strong><a href="https://www.iterm2.com">iTerm</a></strong> as a terminal in macOS, which supports <strong><a href="https://en.wikipedia.org/wiki/Tmux">tmux</a></strong> sessions out of the box, which allows me to simply do the following to connect to a long-living SSH session.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh <span class="nt">-t</span> tom@10.23.45.67 <span class="s1">'tmux -CC attach'</span>
</code></pre></div></div>
<p>This would present a window attached to a tmux session, where you can start Jupyter Notebook server.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jupyter notebook
</code></pre></div></div>
<p>You can now close the window — Jupyter process will always stay there, whether you are connected to the remote machine over SSH or not. And, of course, you can always get back to it by attaching to the same <em>tmux</em> session.</p>
<p class="notice"><strong>Don’t forget to set password!</strong> A wise thing to do would be configuring a password for Jupyter’s web interface access. Make sure to check out <a href="http://navoshta.com/aws-tensorflow/">my AWS post</a> where I describe it in more detail.</p>
<h2 id="pycharm">PyCharm</h2>
<p><strong><a href="https://www.jetbrains.com/pycharm/">PyCharm</a></strong> is my favourite Python IDE, <strong>PyCharm Community Edition</strong> is free but doesn’t support remote interpreters unfortunately, however <strong>PyCharm Professional</strong> does (and is not too expensive). You need to go through a cumbersome configuration of your project (which is described in depth <a href="https://medium.com/@erikhallstrm/work-remotely-with-pycharm-tensorflow-and-ssh-c60564be862d#.7sr7uresx">here</a>), but as a result you can work with your source code locally, and run it with a remote interpreter, leaving automatic syncing and deployment to PyCharm.</p>
<h2 id="monitoring">Monitoring</h2>
<p>Finally, I suggest installing a monitoring daemon on your remote machine, so that you can periodically check useful stats like CPU load, memory consumption, disk and network activity, etc. Ideally you want to monitor your GPU sensors as well, however I didn’t find any daemon-like monitoring software allowing that on Ubuntu — maybe you will have better luck with it.</p>
<p>What I decided to go with was <strong><a href="https://bjango.com/ios/istat/">iStat</a></strong>, which works with a wide range of sensors (Nvidia GPU sensor is not on the list unfortunately) and has a nice companion iOS app. This is what the training process looks like, for instance: CPU is busy with some heavy on-the-go data augmentation, so you can see iStat’s CPU load graph exposing training epochs spikes.</p>
<figure class="half ">
<a href="//navoshta.com/images/posts/fenton/istat-1.jpg" title="iStat for iOS">
<img src="//navoshta.com/images/posts/fenton/istat-1.jpg" alt="iStat for iOS" />
</a>
<a href="//navoshta.com/images/posts/fenton/istat-2.jpg" title="iStat for iOS">
<img src="//navoshta.com/images/posts/fenton/istat-2.jpg" alt="iStat for iOS" />
</a>
</figure>
<h1 id="pick-a-name">Pick a name</h1>
<p>Arguably the most important step is picking your machine’s name. I named mine after <a href="https://www.youtube.com/watch?v=3GRSbr0EYYU">this famous dog</a>, probably because when making my first steps in data science, whenever my algorithm failed to learn I felt just as desperate and helpless as Fenton’s owner. Fortunately, this happens less and less often these days!</p>
<p align="center">
<img src="/images/posts/fenton/telegram_bot.jpg" alt="Telegram Bot" style="width: 375px;" />
</p>
<p style="text-align: center;" class="small">Fenton is a good <strike>bot</strike> boy, sending me messenger notifications when it finishes training</p>
<p>I also wrote a tiny shell script to make connecting to a remote machine easier. It allows to SSH into it, mount its file system, or attach to a <em>tmux</em> session.</p>
<script src="https://gist.github.com/alexstaravoitau/e7860838e769dfed835418b38d8e069c.js"></script>
<p>Update user/server/path settings, put this file to <code class="language-plaintext highlighter-rouge">/usr/local/bin</code> and make it executable.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Make fenton.sh executable</span>
<span class="nb">chmod</span> +x fenton.sh
</code></pre></div></div>
<p>You may also want to remove file extension to do less typing in the CLI. Here is a list of available commands.</p>
<table>
<thead>
<tr>
<th>Command</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">fenton</code></td>
<td>Connects via SSH.</td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">fenton -fs</code></td>
<td>Mounts remote machine file system to <code class="language-plaintext highlighter-rouge">LOCAL_MOUNT_PATH</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">fenton -jn</code></td>
<td>Attaches to a persistent <em>tmux</em> session, where I typically have my Jupyter Notebook running.</td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">fenton jesus christ</code></td>
<td>Couldn’t resist adding this one. Opens the Fenton video on YouTube.</td>
</tr>
</tbody>
</table>
<p>You are all set! Having your own dedicated machine allows you to do incredible things, like kicking off a background training job that is expected to run for hours or days, periodically checking on it. You could even receive notifications and updates on how the training is going using <a href="http://navoshta.com/cloud-log/">my cloud logger</a>! The main thing however is that you don’t need to worry anymore that your personal computer is not powerful enough for machine learning tasks, since there is a ton of computational power always accessible to you from anywhere in the world.</p>Alex StaravoitauThis is how I built and configured my dedicated data science machine that acts as a remote backend for Jupyter Notebook and PyCharm. It is backed by a powerful Nvidia GPU and is accessible from anywhere, so that when it comes to machine learning tasks I am no longer constrained by my personal computer hardware performance.End-to-end learning for self-driving cars2017-02-05T00:00:00+00:002017-02-05T00:00:00+00:00//navoshta.com/end-to-end-deep-learning<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-none"></i> Contents</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#dataset" id="markdown-toc-dataset">Dataset</a> <ul>
<li><a href="#data-collection" id="markdown-toc-data-collection">Data collection</a></li>
<li><a href="#balancing-dataset" id="markdown-toc-balancing-dataset">Balancing dataset</a></li>
<li><a href="#data-augmentation" id="markdown-toc-data-augmentation">Data augmentation</a></li>
</ul>
</li>
<li><a href="#model" id="markdown-toc-model">Model</a></li>
<li><a href="#results" id="markdown-toc-results">Results</a></li>
</ul>
</nav>
</aside>
<p>I’m assuming you already know a fair bit about neural networks and regularization, as I won’t go into too much detail about their background and how they work. I am using <strong>Keras</strong> with TensorFlow backend as a ML framework and a couple of dependancies like <code class="language-plaintext highlighter-rouge">numpy</code>, <code class="language-plaintext highlighter-rouge">pandas</code> and <code class="language-plaintext highlighter-rouge">scikit-image</code>. You may want to check out <a href="https://github.com/alexstaravoitau/behavioral-cloning" target="_blank">code of the final solution</a> I am describing in this tutorial, however keep in mind that if you would like to follow along, you may as well need a machine with a CUDA-capable GPU.</p>
<p class="notice">Training a model to drive a car in a simulator is one of the assignments in <a href="http://udacity.com/drive"><strong>Udacity Self-Driving Car Nanodegree</strong></a> program, however the concepts described here should be easy to follow even without that context.</p>
<h2 id="dataset">Dataset</h2>
<p>The provided driving simulator had two different tracks. One of them was used for collecting training data, and the other one — never seen by the model — as a substitute for test set.</p>
<h3 id="data-collection">Data collection</h3>
<p>The driving simulator would save frames from three front-facing “cameras”, recording data from the car’s point of view; as well as various driving statistics like throttle, speed and steering angle. We are going to use camera data as model input and expect it to predict the steering angle in the <code class="language-plaintext highlighter-rouge">[-1, 1]</code> range.</p>
<p>I have collected a dataset containing approximately <strong>1 hour worth of driving data</strong> around one of the given tracks. This would contain both driving in <em>“smooth”</em> mode (staying right in the middle of the road for the whole lap), and <em>“recovery”</em> mode (letting the car drive off center and then interfering to steer it back in the middle).</p>
<h3 id="balancing-dataset">Balancing dataset</h3>
<p>Just as one would expect, resulting dataset was extremely unbalanced and had a lot of examples with steering angles close to <code class="language-plaintext highlighter-rouge">0</code> (e.g. when the wheel is “at rest” and not steering while driving in a straight line). So I applied a designated random sampling which ensured that the data is as balanced across steering angles as possible. This process included splitting steering angles into <code class="language-plaintext highlighter-rouge">n</code> bins and using at most <code class="language-plaintext highlighter-rouge">200</code> frames for each bin:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">read_csv</span><span class="p">(</span><span class="s">'data/driving_log.csv'</span><span class="p">)</span>
<span class="n">balanced</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">()</span> <span class="c1"># Balanced dataset
</span><span class="n">bins</span> <span class="o">=</span> <span class="mi">1000</span> <span class="c1"># N of bins
</span><span class="n">bin_n</span> <span class="o">=</span> <span class="mi">200</span> <span class="c1"># N of examples to include in each bin (at most)
</span>
<span class="n">start</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">end</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="n">bins</span><span class="p">):</span>
<span class="n">df_range</span> <span class="o">=</span> <span class="n">df</span><span class="p">[(</span><span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">(</span><span class="n">df</span><span class="p">.</span><span class="n">steering</span><span class="p">)</span> <span class="o">>=</span> <span class="n">start</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">absolute</span><span class="p">(</span><span class="n">df</span><span class="p">.</span><span class="n">steering</span><span class="p">)</span> <span class="o"><</span> <span class="n">end</span><span class="p">)]</span>
<span class="n">range_n</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">bin_n</span><span class="p">,</span> <span class="n">df_range</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">balanced</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">([</span><span class="n">balanced</span><span class="p">,</span> <span class="n">df_range</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">range_n</span><span class="p">)])</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">end</span>
<span class="n">balanced</span><span class="p">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s">'data/driving_log_balanced.csv'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<p>Histogram of the resulting dataset looks fairly balanced across most “popular” steering angles.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/end-to-end-deep-learning/training_dataset_hist.png" alt="image-center" class="align-center" />
Dataset histogram</p>
<p>Please, mind that we are balancing dataset across <em>absolute</em> values, as by applying horizontal flip during augmentation we end up using both positive and negative steering angles for each frame.</p>
<h3 id="data-augmentation">Data augmentation</h3>
<p>After balancing ~1 hour worth of driving data we ended up with <strong>7,698 samples</strong>, which most likely wouldn’t be enough for the model to generalise well. However, as many pointed out, there a couple of augmentation tricks that should let you extend the dataset significantly:</p>
<ul>
<li><strong>Left and right cameras</strong>. Along with each sample we receive frames from 3 camera positions: left, center and right. Although we are only going to use central camera while driving, we can still use left and right cameras data during training after applying steering angle correction, increasing number of examples by a factor of 3.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cameras</span> <span class="o">=</span> <span class="p">[</span><span class="s">'left'</span><span class="p">,</span> <span class="s">'center'</span><span class="p">,</span> <span class="s">'right'</span><span class="p">]</span>
<span class="n">steering_correction</span> <span class="o">=</span> <span class="p">[.</span><span class="mi">25</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="o">-</span><span class="p">.</span><span class="mi">25</span><span class="p">]</span>
<span class="n">camera</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">cameras</span><span class="p">))</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">mpimg</span><span class="p">.</span><span class="n">imread</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">cameras</span><span class="p">[</span><span class="n">camera</span><span class="p">]].</span><span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="n">angle</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">steering</span><span class="p">.</span><span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">steering_correction</span><span class="p">[</span><span class="n">camera</span><span class="p">]</span>
</code></pre></div></div>
<ul>
<li><strong>Horizontal flip</strong>. For every batch we flip half of the frames horizontally and change the sign of the steering angle, thus yet increasing number of examples by a factor of 2.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">flip_indices</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="mi">2</span><span class="p">))</span>
<span class="n">x</span><span class="p">[</span><span class="n">flip_indices</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">flip_indices</span><span class="p">,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">y</span><span class="p">[</span><span class="n">flip_indices</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="n">y</span><span class="p">[</span><span class="n">flip_indices</span><span class="p">]</span>
</code></pre></div></div>
<ul>
<li><strong>Vertical shift</strong>. We cut out insignificant top and bottom portions of the image during preprocessing, and choosing the amount of frame to crop at random should increase the ability of the model to generalise.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">top</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(.</span><span class="mi">325</span><span class="p">,</span> <span class="p">.</span><span class="mi">425</span><span class="p">)</span> <span class="o">*</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">bottom</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(.</span><span class="mi">075</span><span class="p">,</span> <span class="p">.</span><span class="mi">175</span><span class="p">)</span> <span class="o">*</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">[</span><span class="n">top</span><span class="p">:</span><span class="o">-</span><span class="n">bottom</span><span class="p">,</span> <span class="p">:]</span>
</code></pre></div></div>
<ul>
<li><strong>Random shadow</strong>. We add a random vertical “shadow” by decreasing brightness of a frame slice, hoping to make the model invariant to actual shadows on the road.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">h</span><span class="p">,</span> <span class="n">w</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="p">[</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">k</span> <span class="o">=</span> <span class="n">h</span> <span class="o">/</span> <span class="p">(</span><span class="n">x2</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="o">-</span> <span class="n">k</span> <span class="o">*</span> <span class="n">x1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">h</span><span class="p">):</span>
<span class="n">c</span> <span class="o">=</span> <span class="nb">int</span><span class="p">((</span><span class="n">i</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span> <span class="o">/</span> <span class="n">k</span><span class="p">)</span>
<span class="n">image</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:</span><span class="n">c</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="p">(</span><span class="n">image</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:</span><span class="n">c</span><span class="p">,</span> <span class="p">:]</span> <span class="o">*</span> <span class="p">.</span><span class="mi">5</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">int32</span><span class="p">)</span>
</code></pre></div></div>
<p>We then preprocess each frame by cropping top and bottom of the image and resizing to a shape our model expects (<code class="language-plaintext highlighter-rouge">32×128×3</code>, RGB pixel intensities of a 32×128 image). The resizing operation also takes care of scaling pixel values to <code class="language-plaintext highlighter-rouge">[0, 1]</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="n">skimage</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
</code></pre></div></div>
<p>To make a better sense of it, let’s consider an example of a <strong>single recorded sample</strong> that we turn into <strong>16 training samples</strong> by using frames from all three cameras and applying aforementioned augmentation pipeline.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/end-to-end-deep-learning/frames_original.png" alt="image-center" class="align-center" />
Original frames</p>
<p style="text-align: center;" class="small"><img src="/images/posts/end-to-end-deep-learning/frames_augmented.png" alt="image-center" class="align-center" />
Augmented and preprocessed frames</p>
<p>Augmentation pipeline is applied in <a href="https://github.com/alexstaravoitau/behavioral-cloning/blob/master/data.py" target="_blank"><code class="language-plaintext highlighter-rouge">data.py</code></a> using a Keras generator, which lets us do it in real-time on CPU while GPU is busy backpropagating!</p>
<h2 id="model">Model</h2>
<p>I started with the model described in <a href="https://arxiv.org/abs/1604.07316" target="_blank">Nvidia paper</a> and kept simplifying and optimising it while making sure it performs well on both tracks. It was clear we wouldn’t need that complicated model, as the data we are working with is way simpler and much more constrained than the one Nvidia team had to deal with when running their model. Eventually I settled on a fairly simple architecture with <strong>3 convolutional layers and 3 fully connected layers</strong>.</p>
<figure>
<a href="/images/posts/end-to-end-deep-learning/model.png"><img src="/images/posts/end-to-end-deep-learning/model.png" /></a>
</figure>
<p style="text-align: center;" class="small">Model architecture</p>
<p>This model can be very briefly encoded with Keras.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">keras</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">core</span><span class="p">,</span> <span class="n">convolutional</span><span class="p">,</span> <span class="n">pooling</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">convolutional</span><span class="p">.</span><span class="n">Convolution2D</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">pooling</span><span class="p">.</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">convolutional</span><span class="p">.</span><span class="n">Convolution2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">pooling</span><span class="p">.</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">convolutional</span><span class="p">.</span><span class="n">Convolution2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">pooling</span><span class="p">.</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">core</span><span class="p">.</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">core</span><span class="p">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">500</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">core</span><span class="p">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">core</span><span class="p">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">core</span><span class="p">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>
<p>I added dropout on 2 out of 3 dense layers to prevent overfitting, and the model proved to generalise quite well. The model was trained using <strong>Adam optimiser</strong> with a <strong>learning rate = <code class="language-plaintext highlighter-rouge">1e-04</code></strong> and <strong>mean squared error</strong> as a loss function. I used 20% of the training data for validation (which means that we only used <strong>6,158 out of 7,698 examples</strong> for training), and the model seems to perform quite well after training for <strong>~20 epochs</strong> — you can find the code related to training in <a href="https://github.com/alexstaravoitau/behavioral-cloning/blob/master/model.py" target="_blank"><code class="language-plaintext highlighter-rouge">model.py</code></a>.</p>
<h2 id="results">Results</h2>
<p>The car manages to drive just fine on both tracks pretty much endlessly. It rarely goes off the middle of the road, this is what driving looks like on track 2 (previously unseen).</p>
<p style="text-align: center;" class="small"><img src="/images/posts/end-to-end-deep-learning/track_2.gif" alt="image-center" class="align-center" />
Driving autonomously on a previously unseen track</p>
<p>You can check out a longer <a href="https://www.youtube.com/watch?v=J72Q9A0GeEo" target="_blank">highlights compilation video</a> of the car driving itself on both tracks.</p>
<p>Clearly this is a very basic example of end-to-end learning for self-driving cars, nevertheless it should give a rough idea of what these models are capable of, even considering all limitations of training and validating solely on a virtual driving simulator.</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau" data-style="mega" data-count-href="/navoshta/followers" data-count-api="/users/navoshta#followers" data-count-aria-label="# followers on GitHub" aria-label="Follow @alexstaravoitau on GitHub">Follow @alexstaravoitau</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/behavioral-cloning" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/behavioral-cloning/stargazers" data-count-api="/repos/navoshta/behavioral-cloning#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/behavioral-cloning on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/behavioral-cloning/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/behavioral-cloning/network" data-count-api="/repos/navoshta/behavioral-cloning#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/behavioral-cloning on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/behavioral-cloning/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/behavioral-cloning on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>Alex StaravoitauThe goal of this project was to train a end-to-end deep learning model that would let a car drive itself around the track in a driving simulator. The approach I took was based on a paper by Nvidia research team with a significantly simplified architecture that was optimised for this specific project.Traffic signs classification with a convolutional network2017-01-15T00:00:00+00:002017-01-15T00:00:00+00:00//navoshta.com/traffic-signs-classification<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-none"></i> Contents</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#dataset" id="markdown-toc-dataset">Dataset</a></li>
<li><a href="#preprocessing" id="markdown-toc-preprocessing">Preprocessing</a></li>
<li><a href="#augmentation" id="markdown-toc-augmentation">Augmentation</a> <ul>
<li><a href="#flipping" id="markdown-toc-flipping">Flipping</a></li>
<li><a href="#rotation-and-projection" id="markdown-toc-rotation-and-projection">Rotation and projection</a></li>
</ul>
</li>
<li><a href="#model" id="markdown-toc-model">Model</a> <ul>
<li><a href="#architecture" id="markdown-toc-architecture">Architecture</a></li>
<li><a href="#regularization" id="markdown-toc-regularization">Regularization</a></li>
<li><a href="#implementation" id="markdown-toc-implementation">Implementation</a></li>
</ul>
</li>
<li><a href="#training" id="markdown-toc-training">Training</a></li>
<li><a href="#visualization" id="markdown-toc-visualization">Visualization</a></li>
<li><a href="#results" id="markdown-toc-results">Results</a></li>
</ul>
</nav>
</aside>
<p>I’m assuming you already know a fair bit about neural networks and regularization, as I won’t go into too much detail about their background and how they work. I am using <strong>TensorFlow</strong> as a ML framework and a couple of dependencies like <code class="language-plaintext highlighter-rouge">numpy</code>, <code class="language-plaintext highlighter-rouge">matplotlib</code> and <code class="language-plaintext highlighter-rouge">scikit-image</code>. In case you are not familiar with TensorFlow, make sure to check out <a href="http://navoshta.com/facial-with-tensorflow/" target="_blank">my recent post</a> about its core concepts.</p>
<p>If you would like to follow along, you may as well need a machine with a CUDA-capable GPU and all dependencies installed. Here is a <a href="https://github.com/alexstaravoitau/traffic-signs/blob/master/Traffic_Signs_Recognition.ipynb" target="_blank">Jupyter notebook with the final solution</a> I am describing in this tutorial, presumably if you go through all the cells you should get the same results.</p>
<h2 id="dataset">Dataset</h2>
<p>The <a href="http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset" target="_blank">German Traffic Sign Dataset</a> consists of <strong>39,209 32×32 px color images</strong> that we are supposed to use for training, and <strong>12,630 images</strong> that we will use for testing. Each image is a photo of a traffic sign belonging to one of 43 classes, e.g. traffic sign types.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/HiojuukJimAAAAAElFTkSuQmCC.png" alt="image-center" class="align-center" />
Random dataset sample</p>
<p>Each image is a 32×32×3 array of pixel intensities, represented as <code class="language-plaintext highlighter-rouge">[0, 255]</code> integer values in RGB color space. Class of each image is encoded as an integer in a 0 to 42 range. Let’s check if the training dataset is balanced across classes.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/yGIoVOF9s+D6SauJlGkmSVCsv00iSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFqZjEiSpFr9f+oLc6HSvr24AAAAAElFTkSuQmCC.png" alt="image-center" class="align-center" />
Dataset classes distribution</p>
<p>Apparently dataset is very unbalanced, and some classes are represented significantly better than the others. Let’s now plot a bunch of random images for various classes to see what we are working with.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/wGGNjp6MlRqbwAAAABJRU5ErkJggg==.png" alt="image-center" class="align-center" />
Yield</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/CxM7UvcMAPAAAAAElFTkSuQmCC.png" alt="image-center" class="align-center" />
No entry</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/lPr0ICbgAAAABJRU5ErkJggg==.png" alt="image-center" class="align-center" />
General caution</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/wetDaG2jcBk+gAAAABJRU5ErkJggg==.png" alt="image-center" class="align-center" />
Roundabout mandatory</p>
<p>The images differ significantly in terms of contrast and brightness, so we will need to apply some kind of histogram equalization, this should noticeably improve feature extraction.</p>
<h2 id="preprocessing">Preprocessing</h2>
<p>The usual preprocessing in this case would include scaling of pixel values to <code class="language-plaintext highlighter-rouge">[0, 1]</code> (as currently they are in <code class="language-plaintext highlighter-rouge">[0, 255]</code> range), representing labels in a one-hot encoding and shuffling. Looking at the images, histogram equalization may be helpful as well. We will apply <em>localized</em> histogram equalization, as it seems to improve feature extraction even further in our case.</p>
<p>I will only use a single channel in my model, e.g. grayscale images instead of color ones. As Pierre Sermanet and Yann LeCun mentioned in <a href="http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf" target="_blank">their paper</a>, using color channels didn’t seem to improve things a lot, so I will only take <code class="language-plaintext highlighter-rouge">Y</code> channel of the <code class="language-plaintext highlighter-rouge">YCbCr</code> representation of an image.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">sklearn.utils</span> <span class="kn">import</span> <span class="n">shuffle</span>
<span class="kn">from</span> <span class="nn">skimage</span> <span class="kn">import</span> <span class="n">exposure</span>
<span class="k">def</span> <span class="nf">preprocess_dataset</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="bp">None</span><span class="p">):</span>
<span class="c1">#Convert to grayscale, e.g. single Y channel
</span> <span class="n">X</span> <span class="o">=</span> <span class="mf">0.299</span> <span class="o">*</span> <span class="n">X</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="mf">0.587</span> <span class="o">*</span> <span class="n">X</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mf">0.114</span> <span class="o">*</span> <span class="n">X</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]</span>
<span class="c1">#Scale features to be in [0, 1]
</span> <span class="n">X</span> <span class="o">=</span> <span class="p">(</span><span class="n">X</span> <span class="o">/</span> <span class="mf">255.</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="c1"># Apply localized histogram localization
</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):</span>
<span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">exposure</span><span class="p">.</span><span class="n">equalize_adapthist</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">if</span> <span class="n">y</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="c1"># Convert to one-hot encoding. Convert back with `y = y.nonzero()[1]`
</span> <span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">43</span><span class="p">)[</span><span class="n">y</span><span class="p">]</span>
<span class="c1"># Shuffle the data
</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">shuffle</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="c1"># Add a single grayscale channel
</span> <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="p">,))</span>
<span class="k">return</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span>
</code></pre></div></div>
<p>This is what original and preprocessed images look like:</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/vDGPI83pXTxsYM+yVh7kid5kid5kid5kid5kn8z8jdH5T3JkzzJkzzJkzzJkzzJ71yejLUneZIneZIneZIneZLfY3ky1p7kSZ7kSZ7kSZ7kSX6P5clYe5IneZIneZIneZIn+T2WJ2PtSZ7kSZ7kSZ7kSZ7k91iejLUneZIneZIneZIneZLfY3ky1p7kSZ7kSZ7kSZ7kSX6P5f8DZc6ez8Sy66QAAAAASUVORK5CYII=.png" alt="image-center" class="align-center" />
Original</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/fH5+9Nur3T2bA57T7e90qHNf0r6UWfH3rOyxmHv6bZXOns2m73D4XB8iwYd01kLBAKBQCAQCPw3OL7SPBAIBAKBQCDw3RHOWiAQCAQCgcAJRjhrgUAgEAgEAicY4awFAoFAIBAInGCEsxYIBAKBQCBwghHOWiAQCAQCgcAJRjhrgUAgEAgEAicYfwF7KOG348bCvwAAAABJRU5ErkJggg==.png" alt="image-center" class="align-center" />
Preprocessed</p>
<h2 id="augmentation">Augmentation</h2>
<p>The amount of data we have is not sufficient for a model to generalise well. It is also fairly unbalanced, and some classes are represented to significantly lower extent than the others. But we will fix this with data augmentation!</p>
<h3 id="flipping">Flipping</h3>
<p>First, we are going to apply a couple of tricks to extend our data by <em>flipping</em>. You might have noticed that some traffic signs are invariant to horizontal and/or vertical flipping, which basically means that we can flip an image and it should still be classified as belonging to the same class.</p>
<figure class="align-center" style="width: 500px">
<img src="/images/posts/traffic-signs-classification/aug_flip_h.png" alt="" />
</figure>
<figure class="align-center" style="width: 500px">
<img src="/images/posts/traffic-signs-classification/aug_flip_v.png" alt="" />
</figure>
<p>Some signs can be flipped either way — like <strong>Priority Road</strong> or <strong>No Entry</strong> signs.</p>
<figure class="align-center" style="width: 500px">
<img src="/images/posts/traffic-signs-classification/aug_flip_hv.png" alt="" />
</figure>
<p>Other signs are <em>180° rotation invariant</em>, and to rotate them 180° we will simply first flip them horizontally, and then vertically.</p>
<figure class="align-center" style="width: 500px">
<img src="/images/posts/traffic-signs-classification/aug_flip_h+v.png" alt="" />
</figure>
<p>Finally there are signs that can be flipped, and should then be classified as a sign of some other class. This is still useful, as we can use data of these classes to extend their counterparts.</p>
<figure class="align-center" style="width: 500px">
<img src="/images/posts/traffic-signs-classification/aug_flip_hx.png" alt="" />
Turn left Turn right
</figure>
<p>We are going to use this during augmentation. Let’s prepare a sign-flipping routine.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">def</span> <span class="nf">flip_extend</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="c1"># Classes of signs that, when flipped horizontally, should still be classified as the same class
</span> <span class="n">self_flippable_horizontally</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">18</span><span class="p">,</span> <span class="mi">22</span><span class="p">,</span> <span class="mi">26</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">35</span><span class="p">])</span>
<span class="c1"># Classes of signs that, when flipped vertically, should still be classified as the same class
</span> <span class="n">self_flippable_vertically</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">17</span><span class="p">])</span>
<span class="c1"># Classes of signs that, when flipped horizontally and then vertically, should still be classified as the same class
</span> <span class="n">self_flippable_both</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">32</span><span class="p">,</span> <span class="mi">40</span><span class="p">])</span>
<span class="c1"># Classes of signs that, when flipped horizontally, would still be meaningful, but should be classified as some other class
</span> <span class="n">cross_flippable</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">19</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span>
<span class="p">[</span><span class="mi">33</span><span class="p">,</span> <span class="mi">34</span><span class="p">],</span>
<span class="p">[</span><span class="mi">36</span><span class="p">,</span> <span class="mi">37</span><span class="p">],</span>
<span class="p">[</span><span class="mi">38</span><span class="p">,</span> <span class="mi">39</span><span class="p">],</span>
<span class="p">[</span><span class="mi">20</span><span class="p">,</span> <span class="mi">19</span><span class="p">],</span>
<span class="p">[</span><span class="mi">34</span><span class="p">,</span> <span class="mi">33</span><span class="p">],</span>
<span class="p">[</span><span class="mi">37</span><span class="p">,</span> <span class="mi">36</span><span class="p">],</span>
<span class="p">[</span><span class="mi">39</span><span class="p">,</span> <span class="mi">38</span><span class="p">],</span>
<span class="p">])</span>
<span class="n">num_classes</span> <span class="o">=</span> <span class="mi">43</span>
<span class="n">X_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">3</span><span class="p">]],</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="n">y_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">([</span><span class="mi">0</span><span class="p">],</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">y</span><span class="p">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_classes</span><span class="p">):</span>
<span class="c1"># First copy existing data for this class
</span> <span class="n">X_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">X_extended</span><span class="p">,</span> <span class="n">X</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">c</span><span class="p">],</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># If we can flip images of this class horizontally and they would still belong to said class...
</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">self_flippable_horizontally</span><span class="p">:</span>
<span class="c1"># ...Copy their flipped versions into extended array.
</span> <span class="n">X_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">X_extended</span><span class="p">,</span> <span class="n">X</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">c</span><span class="p">][:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:],</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># If we can flip images of this class horizontally and they would belong to other class...
</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">cross_flippable</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]:</span>
<span class="c1"># ...Copy flipped images of that other class to the extended array.
</span> <span class="n">flip_class</span> <span class="o">=</span> <span class="n">cross_flippable</span><span class="p">[</span><span class="n">cross_flippable</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">c</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>
<span class="n">X_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">X_extended</span><span class="p">,</span> <span class="n">X</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">flip_class</span><span class="p">][:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:],</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># Fill labels for added images set to current class.
</span> <span class="n">y_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">y_extended</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">full</span><span class="p">((</span><span class="n">X_extended</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">y_extended</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">c</span><span class="p">,</span> <span class="n">dtype</span> <span class="o">=</span> <span class="nb">int</span><span class="p">))</span>
<span class="c1"># If we can flip images of this class vertically and they would still belong to said class...
</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">self_flippable_vertically</span><span class="p">:</span>
<span class="c1"># ...Copy their flipped versions into extended array.
</span> <span class="n">X_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">X_extended</span><span class="p">,</span> <span class="n">X_extended</span><span class="p">[</span><span class="n">y_extended</span> <span class="o">==</span> <span class="n">c</span><span class="p">][:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:,</span> <span class="p">:],</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># Fill labels for added images set to current class.
</span> <span class="n">y_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">y_extended</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">full</span><span class="p">((</span><span class="n">X_extended</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">y_extended</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">c</span><span class="p">,</span> <span class="n">dtype</span> <span class="o">=</span> <span class="nb">int</span><span class="p">))</span>
<span class="c1"># If we can flip images of this class horizontally AND vertically and they would still belong to said class...
</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">self_flippable_both</span><span class="p">:</span>
<span class="c1"># ...Copy their flipped versions into extended array.
</span> <span class="n">X_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">X_extended</span><span class="p">,</span> <span class="n">X_extended</span><span class="p">[</span><span class="n">y_extended</span> <span class="o">==</span> <span class="n">c</span><span class="p">][:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:],</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># Fill labels for added images set to current class.
</span> <span class="n">y_extended</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">y_extended</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">full</span><span class="p">((</span><span class="n">X_extended</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">y_extended</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">c</span><span class="p">,</span> <span class="n">dtype</span> <span class="o">=</span> <span class="nb">int</span><span class="p">))</span>
<span class="k">return</span> <span class="p">(</span><span class="n">X_extended</span><span class="p">,</span> <span class="n">y_extended</span><span class="p">)</span>
</code></pre></div></div>
<p>This simple trick lets us extend original <strong>39,209</strong> training examples to <strong>63,538</strong>, nice! And it cost us nothing in terms of data collection or computational resources.</p>
<h3 id="rotation-and-projection">Rotation and projection</h3>
<p>However, it is still not enough, and we need to augment even further. After experimenting with adding random <em>rotation</em>, <em>projection</em>, <em>blur</em>, <em>noize</em> and <em>gamma adjusting</em>, I have used <em>rotation</em> and <em>projection</em> transformations in the pipeline. Projection transform seems to also take care of random shearing and scaling as we randomly position image corners in a <code class="language-plaintext highlighter-rouge">[±delta, ±delta]</code> range.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">skimage.transform</span> <span class="kn">import</span> <span class="n">rotate</span>
<span class="kn">from</span> <span class="nn">skimage.transform</span> <span class="kn">import</span> <span class="n">warp</span>
<span class="kn">from</span> <span class="nn">skimage.transform</span> <span class="kn">import</span> <span class="n">ProjectiveTransform</span>
<span class="k">def</span> <span class="nf">rotate</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">intensity</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])):</span>
<span class="n">delta</span> <span class="o">=</span> <span class="mf">30.</span> <span class="o">*</span> <span class="n">intensity</span> <span class="c1"># scale using augmentation intensity
</span> <span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">rotate</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">delta</span><span class="p">,</span> <span class="n">delta</span><span class="p">),</span> <span class="n">mode</span> <span class="o">=</span> <span class="s">'edge'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">X</span>
<span class="k">def</span> <span class="nf">apply_projection_transform</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">intensity</span><span class="p">):</span>
<span class="n">image_size</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">image_size</span> <span class="o">*</span> <span class="mf">0.3</span> <span class="o">*</span> <span class="n">intensity</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])):</span>
<span class="n">tl_top</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Top left corner, top margin
</span> <span class="n">tl_left</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Top left corner, left margin
</span> <span class="n">bl_bottom</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Bottom left corner, bottom margin
</span> <span class="n">bl_left</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Bottom left corner, left margin
</span> <span class="n">tr_top</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Top right corner, top margin
</span> <span class="n">tr_right</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Top right corner, right margin
</span> <span class="n">br_bottom</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Bottom right corner, bottom margin
</span> <span class="n">br_right</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># Bottom right corner, right margin
</span>
<span class="n">transform</span> <span class="o">=</span> <span class="n">ProjectiveTransform</span><span class="p">()</span>
<span class="n">transform</span><span class="p">.</span><span class="n">estimate</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">((</span>
<span class="p">(</span><span class="n">tl_left</span><span class="p">,</span> <span class="n">tl_top</span><span class="p">),</span>
<span class="p">(</span><span class="n">bl_left</span><span class="p">,</span> <span class="n">image_size</span> <span class="o">-</span> <span class="n">bl_bottom</span><span class="p">),</span>
<span class="p">(</span><span class="n">image_size</span> <span class="o">-</span> <span class="n">br_right</span><span class="p">,</span> <span class="n">image_size</span> <span class="o">-</span> <span class="n">br_bottom</span><span class="p">),</span>
<span class="p">(</span><span class="n">image_size</span> <span class="o">-</span> <span class="n">tr_right</span><span class="p">,</span> <span class="n">tr_top</span><span class="p">)</span>
<span class="p">)),</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">((</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">image_size</span><span class="p">),</span>
<span class="p">(</span><span class="n">image_size</span><span class="p">,</span> <span class="n">image_size</span><span class="p">),</span>
<span class="p">(</span><span class="n">image_size</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">)))</span>
<span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">warp</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">transform</span><span class="p">,</span> <span class="n">output_shape</span><span class="o">=</span><span class="p">(</span><span class="n">image_size</span><span class="p">,</span> <span class="n">image_size</span><span class="p">),</span> <span class="n">order</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">mode</span> <span class="o">=</span> <span class="s">'edge'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">X</span>
</code></pre></div></div>
<p>Please note that we use <code class="language-plaintext highlighter-rouge">edge</code> mode when applying our transformations, to ensure that we don’t have black box around warped image. Let’s check out what the images look like when we apply random augmentation with intensity = <code class="language-plaintext highlighter-rouge">0.75</code>.</p>
<table border="">
<tr>
<td align="center"><b>Original</b></td>
<td align="center"><b>Augmented (intensity = 0.75)</b></td>
</tr>
<tr>
<td><img src="/images/posts/traffic-signs-classification/aug_example_orig_1.png" alt="Original" /></td>
<td><img src="/images/posts/traffic-signs-classification/aug_example_aug_1.png" alt="Augmented" /></td>
</tr>
<tr>
<td><img src="/images/posts/traffic-signs-classification/aug_example_orig_2.png" alt="Original" /></td>
<td><img src="/images/posts/traffic-signs-classification/aug_example_aug_2.png" alt="Augmented" /></td>
</tr>
<tr>
<td><img src="/images/posts/traffic-signs-classification/aug_example_orig_3.png" alt="Original" /></td>
<td><img src="/images/posts/traffic-signs-classification/aug_example_aug_3.png" alt="Augmented" /></td>
</tr>
<tr>
<td><img src="/images/posts/traffic-signs-classification/aug_example_orig_4.png" alt="Original" /></td>
<td><img src="/images/posts/traffic-signs-classification/aug_example_aug_4.png" alt="Augmented" /></td>
</tr>
<tr>
<td><img src="/images/posts/traffic-signs-classification/aug_example_orig_5.png" alt="Original" /></td>
<td><img src="/images/posts/traffic-signs-classification/aug_example_aug_5.png" alt="Augmented" /></td>
</tr>
</table>
<h2 id="model">Model</h2>
<h3 id="architecture">Architecture</h3>
<p>I decided to use a deep neural network classifier as a model, which was inspired by <a href="http://navoshta.com/facial-with-tensorflow/" target="_blank">Daniel Nouri’s tutorial</a> and aforementioned <a href="http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf" target="_blank">Pierre Sermanet / Yann LeCun paper</a>. It is fairly simple and has 4 layers: <strong>3 convolutional layers</strong> for feature extraction and <strong>one fully connected layer</strong> as a classifier.</p>
<p align="center">
<a href="/images/posts/traffic-signs-classification/traffic-signs-architecture.png"><img src="/images/posts/traffic-signs-classification/traffic-signs-architecture.png" /></a>
</p>
<p style="text-align: center;" class="small">Model architecture</p>
<p>As opposed to usual strict feed-forward CNNs I use <strong>multi-scale features</strong>, which means that convolutional layers’ output is not only forwarded into subsequent layer, but is also branched off and fed into classifier (e.g. fully connected layer). Please mind that these branched off layers undergo additional max-pooling, so that all convolutions are proportionally subsampled before going into classifier.</p>
<h3 id="regularization">Regularization</h3>
<p>I use the following regularization techniques to minimize overfitting to training data:</p>
<ul>
<li><strong>Dropout</strong>. Dropout is amazing and will drastically improve generalization of your model. Normally you may only want to apply dropout to fully connected layers, as shared weights in convolutional layers are good regularizers themselves. However, I did notice a slight improvement in performance when using a bit of dropout on convolutional layers, thus left it in, but kept it at minimum:</li>
</ul>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Type Size keep_p Dropout
Layer 1 5x5 Conv 32 0.9 10% of neurons
Layer 2 5x5 Conv 64 0.8 20% of neurons
Layer 3 5x5 Conv 128 0.7 30% of neurons
Layer 4 FC 1024 0.5 50% of neurons
</code></pre></div></div>
<ul>
<li>
<p><strong>L2 Regularization</strong>. I ended up using <strong>lambda = 0.0001</strong> which seemed to perform best. Important point here is that L2 loss should only include weights of the fully connected layers, and normally it doesn’t include bias term. Intuition behind it being that bias term is not contributing to overfitting, as it is not adding any new degree of freedom to a model.</p>
</li>
<li>
<p><strong>Early stopping</strong>. I use early stopping with a patience of <strong>100 epochs</strong> to capture the last best-performing weights and roll back when model starts overfitting training data. I use validation set cross entropy loss as an early stopping metric, intuition behind using it instead of accuracy is that if your model is <em>confident</em> about its predictions it should generalize better.</p>
</li>
</ul>
<h3 id="implementation">Implementation</h3>
<p>I find it helpful defining a structure holding hyperparameters I will be experimenting with and fine-tuning. It makes the process of tuning them easier, and even automate it in some cases.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span>
<span class="n">Parameters</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s">'Parameters'</span><span class="p">,</span> <span class="p">[</span>
<span class="c1"># Data parameters
</span> <span class="s">'num_classes'</span><span class="p">,</span> <span class="s">'image_size'</span><span class="p">,</span>
<span class="c1"># Training parameters
</span> <span class="s">'batch_size'</span><span class="p">,</span> <span class="s">'max_epochs'</span><span class="p">,</span> <span class="s">'log_epoch'</span><span class="p">,</span> <span class="s">'print_epoch'</span><span class="p">,</span>
<span class="c1"># Optimisations
</span> <span class="s">'learning_rate_decay'</span><span class="p">,</span> <span class="s">'learning_rate'</span><span class="p">,</span>
<span class="s">'l2_reg_enabled'</span><span class="p">,</span> <span class="s">'l2_lambda'</span><span class="p">,</span>
<span class="s">'early_stopping_enabled'</span><span class="p">,</span> <span class="s">'early_stopping_patience'</span><span class="p">,</span>
<span class="s">'resume_training'</span><span class="p">,</span>
<span class="c1"># Layers architecture
</span> <span class="s">'conv1_k'</span><span class="p">,</span> <span class="s">'conv1_d'</span><span class="p">,</span> <span class="s">'conv1_p'</span><span class="p">,</span>
<span class="s">'conv2_k'</span><span class="p">,</span> <span class="s">'conv2_d'</span><span class="p">,</span> <span class="s">'conv2_p'</span><span class="p">,</span>
<span class="s">'conv3_k'</span><span class="p">,</span> <span class="s">'conv3_d'</span><span class="p">,</span> <span class="s">'conv3_p'</span><span class="p">,</span>
<span class="s">'fc4_size'</span><span class="p">,</span> <span class="s">'fc4_p'</span>
<span class="p">])</span>
</code></pre></div></div>
<p>Let’s first declare a couple of helpful TensorFlow routines that implement individual types of layers.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>
<span class="k">def</span> <span class="nf">fully_connected</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'weights'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="nb">input</span><span class="p">.</span><span class="n">get_shape</span><span class="p">()[</span><span class="mi">1</span><span class="p">],</span> <span class="n">size</span><span class="p">],</span>
<span class="n">initializer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">contrib</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">xavier_initializer</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">biases</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'biases'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="n">size</span><span class="p">],</span>
<span class="n">initializer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant_initializer</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">matmul</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">weights</span><span class="p">)</span> <span class="o">+</span> <span class="n">biases</span>
<span class="k">def</span> <span class="nf">fully_connected_relu</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">fully_connected</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">conv_relu</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="n">depth</span><span class="p">):</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'weights'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="n">kernel_size</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="nb">input</span><span class="p">.</span><span class="n">get_shape</span><span class="p">()[</span><span class="mi">3</span><span class="p">],</span> <span class="n">depth</span><span class="p">],</span>
<span class="n">initializer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">contrib</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">xavier_initializer</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">biases</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'biases'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="n">depth</span><span class="p">],</span>
<span class="n">initializer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant_initializer</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">conv</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">conv2d</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">weights</span><span class="p">,</span>
<span class="n">strides</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">padding</span> <span class="o">=</span> <span class="s">'SAME'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">conv</span> <span class="o">+</span> <span class="n">biases</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">pool</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">max_pool</span><span class="p">(</span>
<span class="nb">input</span><span class="p">,</span>
<span class="n">ksize</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">strides</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">padding</span> <span class="o">=</span> <span class="s">'SAME'</span>
<span class="p">)</span>
</code></pre></div></div>
<p>I am using Xavier initializer, which automatically determines the scale of initialization based on the layers’ dimensions, hence there are less parameter we need to experiment with.</p>
<p>We can now encode the model, getting most of variable scopes, which makes code easier to read and maintain. This method will perform a full model pass.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">model_pass</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">is_training</span><span class="p">):</span>
<span class="s">"""
Performs a full model pass.
Parameters
----------
input : Tensor
Batch of examples.
params : Parameters
Structure (`namedtuple`) containing model parameters.
is_training : Tensor of type tf.bool
Flag indicating if we are training or not (e.g. whether to use dropout).
Returns
-------
Tensor with predicted logits.
"""</span>
<span class="c1"># Convolutions
</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'conv1'</span><span class="p">):</span>
<span class="n">conv1</span> <span class="o">=</span> <span class="n">conv_relu</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">kernel_size</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv1_k</span><span class="p">,</span> <span class="n">depth</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv1_d</span><span class="p">)</span>
<span class="n">pool1</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">conv1</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">pool1</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">is_training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">pool1</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv1_p</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">pool1</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'conv2'</span><span class="p">):</span>
<span class="n">conv2</span> <span class="o">=</span> <span class="n">conv_relu</span><span class="p">(</span><span class="n">pool1</span><span class="p">,</span> <span class="n">kernel_size</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv2_k</span><span class="p">,</span> <span class="n">depth</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv2_d</span><span class="p">)</span>
<span class="n">pool2</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">conv2</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">pool2</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">is_training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">pool2</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv2_p</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">pool2</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'conv3'</span><span class="p">):</span>
<span class="n">conv3</span> <span class="o">=</span> <span class="n">conv_relu</span><span class="p">(</span><span class="n">pool2</span><span class="p">,</span> <span class="n">kernel_size</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv3_k</span><span class="p">,</span> <span class="n">depth</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv3_d</span><span class="p">)</span>
<span class="n">pool3</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">conv3</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">pool3</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">is_training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">pool3</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">conv3_p</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">pool3</span><span class="p">)</span>
<span class="c1"># Fully connected
</span>
<span class="c1"># 1st stage output
</span> <span class="n">pool1</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">pool1</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">4</span><span class="p">)</span>
<span class="n">shape</span> <span class="o">=</span> <span class="n">pool1</span><span class="p">.</span><span class="n">get_shape</span><span class="p">().</span><span class="n">as_list</span><span class="p">()</span>
<span class="n">pool1</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">pool1</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">3</span><span class="p">]])</span>
<span class="c1"># 2nd stage output
</span> <span class="n">pool2</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">pool2</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">shape</span> <span class="o">=</span> <span class="n">pool2</span><span class="p">.</span><span class="n">get_shape</span><span class="p">().</span><span class="n">as_list</span><span class="p">()</span>
<span class="n">pool2</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">pool2</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">3</span><span class="p">]])</span>
<span class="c1"># 3rd stage output
</span> <span class="n">shape</span> <span class="o">=</span> <span class="n">pool3</span><span class="p">.</span><span class="n">get_shape</span><span class="p">().</span><span class="n">as_list</span><span class="p">()</span>
<span class="n">pool3</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">pool3</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">3</span><span class="p">]])</span>
<span class="n">flattened</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">concat</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="n">pool1</span><span class="p">,</span> <span class="n">pool2</span><span class="p">,</span> <span class="n">pool3</span><span class="p">])</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'fc4'</span><span class="p">):</span>
<span class="n">fc4</span> <span class="o">=</span> <span class="n">fully_connected_relu</span><span class="p">(</span><span class="n">flattened</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">fc4_size</span><span class="p">)</span>
<span class="n">fc4</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">is_training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">fc4</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">fc4_p</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">fc4</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'out'</span><span class="p">):</span>
<span class="n">logits</span> <span class="o">=</span> <span class="n">fully_connected</span><span class="p">(</span><span class="n">fc4</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">params</span><span class="p">.</span><span class="n">num_classes</span><span class="p">)</span>
<span class="k">return</span> <span class="n">logits</span>
</code></pre></div></div>
<p>Note that we collect all branched off convolutional layers’ output, flatten and concatenate them before passing over to classifier.</p>
<p class="notice">If you have questions about TensorFlow implementation, make sure to check out <a href="http://navoshta.com/facial-with-tensorflow/" target="_blank">my TensorFlow post</a> about variable scopes, saving and restoring sessions, implementing dropout and other interesting things!</p>
<h2 id="training">Training</h2>
<p>I have generated two datasets for training my model using augmentation pipeline I mentioned earlier:</p>
<ul>
<li><strong>Extended</strong> dataset. This dataset simply contains <strong>20x more data</strong> than the original one — e.g. for each training example we generate 19 additional examples by jittering original image, with <strong>augmentation intensity = 0.75</strong>.</li>
<li><strong>Balanced</strong> dataset. This dataset is balanced across classes and has <strong>20.000 examples</strong> for each class. These 20k contain original training dataset, as well as jittered images from the original training set (with <strong>augmentation intensity = 0.75</strong>) to complete number of examples for each class to 20.000 images.</li>
</ul>
<p class="notice"><strong>Disclaimer:</strong> Training on <strong>extended</strong> dataset may not be the best idea, as some classes remain significantly less represented than the others there. Training a model with this dataset would make it biased towards predicting overrepresented classes. However, in our case we are trying to score highest accuracy on supplied test dataset, which (probably) follows the same classes distribution. So we are going to <em>cheat</em> a bit and use this extended dataset for pre-training — this has proven to make test set accuracy higher (although hardly makes a model perform better “in the field”!).</p>
<p>I then use 25% of these augmented datasets for validation while training in 2 stages:</p>
<ul>
<li><strong>Stage 1: Pre-training</strong>. On the first stage I pre-train the model using <strong>extended</strong> training dataset with TensorFlow <code class="language-plaintext highlighter-rouge">AdamOptimizer</code> and learning rate set to <strong>0.001</strong>. It normally stops improving after ~180 epochs, which takes ~3.5 hours on <a href="http://navoshta.com/meet-fenton/">my machine</a> equipped with Nvidia GTX 1080 GPU.</li>
<li><strong>Stage 2: Fine-tuning</strong>. I then train the model using a <strong>balanced</strong> dataset with a decreased learning rate of <strong>0.0001</strong>.</li>
</ul>
<p>These two training stages could easily get you past 99% accuracy on the test set. You can, however, improve model performance even further by re-generating <strong>balanced</strong> dataset with slightly decreased augmentation intensity and repeating 2nd fine-tuning stage a couple of times.</p>
<h2 id="visualization">Visualization</h2>
<p>As an illustration of what a trained neural network looks like, let’s plot weights of the first convolutional layer. First layer has dimensions of <code class="language-plaintext highlighter-rouge">5×5×1×32</code>, which means that it consists of <strong>32 5×5 filters</strong> — we can visualize them as 32 5×5 px grayscale images.</p>
<table border="">
<caption><b>5×5 convolutional filters of the first layer</b></caption>
<tr>
<td><img src="/images/posts/traffic-signs-classification/conv1_weights_raw.png" alt="Raw" /></td>
<td><img src="/images/posts/traffic-signs-classification/conv1_weights_interpolated.png" alt="Interpolated" /></td>
</tr>
<tr>
<td align="center">Raw</td>
<td align="center">Interpolated</td>
</tr>
</table>
<p>We usually expect the first layer to contain filters that can detect very basic pixel patterns, like edges and lines. These basic filters are then used by subsequent layers as building bricks to construct detectors of more complicated patterns and figures.</p>
<h2 id="results">Results</h2>
<p>After a couple of fine-tuning training iterations this model scored <strong>99.33% accuracy on the test set</strong>, which is not too bad. As there was a total of 12,630 images that we used for testing, apparently there are <strong>85 examples</strong> that the model could not classify correctly — let’s take a look at those bad boys!</p>
<table border="">
<caption><b>Remaining 85 errors out of 12,630 samples of the test set</b></caption>
<tr>
<td><img src="/images/posts/traffic-signs-classification/8DKqcJ3Ir9U8IAAAAASUVORK5CYII=.png" alt="Original" /></td>
<td><img src="/images/posts/traffic-signs-classification/L+aiejvF2sYAAAAASUVORK5CYII=.png" alt="Preprocessed" /></td>
</tr>
<tr>
<td align="center">Original</td>
<td align="center">Preprocessed</td>
</tr>
</table>
<p>Signs on most of the images either have artefacts like shadows or obstructing objects. There are, however, a couple of signs that were simply underrepresented in the training set — training solely on balanced datasets could potentially eliminate this issue, and using some sort of color information could definitely help as well.</p>
<p>Finally, this model provides mildly interesting predictions for types of signs it wasn’t trained for.</p>
<p style="text-align: center;" class="small"><img src="/images/posts/traffic-signs-classification/elderly_sign_prediction.png" alt="image-center" class="align-center" />
Predictions for a new type of sign</p>
<p>To clarify, this <strong>Elderly crossing</strong> sign was not among those 43 classes this model was trained for, yet what we see here is a reasonable assumption that it looks a lot like <strong>Road narrows on the right</strong> sign. Ironically, classifier’s second guess was that this <strong>Elderly crossing</strong> sign should be classified as <strong>Children crossing</strong>!</p>
<p>In conclusion, according to different sources human performance on a similar task varies from 98.3% to 98.8%, therefore this model seems to outperform an average human. Which, I believe, is the ultimate goal of machine learning!</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau" data-style="mega" data-count-href="/navoshta/followers" data-count-api="/users/navoshta#followers" data-count-aria-label="# followers on GitHub" aria-label="Follow @alexstaravoitau on GitHub">Follow @alexstaravoitau</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/traffic-signs" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/traffic-signs/stargazers" data-count-api="/repos/navoshta/traffic-signs#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/traffic-signs on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/traffic-signs/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/traffic-signs/network" data-count-api="/repos/navoshta/traffic-signs#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/traffic-signs on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/traffic-signs/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/traffic-signs on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>Alex StaravoitauThis is my attempt to tackle traffic signs classification problem with a convolutional neural network implemented in TensorFlow (reaching **99.33%** accuracy). The highlights of this solution would be data preprocessing, data augmentation, pre-training and skipping connections in the network.Detecting facial keypoints with TensorFlow2017-01-09T00:00:00+00:002017-01-09T00:00:00+00:00//navoshta.com/facial-with-tensorflow<p>This is a TensorFlow follow-along for an amazing <a href="http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/">Deep Learning tutorial</a> by Daniel Nouri. Daniel describes ways of approaching a computer vision problem of detecting facial keypoints in an image using various deep learning techniques, while these techniques gradually build upon each other, demonstrating advantages and limitations of each. <!--more--> I highly recommend going through the steps if you are interested in the topic and prefer learning by example.</p>
<p>However, Daniel uses Lasagne as a machine learning framework, and I’m currently learning to use TensorFlow, so I thought I would publish my follow-along tutorial where I’m utilising the very same approach, but using TensorFlow for building models on each of the steps. Daniel is using a set of different models that tend to gradually get more complicated (and perform better), so I did the same and broke down the tutorial into three Jupyter notebooks:</p>
<ul>
<li><strong>First model: a single hidden layer.</strong> A very simple neural network.</li>
<li><strong>Second model: convolutions.</strong> Convolutional neural network with data augmentation, learning rate decay and dropout.</li>
<li><strong>Third model: training specialists.</strong> A pipeline of specialist CNNs with early stopping and supervised pre-training.</li>
</ul>
<p>Let’s take a look at them and check out the differences when it comes to TensorFlow. You can get the notebooks here:</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau/kaggle-facial-keypoints-detection" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/kaggle-facial-keypoints-detection/stargazers" data-count-api="/repos/navoshta/kaggle-facial-keypoints-detection#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/kaggle-facial-keypoints-detection on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/kaggle-facial-keypoints-detection/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/kaggle-facial-keypoints-detection/network" data-count-api="/repos/navoshta/kaggle-facial-keypoints-detection#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/kaggle-facial-keypoints-detection on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/kaggle-facial-keypoints-detection/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/kaggle-facial-keypoints-detection on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>
<h2 id="first-model-a-single-hidden-layer">First model: a single hidden layer.</h2>
<p>This is a fairly simple model, so it was easy to recreate it in TensorFlow. If you are not familiar with TensorFlow framework, here is how it works: you first build a computation graph, which means you specify all variables you are planning to use, as well as all the relations across those variables. Then you evaluate specific variables from that graph that you are interested in, triggering computation of a path in the graph that leads to them. So in our case we will define a neural network structure and its loss, and will then train it by evaluating a TensorFlow loss optimiser, feeding it with batches of training data over and over again.</p>
<p>First, let’s introduce a couple of handy functions that will help us defining model architecture.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fully_connected</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'weights'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="nb">input</span><span class="p">.</span><span class="n">get_shape</span><span class="p">()[</span><span class="mi">1</span><span class="p">],</span> <span class="n">size</span><span class="p">],</span>
<span class="n">initializer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">contrib</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">xavier_initializer</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">biases</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'biases'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="n">size</span><span class="p">],</span>
<span class="n">initializer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">constant_initializer</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">matmul</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">weights</span><span class="p">)</span> <span class="o">+</span> <span class="n">biases</span>
</code></pre></div></div>
<p>This function performs a single fully connected neural network layer pass. You only need to provide input and define number of units, it will work out the rest and initialise its weights. It’s very handy, since now we can use the same function for defining as many fully connected layers as we like. Let’s define our model structure and use this function for defining a hidden and output layers:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">model_pass</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'hidden'</span><span class="p">):</span>
<span class="n">hidden</span> <span class="o">=</span> <span class="n">fully_connected</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">relu_hidden</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">hidden</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'out'</span><span class="p">):</span>
<span class="n">prediction</span> <span class="o">=</span> <span class="n">fully_connected</span><span class="p">(</span><span class="n">relu_hidden</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">num_keypoints</span><span class="p">)</span>
<span class="k">return</span> <span class="n">prediction</span>
</code></pre></div></div>
<p>This function performs a full model pass. It takes our array of features, passes it over to hidden layer (containing 100 units), then feeds the hidden output to output layer which in its turn produces vector of output values.</p>
<p>Please, note that we used <code class="language-plaintext highlighter-rouge">fully_connected()</code> function defined earlier for both layers, and thanks to TensorFlow’s concept of <code class="language-plaintext highlighter-rouge">variable_scope</code> we didn’t have to specify variables for weights and biases of each. You can think of it this way: in this example we implicitly create variables with the following names:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">hidden/weights</code></li>
<li><code class="language-plaintext highlighter-rouge">hidden/bias</code></li>
<li><code class="language-plaintext highlighter-rouge">out/weights</code></li>
<li><code class="language-plaintext highlighter-rouge">out/bias</code></li>
</ul>
<p>You don’t have to use full names of those variables each time, instead you simply specify a block with <em>variable scope</em> — and whenever you try to get hold of a variable using <code class="language-plaintext highlighter-rouge">tf.get_variable()</code> within that block, the scope would be appended to each of your variables names.</p>
<p>Ok, now let’s define our training graph. First, just as we did for each of the layers, we will use a variable scope for the whole model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># This model has 1 fully connected layer, we train it using batches of 36 examples for 1000 epochs.
</span><span class="n">model_variable_scope</span> <span class="o">=</span> <span class="s">"1fc_b36_e1000"</span>
</code></pre></div></div>
<p>So our variables would now have names like: <code class="language-plaintext highlighter-rouge">1fc_b36_e1000/hidden/weights</code>, <code class="language-plaintext highlighter-rouge">1fc_b36_e1000/hidden/bias</code> and so on.</p>
<p>Next thing we initialise a graph.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">graph</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">Graph</span><span class="p">()</span>
<span class="k">with</span> <span class="n">graph</span><span class="p">.</span><span class="n">as_default</span><span class="p">():</span>
<span class="p">...</span>
</code></pre></div></div>
<p class="notice">Strictly speaking we didn’t have to do that, as there is always a default graph and we could just use it. But where is fun in that?</p>
<p>Whatever comes in <code class="language-plaintext highlighter-rouge">with graph.as_default():</code> block defines our graph: all of the graph variables and their relations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">graph</span><span class="p">.</span><span class="n">as_default</span><span class="p">():</span>
<span class="c1"># Input data. For the training data, we use a placeholder that will be fed at run time with a training minibatch.
</span> <span class="n">tf_x_batch</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">shape</span> <span class="o">=</span> <span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">image_size</span> <span class="o">*</span> <span class="n">image_size</span><span class="p">))</span>
<span class="n">tf_y_batch</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">shape</span> <span class="o">=</span> <span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">num_keypoints</span><span class="p">))</span>
<span class="c1"># Training computation.
</span> <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="n">model_variable_scope</span><span class="p">):</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="n">model_pass</span><span class="p">(</span><span class="n">tf_x_batch</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">square</span><span class="p">(</span><span class="n">predictions</span> <span class="o">-</span> <span class="n">tf_y_batch</span><span class="p">))</span>
<span class="c1"># Optimizer.
</span> <span class="n">optimizer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">train</span><span class="p">.</span><span class="n">MomentumOptimizer</span><span class="p">(</span>
<span class="n">learning_rate</span> <span class="o">=</span> <span class="n">learning_rate</span><span class="p">,</span>
<span class="n">momentum</span> <span class="o">=</span> <span class="n">momentum</span><span class="p">,</span>
<span class="n">use_nesterov</span> <span class="o">=</span> <span class="bp">True</span>
<span class="p">).</span><span class="n">minimize</span><span class="p">(</span><span class="n">loss</span><span class="p">)</span>
</code></pre></div></div>
<p>Here we define a couple of <code class="language-plaintext highlighter-rouge">tf.placeholder</code>s — these are not variables per se, they are, well, just placeholders. They are not trainable, and we don’t need to initialise them during graph build time. Instead, we provide what’s going to be in them during run time, while evaluating portions of our graph. Here we will use them to feed model with training examples in batches, and those examples will, of course, change after every weights update. <strong>Note that we don’t explicitly specify batch size during graph build time, and instead use <code class="language-plaintext highlighter-rouge">None</code> as the first dimension of placeholders’ shapes.</strong></p>
<p>We then define computation of model predictions and loss, create an optimiser for our model and off we go!</p>
<p>Now we need to run that graph using <code class="language-plaintext highlighter-rouge">tf.Session</code> object. Every session has a graph, so we specify one when initialising our session. Also, before doing any computation you need to initialise all graph variables by running <code class="language-plaintext highlighter-rouge">tf.global_variables_initializer()</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">Session</span><span class="p">(</span><span class="n">graph</span> <span class="o">=</span> <span class="n">graph</span><span class="p">)</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
<span class="c1"># Initialise all variables in the graph
</span> <span class="n">session</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">global_variables_initializer</span><span class="p">())</span>
<span class="p">...</span>
</code></pre></div></div>
<p>Once we are in the scope of initialised session we can actually perform the training procedure:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_epochs</span><span class="p">):</span>
<span class="c1"># Train on whole randomised dataset in batches
</span> <span class="n">batch_iterator</span> <span class="o">=</span> <span class="n">BatchIterator</span><span class="p">(</span><span class="n">batch_size</span> <span class="o">=</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">shuffle</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">x_batch</span><span class="p">,</span> <span class="n">y_batch</span> <span class="ow">in</span> <span class="n">batch_iterator</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">):</span>
<span class="n">session</span><span class="p">.</span><span class="n">run</span><span class="p">([</span><span class="n">optimizer</span><span class="p">],</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">tf_x_batch</span> <span class="p">:</span> <span class="n">x_batch</span><span class="p">,</span>
<span class="n">tf_y_batch</span> <span class="p">:</span> <span class="n">y_batch</span>
<span class="p">}</span>
<span class="p">)</span>
</code></pre></div></div>
<p>What happens here is that we ask the session to evaluate <code class="language-plaintext highlighter-rouge">optimizer</code>, which will implicitly run a sub-graph containing every variable that <code class="language-plaintext highlighter-rouge">optimizer</code> uses. From definition you can see it uses <code class="language-plaintext highlighter-rouge">loss</code> (value it is optimising), which in its turn uses <code class="language-plaintext highlighter-rouge">predictions</code>, etc. We also provide values that should be put in our data feeding <code class="language-plaintext highlighter-rouge">tf.placeholder</code>s by providing <code class="language-plaintext highlighter-rouge">feed_dict</code> parameter. This means that by the time computation of the path leading to <code class="language-plaintext highlighter-rouge">optimizer</code> begins, <code class="language-plaintext highlighter-rouge">tf_x_batch</code> and <code class="language-plaintext highlighter-rouge">tf_y_batch</code> placeholders would be holding <code class="language-plaintext highlighter-rouge">x_batch</code> and <code class="language-plaintext highlighter-rouge">y_batch</code> values respectively.</p>
<p>When training finishes we need to run our trained model on the testing data. We do this within the scope of the same <code class="language-plaintext highlighter-rouge">session</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Evaluate on test dataset (also in batches).
</span><span class="n">batch_iterator</span> <span class="o">=</span> <span class="n">BatchIterator</span><span class="p">(</span><span class="n">batch_size</span> <span class="o">=</span> <span class="mi">128</span><span class="p">)</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">x_batch</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">batch_iterator</span><span class="p">(</span><span class="n">x_test</span><span class="p">):</span>
<span class="p">[</span><span class="n">p_batch</span><span class="p">]</span> <span class="o">=</span> <span class="n">session</span><span class="p">.</span><span class="n">run</span><span class="p">([</span><span class="n">predictions</span><span class="p">],</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">tf_x_batch</span> <span class="p">:</span> <span class="n">x_batch</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="n">predictions</span><span class="p">.</span><span class="n">extend</span><span class="p">(</span><span class="n">p_batch</span><span class="p">)</span>
<span class="n">test_loss</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">square</span><span class="p">(</span><span class="n">predictions</span> <span class="o">-</span> <span class="n">y_test</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">" Test score: %.3f (loss = %.8f)"</span> <span class="o">%</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">test_loss</span><span class="p">)</span> <span class="o">*</span> <span class="mf">48.0</span><span class="p">,</span> <span class="n">test_loss</span><span class="p">))</span>
</code></pre></div></div>
<p>Since this operation would be performed on GPU by default (if you’re running a GPU version of TensorFlow), you may bump into your GPU’s memory limitations, therefore I’m suggesting batching your testing data as well.</p>
<p class="notice">I’m still using a tiny bit of Lasagne here, more specifically its <code class="language-plaintext highlighter-rouge">BatchIterator</code>. Further in tutorial Daniel uses this <code class="language-plaintext highlighter-rouge">BatchIterator</code> for data augmentation and it fits perfectly into the workflow. Also, as far as I’m aware TensorFlow lacks a similar plug-and-play component for iterating over data in batches, and one would have to define their own <code class="language-plaintext highlighter-rouge">tf.train.Example</code> type and setup a pipeline for <code class="language-plaintext highlighter-rouge">tf.TFRecordReader</code>, feeding it to the model with a <code class="language-plaintext highlighter-rouge">tf.train.QueueRunner</code>. Although this seems like a bucket of joy, I thought I would go with a plain vanilla <code class="language-plaintext highlighter-rouge">BatchIterator</code>, and concentrate on building a model instead. Data feeding in TensorFlow seems to be a broad topic, and would make a good article on its own!</p>
<p>As you see we only supply <code class="language-plaintext highlighter-rouge">tf_x_batch</code> value in the <code class="language-plaintext highlighter-rouge">feed_dict</code>, since we only evaluate <code class="language-plaintext highlighter-rouge">predictions</code> variable here, and its path in the graph does not involve <code class="language-plaintext highlighter-rouge">tf_y_batch</code> — we are not calculating <code class="language-plaintext highlighter-rouge">loss</code> as a part of this computation after all.</p>
<p>One of the neat Lasagne features is keeping track of training history by logging validation and training losses. As far as I’m aware TensorFlow doesn’t do that for you, so we will have to come up with some other solution.</p>
<p class="notice">One might be tempted to use <code class="language-plaintext highlighter-rouge">tf.train.SummaryWriter</code>s and visualise data using <code class="language-plaintext highlighter-rouge">TensorBoard</code>, and actually that’s exactly what I did at first. I even managed to plot training and validation losses on the same graph and overcome a couple of other issues, but in the end <code class="language-plaintext highlighter-rouge">tf.train.SummaryWriter</code> seemed to slow down training process quite a bit. I’m not sure if it was due to me not using it correctly, or it’s just the way it works, but I got much better results in terms of speed using simple arrays, saving them to disk and plotting losses with <code class="language-plaintext highlighter-rouge">matplotlib</code>.</p>
<p>First let’s refactor out part where we evaluate model on the testing dataset into a function. We’re going to use it quite a lot, as the plan is to periodically get validation and training datasets predictions during training and logging those losses:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_predictions_in_batches</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">session</span><span class="p">):</span>
<span class="s">"""
Calculates predictions in batches of 128 examples at a time, using `session`'s calculation graph.
"""</span>
<span class="n">p</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">batch_iterator</span> <span class="o">=</span> <span class="n">BatchIterator</span><span class="p">(</span><span class="n">batch_size</span> <span class="o">=</span> <span class="mi">128</span><span class="p">)</span>
<span class="k">for</span> <span class="n">x_batch</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">batch_iterator</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="p">[</span><span class="n">p_batch</span><span class="p">]</span> <span class="o">=</span> <span class="n">session</span><span class="p">.</span><span class="n">run</span><span class="p">([</span><span class="n">predictions</span><span class="p">],</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">tf_x_batch</span> <span class="p">:</span> <span class="n">x_batch</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="n">p</span><span class="p">.</span><span class="n">extend</span><span class="p">(</span><span class="n">p_batch</span><span class="p">)</span>
<span class="k">return</span> <span class="n">p</span>
</code></pre></div></div>
<p>This function here is just a convenient way of getting predictions for a dataset on the model’s weights it has learned so far.</p>
<p>Now let’s add a couple of arrays: <code class="language-plaintext highlighter-rouge">train_loss_history</code> and <code class="language-plaintext highlighter-rouge">valid_loss_history</code> for keeping track of training and validation losses respectively. Let’s rewrite our training code as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calc_loss</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">labels</span><span class="p">):</span>
<span class="s">"""
Squared mean error for given predictions.
"""</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">square</span><span class="p">(</span><span class="n">predictions</span> <span class="o">-</span> <span class="n">labels</span><span class="p">))</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">Session</span><span class="p">(</span><span class="n">graph</span> <span class="o">=</span> <span class="n">graph</span><span class="p">)</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
<span class="n">tf</span><span class="p">.</span><span class="n">initialize_all_variables</span><span class="p">().</span><span class="n">run</span><span class="p">()</span>
<span class="n">train_loss_history</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_epochs</span><span class="p">)</span>
<span class="n">valid_loss_history</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_epochs</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"============ TRAINING ============="</span><span class="p">)</span>
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_epochs</span><span class="p">):</span>
<span class="c1"># Train on whole randomised dataset in batches
</span> <span class="n">batch_iterator</span> <span class="o">=</span> <span class="n">BatchIterator</span><span class="p">(</span><span class="n">batch_size</span> <span class="o">=</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">shuffle</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">x_batch</span><span class="p">,</span> <span class="n">y_batch</span> <span class="ow">in</span> <span class="n">batch_iterator</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">):</span>
<span class="n">session</span><span class="p">.</span><span class="n">run</span><span class="p">([</span><span class="n">optimizer</span><span class="p">],</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">tf_x_batch</span> <span class="p">:</span> <span class="n">x_batch</span><span class="p">,</span>
<span class="n">tf_y_batch</span> <span class="p">:</span> <span class="n">y_batch</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="c1"># Another epoch ended, let's log our losses.
</span> <span class="c1"># Get training data predictions and log training loss:
</span> <span class="n">train_loss</span> <span class="o">=</span> <span class="n">calc_loss</span><span class="p">(</span>
<span class="n">get_predictions_in_batches</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">session</span><span class="p">),</span>
<span class="n">y_train</span>
<span class="p">)</span>
<span class="n">train_loss_history</span><span class="p">[</span><span class="n">epoch</span><span class="p">]</span> <span class="o">=</span> <span class="n">train_loss</span>
<span class="c1"># Get validation data predictions and log validation loss:
</span> <span class="n">valid_loss</span> <span class="o">=</span> <span class="n">calc_loss</span><span class="p">(</span>
<span class="n">get_predictions_in_batches</span><span class="p">(</span><span class="n">x_valid</span><span class="p">,</span> <span class="n">session</span><span class="p">),</span>
<span class="n">y_valid</span>
<span class="p">)</span>
<span class="n">valid_loss_history</span><span class="p">[</span><span class="n">epoch</span><span class="p">]</span> <span class="o">=</span> <span class="n">valid_loss</span>
<span class="k">if</span> <span class="p">(</span><span class="n">epoch</span> <span class="o">%</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"--------- EPOCH %4d/%d ---------"</span> <span class="o">%</span> <span class="p">(</span><span class="n">epoch</span><span class="p">,</span> <span class="n">num_epochs</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">" Train loss: %.8f"</span> <span class="o">%</span> <span class="p">(</span><span class="n">train_loss</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Validation loss: %.8f"</span> <span class="o">%</span> <span class="p">(</span><span class="n">valid_loss</span><span class="p">))</span>
<span class="c1"># Evaluate on test dataset.
</span> <span class="n">test_loss</span> <span class="o">=</span> <span class="n">calc_loss</span><span class="p">(</span>
<span class="n">get_predictions_in_batches</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">session</span><span class="p">),</span>
<span class="n">y_test</span>
<span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"==================================="</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">" Test score: %.3f (loss = %.8f)"</span> <span class="o">%</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">test_loss</span><span class="p">)</span> <span class="o">*</span> <span class="mf">48.0</span><span class="p">,</span> <span class="n">test_loss</span><span class="p">))</span>
<span class="n">np</span><span class="p">.</span><span class="n">savez</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">()</span> <span class="o">+</span> <span class="s">"/train_history"</span><span class="p">,</span> <span class="n">train_loss_history</span> <span class="o">=</span> <span class="n">train_loss_history</span><span class="p">,</span> <span class="n">valid_loss_history</span> <span class="o">=</span> <span class="n">valid_loss_history</span><span class="p">)</span>
</code></pre></div></div>
<p>You can now load training history from file and use Daniel’s code to plot learning curves and see how your model is performing.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_history</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">()</span> <span class="o">+</span> <span class="s">"/train_history.npz"</span><span class="p">)</span>
<span class="n">train_loss</span> <span class="o">=</span> <span class="n">model_history</span><span class="p">[</span><span class="s">"train_loss_history"</span><span class="p">]</span>
<span class="n">valid_loss</span> <span class="o">=</span> <span class="n">model_history</span><span class="p">[</span><span class="s">"valid_loss_history"</span><span class="p">]</span>
<span class="n">x_axis</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">num_epochs</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_axis</span><span class="p">,</span> <span class="n">train_loss</span><span class="p">,</span> <span class="s">"b-"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"train"</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_axis</span><span class="p">,</span> <span class="n">valid_loss</span><span class="p">,</span> <span class="s">"g-"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"valid"</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">grid</span><span class="p">()</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">"epoch"</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">"loss"</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">ylim</span><span class="p">(</span><span class="mf">0.0005</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">xlim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_epochs</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">yscale</span><span class="p">(</span><span class="s">"log"</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p class="notice">You may want to only log losses every, say, 5 or 10 epochs, as evaluating on the whole training set does take a while. However, you may need validation loss later on in order to implement early stopping.</p>
<h2 id="second-model-convolutions">Second model: convolutions.</h2>
<p>In the second model we will add convolutions, which should improve model performance significantly. Let’s declare a couple of additional convenience functions:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">conv_relu</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="n">depth</span><span class="p">):</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'weights'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="n">kernel_size</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="nb">input</span><span class="p">.</span><span class="n">get_shape</span><span class="p">()[</span><span class="mi">3</span><span class="p">],</span> <span class="n">depth</span><span class="p">],</span>
<span class="n">initializer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">contrib</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">xavier_initializer</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">biases</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">get_variable</span><span class="p">(</span> <span class="s">'biases'</span><span class="p">,</span>
<span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="n">depth</span><span class="p">],</span>
<span class="n">initializer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">constant_initializer</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">conv</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">conv2d</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">weights</span><span class="p">,</span>
<span class="n">strides</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">padding</span><span class="o">=</span><span class="s">'SAME'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">conv</span> <span class="o">+</span> <span class="n">biases</span><span class="p">)</span>
</code></pre></div></div>
<p>This one will perform a convolutional layer pass followed by a rectified linear unit (since usually those two are applied together). As you see we’re using <code class="language-plaintext highlighter-rouge">tf.get_variable()</code> again, so we can reuse this function with different layers by simply providing variable scope. Let’s add a couple of other helper functions to make encoding of our model architecture easier:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fully_connected_relu</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">fully_connected</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">pool</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">max_pool</span><span class="p">(</span>
<span class="nb">input</span><span class="p">,</span>
<span class="n">ksize</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">strides</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">padding</span><span class="o">=</span><span class="s">'SAME'</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Ok, with these routines we can now encode our full model pass.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">model_pass</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">training</span><span class="p">):</span>
<span class="c1"># Convolutional layers
</span> <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'conv1'</span><span class="p">):</span>
<span class="n">conv1</span> <span class="o">=</span> <span class="n">conv_relu</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">kernel_size</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="n">depth</span> <span class="o">=</span> <span class="mi">32</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'pool1'</span><span class="p">):</span>
<span class="n">pool1</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">conv1</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># Apply dropout if needed
</span> <span class="n">pool1</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">pool1</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="mf">0.9</span> <span class="k">if</span> <span class="n">dropout</span> <span class="k">else</span> <span class="mf">1.0</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">pool1</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'conv2'</span><span class="p">):</span>
<span class="n">conv2</span> <span class="o">=</span> <span class="n">conv_relu</span><span class="p">(</span><span class="n">pool1</span><span class="p">,</span> <span class="n">kernel_size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="n">depth</span> <span class="o">=</span> <span class="mi">64</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'pool2'</span><span class="p">):</span>
<span class="n">pool2</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">conv2</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># Apply dropout if needed
</span> <span class="n">pool2</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">pool2</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="mf">0.8</span> <span class="k">if</span> <span class="n">dropout</span> <span class="k">else</span> <span class="mf">1.0</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">pool2</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'conv3'</span><span class="p">):</span>
<span class="n">conv3</span> <span class="o">=</span> <span class="n">conv_relu</span><span class="p">(</span><span class="n">pool2</span><span class="p">,</span> <span class="n">kernel_size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="n">depth</span> <span class="o">=</span> <span class="mi">128</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'pool3'</span><span class="p">):</span>
<span class="n">pool3</span> <span class="o">=</span> <span class="n">pool</span><span class="p">(</span><span class="n">conv3</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># Apply dropout if needed
</span> <span class="n">pool3</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">pool3</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="mf">0.7</span> <span class="k">if</span> <span class="n">dropout</span> <span class="k">else</span> <span class="mf">1.0</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">pool3</span><span class="p">)</span>
<span class="c1"># Flatten convolutional layers output
</span> <span class="n">shape</span> <span class="o">=</span> <span class="n">pool3</span><span class="p">.</span><span class="n">get_shape</span><span class="p">().</span><span class="n">as_list</span><span class="p">()</span>
<span class="n">flattened</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">pool3</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="n">shape</span><span class="p">[</span><span class="mi">3</span><span class="p">]])</span>
<span class="c1"># Fully connected layers
</span> <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'fc4'</span><span class="p">):</span>
<span class="n">fc4</span> <span class="o">=</span> <span class="n">fully_connected_relu</span><span class="p">(</span><span class="n">flattened</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">)</span>
<span class="c1"># Apply dropout if needed
</span> <span class="n">fc4</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">fc4</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="k">if</span> <span class="n">dropout</span> <span class="k">else</span> <span class="mf">1.0</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">fc4</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'fc5'</span><span class="p">):</span>
<span class="n">fc5</span> <span class="o">=</span> <span class="n">fully_connected_relu</span><span class="p">(</span><span class="n">fc4</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'out'</span><span class="p">):</span>
<span class="n">prediction</span> <span class="o">=</span> <span class="n">fully_connected</span><span class="p">(</span><span class="n">fc5</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">num_keypoints</span><span class="p">)</span>
<span class="k">return</span> <span class="n">prediction</span>
</code></pre></div></div>
<p>Please note those weird assignments:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fc4</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">fc4</span><span class="p">,</span> <span class="n">keep_prob</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="k">if</span> <span class="n">dropout</span> <span class="k">else</span> <span class="mf">1.0</span><span class="p">),</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">fc4</span><span class="p">)</span>
</code></pre></div></div>
<p>Let’s break it down a bit.</p>
<p>First we calculate <code class="language-plaintext highlighter-rouge">0.5 if dropout else 1.0</code>, which means that we conditionally apply dropout, if <code class="language-plaintext highlighter-rouge">dropout</code> flag is set to <code class="language-plaintext highlighter-rouge">True</code>. This is done so that later you could compare how same model performs with and without dropout.</p>
<p>Furthermore, we only want to apply dropout while training, and not while evaluating our model, that’s why we put the assignment into a <code class="language-plaintext highlighter-rouge">tf.cond(training, lambda: ..., lambda: fc4)</code> block. It means that if <code class="language-plaintext highlighter-rouge">training</code> (being a <code class="language-plaintext highlighter-rouge">tf.Variable</code>) is <code class="language-plaintext highlighter-rouge">True</code>, we will apply dropout, and will simply assign <code class="language-plaintext highlighter-rouge">fc4</code> to itself otherwise.</p>
<p>Also note that we have to manually flatten convolutional layers’ output before passing it over to fully connected layers.</p>
<p>A couple of new things you may notice in the TensorFlow graph are <code class="language-plaintext highlighter-rouge">is_training</code> flag, learning rate decay and momentum increase. The <code class="language-plaintext highlighter-rouge">is_training</code> flag is another TensorFlow placeholder we use to indicate if we’re training or evaluating. In latter case model pipeline function won’t apply dropout. I implemented momentum increase in plain Python by checking how far have we gone into the maximum number of epochs. As for learning rate decay, there is a nice TensorFlow function for that: <code class="language-plaintext highlighter-rouge">tf.train.exponential_decay()</code> lets you do exactly that, providing number of decay steps and decay rate.</p>
<p>The rest should be familiar from the first notebook. Please, mind that some optimisation options are defined as flags (for instance, <code class="language-plaintext highlighter-rouge">data_augmentation</code>, <code class="language-plaintext highlighter-rouge">learning_rate_decay</code>, etc.) which are encoded into model name. This is done so that you could compare performance with different optimisation techniques applied. Just provide the name of the model as a parameter of <code class="language-plaintext highlighter-rouge">plot_learning_curves()</code> method and learning curves of that model would be drawn on top of the current plot:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">new_model_epochs</span> <span class="o">=</span> <span class="n">plot_learning_curves</span><span class="p">()</span>
<span class="n">old_model_epochs</span> <span class="o">=</span> <span class="n">plot_learning_curves</span><span class="p">(</span><span class="s">"1fc_b36_e1000"</span><span class="p">,</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">ylim</span><span class="p">(</span><span class="mf">0.001</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">xlim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">max</span><span class="p">(</span><span class="n">new_model_epochs</span><span class="p">,</span> <span class="n">old_model_epochs</span><span class="p">))</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<h2 id="third-model-training-specialists">Third model: training specialists.</h2>
<p>The third notebook implements the most advanced model of this tutorial: training specialists for groups of facial keypoints. It also covers another great technique for battling overfitting: <em>early stopping</em>. <code class="language-plaintext highlighter-rouge">EarlyStopping</code> class from Daniel’s tutorial requires one crucial modification when working with TensorFlow: in order to save and restore trained weights we need a reference to current TensorFlow session and <code class="language-plaintext highlighter-rouge">tf.train.Saver</code> object. TensorFlow <code class="language-plaintext highlighter-rouge">Saver</code> is doing exactly what you would expect it to do: lets you easily save and restore variables from your session graph — for instance, trained weights. You simply call <code class="language-plaintext highlighter-rouge">save()</code> to save current weights to a checkpoint file, or <code class="language-plaintext highlighter-rouge">restore()</code> in order to load those weights into your session’s graph.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">saver</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">checkpoint_path</span><span class="p">)</span>
<span class="c1"># Weights are now saved in a file located at `checkpoint_path`.
</span>
<span class="n">saver</span><span class="p">.</span><span class="n">restore</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">checkpoint_path</span><span class="p">)</span>
<span class="c1"># Saved weights are loaded into corresponding variables again.
</span></code></pre></div></div>
<p>As easy as that! One thing to consider is that when restoring session your graph (e.g. variables’s names and relations) is expected to be exactly the same as it was during saving, so that <code class="language-plaintext highlighter-rouge">saver</code> knows which weights to load where. The easiest way to do so is saving and restoring the graph with <code class="language-plaintext highlighter-rouge">tf.train.export_meta_graph</code> and <code class="language-plaintext highlighter-rouge">tf.train.import_meta_graph</code> functions.</p>
<p>However, what if your graph is <em>not</em> the same? Well, actually we run into exactly this problem when reusing previously trained model as a specialist. The idea is that we initialise weights for each specialist with values from a pre-trained model (the one we implemented in notebook #2 — <em>3con_2fc_b36_e1000_aug_lrdec_mominc_dr</em>). Unfortunately, the graph for a single specialist is not going to be the same due to a different shape of the <code class="language-plaintext highlighter-rouge">out</code> layer, e.g. number of keypoints the model provides as an output. Also, we are using a different variable scope. In order to fix that we do the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">spec_var_scope</span> <span class="o">=</span> <span class="s">"specialist_variable_scope"</span>
<span class="n">initialising_model</span> <span class="o">=</span> <span class="s">"3con_2fc_b36_e1000_aug_lrdec_mominc_dr"</span>
<span class="c1"># Exclude output layer weights from variables we will restore
</span><span class="n">variables_to_restore</span> <span class="o">=</span> <span class="p">[</span><span class="n">v</span> <span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">tf</span><span class="p">.</span><span class="n">global_variables</span><span class="p">()</span> <span class="k">if</span> <span class="s">"/out/"</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">v</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">name</span><span class="p">]</span>
<span class="c1"># Replace variables scope with that of the current model
</span><span class="n">loader</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">train</span><span class="p">.</span><span class="n">Saver</span><span class="p">({</span><span class="n">v</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="n">spec_var_scope</span><span class="p">,</span> <span class="n">initialising_model</span><span class="p">):</span> <span class="n">v</span> <span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">variables_to_restore</span><span class="p">})</span>
<span class="n">loader</span><span class="p">.</span><span class="n">restore</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="s">"/3con_2fc_b36_e1000_aug_lrdec_mominc_dr/model.ckpt"</span><span class="p">)</span>
</code></pre></div></div>
<p>By default <code class="language-plaintext highlighter-rouge">tf.train.Saver</code> will restore all variables you have in your graph. However, you can provide a list of variables to be restored — that’s how we are going to exclude output layer weights from the list of the values we are restoring. Important thing to remember is that variable scope is pretty much a namespace that is encoded in the variable name, separated by slashes (<code class="language-plaintext highlighter-rouge">/</code>). That’s why we simply filter out variables with <code class="language-plaintext highlighter-rouge">/out/</code> in their names from all variables in the graph:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">variables_to_restore</span> <span class="o">=</span> <span class="p">[</span><span class="n">v</span> <span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">tf</span><span class="p">.</span><span class="n">all_variables</span><span class="p">()</span> <span class="k">if</span> <span class="s">"/out/"</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">v</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">name</span><span class="p">]</span>
</code></pre></div></div>
<p>Next thing to do is updating the variable scope. This is done, again, by simply updating variables’ names, e.g. by replacing occurences of the old model name with your current variable scope:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...{v.op.name.replace(spec_var_scope, initialising_model): v for v in variables_to_restore}
</code></pre></div></div>
<p>Let’s assume this is what your saved model looks like. Say, it has a graph with the following variables:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">3con_2fc_b36_e1000_aug_lrdec_mominc_dr/fc4/weights</code></li>
<li><code class="language-plaintext highlighter-rouge">3con_2fc_b36_e1000_aug_lrdec_mominc_dr/fc4/biases</code></li>
<li><code class="language-plaintext highlighter-rouge">3con_2fc_b36_e1000_aug_lrdec_mominc_dr/out/weights</code></li>
<li><code class="language-plaintext highlighter-rouge">3con_2fc_b36_e1000_aug_lrdec_mominc_dr/out/biases</code></li>
</ul>
<p>After we apply our transformations it is converted to:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">specialist_variable_scope/fc4/weights</code></li>
<li><code class="language-plaintext highlighter-rouge">specialist_variable_scope/fc4/biases</code></li>
</ul>
<p>And <code class="language-plaintext highlighter-rouge">.../out/weights</code> and <code class="language-plaintext highlighter-rouge">.../out/biases</code> are gone, since they had <code class="language-plaintext highlighter-rouge">/out/</code> in their names.</p>
<p>You can now plot learning curves for each of the speclialists, and, as Daniel suggests, explore ways of improving your model even further. As he points out, some specialists overfit more than the others, so there might be sense in using different dropout values for each of them. One might want to also experiment with additional regularisation techniques, like L2 loss, and probably take some further steps with data augmentation.</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau" data-style="mega" data-count-href="/navoshta/followers" data-count-api="/users/navoshta#followers" data-count-aria-label="# followers on GitHub" aria-label="Follow @alexstaravoitau on GitHub">Follow @alexstaravoitau</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/kaggle-facial-keypoints-detection" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/kaggle-facial-keypoints-detection/stargazers" data-count-api="/repos/navoshta/kaggle-facial-keypoints-detection#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/kaggle-facial-keypoints-detection on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/kaggle-facial-keypoints-detection/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/kaggle-facial-keypoints-detection/network" data-count-api="/repos/navoshta/kaggle-facial-keypoints-detection#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/kaggle-facial-keypoints-detection on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/kaggle-facial-keypoints-detection/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/kaggle-facial-keypoints-detection on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>Alex StaravoitauThis is a TensorFlow follow-along for an amazing Deep Learning tutorial by Daniel Nouri. Daniel describes ways of approaching a computer vision problem of detecting facial keypoints in an image using various deep learning techniques, while these techniques gradually build upon each other, demonstrating advantages and limitations of each.Cloud logger2016-12-25T00:00:00+00:002016-12-25T00:00:00+00:00//navoshta.com/cloud-log<p>Most of the tasks in data science are long-running, and many folks (me included) execute those tasks on <a href="http://navoshta.com/meet-fenton/">remote machines</a>. And the crucial thing for those tasks is logging: you do need to know how training process was going and see the learning curves. It would also be convenient if you could access those logs from anywhere and be notified when the process had finished. So I built the <code class="language-plaintext highlighter-rouge">cloudlog</code>!<!--more--></p>
<h2 id="cloudlog"><code class="language-plaintext highlighter-rouge">cloudlog</code></h2>
<p><code class="language-plaintext highlighter-rouge">cloudlog</code> is a very simple Python logger that duplicates your console logs to a local file, saves a copy safely in the cloud, and can as well notify you via messenger bot. And it can do all those things with <code class="language-plaintext highlighter-rouge">pyplot</code> plots as well! For cloud service I went with <strong>Dropbox</strong>, as it’s easy to integrate and can be accessed from any device such as your phone. For messenger I chose <strong>Telegram</strong>, being a huge fan of the platform.</p>
<h3 id="how-to-use">How to use</h3>
<ul>
<li>Install package:</li>
</ul>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>cloudlog
</code></pre></div></div>
<ul>
<li>Import <code class="language-plaintext highlighter-rouge">CloudLog</code> class:</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">cloudlog</span> <span class="kn">import</span> <span class="n">CloudLog</span>
</code></pre></div></div>
<ul>
<li>Log text by simply calling a <code class="language-plaintext highlighter-rouge">CloudLog</code> instance:</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span> <span class="o">=</span> <span class="n">CloudLog</span><span class="p">(</span><span class="n">root_path</span><span class="o">=</span><span class="s">'~/logs'</span><span class="p">))</span>
<span class="n">log</span><span class="p">(</span><span class="s">'Some important stuff happening.'</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">'And again!'</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">'Luckily, it</span><span class="se">\'</span><span class="s">s all safe now in a local file.'</span><span class="p">)</span>
</code></pre></div></div>
<ul>
<li>Add <code class="language-plaintext highlighter-rouge">pyplot</code> plots as images in the same folder:</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span>
<span class="c1"># Draw a plot
</span><span class="n">x</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'Amount of logs'</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Coolness of your app'</span><span class="p">)</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># Call it before calling `pyplot.show()`.
</span><span class="n">log</span><span class="p">.</span><span class="n">add_plot</span><span class="p">()</span>
<span class="n">pyplot</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<h3 id="dropbox">Dropbox</h3>
<p>In order to sync your logs and plots to Dropbox, do the following.</p>
<ul>
<li><a href="https://www.dropbox.com/developers/apps/create">Create a Dropbox app</a> with <code class="language-plaintext highlighter-rouge">App folder</code> access type.</li>
<li>Get your Dropbox access token and provide it in initialiser.</li>
<li>Call <code class="language-plaintext highlighter-rouge">sync()</code> in order to dispatch log file to your Dropbox app folder.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span> <span class="o">=</span> <span class="n">CloudLog</span><span class="p">(</span><span class="n">root_path</span><span class="o">=</span><span class="s">'~/logs'</span><span class="p">,</span> <span class="n">dropbox_token</span><span class="o">=</span><span class="s">'YOUR_DROPBOX_TOKEN_HERE'</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">'Some important stuff happening again.'</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">'Luckily, it</span><span class="se">\'</span><span class="s">s all safe now. In the cloud!'</span><span class="p">)</span>
<span class="n">log</span><span class="p">.</span><span class="n">sync</span><span class="p">()</span>
</code></pre></div></div>
<p>Plots are being synced to Dropbox folder by default.</p>
<h3 id="telegram">Telegram</h3>
<p>You may as well get notifications in a Telegram chat, with logs and plots being sent to you.</p>
<ul>
<li><a href="https://core.telegram.org/bots#creating-a-new-bot">Create a Telegram bot</a>.</li>
<li>Get your Telegram Bot API access token</li>
<li><a href="http://stackoverflow.com/a/32777943/300131">Find out your Telegram chat or user ID</a>.</li>
<li>Provide both values in the initialiser.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span> <span class="o">=</span> <span class="n">CloudLog</span><span class="p">(</span><span class="n">root_path</span><span class="o">=</span><span class="s">'~/logs'</span><span class="p">,</span> <span class="n">telegram_token</span><span class="o">=</span><span class="s">'YOUR_TELEGRAM_TOKEN'</span><span class="p">,</span> <span class="n">telegram_chat_id</span><span class="o">=</span><span class="s">'CHAT_ID'</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">'Some important stuff once more.'</span><span class="p">)</span>
<span class="n">log</span><span class="p">(</span><span class="s">'Luckily, it</span><span class="se">\'</span><span class="s">s all safe now in a local file. AND you</span><span class="se">\'</span><span class="s">re notified — how cool is that?'</span><span class="p">)</span>
<span class="n">log</span><span class="p">.</span><span class="n">sync</span><span class="p">(</span><span class="n">notify</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">message</span><span class="o">=</span><span class="s">'I</span><span class="se">\'</span><span class="s">m pregnant.'</span><span class="p">)</span>
</code></pre></div></div>
<p>Specify the same <code class="language-plaintext highlighter-rouge">notify</code> flag for plots for them to be sent to a Telegram chat as well:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span>
<span class="n">log</span><span class="p">.</span><span class="n">add_plot</span><span class="p">(</span><span class="n">notify</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<p>Since one may be tempted to dispatch a bunch of updates at the same time, the user will not be notified about messages containing files, such as plots and logs — only about the <code class="language-plaintext highlighter-rouge">message</code> passed to <code class="language-plaintext highlighter-rouge">sync()</code> method.</p>
<p>There you go! Your remote machine will now not only safely store your logs in the cloud, providing easy access from anywhere, but will as well send you notification with a full report.</p>
<table border="">
<tr>
<td><img src="/images/posts/cloudlog/cloudlog_screenshot_1.jpg" alt="Messenger notification." /></td>
<td><img src="/images/posts/cloudlog/cloudlog_screenshot_2.jpg" alt="Training report in your chat." /></td>
</tr>
</table>
<p>You could have guessed that <strong>Fenton</strong> is the name of <a href="http://navoshta.com/meet-fenton/">my remote machine</a>, of course!</p>
<!-- Place this tag where you want the button to render. -->
<p><a class="github-button" href="https://github.com/alexstaravoitau" data-style="mega" data-count-href="/navoshta/followers" data-count-api="/users/navoshta#followers" data-count-aria-label="# followers on GitHub" aria-label="Follow @alexstaravoitau on GitHub">Follow @alexstaravoitau</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/cloudlog" data-icon="octicon-star" data-style="mega" data-count-href="/navoshta/cloudlog/stargazers" data-count-api="/repos/navoshta/cloudlog#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star navoshta/cloudlog on GitHub">Star</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/cloudlog/fork" data-icon="octicon-repo-forked" data-style="mega" data-count-href="/navoshta/cloudlog/network" data-count-api="/repos/navoshta/cloudlog#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork navoshta/cloudlog on GitHub">Fork</a>
<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/alexstaravoitau/cloudlog/archive/master.zip" data-icon="octicon-cloud-download" data-style="mega" aria-label="Download navoshta/cloudlog on GitHub">Download</a></p>
<!-- Place this tag in your head or just before your close body tag. -->
<script async="" defer="" src="https://buttons.github.io/buttons.js"></script>Alex StaravoitauMost of the tasks in data science are long-running, and many folks (me included) execute those tasks on remote machines. And the crucial thing for those tasks is logging: you do need to know how training process was going and see the learning curves. It would also be convenient if you could access those logs from anywhere and be notified when the process had finished. So I built the cloudlog!