arm_compute v19.02 Change-Id: I853a3ecf38f206da13c1b03640c8adf73c20477c

commit: 514be65ad8d3340f53fd9591035352ed285811ba [log] [tgz]
author: Jenkins <bsgcomp@arm.com> Thu Feb 28 12:25:18 2019 +0000
committer: Anthony Barbier <anthony.barbier@arm.com> Thu Feb 28 13:38:08 2019 +0000
tree: abe236598d76078a537fd247813e287d5bf34acd
parent: 3d2d44ef55ab6b08afda8be48301ce3c55c7bc67 [diff] [blame]
diff --git a/documentation/index.xhtml b/documentation/index.xhtml
index 1cb890b..e1c9012 100644
--- a/documentation/index.xhtml
+++ b/documentation/index.xhtml

@@ -1,10 +1,11 @@
-<!-- HTML header for doxygen 1.8.9.1-->
+<!-- HTML header for doxygen 1.8.15-->
+<!-- Remember to use version doxygen 1.8.15 +-->
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
 <meta http-equiv="X-UA-Compatible" content="IE=9"/>
-<meta name="generator" content="Doxygen 1.8.13"/>
+<meta name="generator" content="Doxygen 1.8.15"/>
 <meta name="robots" content="NOINDEX, NOFOLLOW" /> <!-- Prevent indexing by search engines -->
 <title>Compute Library: Introduction</title>
 <link href="tabs.css" rel="stylesheet" type="text/css"/>
@@ -15,8 +16,9 @@
 <script type="text/javascript" src="navtreedata.js"></script>
 <script type="text/javascript" src="navtree.js"></script>
 <script type="text/javascript">
+/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
   $(document).ready(initResizable);
-</script>
+/* @license-end */</script>
 <link href="search/search.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="search/searchdata.js"></script>
 <script type="text/javascript" src="search/search.js"></script>
@@ -25,8 +27,9 @@
     extensions: ["tex2jax.js"],
     jax: ["input/TeX","output/HTML-CSS"],
 });
-</script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script>
+</script><script type="text/javascript" async="async" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script>
 <link href="doxygen.css" rel="stylesheet" type="text/css" />
+<link href="stylesheet.css" rel="stylesheet" type="text/css"/>
 </head>
 <body>
 <div id="top"><!-- do not remove this div, it is closed by doxygen! -->
@@ -34,9 +37,10 @@
 <table cellspacing="0" cellpadding="0">
  <tbody>
  <tr style="height: 56px;">
+  <img alt="Compute Library" src="https://raw.githubusercontent.com/ARM-software/ComputeLibrary/gh-pages/ACL_logo.png" style="max-width: 100%;margin-top: 15px;margin-left: 10px"/>
   <td style="padding-left: 0.5em;">
-   <div id="projectname">Compute Library
-   &#160;<span id="projectnumber">18.11</span>
+   <div id="projectname">
+   &#160;<span id="projectnumber">19.02</span>
    </div>
   </td>
  </tr>
@@ -44,18 +48,21 @@
 </table>
 </div>
 <!-- end header part -->
-<!-- Generated by Doxygen 1.8.13 -->
+<!-- Generated by Doxygen 1.8.15 -->
 <script type="text/javascript">
+/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
 var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
 </script>
 <script type="text/javascript" src="menudata.js"></script>
 <script type="text/javascript" src="menu.js"></script>
 <script type="text/javascript">
+/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
 $(function() {
   initMenu('',true,false,'search.php','Search');
   $(document).ready(function() { init_search(); });
 });
-</script>
+/* @license-end */</script>
 <div id="main-nav"></div>
 </div><!-- top -->
 <div id="side-nav" class="ui-resizable side-nav-resizable">
@@ -69,7 +76,9 @@
   </div>
 </div>
 <script type="text/javascript">
+/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
 $(document).ready(function(){initNavTree('index.xhtml','');});
+/* @license-end */
 </script>
 <div id="doc-content">
 <!-- window showing the filter options -->
@@ -86,7 +95,7 @@
 </iframe>
 </div>
 
-<div class="header">
+<div class="PageDoc"><div class="header">
   <div class="headertitle">
 <div class="title">Introduction </div>  </div>
 </div><!--header-->
@@ -153,7 +162,7 @@
 <h1><a class="anchor" id="S1_file_organisation"></a>
 File organisation</h1>
 <p>This archive contains:</p><ul>
-<li>The <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> header and source files</li>
+<li>The <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> header and source files</li>
 <li>The latest Khronos OpenCL 1.2 C headers from the <a href="https://www.khronos.org/registry/cl/">Khronos OpenCL registry</a></li>
 <li>The latest Khronos cl2.hpp from the <a href="https://www.khronos.org/registry/cl/">Khronos OpenCL registry</a> (API version 2.1 when this document was written)</li>
 <li>The latest Khronos OpenGL ES 3.1 C headers from the <a href="https://www.khronos.org/registry/gles/">Khronos OpenGL ES registry</a></li>
@@ -324,6 +333,105 @@
 </pre><dl class="section note"><dt>Note</dt><dd>We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.</dd></dl>
 <h2><a class="anchor" id="S2_2_changelog"></a>
 Changelog</h2>
+<p>v19.02 Public major release</p><ul>
+<li>Various bug fixes.</li>
+<li>Various optimisations.</li>
+<li>New Neon kernels / functions:<ul>
+<li><a class="el" href="classarm__compute_1_1_n_e_tile_kernel.xhtml">NETileKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_tile.xhtml">NETile</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_fuse_batch_normalization_kernel.xhtml">NEFuseBatchNormalizationKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_fuse_batch_normalization.xhtml">NEFuseBatchNormalization</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_operation_kernel.xhtml">NEElementwiseOperationKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_max.xhtml">NEElementwiseMax</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_min.xhtml">NEElementwiseMin</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_squared_diff.xhtml">NEElementwiseSquaredDiff</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_select_kernel.xhtml">NESelectKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_select.xhtml">NESelect</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_split.xhtml">NESplit</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_slice.xhtml">NESlice</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_unstack.xhtml">NEUnstack</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_strided_slice_kernel.xhtml">NEStridedSliceKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_strided_slice.xhtml">NEStridedSlice</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_unary_kernel.xhtml">NEElementwiseUnaryKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_rsqrt_layer.xhtml">NERsqrtLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_exp_layer.xhtml">NEExpLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_reverse_kernel.xhtml">NEReverseKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_reverse.xhtml">NEReverse</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_arg_min_max_layer.xhtml">NEArgMinMaxLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_stack_layer_kernel.xhtml">NEStackLayerKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_stack_layer.xhtml">NEStackLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_range_kernel.xhtml">NERangeKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_range.xhtml">NERange</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_pad_layer.xhtml">NEPadLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_memset_kernel.xhtml">NEMemsetKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_gather_kernel.xhtml">NEGatherKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_gather.xhtml">NEGather</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_comparison.xhtml">NEElementwiseComparison</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_comparison_static.xhtml">NEElementwiseComparisonStatic</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_comparison_operation_kernel.xhtml">NEComparisonOperationKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_elementwise_division.xhtml">NEElementwiseDivision</a></li>
+</ul>
+</li>
+<li>New OpenCL kernels / functions:<ul>
+<li><a class="el" href="classarm__compute_1_1_c_l_select_kernel.xhtml">CLSelectKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_select.xhtml">CLSelect</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_tile_kernel.xhtml">CLTileKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_tile.xhtml">CLTile</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_comparison_kernel.xhtml">CLComparisonKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_comparison.xhtml">CLComparison</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_arg_min_max_layer.xhtml">CLArgMinMaxLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_elementwise_max.xhtml">CLElementwiseMax</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_elementwise_min.xhtml">CLElementwiseMin</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_elementwise_squared_diff.xhtml">CLElementwiseSquaredDiff</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_stack_layer_kernel.xhtml">CLStackLayerKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_stack_layer.xhtml">CLStackLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_reverse.xhtml">CLReverse</a> / <a class="el" href="classarm__compute_1_1_c_l_reverse_kernel.xhtml">CLReverseKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_rsqrt_layer.xhtml">CLRsqrtLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_exp_layer.xhtml">CLExpLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_element_wise_unary_layer_kernel.xhtml">CLElementWiseUnaryLayerKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_g_e_m_m_reshape_l_h_s_matrix_kernel.xhtml">CLGEMMReshapeLHSMatrixKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_g_e_m_m_reshape_r_h_s_matrix_kernel.xhtml">CLGEMMReshapeRHSMatrixKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_g_e_m_m_matrix_multiply_reshaped_kernel.xhtml">CLGEMMMatrixMultiplyReshapedKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_range_kernel.xhtml">CLRangeKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_range.xhtml">CLRange</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_unstack.xhtml">CLUnstack</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_gather_kernel.xhtml">CLGatherKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_gather.xhtml">CLGather</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_g_e_m_m_lowp_matrix_multiply_reshaped_kernel.xhtml">CLGEMMLowpMatrixMultiplyReshapedKernel</a></li>
+</ul>
+</li>
+<li>New CPP kernels / functions:<ul>
+<li><a class="el" href="classarm__compute_1_1_c_p_p_detection_output_layer.xhtml">CPPDetectionOutputLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_p_p_top_k_v.xhtml">CPPTopKV</a> / <a class="el" href="classarm__compute_1_1_c_p_p_top_k_v_kernel.xhtml">CPPTopKVKernel</a></li>
+</ul>
+</li>
+<li>Added new examples:<ul>
+<li><a class="el" href="graph__ssd__mobilenet_8cpp.xhtml">graph_ssd_mobilenet.cpp</a></li>
+<li><a class="el" href="graph__mobilenet__v2_8cpp.xhtml">graph_mobilenet_v2.cpp</a></li>
+<li><a class="el" href="graph__resnet12_8cpp.xhtml">graph_resnet12.cpp</a></li>
+<li><a class="el" href="graph__srcnn955_8cpp.xhtml">graph_srcnn955.cpp</a></li>
+<li><a class="el" href="graph__vgg__vdsr_8cpp.xhtml">graph_vgg_vdsr.cpp</a></li>
+<li><a class="el" href="graph__inception__resnet__v1_8cpp.xhtml">graph_inception_resnet_v1.cpp</a></li>
+</ul>
+</li>
+<li>Add 4D tensors support to<ul>
+<li><a class="el" href="classarm__compute_1_1_n_e_softmax_layer.xhtml">NESoftmaxLayer</a></li>
+</ul>
+</li>
+<li>Fused activation in <a class="el" href="classarm__compute_1_1_c_l_winograd_convolution_layer.xhtml">CLWinogradConvolutionLayer</a></li>
+<li>Extented <a class="el" href="classarm__compute_1_1_n_e_permute.xhtml">NEPermute</a> to support more cases</li>
+<li>Added NEON/SVE GEMM Hybrid kernels</li>
+<li>Added u8 and s8 hybrid assembly kernels</li>
+<li>Introduced GEMM strategy name in NEGEMMAssemblyWrapper</li>
+<li>Improved <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml">CLTuner</a></li>
+<li>Fused the bias addition within <a class="el" href="classarm__compute_1_1_c_l_g_e_m_m.xhtml">CLGEMM</a></li>
+<li>Added support for QASYMM8 LOGISTIC activation in <a class="el" href="classarm__compute_1_1_n_e_activation_layer.xhtml">NEActivationLayer</a></li>
+<li>Added NHWC data layout support to:<ul>
+<li><a class="el" href="classarm__compute_1_1_n_e_scale.xhtml">NEScale</a> for F16</li>
+<li><a class="el" href="classarm__compute_1_1_c_l_normalization_layer.xhtml">CLNormalizationLayer</a> IN_MAP_2D for FP32/FP16</li>
+<li><a class="el" href="classarm__compute_1_1_n_e_l2_normalize_layer.xhtml">NEL2NormalizeLayer</a> for FP32/FP16</li>
+<li><a class="el" href="classarm__compute_1_1_n_e_normalization_layer.xhtml">NENormalizationLayer</a> IN_MAP_2D for FP32/FP16</li>
+<li><a class="el" href="classarm__compute_1_1_c_l_r_o_i_align_layer.xhtml">CLROIAlignLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_generate_proposals_layer.xhtml">CLGenerateProposalsLayer</a></li>
+</ul>
+</li>
+<li>Added QASYMM8 support to the following kernels:<ul>
+<li><a class="el" href="classarm__compute_1_1_n_e_arithmetic_addition_kernel.xhtml">NEArithmeticAdditionKernel</a></li>
+<li><a class="el" href="classarm__compute_1_1_n_e_scale.xhtml">NEScale</a></li>
+</ul>
+</li>
+<li>Added new tests and improved validation and benchmarking suites.</li>
+<li>Deprecated functions/interfaces<ul>
+<li>Usage of inner_border_right and inner_border_top has been deprecated in <a class="el" href="classarm__compute_1_1_c_l_deconvolution_layer.xhtml">CLDeconvolutionLayer</a> and <a class="el" href="classarm__compute_1_1_n_e_deconvolution_layer.xhtml">NEDeconvolutionLayer</a></li>
+</ul>
+</li>
+</ul>
 <p>v18.11 Public major release</p><ul>
 <li>Various bug fixes.</li>
 <li>Various optimisations.</li>
@@ -439,7 +547,7 @@
 <li>Various bug fixes.</li>
 <li>Various optimisations.</li>
 <li>Major redesign in the interface for the neon kernels implemented in assembly.</li>
-<li>Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_assembly_matrix_multiply_core.xhtml" title="Basic function to execute matrix multiply assembly kernels. ">arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore</a> / arm_compute::NEHGEMMAArch64FP16Kernel</li>
+<li>Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_assembly_matrix_multiply_core.xhtml" title="Basic function to execute matrix multiply assembly kernels.">arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore</a> / arm_compute::NEHGEMMAArch64FP16Kernel</li>
 <li>Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions.</li>
 <li>Minor changes to the <a class="el" href="classarm__compute_1_1_c_p_u_info.xhtml">CPUInfo</a> type to make it compatible with the new assembly gemm interface.</li>
 <li>Moved neon assembly kernels to the folder src/core/NEON/kernels/arm_gemm.</li>
@@ -565,7 +673,7 @@
 <li>Added some of the missing <a class="el" href="namespacearm__compute_1_1test_1_1validation.xhtml#ae02c6fc90d9c60c634bfa258049eb46b">validate()</a> methods</li>
 <li>Added <a class="el" href="classarm__compute_1_1_c_l_deconvolution_layer_upsample_kernel.xhtml">CLDeconvolutionLayerUpsampleKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_deconvolution_layer.xhtml">CLDeconvolutionLayer</a> <a class="el" href="classarm__compute_1_1_c_l_deconvolution_layer_upsample.xhtml">CLDeconvolutionLayerUpsample</a></li>
 <li>Added <a class="el" href="classarm__compute_1_1_c_l_permute_kernel.xhtml">CLPermuteKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_permute.xhtml">CLPermute</a></li>
-<li>Added method to clean the programs cache in the CL <a class="el" href="classarm__compute_1_1_kernel.xhtml" title="Kernel class. ">Kernel</a> library.</li>
+<li>Added method to clean the programs cache in the CL <a class="el" href="classarm__compute_1_1_kernel.xhtml" title="Kernel class.">Kernel</a> library.</li>
 <li>Added <a class="el" href="classarm__compute_1_1_g_c_arithmetic_addition_kernel.xhtml">GCArithmeticAdditionKernel</a> / <a class="el" href="classarm__compute_1_1_g_c_arithmetic_addition.xhtml">GCArithmeticAddition</a></li>
 <li>Added <a class="el" href="classarm__compute_1_1_g_c_depthwise_convolution_layer3x3_kernel.xhtml">GCDepthwiseConvolutionLayer3x3Kernel</a> / <a class="el" href="classarm__compute_1_1_g_c_depthwise_convolution_layer3x3.xhtml">GCDepthwiseConvolutionLayer3x3</a></li>
 <li>Added <a class="el" href="classarm__compute_1_1_g_c_normalize_planar_y_u_v_layer_kernel.xhtml">GCNormalizePlanarYUVLayerKernel</a> / <a class="el" href="classarm__compute_1_1_g_c_normalize_planar_y_u_v_layer.xhtml">GCNormalizePlanarYUVLayer</a></li>
@@ -627,7 +735,7 @@
 </ul>
 </li>
 <li>New NEON kernels / functions<ul>
-<li>arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_assembly_matrix_multiply_core.xhtml" title="Basic function to execute matrix multiply assembly kernels. ">arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore</a></li>
+<li>arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_assembly_matrix_multiply_core.xhtml" title="Basic function to execute matrix multiply assembly kernels.">arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore</a></li>
 <li>arm_compute::NEHGEMMAArch64FP16Kernel</li>
 <li><a class="el" href="classarm__compute_1_1_n_e_depthwise_convolution_layer3x3_kernel.xhtml">NEDepthwiseConvolutionLayer3x3Kernel</a> / <a class="el" href="classarm__compute_1_1_n_e_depthwise_im2_col_kernel.xhtml">NEDepthwiseIm2ColKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_matrix_vector_multiply_kernel.xhtml">NEGEMMMatrixVectorMultiplyKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_depthwise_vector_to_tensor_kernel.xhtml">NEDepthwiseVectorToTensorKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_depthwise_convolution_layer.xhtml">NEDepthwiseConvolutionLayer</a></li>
 <li><a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_offset_contribution_kernel.xhtml">NEGEMMLowpOffsetContributionKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_matrix_a_reduction_kernel.xhtml">NEGEMMLowpMatrixAReductionKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_matrix_b_reduction_kernel.xhtml">NEGEMMLowpMatrixBReductionKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_lowp_matrix_multiply_core.xhtml">NEGEMMLowpMatrixMultiplyCore</a></li>
@@ -657,7 +765,7 @@
 <li>Bug fixes:<ul>
 <li>Check the maximum local workgroup size supported by OpenCL devices</li>
 <li>Minor documentation updates (Fixed instructions to build the examples)</li>
-<li>Introduced a <a class="el" href="classarm__compute_1_1graph_1_1_graph_context.xhtml" title="Graph context. ">graph::GraphContext</a></li>
+<li>Introduced a <a class="el" href="classarm__compute_1_1graph_1_1_graph_context.xhtml" title="Graph context.">graph::GraphContext</a></li>
 <li>Added a few new Graph nodes, support for branches and grouping.</li>
 <li>Automatically enable cl_printf in debug builds</li>
 <li>Fixed bare metal builds for armv7a</li>
@@ -668,11 +776,11 @@
 </ul>
 <p>v17.09 Public major release</p><ul>
 <li>Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.</li>
-<li><a class="el" href="classarm__compute_1_1_memory.xhtml" title="CPU implementation of memory object. ">Memory</a> Manager (<a class="el" href="classarm__compute_1_1_blob_lifetime_manager.xhtml">BlobLifetimeManager</a>, <a class="el" href="classarm__compute_1_1_blob_memory_pool.xhtml">BlobMemoryPool</a>, <a class="el" href="classarm__compute_1_1_i_lifetime_manager.xhtml">ILifetimeManager</a>, <a class="el" href="classarm__compute_1_1_i_memory_group.xhtml">IMemoryGroup</a>, <a class="el" href="classarm__compute_1_1_i_memory_manager.xhtml">IMemoryManager</a>, <a class="el" href="classarm__compute_1_1_i_memory_pool.xhtml">IMemoryPool</a>, <a class="el" href="classarm__compute_1_1_i_pool_manager.xhtml">IPoolManager</a>, <a class="el" href="classarm__compute_1_1_memory_manager_on_demand.xhtml">MemoryManagerOnDemand</a>, <a class="el" href="classarm__compute_1_1_pool_manager.xhtml">PoolManager</a>)</li>
+<li><a class="el" href="classarm__compute_1_1_memory.xhtml" title="CPU implementation of memory object.">Memory</a> Manager (<a class="el" href="classarm__compute_1_1_blob_lifetime_manager.xhtml">BlobLifetimeManager</a>, <a class="el" href="classarm__compute_1_1_blob_memory_pool.xhtml">BlobMemoryPool</a>, <a class="el" href="classarm__compute_1_1_i_lifetime_manager.xhtml">ILifetimeManager</a>, <a class="el" href="classarm__compute_1_1_i_memory_group.xhtml">IMemoryGroup</a>, <a class="el" href="classarm__compute_1_1_i_memory_manager.xhtml">IMemoryManager</a>, <a class="el" href="classarm__compute_1_1_i_memory_pool.xhtml">IMemoryPool</a>, <a class="el" href="classarm__compute_1_1_i_pool_manager.xhtml">IPoolManager</a>, <a class="el" href="classarm__compute_1_1_memory_manager_on_demand.xhtml">MemoryManagerOnDemand</a>, <a class="el" href="classarm__compute_1_1_pool_manager.xhtml">PoolManager</a>)</li>
 <li>New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).</li>
 <li>Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both NEON and OpenCL.</li>
 <li>New NEON kernels / functions:<ul>
-<li><a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_assembly_base_kernel.xhtml" title="Base class for GEMM NEON kernels implemented in Assembly. ">arm_compute::NEGEMMAssemblyBaseKernel</a> arm_compute::NEGEMMAArch64Kernel</li>
+<li><a class="el" href="classarm__compute_1_1_n_e_g_e_m_m_assembly_base_kernel.xhtml" title="Base class for GEMM NEON kernels implemented in Assembly.">arm_compute::NEGEMMAssemblyBaseKernel</a> arm_compute::NEGEMMAArch64Kernel</li>
 <li><a class="el" href="classarm__compute_1_1_n_e_dequantization_layer_kernel.xhtml">NEDequantizationLayerKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_dequantization_layer.xhtml">NEDequantizationLayer</a></li>
 <li><a class="el" href="classarm__compute_1_1_n_e_floor_kernel.xhtml">NEFloorKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_floor.xhtml">NEFloor</a></li>
 <li><a class="el" href="classarm__compute_1_1_n_e_l2_normalize_layer_kernel.xhtml">NEL2NormalizeLayerKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_l2_normalize_layer.xhtml">NEL2NormalizeLayer</a></li>
@@ -683,7 +791,7 @@
 </ul>
 </li>
 <li>New OpenCL kernels / functions:<ul>
-<li><a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer3x3_n_c_h_w_kernel.xhtml">CLDepthwiseConvolutionLayer3x3NCHWKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer3x3_n_h_w_c_kernel.xhtml">CLDepthwiseConvolutionLayer3x3NHWCKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_im2_col_kernel.xhtml">CLDepthwiseIm2ColKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_vector_to_tensor_kernel.xhtml">CLDepthwiseVectorToTensorKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_weights_reshape_kernel.xhtml">CLDepthwiseWeightsReshapeKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer3x3.xhtml">CLDepthwiseConvolutionLayer3x3</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer.xhtml">CLDepthwiseConvolutionLayer</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_separable_convolution_layer.xhtml">CLDepthwiseSeparableConvolutionLayer</a></li>
+<li><a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer3x3_n_c_h_w_kernel.xhtml">CLDepthwiseConvolutionLayer3x3NCHWKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer3x3_n_h_w_c_kernel.xhtml">CLDepthwiseConvolutionLayer3x3NHWCKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_im2_col_kernel.xhtml">CLDepthwiseIm2ColKernel</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_vector_to_tensor_kernel.xhtml">CLDepthwiseVectorToTensorKernel</a> CLDepthwiseWeightsReshapeKernel / <a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer3x3.xhtml">CLDepthwiseConvolutionLayer3x3</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_convolution_layer.xhtml">CLDepthwiseConvolutionLayer</a> <a class="el" href="classarm__compute_1_1_c_l_depthwise_separable_convolution_layer.xhtml">CLDepthwiseSeparableConvolutionLayer</a></li>
 <li><a class="el" href="classarm__compute_1_1_c_l_dequantization_layer_kernel.xhtml">CLDequantizationLayerKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_dequantization_layer.xhtml">CLDequantizationLayer</a></li>
 <li><a class="el" href="classarm__compute_1_1_c_l_direct_convolution_layer_kernel.xhtml">CLDirectConvolutionLayerKernel</a> / <a class="el" href="classarm__compute_1_1_c_l_direct_convolution_layer.xhtml">CLDirectConvolutionLayer</a></li>
 <li><a class="el" href="classarm__compute_1_1_c_l_flatten_layer.xhtml">CLFlattenLayer</a></li>
@@ -757,7 +865,7 @@
 <li><a class="el" href="classarm__compute_1_1_n_e_non_maxima_suppression3x3_kernel.xhtml">NENonMaximaSuppression3x3Kernel</a></li>
 </ul>
 <p>v17.03.1 First Major public release of the sources</p><ul>
-<li>Renamed the library to <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a></li>
+<li>Renamed the library to <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a></li>
 <li>New CPP target introduced for C++ kernels shared between NEON and CL functions.</li>
 <li>New padding calculation interface introduced and ported most kernels / functions to use it.</li>
 <li>New OpenCL kernels / functions:<ul>
@@ -820,11 +928,11 @@
 </ul>
 </li>
 <li>New NEON kernels / functions:<ul>
-<li><a class="el" href="classarm__compute_1_1_h_o_g.xhtml" title="CPU implementation of HOG data-object. ">HOG</a> / SVM: <a class="el" href="classarm__compute_1_1_n_e_h_o_g_orientation_binning_kernel.xhtml">NEHOGOrientationBinningKernel</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_block_normalization_kernel.xhtml">NEHOGBlockNormalizationKernel</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_detector_kernel.xhtml">NEHOGDetectorKernel</a>, NEHOGNonMaximaSuppressionKernel / <a class="el" href="classarm__compute_1_1_n_e_h_o_g_descriptor.xhtml">NEHOGDescriptor</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_detector.xhtml">NEHOGDetector</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_gradient.xhtml">NEHOGGradient</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_multi_detection.xhtml">NEHOGMultiDetection</a></li>
+<li><a class="el" href="classarm__compute_1_1_h_o_g.xhtml" title="CPU implementation of HOG data-object.">HOG</a> / SVM: <a class="el" href="classarm__compute_1_1_n_e_h_o_g_orientation_binning_kernel.xhtml">NEHOGOrientationBinningKernel</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_block_normalization_kernel.xhtml">NEHOGBlockNormalizationKernel</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_detector_kernel.xhtml">NEHOGDetectorKernel</a>, NEHOGNonMaximaSuppressionKernel / <a class="el" href="classarm__compute_1_1_n_e_h_o_g_descriptor.xhtml">NEHOGDescriptor</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_detector.xhtml">NEHOGDetector</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_gradient.xhtml">NEHOGGradient</a>, <a class="el" href="classarm__compute_1_1_n_e_h_o_g_multi_detection.xhtml">NEHOGMultiDetection</a></li>
 <li><a class="el" href="classarm__compute_1_1_n_e_non_linear_filter_kernel.xhtml">NENonLinearFilterKernel</a> / <a class="el" href="classarm__compute_1_1_n_e_non_linear_filter.xhtml">NENonLinearFilter</a></li>
 </ul>
 </li>
-<li>Introduced a <a class="el" href="classarm__compute_1_1_c_l_scheduler.xhtml" title="Provides global access to a CL context and command queue. ">CLScheduler</a> to manage the default context and command queue used by the runtime library and create synchronisation events.</li>
+<li>Introduced a <a class="el" href="classarm__compute_1_1_c_l_scheduler.xhtml" title="Provides global access to a CL context and command queue.">CLScheduler</a> to manage the default context and command queue used by the runtime library and create synchronisation events.</li>
 <li>Switched all the kernels / functions to use tensors instead of images.</li>
 <li>Updated documentation to include instructions to build the library from sources.</li>
 </ul>
@@ -926,7 +1034,7 @@
 <p>There is also an 'embed_only' option which will generate all the .embed files for the OpenCL kernels and / or OpenGLES compute shaders. This might be useful if using a different build system to compile the library.</p>
 <p><b>Werror:</b> If you are compiling using the same toolchains as the ones used in this guide then there shouldn't be any warning and therefore you should be able to keep Werror=1. If with a different compiler version the library fails to build because of warnings interpreted as errors then, if you are sure the warnings are not important, you might want to try to build with Werror=0 (But please do report the issue either on Github or by an email to <a href="#" onclick="location.href='mai'+'lto:'+'dev'+'el'+'ope'+'r@'+'arm'+'.c'+'om'; return false;">devel<span style="display: none;">.nosp@m.</span>oper<span style="display: none;">.nosp@m.</span>@arm.<span style="display: none;">.nosp@m.</span>com</a> so that the issue can be addressed).</p>
 <p><b>opencl</b> / <b>neon</b> / <b>gles_compute:</b> Choose which SIMD technology you want to target. (NEON for ARM Cortex-A CPUs or OpenCL / GLES_COMPUTE for ARM Mali GPUs)</p>
-<p><b>embed_kernels:</b> For OpenCL / GLES_COMPUTE only: set embed_kernels=1 if you want the OpenCL / GLES_COMPUTE kernels to be built in the library's binaries instead of being read from separate ".cl" / ".cs" files. If embed_kernels is set to 0 then the application can set the path to the folder containing the OpenCL / GLES_COMPUTE kernel files by calling <a class="el" href="classarm__compute_1_1_c_l_kernel_library.xhtml#a9f976367edcd9ab787375373e050b94b" title="Initialises the kernel library. ">CLKernelLibrary::init()</a> / <a class="el" href="classarm__compute_1_1_g_c_kernel_library.xhtml#abe24625d55f2fb35da7e293e5e28d483" title="Initialises the kernel library. ">GCKernelLibrary::init()</a>. By default the path is set to "./cl_kernels" / "./cs_shaders".</p>
+<p><b>embed_kernels:</b> For OpenCL / GLES_COMPUTE only: set embed_kernels=1 if you want the OpenCL / GLES_COMPUTE kernels to be built in the library's binaries instead of being read from separate ".cl" / ".cs" files. If embed_kernels is set to 0 then the application can set the path to the folder containing the OpenCL / GLES_COMPUTE kernel files by calling <a class="el" href="classarm__compute_1_1_c_l_kernel_library.xhtml#a9f976367edcd9ab787375373e050b94b" title="Initialises the kernel library.">CLKernelLibrary::init()</a> / <a class="el" href="classarm__compute_1_1_g_c_kernel_library.xhtml#abe24625d55f2fb35da7e293e5e28d483" title="Initialises the kernel library.">GCKernelLibrary::init()</a>. By default the path is set to "./cl_kernels" / "./cs_shaders".</p>
 <p><b>set_soname:</b> Do you want to build the versioned version of the library ?</p>
 <p>If enabled the library will contain a SONAME and SHLIBVERSION and some symlinks will automatically be created between the objects. Example: libarm_compute_core.so -&gt; libarm_compute_core.so.1.0.0 libarm_compute_core.so.1 -&gt; libarm_compute_core.so.1.0.0 libarm_compute_core.so.1.0.0</p>
 <dl class="section note"><dt>Note</dt><dd>This options is disabled by default as it requires SCons version 2.4 or above.</dd></dl>
@@ -940,7 +1048,7 @@
 <p><b>openmp</b> Build in the OpenMP scheduler for NEON.</p>
 <dl class="section note"><dt>Note</dt><dd>Only works when building with g++ not clang++</dd></dl>
 <p><b>cppthreads</b> Build in the C++11 scheduler for NEON.</p>
-<dl class="section see"><dt>See also</dt><dd><a class="el" href="classarm__compute_1_1_scheduler.xhtml#ad2fc671b2772dd9e28b81cf0e2514e85" title="Sets the user defined scheduler and makes it the active scheduler. ">Scheduler::set</a></dd></dl>
+<dl class="section see"><dt>See also</dt><dd><a class="el" href="classarm__compute_1_1_scheduler.xhtml#ad2fc671b2772dd9e28b81cf0e2514e85" title="Sets the user defined scheduler and makes it the active scheduler.">Scheduler::set</a></dd></dl>
 <h2><a class="anchor" id="S3_2_linux"></a>
 Building for Linux</h2>
 <h3><a class="anchor" id="S3_2_1_library"></a>
@@ -963,7 +1071,7 @@
 <h3><a class="anchor" id="S3_2_2_examples"></a>
 How to manually build the examples ?</h3>
 <p>The examples get automatically built by scons as part of the build process of the library described above. This section just describes how you can build and link your own application against our library.</p>
-<dl class="section note"><dt>Note</dt><dd>The following command lines assume the <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> binaries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built library with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.</dd></dl>
+<dl class="section note"><dt>Note</dt><dd>The following command lines assume the <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> binaries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built library with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.</dd></dl>
 <p>To cross compile a NEON example for Linux 32bit: </p><pre class="fragment">arm-linux-gnueabihf-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++11 -mfpu=neon -L. -larm_compute -larm_compute_core -o neon_convolution
 </pre><p>To cross compile a NEON example for Linux 64bit: </p><pre class="fragment">aarch64-linux-gnu-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++11 -L. -larm_compute -larm_compute_core -o neon_convolution
 </pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different)</p>
@@ -977,7 +1085,7 @@
 <p>i.e. to cross compile the "graph_lenet" example for Linux 32bit: </p><pre class="fragment">arm-linux-gnueabihf-g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++11 -mfpu=neon -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet
 </pre><p>i.e. to cross compile the "graph_lenet" example for Linux 64bit: </p><pre class="fragment">aarch64-linux-gnu-g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++11 -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet
 </pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different)</p>
-<dl class="section note"><dt>Note</dt><dd>If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a>, arm_compute_core</dd></dl>
+<dl class="section note"><dt>Note</dt><dd>If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a>, arm_compute_core</dd></dl>
 <p>To compile natively (i.e directly on an ARM device) for NEON for Linux 32bit: </p><pre class="fragment">g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++11 -mfpu=neon -larm_compute -larm_compute_core -o neon_convolution
 </pre><p>To compile natively (i.e directly on an ARM device) for NEON for Linux 64bit: </p><pre class="fragment">g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++11 -larm_compute -larm_compute_core -o neon_convolution
 </pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option)</p>
@@ -987,7 +1095,7 @@
 <p>i.e. to natively compile the "graph_lenet" example for Linux 32bit: </p><pre class="fragment">g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++11 -mfpu=neon -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet
 </pre><p>i.e. to natively compile the "graph_lenet" example for Linux 64bit: </p><pre class="fragment">g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++11 L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet
 </pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option)</p>
-<dl class="section note"><dt>Note</dt><dd>If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a>, arm_compute_core</dd>
+<dl class="section note"><dt>Note</dt><dd>If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a>, arm_compute_core</dd>
 <dd>
 These two commands assume libarm_compute.so is available in your library path, if not add the path to it using -L</dd></dl>
 <p>To run the built executable simply run: </p><pre class="fragment">LD_LIBRARY_PATH=build ./neon_convolution
@@ -1020,7 +1128,7 @@
 </pre><h3><a class="anchor" id="S3_3_2_examples"></a>
 How to manually build the examples ?</h3>
 <p>The examples get automatically built by scons as part of the build process of the library described above. This section just describes how you can build and link your own application against our library.</p>
-<dl class="section note"><dt>Note</dt><dd>The following command lines assume the <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> binaries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built library with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.</dd></dl>
+<dl class="section note"><dt>Note</dt><dd>The following command lines assume the <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> binaries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built library with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.</dd></dl>
 <p>Once you've got your Android standalone toolchain built and added to your path you can do the following:</p>
 <p>To cross compile a NEON example: </p><pre class="fragment">#32 bit:
 arm-linux-androideabi-clang++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++11 -larm_compute-static -larm_compute_core-static -L. -o neon_convolution_arm -static-libstdc++ -pie
@@ -1038,7 +1146,7 @@
 arm-linux-androideabi-clang++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++11 -Wl,--whole-archive -larm_compute_graph-static -Wl,--no-whole-archive -larm_compute-static -larm_compute_core-static -L. -o graph_lenet_arm -static-libstdc++ -pie -DARM_COMPUTE_CL
 #64 bit:
 aarch64-linux-android-clang++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++11 -Wl,--whole-archive -larm_compute_graph-static -Wl,--no-whole-archive -larm_compute-static -larm_compute_core-static -L. -o graph_lenet_aarch64 -static-libstdc++ -pie -DARM_COMPUTE_CL
-</pre><dl class="section note"><dt>Note</dt><dd>Due to some issues in older versions of the Mali OpenCL DDK (&lt;= r13p0), we recommend to link <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> statically on Android. </dd>
+</pre><dl class="section note"><dt>Note</dt><dd>Due to some issues in older versions of the Mali OpenCL DDK (&lt;= r13p0), we recommend to link <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> statically on Android. </dd>
 <dd>
 When linked statically the arm_compute_graph library currently needs the &ndash;whole-archive linker flag in order to work properly</dd></dl>
 <p>Then you need to do is upload the executable and the shared library to the device using ADB: </p><pre class="fragment">adb push neon_convolution_arm /data/local/tmp/
@@ -1057,7 +1165,7 @@
 adb shell /data/local/tmp/gc_absdiff_aarch64
 </pre><dl class="section note"><dt>Note</dt><dd>Examples accept different types of arguments, to find out what they are run the example with <em>&ndash;help</em> as an argument. If no arguments are specified then random values will be used to execute the graph.</dd></dl>
 <p>For example: adb shell /data/local/tmp/graph_lenet &ndash;help</p>
-<p>In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on NEON, 1 to run on OpenCL if available, 2 to run on OpenCL using the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a>), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run.</p>
+<p>In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on NEON, 1 to run on OpenCL if available, 2 to run on OpenCL using the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a>), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run.</p>
 <h2><a class="anchor" id="S3_4_bare_metal"></a>
 Building for bare metal</h2>
 <p>For bare metal, the library was successfully built using linaros's latest (gcc-linaro-6.3.1-2017.05) bare metal toolchains:</p><ul>
@@ -1083,9 +1191,9 @@
 <p>If the Windows subsystem for Linux is not available <a href="https://www.cygwin.com/">Cygwin</a> can be used to install and run <code>scons</code>. In addition to the default packages installed by Cygwin <code>scons</code> has to be selected in the installer. (<code>git</code> might also be useful but is not strictly required if you already have got the source code of the library.) Linaro provides pre-built versions of <a href="http://releases.linaro.org/components/toolchain/binaries/">GCC cross-compilers</a> that can be used from the Cygwin terminal. When building for Android the compiler is included in the Android standalone toolchain. After everything has been set up in the Cygwin terminal the general guide on building the library can be followed.</p>
 <h2><a class="anchor" id="S3_6_cl_stub_library"></a>
 The OpenCL stub library</h2>
-<p>In the opencl-1.2-stubs folder you will find the sources to build a stub OpenCL library which then can be used to link your application or <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> against.</p>
+<p>In the opencl-1.2-stubs folder you will find the sources to build a stub OpenCL library which then can be used to link your application or <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> against.</p>
 <p>If you preferred you could retrieve the OpenCL library from your device and link against this one but often this library will have dependencies on a range of system libraries forcing you to link your application against those too even though it is not using them.</p>
-<dl class="section warning"><dt>Warning</dt><dd>This OpenCL library provided is a stub and <em>not</em> a real implementation. You can use it to resolve OpenCL's symbols in <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> while building the example but you must make sure the real libOpenCL.so is in your PATH when running the example or it will not work.</dd></dl>
+<dl class="section warning"><dt>Warning</dt><dd>This OpenCL library provided is a stub and <em>not</em> a real implementation. You can use it to resolve OpenCL's symbols in <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> while building the example but you must make sure the real libOpenCL.so is in your PATH when running the example or it will not work.</dd></dl>
 <p>To cross-compile the stub OpenCL library simply run: </p><pre class="fragment">&lt;target-prefix&gt;-gcc -o libOpenCL.so -Iinclude opencl-1.2-stubs/opencl_stubs.c -fPIC -shared
 </pre><p>For example: </p><pre class="fragment">#Linux 32bit
 arm-linux-gnueabihf-gcc -o libOpenCL.so -Iinclude opencl-1.2-stubs/opencl_stubs.c -fPIC -shared
@@ -1097,7 +1205,7 @@
 aarch64-linux-android-clang -o libOpenCL.so -Iinclude -shared opencl-1.2-stubs/opencl_stubs.c -fPIC -shared
 </pre><h2><a class="anchor" id="S3_7_gles_stub_library"></a>
 The Linux OpenGLES and EGL stub libraries</h2>
-<p>In the opengles-3.1-stubs folder you will find the sources to build stub EGL and OpenGLES libraries which then can be used to link your Linux application of <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited. ">arm_compute</a> against.</p>
+<p>In the opengles-3.1-stubs folder you will find the sources to build stub EGL and OpenGLES libraries which then can be used to link your Linux application of <a class="el" href="namespacearm__compute.xhtml" title="Copyright (c) 2017-2018 ARM Limited.">arm_compute</a> against.</p>
 <dl class="section note"><dt>Note</dt><dd>The stub libraries are only needed on Linux. For Android, the NDK toolchains already provide the meta-EGL and meta-GLES libraries.</dd></dl>
 <p>To cross-compile the stub OpenGLES and EGL libraries simply run: </p><pre class="fragment">&lt;target-prefix&gt;-gcc -o libEGL.so -Iinclude/linux opengles-3.1-stubs/EGL.c -fPIC -shared
 &lt;target-prefix&gt;-gcc -o libGLESv2.so -Iinclude/linux opengles-3.1-stubs/GLESv2.c -fPIC -shared
@@ -1123,11 +1231,11 @@
 <p>SVM allocations are supported for all the underlying allocations in Compute Library. To enable this OpenCL 2.0 and above is a requirement.</p>
 <h2><a class="anchor" id="S3_9_cl_tuner"></a>
 OpenCL Tuner</h2>
-<p>The OpenCL tuner, a.k.a. <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a>, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS). The optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file. The OpenCL tuner performs a brute-force approach: it runs the same OpenCL kernel for a range of local workgroup sizes and keep the local workgroup size of the fastest run to use in subsequent calls to the kernel. In order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase.</p>
+<p>The OpenCL tuner, a.k.a. <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a>, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS). The optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file. The OpenCL tuner performs a brute-force approach: it runs the same OpenCL kernel for a range of local workgroup sizes and keep the local workgroup size of the fastest run to use in subsequent calls to the kernel. In order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase.</p>
 <p>If you wish to know more about LWS and the important role on improving the GPU cache utilization, we suggest having a look at the presentation "Even Faster CNNs: Exploring the New Class of Winograd Algorithms available at the following link:</p>
 <p><a href="https://www.embedded-vision.com/platinum-members/arm/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-iodice">https://www.embedded-vision.com/platinum-members/arm/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-iodice</a></p>
-<p>Tuning a network from scratch can be long and affect considerably the execution time for the first run of your network. It is recommended for this reason to store the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a>'s result in a file to amortize this time when you either re-use the same network or the functions with the same configurations. The tuning is performed only once for each OpenCL kernel.</p>
-<p><a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a> looks for the optimal LWS for each unique OpenCL kernel configuration. Since a function (i.e. Convolution Layer, Pooling Layer, Fully Connected Layer ...) can be called multiple times but with different parameters, we associate an "id" (called "config_id") to each kernel to distinguish the unique configurations. </p><pre class="fragment">#Example: 2 unique Matrix Multiply configurations
+<p>Tuning a network from scratch can be long and affect considerably the execution time for the first run of your network. It is recommended for this reason to store the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a>'s result in a file to amortize this time when you either re-use the same network or the functions with the same configurations. The tuning is performed only once for each OpenCL kernel.</p>
+<p><a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a> looks for the optimal LWS for each unique OpenCL kernel configuration. Since a function (i.e. Convolution Layer, Pooling Layer, Fully Connected Layer ...) can be called multiple times but with different parameters, we associate an "id" (called "config_id") to each kernel to distinguish the unique configurations. </p><pre class="fragment">#Example: 2 unique Matrix Multiply configurations
 </pre> <div class="fragment"><div class="line">TensorShape a0 = TensorShape(32,32);</div><div class="line">TensorShape b0 = TensorShape(32,32);</div><div class="line">TensorShape c0 = TensorShape(32,32);</div><div class="line">TensorShape a1 = TensorShape(64,64);</div><div class="line">TensorShape b1 = TensorShape(64,64);</div><div class="line">TensorShape c1 = TensorShape(64,64);</div><div class="line"></div><div class="line">Tensor a0_tensor;</div><div class="line">Tensor b0_tensor;</div><div class="line">Tensor c0_tensor;</div><div class="line">Tensor a1_tensor;</div><div class="line">Tensor b1_tensor;</div><div class="line">Tensor c1_tensor;</div><div class="line"></div><div class="line">a0_tensor.allocator()-&gt;init(TensorInfo(a0, 1, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a44ad4ef5a76e6aa6fb3e3fa079a54fda">DataType::F32</a>));</div><div class="line">b0_tensor.allocator()-&gt;init(TensorInfo(b0, 1, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a44ad4ef5a76e6aa6fb3e3fa079a54fda">DataType::F32</a>));</div><div class="line">c0_tensor.allocator()-&gt;init(TensorInfo(c0, 1, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a44ad4ef5a76e6aa6fb3e3fa079a54fda">DataType::F32</a>));</div><div class="line">a1_tensor.allocator()-&gt;init(TensorInfo(a1, 1, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a44ad4ef5a76e6aa6fb3e3fa079a54fda">DataType::F32</a>));</div><div class="line">b1_tensor.allocator()-&gt;init(TensorInfo(b1, 1, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a44ad4ef5a76e6aa6fb3e3fa079a54fda">DataType::F32</a>));</div><div class="line">c1_tensor.allocator()-&gt;init(TensorInfo(c1 1, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a44ad4ef5a76e6aa6fb3e3fa079a54fda">DataType::F32</a>));</div><div class="line"></div><div class="line">CLGEMM gemm0;</div><div class="line">CLGEMM gemm1;</div><div class="line"></div><div class="line"><span class="comment">// Configuration 0</span></div><div class="line">gemm0.configure(&amp;a0, &amp;b0, <span class="keyword">nullptr</span>, &amp;c0, 1.0f, 0.0f);</div><div class="line"></div><div class="line"><span class="comment">// Configuration 1</span></div><div class="line">gemm1.configure(&amp;a1, &amp;b1, <span class="keyword">nullptr</span>, &amp;c1, 1.0f, 0.0f);</div></div><!-- fragment --><h3><a class="anchor" id="S3_9_1_cl_tuner_how_to"></a>
 How to use it</h3>
 <p>All the graph examples in the ACL's folder "examples" and the arm_compute_benchmark accept an argument to enable the OpenCL tuner and an argument to export/import the LWS values to/from a file </p><pre class="fragment">#Enable CL tuner
@@ -1137,25 +1245,26 @@
 #Export/Import to/from a file
 ./graph_mobilenet --enable-tuner --target=CL --tuner-file=acl_tuner.csv
 ./arm_compute_benchmark --enable-tuner --tuner-file=acl_tuner.csv
-</pre><p>If you are importing the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a>'results from a file, the new tuned LWS values will be appended to it.</p>
+</pre><p>If you are importing the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a>'results from a file, the new tuned LWS values will be appended to it.</p>
 <p>Either you are benchmarking the graph examples or the test cases in the arm_compute_benchmark remember to: </p><pre class="fragment">-# Disable the power management
 -# Keep the GPU frequency constant
 -# Run multiple times the network (i.e. 10).
-</pre><p>If you are not using the graph API or the benchmark infrastructure you will need to manually pass a <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a> object to <a class="el" href="classarm__compute_1_1_c_l_scheduler.xhtml" title="Provides global access to a CL context and command queue. ">CLScheduler</a> before configuring any function.</p>
-<div class="fragment"><div class="line">CLTuner tuner;</div><div class="line"></div><div class="line"><span class="comment">// Setup Scheduler</span></div><div class="line"><a class="code" href="classarm__compute_1_1_c_l_scheduler.xhtml#a9b58d0eb9a2af8e6d7908695e1557d6c">CLScheduler::get</a>().<a class="code" href="classarm__compute_1_1_c_l_scheduler.xhtml#a46ecf9ef0fe80ba2ed35acfc29856b7d">default_init</a>(&amp;tuner);</div></div><!-- fragment --><p>After the first run, the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface. ">CLTuner</a>'s results can be exported to a file using the method "save_to_file()".</p><ul>
+</pre><p>If you are not using the graph API or the benchmark infrastructure you will need to manually pass a <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a> object to <a class="el" href="classarm__compute_1_1_c_l_scheduler.xhtml" title="Provides global access to a CL context and command queue.">CLScheduler</a> before configuring any function.</p>
+<div class="fragment"><div class="line">CLTuner tuner;</div><div class="line"></div><div class="line"><span class="comment">// Setup Scheduler</span></div><div class="line"><a class="code" href="classarm__compute_1_1_c_l_scheduler.xhtml#a9b58d0eb9a2af8e6d7908695e1557d6c">CLScheduler::get</a>().<a class="code" href="classarm__compute_1_1_c_l_scheduler.xhtml#a46ecf9ef0fe80ba2ed35acfc29856b7d">default_init</a>(&amp;tuner);</div></div><!-- fragment --><p>After the first run, the <a class="el" href="classarm__compute_1_1_c_l_tuner.xhtml" title="Basic implementation of the OpenCL tuner interface.">CLTuner</a>'s results can be exported to a file using the method "save_to_file()".</p><ul>
 <li>tuner.save_to_file("results.csv");</li>
 </ul>
 <p>This file can be also imported using the method "load_from_file("results.csv")".</p><ul>
 <li>tuner.load_from_file("results.csv"); </li>
 </ul>
-</div></div><!-- contents -->
+</div></div><!-- PageDoc -->
+</div><!-- contents -->
 </div><!-- doc-content -->
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Nov 22 2018 11:57:52 for Compute Library by
+    <li class="footer">Generated on Thu Feb 28 2019 12:25:07 for Compute Library by
     <a href="http://www.doxygen.org/index.html">
-    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
+    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.15 </li>
   </ul>
 </div>
 </body>
commit	514be65ad8d3340f53fd9591035352ed285811ba	[log] [tgz]
author	Jenkins <bsgcomp@arm.com>	Thu Feb 28 12:25:18 2019 +0000
committer	Anthony Barbier <anthony.barbier@arm.com>	Thu Feb 28 13:38:08 2019 +0000
tree	abe236598d76078a537fd247813e287d5bf34acd
parent	3d2d44ef55ab6b08afda8be48301ce3c55c7bc67 [diff] [blame]