Clay Murphy | 7797df8 | 2015-09-22 18:12:56 -0700 | [diff] [blame] | 1 | page.title=Debugging ART Garbage Collection |
| 2 | @jd:body |
| 3 | |
| 4 | <!-- |
| 5 | Copyright 2015 The Android Open Source Project |
| 6 | |
| 7 | Licensed under the Apache License, Version 2.0 (the "License"); |
| 8 | you may not use this file except in compliance with the License. |
| 9 | You may obtain a copy of the License at |
| 10 | |
| 11 | http://www.apache.org/licenses/LICENSE-2.0 |
| 12 | |
| 13 | Unless required by applicable law or agreed to in writing, software |
| 14 | distributed under the License is distributed on an "AS IS" BASIS, |
| 15 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 16 | See the License for the specific language governing permissions and |
| 17 | limitations under the License. |
| 18 | --> |
| 19 | |
| 20 | |
| 21 | <div id="qv-wrapper"> |
| 22 | <div id="qv"> |
| 23 | <h2 id="Contents">In this document</h2> |
| 24 | <ol id="auto-toc"> |
| 25 | </ol> |
| 26 | </div> |
| 27 | </div> |
| 28 | |
| 29 | <p>This document describes how to debug Android Runtime (ART) Garbage Collection |
| 30 | (GC) correctness and performance issues. It explains how to use GC verification |
| 31 | options, identify solutions for GC verification failures, and measure and |
| 32 | address GC performance problems.</p> |
| 33 | |
| 34 | <p>See <a href="index.html">ART and Dalvik</a>, the <a |
| 35 | href="dex-format.html">Dalvik Executable format</a>, and the remaining pages |
| 36 | within this <a href="index.html">ART and Dalvik</a> section to work with |
| 37 | ART. See <a |
| 38 | href="http://developer.android.com/guide/practices/verifying-apps-art.html">Verifying |
| 39 | App Behavior on the Android Runtime (ART)</a> for additional help verifying app |
| 40 | behavior.</p> |
| 41 | |
| 42 | <h2 id=art_gc_overview>ART GC overview</h2> |
| 43 | |
| 44 | <p>ART, introduced as a developer option in Android 4.4, is the default Android |
| 45 | runtime for Android 5.0 and beyond. The Dalvik runtime is no longer maintained |
| 46 | or available and its byte-code format is now used by ART. Please note this |
| 47 | section merely summarizes ART’s GC. For additional information, see the <a |
| 48 | href="https://www.google.com/events/io/io14videos/b750c8da-aebe-e311-b297-00155d5066d7">Android |
| 49 | runtime</a> presentation conducted at Google I/O 2014. </p> |
| 50 | |
| 51 | <p>ART has a few different GC plans that consist of running different garbage |
| 52 | collectors. The default plan is the CMS (concurrent mark sweep) plan, which |
| 53 | uses mostly sticky CMS and partial CMS. Sticky CMS is ART’s non-moving |
| 54 | generational garbage collector. It scans only the portion of the heap that was |
| 55 | modified since the last GC and can reclaim only the objects allocated since the |
| 56 | last GC. In addition to the CMS plan, ART performs heap compaction when an app |
| 57 | changes process state to a jank-imperceptible process state (e.g. background or |
| 58 | cached).</p> |
| 59 | |
| 60 | <p>Aside from the new garbage collectors, ART also introduces a new bitmap-based |
| 61 | memory allocator called RosAlloc (runs of slots allocator). This new allocator |
| 62 | has sharded locking and outperforms DlMalloc by adding thread local buffers for |
| 63 | small allocation sizes.</p> |
| 64 | |
| 65 | <p>Compared to Dalvik, the ART CMS garbage collection plan has a number of |
| 66 | improvements:</p> |
| 67 | |
| 68 | <ul> |
| 69 | <li>The number of pauses is reduced from two to one compared to Dalvik. |
| 70 | Dalvik’s first pause, which did mostly root marking, is done concurrently in |
| 71 | ART by getting the threads to mark their own roots, then resume running right away. |
| 72 | <li>Similarly to Dalvik, the ART GC also has a pause before the sweeping process. |
| 73 | The key difference in this area is that some of the Dalvik phases during this |
| 74 | pause are done concurrently in ART. These phases include |
| 75 | <code>java.lang.ref.Reference</code> processing, system weak sweeping (e.g. jni |
| 76 | weak globals, etc.), re-marking non-thread roots, and card pre-cleaning. The |
| 77 | phases that are still done paused in ART are scanning the dirty cards and |
| 78 | re-marking the thread roots, which helps reduce the pause time. |
| 79 | <li>The last area where the ART GC improves over Dalvik is with increased GC |
| 80 | throughput enabled by the sticky CMS collector. Unlike normal generational GC, |
| 81 | sticky CMS is non-moving. Instead of having a dedicated region for young |
| 82 | objects, young objects are kept in an allocation stack, which is basically an |
| 83 | array of <code>java.lang.Object</code>. This avoids moving objects required to |
| 84 | maintain low pauses but has the disadvantage of having longer collections for |
| 85 | heaps with complex object graphs. |
| 86 | </ul> |
| 87 | |
| 88 | <p>The other main other area where the ART GC is different than Dalvik is the |
| 89 | introduction of moving garbage collectors. The goal of moving GCs is to |
| 90 | reduce memory usage of backgrounded apps through heap compaction. Currently, |
| 91 | the event that triggers heap compaction is ActivityManager process-state |
| 92 | changes. When an app goes to background, it notifies ART the process state is |
| 93 | no longer jank “perceptible.” This enables ART do things that cause long |
| 94 | application thread pauses, such as compaction and monitor deflation. The two |
| 95 | current moving GCs that are in use are homogeneous space compaction and |
| 96 | semi-space compaction.</p> |
| 97 | |
| 98 | <ul> |
| 99 | <li>Semi-space compaction moves objects between two tightly packed bump pointer |
| 100 | spaces. This moving GC occurs on low-memory devices since it saves slightly |
| 101 | more memory than homogeneous space compaction. The additional savings come |
| 102 | mostly from having tightly packed objects, which avoid RosAlloc / DlMalloc |
| 103 | allocator accounting overhead. Since CMS is still used in the foreground and it |
| 104 | can’t collect from a bump pointer space, semi space requires another transition |
| 105 | when the app is foregrounded. This is not ideal since it can cause a noticeable pause. |
| 106 | <li>Homogenous space compaction works by copying from one RosAlloc space to another |
| 107 | RosAlloc space. This helps reduce memory usage by reducing heap fragmentation. |
| 108 | This is currently the default compaction mode for non-low-memory devices. The |
| 109 | main advantage that homogeneous space compaction has over semi-space compaction |
| 110 | is not needing a heap transition when the app goes back to foreground. |
| 111 | </ul> |
| 112 | |
| 113 | <h2 id=gc_verification_and_performance_options>GC verification and performance options</h2> |
| 114 | |
| 115 | <p>It is possible to change the GC type if you are an OEM. The process for doing |
| 116 | this involves modifying system properties through adb. Keep in mind that these |
| 117 | are only modifiable on non-user or rooted builds.</p> |
| 118 | |
| 119 | <h3 id=changing_the_gc_type>Changing the GC type</h3> |
| 120 | |
| 121 | <p>There are ways to change the GC plans that ART uses. The main way to change the |
| 122 | foreground GC plan is by changing the <code>dalvik.vm.gctype</code> property or |
| 123 | passing in an <code>-Xgc:</code> option. It is possible to pass in multiple GC |
| 124 | options separated by commas.</p> |
| 125 | |
| 126 | <p>In order to derive the entire list of available <code>-Xgc</code> settings, |
| 127 | it is possible to type <code>adb shell dalvikvm -help</code> to print the |
| 128 | various runtime command-line options.</p> |
| 129 | |
| 130 | <p>Here is one example that changes the GC to semi space and turns on pre-GC heap |
| 131 | verification: |
| 132 | <code>adb shell setprop dalvik.vm.gctype SS,preverify</code></p> |
| 133 | |
| 134 | <ul> |
| 135 | <li><code>CMS</code>, which is also the default value, specifies the |
| 136 | concurrent mark sweep GC plan. This plan consists of running sticky |
| 137 | generational CMS, partial CMS, and full CMS. The allocator for this plan is |
| 138 | RosAlloc for movable objects and DlMalloc for non-movable objects. |
| 139 | <li><code>SS</code> specifies the semi space GC plan. This plan has two semi |
| 140 | spaces for movable objects and a DlMalloc space for non-movable objects. The |
| 141 | movable object allocator defaults to a shared bump pointer allocator which uses |
| 142 | atomic operations. However, if the <code>-XX:UseTLAB</code> flag is also passed |
| 143 | in, the allocator uses thread local bump pointer allocation. |
| 144 | <li><code>GSS</code> specifies the generational semi space plan. This plan is |
| 145 | very similar to the semi-space plan with the exception that older-lived objects |
| 146 | are promoted into a large RosAlloc space. This has the advantage of needing to |
| 147 | copy significantly fewer objects for typical use cases. |
| 148 | </ul> |
| 149 | |
| 150 | <h3 id=verifying_the_heap>Verifying the heap</h3> |
| 151 | |
| 152 | <p>Heap verification is probably the most useful GC option for debugging |
| 153 | GC-related errors or heap corruption. Enabling heap verification causes the GC |
| 154 | to check the correctness of the heap at a few points during the garbage |
| 155 | collection process. Heap verification shares the same options as the ones that |
| 156 | change the GC type. If enabled, heap verification verifies the roots and |
| 157 | ensures that reachable objects reference only other reachable objects. GC |
| 158 | verification is enabled by passing in the following -<code>Xgc</code> values:</p> |
| 159 | |
| 160 | <ul> |
| 161 | <li>If enabled, <code>[no]preverify</code> performs heap verification before starting the GC. |
| 162 | <li>If enabled, <code>[no]presweepingverify</code> performs heap verification |
| 163 | before starting the garbage collector sweeping |
| 164 | process. |
| 165 | <li>If enabled, <code>[no]postverify</code> performs heap verification after |
| 166 | the GC has finished sweeping. |
| 167 | <li><code>[no]preverify_rosalloc</code>, |
| 168 | <code>[no]postsweepingverify_rosalloc</code>, |
| 169 | <code>[no]postverify_rosalloc</code> are also additional GC options that verify |
| 170 | only the state of RosAlloc’s internal accounting. The main things verified are |
| 171 | that magic values match expected constants, and free blocks of memory are all |
| 172 | registered in the <code>free_page_runs_</code> map. |
| 173 | </ul> |
| 174 | |
| 175 | <h3 id=using_the_tlab_allocator_option>Using the TLAB allocator option</h3> |
| 176 | |
| 177 | <p>Currently, the only option that changes the allocator used without affecting |
| 178 | the active GC type is the TLAB option. This option is not available through |
| 179 | system properties but can be enabled by passing in -<code>XX:UseTLAB</code> to |
| 180 | <code>dalvikvm</code>. This option enables faster allocation by having a |
| 181 | shorter allocation code path. Since this option requires using either the SS or |
| 182 | GSS GC types, which have rather long pauses, it is not enabled by default.</p> |
| 183 | |
| 184 | <h2 id=performance>Performance</h2> |
| 185 | |
| 186 | <p>There are two main tools that can be used to measure GC performance. GC timing |
| 187 | dumps and systrace. The most visual way to measure GC performance problems |
| 188 | would be to use systrace to determine which GCs are causing long pauses or |
| 189 | preempting application threads. Although the ART GC is relatively efficient, |
| 190 | there are still a few ways to get performance problems through excessive |
| 191 | allocations or bad mutator behavior.</p> |
| 192 | |
| 193 | <h3 id=ergonomics>Ergonomics</h3> |
| 194 | |
| 195 | <p>Compared to Dalvik, ART has a few key differences regarding GC ergonomics. One |
| 196 | of the major improvements compared to Dalvik is no longer having GC for alloc |
| 197 | in cases where we start the concurrent GC later than needed. However, there is |
| 198 | one downside to this, not blocking on the GC can cause the heap to grow more |
| 199 | than Dalvik in some circumstances. Fortunately, ART has heap compaction, which |
| 200 | mitigates this issue by defragmenting the heap when the process changes to a |
| 201 | background process state.</p> |
| 202 | |
| 203 | <p>The CMS GC ergonomics have two types of GCs that are regularly run. Ideally, |
| 204 | the GC ergonomics will run the generational sticky CMS more often than the |
| 205 | partial CMS. The GC does sticky CMS until the throughput (calculated by bytes |
| 206 | freed / second of GC duration) of the last GC is less than the mean throughput |
| 207 | of partial CMS. When this occurs, the ergonomics plan the next concurrent GC to |
| 208 | be a partial CMS instead of sticky CMS. After the partial CMS completes, the |
| 209 | ergonomics changes the next GC back to sticky CMS. One key factor that makes |
| 210 | the ergonomics work is that sticky CMS doesn’t adjust the heap footprint limit |
| 211 | after it completes. This causes sticky CMS to happen more and more often until |
| 212 | the throughput is lower than partial CMS, which ends up growing the heap.</p> |
| 213 | |
| 214 | <h3 id=using_sigquit_to_obtain_gc_performance_info>Using SIGQUIT to obtain GC performance info</h3> |
| 215 | |
| 216 | <p>It is possible to get GC performance timings for apps by sending SIGQUIT to |
| 217 | already running apps or passing in -<code>XX:DumpGCPerformanceOnShutdown</code> |
| 218 | to <code>dalvikvm</code> when starting a command line program. When an app gets |
| 219 | the ANR request signal (SIGQUIT) it dumps information related to its locks, |
| 220 | thread stacks, and GC performance.</p> |
| 221 | |
| 222 | <p>The way to get GC timing dumps is to use:<br> |
| 223 | <code>$ adb shell kill -S QUIT <pid></code></p> |
| 224 | |
| 225 | <p>This creates a <code>traces.txt</code> file in <code>/data/anr/</code>. This |
| 226 | file contains some ANR dumps as well as GC timings. You can locate the |
| 227 | GC timings by searching for: “Dumping cumulative Gc timings” These timings will |
| 228 | show a few things that may be of interest. It will show the histogram info for |
| 229 | each GC type’s phases and pauses. The pauses are usually more important to look |
| 230 | at. For example:</p> |
| 231 | |
| 232 | <pre> |
| 233 | sticky concurrent mark sweep paused: Sum: 5.491ms 99% C.I. 1.464ms-2.133ms Avg: 1.830ms Max: 2.133ms |
| 234 | </pre> |
| 235 | |
| 236 | <p><code>This</code> shows that the average pause was 1.83ms. This should be low enough to not |
| 237 | cause missed frames in most applications and shouldn’t be a concern.</p> |
| 238 | |
| 239 | <p>Another area of interest is time to suspend. What time to suspend measures is |
| 240 | how long it takes a thread to reach a suspend point after the GC requests that |
| 241 | it suspends. This time is included in the GC pauses, so it is useful to |
| 242 | determine if long pauses are caused by the GC being slow or the thread |
| 243 | suspending slowly. Here is an example of what a normal time to suspend |
| 244 | resembles on a Nexus 5:</p> |
| 245 | |
| 246 | <pre> |
| 247 | suspend all histogram: Sum: 1.513ms 99% C.I. 3us-546.560us Avg: 47.281us Max: 601us |
| 248 | </pre> |
| 249 | |
| 250 | <p>There are also a few other areas of interest, such as total time spent, GC |
| 251 | throughput, etc. Examples:</p> |
| 252 | |
| 253 | <pre> |
| 254 | Total time spent in GC: 502.251ms |
| 255 | Mean GC size throughput: 92MB/s |
| 256 | Mean GC object throughput: 1.54702e+06 objects/s |
| 257 | </pre> |
| 258 | |
| 259 | <p>Here is an example of how to dump the GC timings of an already running app: |
| 260 | |
| 261 | <pre> |
| 262 | $ adb shell kill -s QUIT <pid> |
| 263 | $ adb pull /data/anr/traces.txt |
| 264 | </pre> |
| 265 | |
| 266 | <p>At this point the GC timings are inside of traces.txt. Here is example output |
| 267 | from Google maps:</p> |
| 268 | |
| 269 | <pre> |
| 270 | Start Dumping histograms for 34 iterations for sticky concurrent mark sweep |
| 271 | ScanGrayAllocSpaceObjects: Sum: 196.174ms 99% C.I. 0.011ms-11.615ms Avg: 1.442ms Max: 14.091ms |
| 272 | FreeList: Sum: 140.457ms 99% C.I. 6us-1676.749us Avg: 128.505us Max: 9886us |
| 273 | MarkRootsCheckpoint: Sum: 110.687ms 99% C.I. 0.056ms-9.515ms Avg: 1.627ms Max: 10.280ms |
| 274 | SweepArray: Sum: 78.727ms 99% C.I. 0.121ms-11.780ms Avg: 2.315ms Max: 12.744ms |
| 275 | ProcessMarkStack: Sum: 77.825ms 99% C.I. 1.323us-9120us Avg: 576.481us Max: 10185us |
| 276 | (Paused)ScanGrayObjects: Sum: 32.538ms 99% C.I. 286us-3235.500us Avg: 986us Max: 3434us |
| 277 | AllocSpaceClearCards: Sum: 30.592ms 99% C.I. 10us-2249.999us Avg: 224.941us Max: 4765us |
| 278 | MarkConcurrentRoots: Sum: 30.245ms 99% C.I. 3us-3017.999us Avg: 444.779us Max: 3774us |
| 279 | ReMarkRoots: Sum: 13.144ms 99% C.I. 66us-712us Avg: 386.588us Max: 712us |
| 280 | ScanGrayImageSpaceObjects: Sum: 13.075ms 99% C.I. 29us-2538.999us Avg: 192.279us Max: 3080us |
| 281 | MarkingPhase: Sum: 9.743ms 99% C.I. 170us-518us Avg: 286.558us Max: 518us |
| 282 | SweepSystemWeaks: Sum: 8.046ms 99% C.I. 28us-479us Avg: 236.647us Max: 479us |
| 283 | MarkNonThreadRoots: Sum: 5.215ms 99% C.I. 31us-698.999us Avg: 76.691us Max: 703us |
| 284 | ImageModUnionClearCards: Sum: 2.708ms 99% C.I. 26us-92us Avg: 39.823us Max: 92us |
| 285 | ScanGrayZygoteSpaceObjects: Sum: 2.488ms 99% C.I. 19us-250.499us Avg: 37.696us Max: 295us |
| 286 | ResetStack: Sum: 2.226ms 99% C.I. 24us-449us Avg: 65.470us Max: 452us |
| 287 | ZygoteModUnionClearCards: Sum: 2.124ms 99% C.I. 18us-233.999us Avg: 32.181us Max: 291us |
| 288 | FinishPhase: Sum: 1.881ms 99% C.I. 31us-431.999us Avg: 55.323us Max: 466us |
| 289 | RevokeAllThreadLocalAllocationStacks: Sum: 1.749ms 99% C.I. 8us-349us Avg: 51.441us Max: 377us |
| 290 | EnqueueFinalizerReferences: Sum: 1.513ms 99% C.I. 3us-201us Avg: 44.500us Max: 201us |
| 291 | ProcessReferences: Sum: 438us 99% C.I. 3us-212us Avg: 12.882us Max: 212us |
| 292 | ProcessCards: Sum: 381us 99% C.I. 4us-17us Avg: 5.602us Max: 17us |
| 293 | PreCleanCards: Sum: 363us 99% C.I. 8us-17us Avg: 10.676us Max: 17us |
| 294 | ReclaimPhase: Sum: 357us 99% C.I. 7us-91.500us Avg: 10.500us Max: 93us |
| 295 | (Paused)PausePhase: Sum: 312us 99% C.I. 7us-15us Avg: 9.176us Max: 15us |
| 296 | SwapBitmaps: Sum: 166us 99% C.I. 4us-8us Avg: 4.882us Max: 8us |
| 297 | (Paused)ScanGrayAllocSpaceObjects: Sum: 126us 99% C.I. 14us-112us Avg: 63us Max: 112us |
| 298 | MarkRoots: Sum: 121us 99% C.I. 2us-7us Avg: 3.558us Max: 7us |
| 299 | (Paused)ScanGrayImageSpaceObjects: Sum: 68us 99% C.I. 68us-68us Avg: 68us Max: 68us |
| 300 | BindBitmaps: Sum: 50us 99% C.I. 1us-3us Avg: 1.470us Max: 3us |
| 301 | UnBindBitmaps: Sum: 49us 99% C.I. 1us-3us Avg: 1.441us Max: 3us |
| 302 | SwapStacks: Sum: 47us 99% C.I. 1us-3us Avg: 1.382us Max: 3us |
| 303 | RecordFree: Sum: 42us 99% C.I. 1us-3us Avg: 1.235us Max: 3us |
| 304 | ForwardSoftReferences: Sum: 37us 99% C.I. 1us-2us Avg: 1.121us Max: 2us |
| 305 | InitializePhase: Sum: 36us 99% C.I. 1us-2us Avg: 1.058us Max: 2us |
| 306 | FindDefaultSpaceBitmap: Sum: 32us 99% C.I. 250ns-1000ns Avg: 941ns Max: 1000ns |
| 307 | (Paused)ProcessMarkStack: Sum: 5us 99% C.I. 250ns-3000ns Avg: 147ns Max: 3000ns |
| 308 | PreSweepingGcVerification: Sum: 0 99% C.I. 0ns-0ns Avg: 0ns Max: 0ns |
| 309 | Done Dumping histograms |
| 310 | sticky concurrent mark sweep paused: Sum: 63.268ms 99% C.I. 0.308ms-8.405ms |
| 311 | Avg: 1.860ms Max: 8.883ms |
| 312 | sticky concurrent mark sweep total time: 763.787ms mean time: 22.464ms |
| 313 | sticky concurrent mark sweep freed: 1072342 objects with total size 75MB |
| 314 | sticky concurrent mark sweep throughput: 1.40543e+06/s / 98MB/s |
| 315 | Total time spent in GC: 4.805s |
| 316 | Mean GC size throughput: 18MB/s |
| 317 | Mean GC object throughput: 330899 objects/s |
| 318 | Total number of allocations 2015049 |
| 319 | Total bytes allocated 177MB |
| 320 | Free memory 4MB |
| 321 | Free memory until GC 4MB |
| 322 | Free memory until OOME 425MB |
| 323 | Total memory 90MB |
| 324 | Max memory 512MB |
| 325 | Zygote space size 4MB |
| 326 | Total mutator paused time: 229.566ms |
| 327 | Total time waiting for GC to complete: 187.655us |
| 328 | </pre> |
| 329 | |
| 330 | <h2 id=tools_for_analyzing_gc_correctness_problems>Tools for analyzing GC correctness problems</h2> |
| 331 | |
| 332 | <p>There are various things that can cause crashes inside of ART. Crashes that |
| 333 | occur reading or writing to object fields may indicate heap corruption. If the |
| 334 | GC crashes when it is running, it could also point to heap corruption. There |
| 335 | are various things that can cause heap corruption, the most common cause is |
| 336 | probably incorrect app code. Fortunately, there are tools to debug GC and |
| 337 | heap-related crashes. These include the heap verification options specified |
| 338 | above, valgrind, and CheckJNI.</p> |
| 339 | |
| 340 | <h3 id=checkjni>CheckJNI</h3> |
| 341 | |
| 342 | <p>Another way to verify app behavior is to use CheckJNI. CheckJNI is a mode that |
| 343 | adds additional JNI checks; these aren’t enabled by default due to performance |
| 344 | reasons. The checks will catch a few errors that could cause heap corruption |
| 345 | such as using invalid/stale local and global references. Here is how to enable |
| 346 | CheckJNI:</p> |
| 347 | |
| 348 | <pre> |
| 349 | $ adb shell setprop dalvik.vm.checkjni true |
| 350 | </pre> |
| 351 | |
| 352 | <p>Forcecopy mode is another part of CheckJNI that is very useful for detecting |
| 353 | writes past the end of array regions. When enabled, forcecopy causes the array |
| 354 | access JNI functions to always return copies with red zones. A <em>red |
| 355 | zone</em> is a region at the end/start of the returned pointer that has a |
| 356 | special value, which is verified when the array is released. If the values in |
| 357 | the red zone don’t match what is expected, this usually means a buffer overrun |
| 358 | or underrun occurred. This would cause CheckJNI to abort. Here is how to enable |
| 359 | forcecopy mode:</p> |
| 360 | |
| 361 | <pre> |
| 362 | $ adb shell setprop dalvik.vm.jniopts forcecopy |
| 363 | </pre> |
| 364 | |
| 365 | <p>One example of an error that CheckJNI should catch is writing past the end of |
| 366 | an array obtained from <code>GetPrimitiveArrayCritical</code>. This operation |
| 367 | would very likely corrupt the Java heap. If the write is |
| 368 | within the CheckJNI red zone area, then CheckJNI would catch the issue when the |
| 369 | corresponding <code>ReleasePrimitiveArrayCritical</code> is called. Otherwise, |
| 370 | the write will end up corrupting some random object in |
| 371 | the Java heap and possibly causing a future GC crash. If the corrupted memory |
| 372 | is a reference field, then the GC may catch the error and print a “<em>Tried to |
| 373 | mark <ptr> not contained by any spaces</em>” error.</p> |
| 374 | |
| 375 | <p>This error occurs when the GC attempts to mark an object for which it can’t |
| 376 | find a space. After this check fails, the GC traverses the roots and tries to |
| 377 | see if the invalid object is a root. From here, there are two options: The |
| 378 | object is a root or a non-root object.</p> |
| 379 | |
| 380 | <h3 id=valgrind>Valgrind</h3> |
| 381 | |
| 382 | <p>The ART heap supports optional valgrind instrumentation, which provides a |
| 383 | way to detect reads and writes to and from an invalid heap address. ART detects |
| 384 | when the app is running under valgrind and inserts red zones before and after |
| 385 | each object allocation. If there are any reads or writes to these red zones, |
| 386 | valgrind prints an error. One example of when this could happen is if you read |
| 387 | or write past the end of an array’s elements while using direct array access |
| 388 | through JNI. Since the AOT compilers use implicit null checks, it is |
| 389 | recommended to use eng builds for running valgrind. Another thing to note is |
| 390 | that valgrind is orders of magnitude slower than normal execution.</p> |
| 391 | |
| 392 | <p>Here is an example use:</p> |
| 393 | |
| 394 | <pre> |
| 395 | # build and install |
| 396 | $ mmm external/valgrind |
| 397 | $ adb remount && adb sync |
| 398 | # disable selinux |
| 399 | $ adb shell setenforce 0 |
| 400 | $ adb shell setprop wrap.com.android.calculator2 |
| 401 | "TMPDIR=/data/data/com.android.calculator2 logwrapper valgrind" |
| 402 | # push symbols |
| 403 | $ adb shell mkdir /data/local/symbols |
| 404 | $ adb push $OUT/symbols /data/local/symbols |
| 405 | $ adb logcat |
| 406 | </pre> |
| 407 | |
| 408 | |
| 409 | <h3 id=invalid_root_example>Invalid root example</h3> |
| 410 | |
| 411 | <p>In the case where the object is actually an invalid root, it will print some |
| 412 | useful information: |
| 413 | <code>art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:383] Tried to mark 0x2 |
| 414 | not contained by any spaces</code></p> |
| 415 | |
| 416 | <pre> |
| 417 | art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:384] Attempting see if |
| 418 | it's a bad root |
| 419 | art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:485] Found invalid |
| 420 | root: 0x2 |
| 421 | art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:486] |
| 422 | Type=RootJavaFrame thread_id=1 location=Visiting method 'java.lang.Object |
| 423 | com.google.gwt.corp.collections.JavaReadableJsArray.get(int)' at dex PC 0x0002 |
| 424 | (native PC 0xf19609d9) vreg=1 |
| 425 | </pre> |
| 426 | |
| 427 | <p>In this case, <code>vreg 1</code> inside of |
| 428 | <code>com.google.gwt.corp.collections.JavaReadableJsArray.get</code> is |
| 429 | supposed to contain a heap reference but actually contains an invalid pointer |
| 430 | of address <code>0x2</code>. This is clearly an invalid root. The next step to |
| 431 | debug this issue would be to use <code>oatdump</code> on the oat file and look |
| 432 | at the method that has the invalid root. In this case, the error turned out to |
| 433 | be a compiler bug in the x86 backend. Here is the changelist that fixed it: <a |
| 434 | href="https://android-review.googlesource.com/#/c/133932/">https://android-review.googlesource.com/#/c/133932/</a></p> |
| 435 | |
| 436 | <h3 id=corrupted_object_example>Corrupted object example</h3> |
| 437 | |
| 438 | <p>In the case where the object isn’t a root, output similar to the following |
| 439 | prints:</p> |
| 440 | |
| 441 | <pre> |
| 442 | 01-15 12:38:00.196 1217 1238 E art : Attempting see if it's a bad root |
| 443 | 01-15 12:38:00.196 1217 1238 F art : |
| 444 | art/runtime/gc/collector/mark_sweep.cc:381] Can't mark invalid object |
| 445 | </pre> |
| 446 | |
| 447 | <p>When heap corruption isn’t an invalid root, it is unfortunately hard to debug. |
| 448 | This error message indicates that there was at least one object in the heap |
| 449 | that was pointing to the invalid object.</p> |