| page.title=Designing for Performance |
| @jd:body |
| |
| <p>An Android application should be fast. Well, it's probably more accurate to |
| say that it should be <em>efficient</em>. That is, it should execute as |
| efficiently as possible in the mobile device environment, with its limited |
| computing power and data storage, smaller screen, and constrained battery life. |
| |
| <p>As you develop your application, keep in mind that, while the application may |
| perform well enough in your emulator, running on your dual-core development |
| computer, it will not perform that well when run a mobile device — even |
| the most powerful mobile device can't match the capabilities of a typical |
| desktop system. For that reason, you should strive to write efficient code, to |
| ensure the best possible performance on a variety of mobile devices.</p> |
| |
| <p>Generally speaking, writing fast or efficient code means keeping memory |
| allocations to a minimum, writing tight code, and avoiding certain language and |
| programming idioms that can subtly cripple performance. In object-oriented |
| terms, most of this work takes place at the <em>method</em> level, on the order of |
| actual lines of code, loops, and so on.</p> |
| |
| <p>This document covers these topics: </p> |
| <ul> |
| <li><a href="#intro">Introduction</a></li> |
| <li><a href="#object_creation">Avoid Creating Objects</a></li> |
| <li><a href="#native_methods">Use Native Methods</a></li> |
| <li><a href="#prefer_virtual">Prefer Virtual Over Interface</a></li> |
| <li><a href="#prefer_static">Prefer Static Over Virtual</a></li> |
| <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li> |
| <li><a href="#cache_fields">Cache Field Lookups</a></li> |
| <li><a href="#use_final">Declare Constants Final</a></li> |
| <li><a href="#foreach">Use Enhanced For Loop Syntax With Caution</a></li> |
| <li><a href="#avoid_enums">Avoid Enums</a></li> |
| <li><a href="#package_inner">Use Package Scope with Inner Classes</a></li> |
| <li><a href="#avoidfloat">Avoid Float</a> </li> |
| <li><a href="#samples">Some Sample Performance Numbers</a> </li> |
| <li><a href="#closing_notes">Closing Notes</a></li> |
| </ul> |
| |
| <a name="intro" id="intro"></a> |
| <h2>Introduction</h2> |
| |
| <p>There are two basic rules for resource-constrained systems:</p> |
| |
| <ul> |
| <li>Don't do work that you don't need to do.</li> |
| <li>Don't allocate memory if you can avoid it.</li> |
| </ul> |
| |
| <p>All the tips below follow from these two basic tenets.</p> |
| |
| <p>Some would argue that much of the advice on this page amounts to "premature |
| optimization." While it's true that micro-optimizations sometimes make it |
| harder to develop efficient data structures and algorithms, on embedded |
| devices like handsets you often simply have no choice. For instance, if you |
| bring your assumptions about VM performance on desktop machines to Android, |
| you're quite likely to write code that exhausts system memory. This will bring |
| your application to a crawl — let alone what it will do to other programs |
| running on the system!</p> |
| |
| <p>That's why these guidelines are important. Android's success depends on |
| the user experience that your applications provide, and that user experience |
| depends in part on whether your code is responsive and snappy, or slow and |
| aggravating. Since all our applications will run on the same devices, we're |
| all in this together, in a way. Think of this document as like the rules of |
| the road you had to learn when you got your driver's license: things run |
| smoothly when everybody follows them, but when you don't, you get your car |
| smashed up.</p> |
| |
| <p>Before we get down to brass tacks, a brief observation: nearly all issues |
| described below are valid whether or not the VM features a JIT compiler. If I |
| have two methods that accomplish the same thing, and the interpreted execution |
| of foo() is faster than bar(), then the compiled version of foo() will |
| probably be as fast or faster than compiled bar(). It is unwise to rely on a |
| compiler to "save" you and make your code fast enough.</p> |
| |
| <a name="object_creation"></a> |
| <h2>Avoid Creating Objects</h2> |
| |
| <p>Object creation is never free. A generational GC with per-thread allocation |
| pools for temporary objects can make allocation cheaper, but allocating memory |
| is always more expensive than not allocating memory.</p> |
| |
| <p>If you allocate objects in a user interface loop, you will force a periodic |
| garbage collection, creating little "hiccups" in the user experience.</p> |
| |
| <p>Thus, you should avoid creating object instances you don't need to. Some |
| examples of things that can help:</p> |
| |
| <ul> |
| <li>When extracting strings from a set of input data, try |
| to return a substring of the original data, instead of creating a copy. |
| You will create a new String object, but it will share the char[] |
| with the data.</li> |
| <li>If you have a method returning a string, and you know that its result |
| will always be appended to a StringBuffer anyway, change your signature |
| and implementation so that the function does the append directly, |
| instead of creating a short-lived temporary object.</li> |
| </ul> |
| |
| <p>A somewhat more radical idea is to slice up multidimensional arrays into parallel |
| single one-dimension arrays:</p> |
| |
| <ul> |
| <li>An array of ints is a much better than an array of Integers, |
| but this also generalizes to the fact that two parallel arrays of ints |
| are also a <strong>lot</strong> more efficient than an array of (int,int) |
| objects. The same goes for any combination of primitive types.</li> |
| <li>If you need to implement a container that stores tuples of (Foo,Bar) |
| objects, try to remember that two parallel Foo[] and Bar[] arrays are |
| generally much better than a single array of custom (Foo,Bar) objects. |
| (The exception to this, of course, is when you're designing an API for |
| other code to access; in those cases, it's usually better to trade |
| correct API design for a small hit in speed. But in your own internal |
| code, you should try and be as efficient as possible.)</li> |
| </ul> |
| |
| <p>Generally speaking, avoid creating short-term temporary objects if you |
| can. Fewer objects created mean less-frequent garbage collection, which has |
| a direct impact on user experience.</p> |
| |
| <a name="native_methods" id="native_methods"></a> |
| <h2>Use Native Methods</h2> |
| |
| <p>When processing strings, don't hesitate to use specialty methods like |
| String.indexOf(), String.lastIndexOf(), and their cousins. These are typically |
| implemented in C/C++ code that easily runs 10-100x faster than doing the same |
| thing in a Java loop.</p> |
| |
| <p>The flip side of that advice is that punching through to a native |
| method is more expensive than calling an interpreted method. Don't use native |
| methods for trivial computation, if you can avoid it.</p> |
| |
| <a name="prefer_virtual" id="prefer_virtual"></a> |
| <h2>Prefer Virtual Over Interface</h2> |
| |
| <p>Suppose you have a HashMap object. You can declare it as a HashMap or as |
| a generic Map:</p> |
| |
| <pre>Map myMap1 = new HashMap(); |
| HashMap myMap2 = new HashMap();</pre> |
| |
| <p>Which is better?</p> |
| |
| <p>Conventional wisdom says that you should prefer Map, because it |
| allows you to change the underlying implementation to anything that |
| implements the Map interface. Conventional wisdom is correct for |
| conventional programming, but isn't so great for embedded systems. Calling |
| through an interface reference can take 2x longer than a virtual |
| method call through a concrete reference.</p> |
| |
| <p>If you have chosen a HashMap because it fits what you're doing, there |
| is little value in calling it a Map. Given the availability of |
| IDEs that refactor your code for you, there's not much value in calling |
| it a Map even if you're not sure where the code is headed. (Again, though, |
| public APIs are an exception: a good API usually trumps small performance |
| concerns.)</p> |
| |
| <a name="prefer_static" id="prefer_static"></a> |
| <h2>Prefer Static Over Virtual</h2> |
| |
| <p>If you don't need to access an object's fields, make your method static. It can |
| be called faster, because it doesn't require a virtual method table |
| indirection. It's also good practice, because you can tell from the method |
| signature that calling the method can't alter the object's state.</p> |
| |
| <a name="internal_get_set" id="internal_get_set"></a> |
| <h2>Avoid Internal Getters/Setters</h2> |
| |
| <p>In native languages like C++ it's common practice to use getters (e.g. |
| <code>i = getCount()</code>) instead of accessing the field directly (<code>i |
| = mCount</code>). This is an excellent habit for C++, because the compiler can |
| usually inline the access, and if you need to restrict or debug field access |
| you can add the code at any time.</p> |
| |
| <p>On Android, this is a bad idea. Virtual method calls are expensive, |
| much more so than instance field lookups. It's reasonable to follow |
| common object-oriented programming practices and have getters and setters |
| in the public interface, but within a class you should always access |
| fields directly.</p> |
| |
| <a name="cache_fields" id="cache_fields"></a> |
| <h2>Cache Field Lookups</h2> |
| |
| <p>Accessing object fields is much slower than accessing local variables. |
| Instead of writing:</p> |
| <pre>for (int i = 0; i < this.mCount; i++) |
| dumpItem(this.mItems[i]);</pre> |
| |
| <p>You should write:</p> |
| <pre> int count = this.mCount; |
| Item[] items = this.mItems; |
| |
| for (int i = 0; i < count; i++) |
| dumpItems(items[i]); |
| </pre> |
| |
| <p>(We're using an explicit "this" to make it clear that these are |
| member variables.)</p> |
| |
| <p>A similar guideline is never call a method in the second clause of a "for" |
| statement. For example, the following code will execute the getCount() method |
| once per iteration, which is a huge waste when you could have simply cached |
| the value as an int:</p> |
| |
| <pre>for (int i = 0; i < this.getCount(); i++) |
| dumpItems(this.getItem(i)); |
| </pre> |
| |
| <p>It's also usually a good idea to create a local variable if you're going to be |
| accessing an instance field more than once. For example:</p> |
| |
| <pre> |
| protected void drawHorizontalScrollBar(Canvas canvas, int width, int height) { |
| if (isHorizontalScrollBarEnabled()) { |
| int size = <strong>mScrollBar</strong>.getSize(<em>false</em>); |
| if (size <= 0) { |
| size = mScrollBarSize; |
| } |
| <strong>mScrollBar</strong>.setBounds(0, <em>height</em> - size, width, height); |
| <strong>mScrollBar</strong>.setParams( |
| computeHorizontalScrollRange(), |
| computeHorizontalScrollOffset(), |
| computeHorizontalScrollExtent(), <em>false</em>); |
| <strong>mScrollBar</strong>.draw(canvas); |
| } |
| }</pre> |
| |
| <p>That's four separate lookups of the member field <code>mScrollBar</code>. |
| By caching mScrollBar in a local stack variable, the four member field lookups |
| become four stack variable references, which are much more efficient.</p> |
| |
| <p>Incidentally, method arguments have the same performance characteristics |
| as local variables.</p> |
| |
| <a name="use_final" id="use_final"></a> |
| <h2>Declare Constants Final</h2> |
| |
| <p>Consider the following declaration at the top of a class:</p> |
| |
| <pre>static int intVal = 42; |
| static String strVal = "Hello, world!";</pre> |
| |
| <p>The compiler generates a class initializer method, called |
| <code><clinit></code>, that is executed when the class is first used. |
| The method stores the value 42 into <code>intVal</code>, and extracts a |
| reference from the classfile string constant table for <code>strVal</code>. |
| When these values are referenced later on, they are accessed with field |
| lookups.</p> |
| |
| <p>We can improve matters with the "final" keyword:</p> |
| |
| <pre>static final int intVal = 42; |
| static final String strVal = "Hello, world!";</pre> |
| |
| <p>The class no longer requires a <code><clinit></code> method, |
| because the constants go into classfile static field initializers, which are |
| handled directly by the VM. Code accessing <code>intVal</code> will use |
| the integer value 42 directly, and accesses to <code>strVal</code> will |
| use a relatively inexpensive "string constant" instruction instead of a |
| field lookup.</p> |
| |
| <p>Declaring a method or class "final" does not confer any immediate |
| performance benefits, but it does allow certain optimizations. For example, if |
| the compiler knows that a "getter" method can't be overridden by a sub-class, |
| it can inline the method call.</p> |
| |
| <p>You can also declare local variables final. However, this has no definitive |
| performance benefits. For local variables, only use "final" if it makes the |
| code clearer (or you have to, e.g. for use in an anonymous inner class).</p> |
| |
| <a name="foreach" id="foreach"></a> |
| <h2>Use Enhanced For Loop Syntax With Caution</h2> |
| |
| <p>The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface. |
| With these objects, an iterator is allocated to make interface calls |
| to hasNext() and next(). With an ArrayList, you're better off walking through |
| it directly, but for other collections the enhanced for loop syntax will be equivalent |
| to explicit iterator usage.</p> |
| |
| <p>Nevertheless, the following code shows an acceptable use of the enhanced for loop:</p> |
| |
| <pre>public class Foo { |
| int mSplat; |
| static Foo mArray[] = new Foo[27]; |
| |
| public static void zero() { |
| int sum = 0; |
| for (int i = 0; i < mArray.length; i++) { |
| sum += mArray[i].mSplat; |
| } |
| } |
| |
| public static void one() { |
| int sum = 0; |
| Foo[] localArray = mArray; |
| int len = localArray.length; |
| |
| for (int i = 0; i < len; i++) { |
| sum += localArray[i].mSplat; |
| } |
| } |
| |
| public static void two() { |
| int sum = 0; |
| for (Foo a: mArray) { |
| sum += a.mSplat; |
| } |
| } |
| }</pre> |
| |
| <p><strong>zero()</strong> retrieves the static field twice and gets the array |
| length once for every iteration through the loop.</p> |
| |
| <p><strong>one()</strong> pulls everything out into local variables, avoiding |
| the lookups.</p> |
| |
| <p><strong>two()</strong> uses the enhanced for loop syntax introduced in version 1.5 of |
| the Java programming language. The code generated by the compiler takes care |
| of copying the array reference and the array length to local variables, making |
| it a good choice for walking through all elements of an array. It does |
| generate an extra local load/store in the main loop (apparently preserving |
| "a"), making it a teensy bit slower and 4 bytes longer than one().</p> |
| |
| <p>To summarize all that a bit more clearly: enhanced for loop syntax performs well |
| with arrays, but be cautious when using it with Iterable objects since there is |
| additional object creation.</p> |
| |
| <a name="avoid_enums" id="avoid_enums"></a> |
| <h2>Avoid Enums</h2> |
| |
| <p>Enums are very convenient, but unfortunately can be painful when size |
| and speed matter. For example, this:</p> |
| |
| <pre>public class Foo { |
| public enum Shrubbery { GROUND, CRAWLING, HANGING } |
| }</pre> |
| |
| <p>turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the |
| class initializer invokes the <init> method on objects representing each |
| of the enumerated values. Each object gets its own static field, and the full |
| set is stored in an array (a static field called "$VALUES"). That's a lot of |
| code and data, just for three integers.</p> |
| |
| <p>This:</p> |
| |
| <pre>Shrubbery shrub = Shrubbery.GROUND;</pre> |
| |
| <p>causes a static field lookup. If "GROUND" were a static final int, |
| the compiler would treat it as a known constant and inline it.</p> |
| |
| <p>The flip side, of course, is that with enums you get nicer APIs and |
| some compile-time value checking. So, the usual trade-off applies: you should |
| by all means use enums for public APIs, but try to avoid them when performance |
| matters.</p> |
| |
| <p>In some circumstances it can be helpful to get enum integer values |
| through the <code>ordinal()</code> method. For example, replace:</p> |
| |
| <pre>for (int n = 0; n < list.size(); n++) { |
| if (list.items[n].e == MyEnum.VAL_X) |
| // do stuff 1 |
| else if (list.items[n].e == MyEnum.VAL_Y) |
| // do stuff 2 |
| }</pre> |
| |
| <p>with:</p> |
| |
| <pre> int valX = MyEnum.VAL_X.ordinal(); |
| int valY = MyEnum.VAL_Y.ordinal(); |
| int count = list.size(); |
| MyItem items = list.items(); |
| |
| for (int n = 0; n < count; n++) |
| { |
| int valItem = items[n].e.ordinal(); |
| |
| if (valItem == valX) |
| // do stuff 1 |
| else if (valItem == valY) |
| // do stuff 2 |
| }</pre> |
| |
| <p>In some cases, this will be faster, though this is not guaranteed.</p> |
| |
| <a name="package_inner" id="package_inner"></a> |
| <h2>Use Package Scope with Inner Classes</h2> |
| |
| <p>Consider the following class definition:</p> |
| |
| <pre>public class Foo { |
| private int mValue; |
| |
| public void run() { |
| Inner in = new Inner(); |
| mValue = 27; |
| in.stuff(); |
| } |
| |
| private void doStuff(int value) { |
| System.out.println("Value is " + value); |
| } |
| |
| private class Inner { |
| void stuff() { |
| Foo.this.doStuff(Foo.this.mValue); |
| } |
| } |
| }</pre> |
| |
| <p>The key things to note here are that we define an inner class (Foo$Inner) |
| that directly accesses a private method and a private instance field |
| in the outer class. This is legal, and the code prints "Value is 27" as |
| expected.</p> |
| |
| <p>The problem is that Foo$Inner is technically (behind the scenes) a totally |
| separate class, which makes direct access to Foo's private |
| members illegal. To bridge that gap, the compiler generates a |
| couple of synthetic methods:</p> |
| |
| <pre>/*package*/ static int Foo.access$100(Foo foo) { |
| return foo.mValue; |
| } |
| /*package*/ static void Foo.access$200(Foo foo, int value) { |
| foo.doStuff(value); |
| }</pre> |
| |
| <p>The inner-class code calls these static methods whenever it needs to |
| access the "mValue" field or invoke the "doStuff" method in the outer |
| class. What this means is that the code above really boils down to a case |
| where you're accessing member fields through accessor methods instead of |
| directly. Earlier we talked about how accessors are slower than direct field |
| accesses, so this is an example of a certain language idiom resulting in an |
| "invisible" performance hit.</p> |
| |
| <p>We can avoid this problem by declaring fields and methods accessed |
| by inner classes to have package scope, rather than private scope. |
| This runs faster and removes the overhead of the generated methods. |
| (Unfortunately it also means the fields could be accessed directly by other |
| classes in the same package, which runs counter to the standard OO |
| practice of making all fields private. Once again, if you're |
| designing a public API you might want to carefully consider using this |
| optimization.)</p> |
| |
| <a name="avoidfloat" id="avoidfloat"></a> |
| <h2>Avoid Float</h2> |
| |
| <p>Before the release of the Pentium CPU, it was common for game authors to do |
| as much as possible with integer math. With the Pentium, the floating point |
| math co-processor became a built-in feature, and by interleaving integer and |
| floating-point operations your game would actually go faster than it would |
| with purely integer math. The common practice on desktop systems is to use |
| floating point freely.</p> |
| |
| <p>Unfortunately, embedded processors frequently do not have hardware floating |
| point support, so all operations on "float" and "double" are performed in |
| software. Some basic floating point operations can take on the order of a |
| millisecond to complete.</p> |
| |
| <p>Also, even for integers, some chips have hardware multiply but lack |
| hardware divide. In such cases, integer division and modulus operations are |
| performed in software — something to think about if you're designing a |
| hash table or doing lots of math.</p> |
| |
| <a name="samples" id="samples"></a> |
| <h2>Some Sample Performance Numbers</h2> |
| |
| <p>To illustrate some of our ideas, here is a table listing the approximate |
| run times for a few basic actions. Note that these values should NOT be taken |
| as absolute numbers: they are a combination of CPU and wall clock time, and |
| will change as improvements are made to the system. However, it is worth |
| noting how these values apply relative to each other — for example, |
| adding a member variable currently takes about four times as long as adding a |
| local variable.</p> |
| |
| <table> |
| <tr> |
| <th>Action</th> |
| <th>Time</th> |
| </tr> |
| <tr> |
| <td>Add a local variable </td> |
| <td>1</td> |
| </tr> |
| <tr class="alt"> |
| <td>Add a member variable </td> |
| <td>4</td> |
| </tr> |
| <tr> |
| <td>Call String.length()</td> |
| <td>5</td> |
| </tr> |
| <tr class="alt"> |
| <td>Call empty static native method</td> |
| <td>5</td> |
| </tr> |
| <tr> |
| <td>Call empty static method </td> |
| <td>12</td> |
| </tr> |
| <tr class="alt"> |
| <td>Call empty virtual method </td> |
| <td>12.5</td> |
| </tr> |
| <tr> |
| <td>Call empty interface method </td> |
| <td>15</td> |
| </tr> |
| <tr class="alt"> |
| <td>Call Iterator:next() on a HashMap </td> |
| <td>165</td> |
| </tr> |
| <tr> |
| <td>Call put() on a HashMap</td> |
| <td>600</td> |
| </tr> |
| <tr class="alt"> |
| <td>Inflate 1 View from XML </td> |
| <td>22,000</td> |
| </tr> |
| <tr> |
| <td>Inflate 1 LinearLayout containing 1 TextView </td> |
| <td>25,000</td> |
| </tr> |
| <tr class="alt"> |
| <td>Inflate 1 LinearLayout containing 6 View objects </td> |
| <td>100,000</td> |
| </tr> |
| <tr> |
| <td>Inflate 1 LinearLayout containing 6 TextView objects </td> |
| <td>135,000</td> |
| </tr> |
| <tr class="alt"> |
| <td>Launch an empty activity </td> |
| <td>3,000,000</td> |
| </tr> |
| </table> |
| |
| <a name="closing_notes" id="closing_notes"></a> |
| <h2>Closing Notes</h2> |
| |
| <p>The best way to write good, efficient code for embedded systems is to |
| understand what the code you write really does. If you really want to allocate |
| an iterator, by all means use enhanced for loop syntax on a List; just make it a |
| deliberate choice, not an inadvertent side effect.</p> |
| |
| <p>Forewarned is forearmed! Know what you're getting into! Insert your |
| favorite maxim here, but always think carefully about what your code is doing, |
| and be on the lookout for ways to speed it up.</p> |