Improve hprof performance: dominators and retained sizes.

For the benchmark 25Mb hprof, dominator computation now takes 0.7s instead of
5.1s. Combined, parsing the hprof and computing retained sizes:
- was 6s, and 268Mb in the heap
- is 1.8s, ad 133Mb in the heap.

Changes included in this CL:
- eliminate Long boxing for retained sizes.
- compute dominators on top of the object model itself.
- prune more aggresively the affected nodes in each iteration: once a node's
 immediate dominator is SENTINEL_ROOT, no further updates are possible.
- eliminate the ClassObj lookup for Instance#getSize().

The added test showcases the worst-case behavior of our quadratic algorithm: the
doubly-linked list results in multiple rounds of refining the dominator tree. This
CL therefore sets the stage for the next optimization, to incrementalize
 Dominators#computeDominators such that it never takes an unbounded amount of time.

Change-Id: I008aaf4723dcb1f1f238a8380b541aa96762170f
8 files changed