commit | e1b3c47620254a7ea7bb299e34564cc5f8724230 | [log] [tgz] |
---|---|---|
author | Lalit Maganti <lalitm@google.com> | Tue Mar 12 23:07:42 2019 +0000 |
committer | Lalit Maganti <lalitm@google.com> | Tue Mar 12 23:07:42 2019 +0000 |
tree | 359d119bbb1ddd8eae093eec9be0a1f1dffb3e58 | |
parent | 48092a436b70f70b0ee1f3da28e65e0c8cb5dd36 [diff] |
trace_processor: improve query performance involving filter operations This change makes 3 changes which improve query performance on queries which are very frequently performed by the UI: 1) Changes from always writing into the bitvector when filtering in all rows mode to only writing when the query returns true, This is important because the cost of indexing into the bitvector cost is more than the cost of a branch. This is especially as when the first constraint to filter is likely going to decrease the dataset a lot (think a constraint on cpu or on utid). 2) Change predicate from using std::function to using a Functor. The virtual dispatch of std::function and the possibility of memory allocation (though unlikely) was causing large slowdowns. As this predicate is called in hot loops, we want this code to be inlined as much as possible - this is less than the cost of the switch. 3) Force the row predicate lambdas to be always inlined. This is really important as not inlining doubles the cost of these functions and generally lambdas are expected to be inlined. Because the filter switch is more than 6 branches (the magic number under which inlining seems to happen on Clang, this function was not getting inlined). The net result of these changes yields the following perf numbers on trace from b/124495829: Query 1: select ts, lead(ts) over (partition by ref_type order by ts) as ts_end, value from counters where name = 'SwapCached' and ref = 0 Old code: 107.088 ms New code: 57.127 ms (1.87x speedup) Query 2: create view mem_rss as select *, lead(ts) over (order by ts) - ts as dur from counters where name="mem.rss.file" and ref=10 and ref_type="upid" create virtual table span_49 USING span_join(mem_rss, window) select ts, dur, value from span_49 Old code: 114.136 ms New code: 67.882 ms (1.68x speedup) Change-Id: Ic81b3f711a2f9b28c9fb35c0cf0c624bae98da92
Perfetto is an open-source project for performance instrumentation and tracing of Linux/Android/Chrome platforms and user-space apps.
See www.perfetto.dev for docs.