Mike Dodd | 8cfa702 | 2010-11-17 11:12:26 -0800 | [diff] [blame] | 1 | This is an (incomplete) list of some of the stuff we want to look at doing. |
| 2 | |
| 3 | If you're interested in hacking on any of these, please contact the list first |
| 4 | for some pointers and/or read HACKING and doc/CodingStyle. |
| 5 | |
| 6 | 1.0 release |
| 7 | ----------- |
| 8 | |
| 9 | (this is a minimal selection of stuff I think we need) |
| 10 | |
| 11 | o default to a vmlinux location: need agreement from kernel developers |
| 12 | o default to --separate=library (with anon, =none, makes not much sense) |
| 13 | o prettify image name for .jo files and allow lib-image: to specify it |
| 14 | o gisle's fixes |
| 15 | o opreport tgid:<tgid> doesn't work even if .jo files with that pid |
| 16 | o Fix: |
| 17 | |
| 18 | warning: [vdso] (tgid:9236 range:0x7fff98ffd000-0x7fff98fff000) could not be found. |
| 19 | warning: /no-vmlinux could not be found. |
| 20 | warning: /usr/lib64/libpanel-applet-2.so.0.2.27.#prelink#.sXCUK1 (deleted) could not be found. |
| 21 | |
| 22 | o amd64 32 bit build needs a sys32_lookup_dcookie() translator in the |
| 23 | kernel |
| 24 | o decide on -m tgid semantics for anon regions |
| 25 | o if ev67 is not fixed, back it out |
| 26 | o lapic : module should says "didn't find apic" if needed, FAQ and doc should |
| 27 | speak a bit about lapic kernel option on x86 and recent kernel |
| 28 | o see the big comment in db_insert.c, it's possible to allow unlimited |
| 29 | amount of samples with a very minor change in libdb. |
| 30 | o if oprofile doesn't recognize the processor selected by the kernel |
| 31 | opcontrol could setup the module in timer mode (remove/reload prolly), and |
| 32 | warn the user it must upgrade oprofile to get all the feature from its |
| 33 | hardware. |
| 34 | |
| 35 | Later |
| 36 | ----- |
| 37 | |
| 38 | o remove 2.95/2.2 support so we can use boost multi index container in |
| 39 | symbol/sample container |
| 40 | o consider if we can improve anon mapping growing support |
| 41 | |
| 42 | <movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so /bin/bash | grep vfprintf |
| 43 | <movement> 14 0.1301 6 0.0102 /lib/tls/libc-2.3.2.so vfprintf |
| 44 | <movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so /usr/bin/vim | grep vfprintf |
| 45 | <movement> 176 2.0927 349 1.2552 /lib/tls/libc-2.3.2.so vfprintf |
| 46 | <movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so { image:/bin/bash } { image:/usr/bin/vim } | grep vfprintf |
| 47 | <movement> 176 10.9657 +++ 349 7.8888 +++ vfprintf |
| 48 | <movement> 14 --- --- 6 --- --- vfprintf |
| 49 | <movement> it seems them as two separate symbols |
| 50 | <movement> but can we remove the app_name from rough_less and still be able to walk the two lists? |
| 51 | <movement> even if we could, it would still go wrong when we're profiling multiple apps |
| 52 | |
| 53 | o Java stuff?? |
| 54 | o with opreport -c I can get "warning: /no-vmlinux could not be found.". |
| 55 | Should be smarter ? |
| 56 | o opreport -c gives weird output for an image with no symbols: |
| 57 | |
| 58 | samples % symbol name |
| 59 | 15965 100.000 (no symbols) |
| 60 | 253 100.000 (no symbols) |
| 61 | 15965 98.4400 (no symbols) |
| 62 | 253 1.5600 (no symbols) [self] |
| 63 | |
| 64 | o consider tagging opreport -c entries with a number like gprof |
| 65 | o --details for opreport -c, or diff?? |
| 66 | o should [self] entries be ommitted if 0 ?? |
| 67 | o stress test opreport -c: compile a Big Application w/o frame pointer and look |
| 68 | how driver and opreport -c react. |
| 69 | o oparchive could fix up {kern} paths with -p (what about diff between |
| 70 | archive and current though?) |
| 71 | o can say more in opcontrol --status |
| 72 | o consider a sort option for diff % |
| 73 | o opannotate is silent about symbols missing debug info |
| 74 | o oprofiled.log now contains various statistics about lost sample etc. from |
| 75 | the driver. Post profile tools must parse that and warn eventually, warning |
| 76 | must include a proposed work around. User need this: if nothing seems wrong |
| 77 | people are unlikely to get a look in oprofiled.log (I ran oprofile on 2.6.1 |
| 78 | 2 weeks before noticing at 30000 I lost a lot of samples, the profile seemed |
| 79 | ok du to the randomization of lost samples). As developper we need that too, |
| 80 | actually we have no clear idea of the behavior on different arch, NUMA etc. |
| 81 | Not perfect because if the profiler is running the oprofiled.log will show |
| 82 | those warning only after the first alarm signal, I think we must dump the |
| 83 | statistics information after each opcontrol --dump to avoid that. |
| 84 | o odb_insert() can fail on ftruncate or mremap() in db_manage.c but we don't |
| 85 | try to recover gracefully. |
| 86 | o output column shortname headers for opreport -l |
| 87 | o is relative_to_absolute_path guaranteeing a trailing '/' documented ? |
| 88 | o move oprofiled.log to OP_SAMPLE_DIR/current ? |
| 89 | o pp tools must handle samples count overflow (marked as (unsigned)-1) |
| 90 | o the way we show kernel modules in 2.5 is not very obvious - "/oprofile" |
| 91 | o oparchive will be more usefull with a --root= options to allow profiling |
| 92 | on a small box, nfs mount / to another box and transfer sample file and |
| 93 | binary on a bigger box for analysis. There is also a problem in oparchive |
| 94 | you can use session: to get the right path to samples files but oprofiled.log |
| 95 | and abi files path are hardcoded to /var/lib/oprofile. |
| 96 | o callgraph patch: better way to skip ignored backtrace ? |
| 97 | o lib-image: and image: behavior depend on --separate=, if --separate=library |
| 98 | opreport "lib-image:*libc*" --merge=lib works but not |
| 99 | opreport "image:*libc*" --merge=lib whilst the behavior is reversed if |
| 100 | --separate==none. Must we take care ? |
| 101 | o dependencies between profile_container.h symbol_container.h and |
| 102 | sample_container.h become more and more ugly, I needed to include them |
| 103 | in a specific order in some source (still true??) |
| 104 | o add event aliases for common things like icache misses, we must start to |
| 105 | think about metrics including simple like event alias mapped to two or more |
| 106 | events and intepreted specially by user space tools like using the ratio |
| 107 | of samples; more tricky will be to select an event used as call count (no |
| 108 | cg on it) and used to emulate the call count field in gprof. I think this is |
| 109 | a after 1.0 thing but event aliases must be specified in a way allowing such |
| 110 | extension |
| 111 | o do we need an opreport like opreport -c (showing caller/callee at binary |
| 112 | boundary not symbols) ? |
| 113 | o we should notice an opcontrol config change (--separate etc.) and |
| 114 | auto-restart the daemon if necessary (Run) |
| 115 | o we can add lots more unit tests yet |
| 116 | o Itanium event constraints are not implemented |
| 117 | o GUI still has a physical-counter interface, should have a general one |
| 118 | like opcontrol --event |
| 119 | o I think we should have the ability to have *fixed* width headers, e.g. : |
| 120 | |
| 121 | vma samples cum. samples % cum. % symbol name image name app name |
| 122 | 0804c350 64582 64582 35.0757 35.0757 odb_insert /usr/loc...in/oprofiled /usr/local/oprofile-pp/bin/oprofiled |
| 123 | |
| 124 | Note the ellipsis |
| 125 | o should we make the sighup handler re-read counter config and re-start profiling too ? |
| 126 | o improve --smart-demangle |
| 127 | o allow user to add it's own pattern in user.pat, document it. |
| 128 | o hard code ${typename} regular definition to remove all current limitations (difficult, perhaps after 1.0 ?). |
| 129 | o oprof_start dialog size is too small initially |
| 130 | o i18n. We need a good formatter, and also remember format_percent() |
| 131 | o opannotate --source --output-dir=~moz/op/ /usr/bin/oprofiled |
| 132 | will fail because the ~ is not expanded (no space around it) (popt bug I say) |
| 133 | o cpu names instead of numbers in 2.4 module/ ? |
| 134 | o remove 1 and 2 magic numbers for oprof_ready |
| 135 | o adapt Anton's patch for handling non-symbolled libraries ? (nowaday C++ |
| 136 | anon namespace symbol are static, 3.4 iirc, so with recent distro we are |
| 137 | more likely to get problems with a "fallback to dynamic symbols" approch) |
| 138 | o use standard C integer type <stdint.h> int32_t int16_t etc. |
| 139 | o event multiplexing for real |
| 140 | o randomizing of reset value |
| 141 | o XML output |
| 142 | o profile the NMI handler code |
| 143 | o opannotate : I added this to the doc about difference between nr samples |
| 144 | credited to a source function and total number of samples for this function: |
| 145 | "The missing samples are not lost, they will be credited to another source |
| 146 | location where the inlined function is defined. The inlined function will |
| 147 | be credited from multiple call site and merged in one place in the |
| 148 | annotated source file so there is no way to see from what call site are |
| 149 | coming the samples for an inlined function." |
| 150 | I think we can work around this: output multiple instances of inlined |
| 151 | function like : |
| 152 | inline foo() { foo: total 1500 30.00 ... |
| 153 | ... annotated source from all call site |
| 154 | inline foo() { foo (call site bar()): total 500 10.00 |
| 155 | .. annotated source from call site bar() etc. |
| 156 | what about template..., can we do/must we do something like that |
| 157 | template <class T> eat_cpu() and do a similar things, merging and annotating |
| 158 | all instantation then annotating for each distinct instantation, this will |
| 159 | break our "keep the source line number in annotated source file identical to |
| 160 | the original source" |
| 161 | o events/mips/34k/events, some events does not make sense, they get identical |
| 162 | event number, um and counter nr so they overlap, currently commented |
| 163 | o can we find a more efficient implementation for sparse_array ? |
| 164 | o libpp/profile.cpp:is_spu_sample_file() can be simplified by using |
| 165 | read_header() |
| 166 | o while fixing #1819350 I needed to make extra_images per profile session |
| 167 | rather than a global var so I think we need to revisit find_image_path(), |
| 168 | extra_found_images, --image-path (-p). |
| 169 | Actually we can't do something ala: |
| 170 | opreport { archive:tmp1 search_path=/lib/modules/2.6.20 } { archive:tmp2 search_path=/.../2.6.20.9 } |
| 171 | because search_path is specified through -p which is not a part of the |
| 172 | profile spec. Fixing #1819350 covered all case except this one but w/o any |
| 173 | user visible change. Another way will be to save the -p option used with |
| 174 | oparchive in a file at the toplevel of the archive, use it with all tools |
| 175 | when an archive: is specified on the command line and deprecate the use of |
| 176 | -p in such case. |
| 177 | o consider to make extra_images a ref counted object, it's copied by value |
| 178 | a few time but can contain a lot of string. There is also some ugly public |
| 179 | member extra_images to fix. |
| 180 | o daemon bss size can be improved, grep for MAX_PATH to see where dynamic |
| 181 | allocation can be used, try $ nm oprofiled --size-sort too. |
| 182 | |
| 183 | Documentation |
| 184 | ------------- |
| 185 | |
| 186 | o the docs should mention the default event for each arch somewhere |
| 187 | o more discussion of problematic code needs to go in the "interpreting" section. |
| 188 | o document gcc 2.95 and linenr info problems especially for inline functions |
| 189 | o finish the internals manual |
| 190 | |
| 191 | JIT support |
| 192 | ----------- |
| 193 | |
| 194 | o We need a more dynamic structure to handle entries_address_ascending and |
| 195 | entries_symbols_ascending, actually many scaling problem occur because they |
| 196 | are array, this was perfect to get a first implementation focusing on |
| 197 | handling overlap and all but the need to qsort/copy arrays at each iteration |
| 198 | is a performance killer. Some sort of AVL tree will do the job. |
| 199 | o Related to the previous, it's possible to do all processing in opjitconv.c |
| 200 | in a single left to right walk of the jitentry list. |
| 201 | o see the FIXME at parse_dump.c:parse_code_unload() |
| 202 | o Increment JITHEADER_VERSION in jitdump.h to be sure that the new code only |
| 203 | accepts dump file created by the new code. |
| 204 | o opjitconv.c:replacement_name() should be enough clever to avoid name |
| 205 | collision so we can remove the recursive call to disambiguate_symbol_names(), |
| 206 | need a hash table or some sort of associative array to check quickly if a |
| 207 | name exists, we will need some sort of avl tree so it's probably better |
| 208 | to do not implement a hash table only for this purpose. |
| 209 | o op_write_native_code() must accept one more parameter, the real code size |
| 210 | which can be zero or equal to code_size, this will allow to create elf |
| 211 | file w/o any code contents, only a symbol table and .text sections w/o |
| 212 | contents (yes ELF format allow that). For dynamic binary translation it'll |
| 213 | avoid to dump tons of code for little use, opannotate --assembly will not |
| 214 | work on such elf file but it can be a real win. It'll need to add to |
| 215 | jitrecord0 a real_size field, and some trickery when building the elf file, |
| 216 | taking care about the case we mix zero code size with non zero code size. |
| 217 | Perhaps we can use it too for java, filtering native method etc. Actually |
| 218 | we allow a simplified form of this feature by allowing to disable/enable |
| 219 | code dumping but at the whole dump level not on a symbol basis, quite |
| 220 | possible sufficient. [mpj: We're backing away from the idea of dumping |
| 221 | JIT records without code. Since BFD asymbol type does not include symbol size, |
| 222 | the op_bfd technique for determining symbol size relies on knowing the true |
| 223 | file size; and if code is not included in the .jo file, we don't have true size.] |
| 224 | o The pipe used for triggering JIT dump conversion should be used for normal |
| 225 | dumping too. |
| 226 | o See FIXME in agents/jvmti/libjvmti_oprofile.c: |
| 227 | If enablement to get line number info would be configurable through command line, |
| 228 | what should be the default on/off? |
| 229 | o See FIXME in opjitconv/debug_line.c |
| 230 | o The way to use the pipe should be made more secure to avoid denial of service |
| 231 | attacks. We have to think about it. |
| 232 | o Callgraph does not work properly for the .jo files the JIT support creates. |
| 233 | See section Chapter 4, sect 2.3.2 "Callgraph and JIT support". Try to figure |
| 234 | out a way to correlate an anonymous sample callgraph entry with |
| 235 | the .jo file that may exist for the anonymous code. |
| 236 | o see mail from Gisle Dankel: |
| 237 | "JIT_SUPPORT: Adding support for file-backed non-ELF JIT code" |
| 238 | -> should be changed (if useful) before next release |
| 239 | o See FIXME in op_header.cpp: |
| 240 | The check for header.mtime of JIT sample files is not correct because currently |
| 241 | this mtime value is set to zero due to missing cookie setting for JIT sample files. |
| 242 | Some additional check/setting to header.mtime should be made for JIT sample files. |
| 243 | o Mono JIT support: |
| 244 | |
| 245 | 2007-11-08: with callgraph massi got |
| 246 | <massi> oparchive error: parse_filename() invalid filename: /var/lib/oprofile/samples/current/{root}/var/lib/oprofile/samples/current/{root}/home/massi/mono/amd64/bin/mono/{dep}/{anon:anon}/32432.0x40a26000.0x40a36000/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/home/massi/mono/amd64/bin/mono/{dep}/{anon:anon}/32432.0x40a26000.0x40a36000/CPU_CLK_UNHALTED.100000.0.all.all.all/{cg}/{root}/usr/oprofile/bin/oprofiled/CPU_CLK_ |
| 247 | |
| 248 | Massi added Mono JIT support, code on the stack is never unloaded and there is |
| 249 | no byte code, code is always compiled to native machine code, this mean than |
| 250 | for mono at least we can do callgraph if we can fix this samples filename |
| 251 | problem. |
| 252 | |
| 253 | General checks to make |
| 254 | ---------------------- |
| 255 | |
| 256 | o rgrep FIXME |
| 257 | o valgrind (--show-reachable=yes --leak-check=yes) |
| 258 | o audit to track unnecessary include <> |
| 259 | o gcc 3.0/3.x compile |
| 260 | o Qt2/3 check, no Qt check |
| 261 | o verify builds (modversions, kernel versions, athlon etc.). I have the |
| 262 | necessary stuff to check kernel versions/configurations on PIII core (Phil) |
| 263 | o use nm and a little script to track unused function |
| 264 | o test it to hell and back |
| 265 | o compile all C++ programs with STL_port and test them (gcc 3.4 contain a |
| 266 | debug mode too but std::string iterator are not checked) |
| 267 | o There is probably place of post profile tools where looking at errno will give better error messages. |
| 268 | |