Petr Machata | 65af2a5 | 2012-10-23 18:12:58 +0200 | [diff] [blame] | 1 | -*-org-*- |
| 2 | * TODO |
Petr Machata | 0ffc085 | 2013-11-06 18:35:56 +0100 | [diff] [blame] | 3 | ** Keep exit code of traced process |
| 4 | See https://bugzilla.redhat.com/show_bug.cgi?id=105371 for details. |
| 5 | |
Petr Machata | 65af2a5 | 2012-10-23 18:12:58 +0200 | [diff] [blame] | 6 | ** Automatic prototype discovery: |
| 7 | *** Use debuginfo if available |
| 8 | Alternatively, use debuginfo to generate configure file. |
Petr Machata | 2699445 | 2013-09-19 23:43:50 +0200 | [diff] [blame] | 9 | *** Mangled identifiers contain partial prototypes themselves |
| 10 | They don't contain return type info, which can change the |
| 11 | parameter passing convention. We could use it and hope for the |
Petr Machata | cecec2e | 2013-11-05 02:21:18 +0100 | [diff] [blame] | 12 | best. Also they don't include the potentially present hidden this |
| 13 | pointer. |
Petr Machata | 65af2a5 | 2012-10-23 18:12:58 +0200 | [diff] [blame] | 14 | ** Automatically update list of syscalls? |
Petr Machata | 65af2a5 | 2012-10-23 18:12:58 +0200 | [diff] [blame] | 15 | ** More operating systems (solaris?) |
| 16 | ** Get rid of EVENT_ARCH_SYSCALL and EVENT_ARCH_SYSRET |
| 17 | ** Implement displaced tracing |
| 18 | A technique used in GDB (and in uprobes, I believe), whereby the |
| 19 | instruction under breakpoint is moved somewhere else, and followed |
| 20 | by a jump back to original place. When the breakpoint hits, the IP |
| 21 | is moved to the displaced instruction, and the process is |
| 22 | continued. We avoid all the fuss with singlestepping and |
| 23 | reenablement. |
| 24 | ** Create different ltrace processes to trace different children |
Petr Machata | 834844a | 2012-10-25 03:39:08 +0200 | [diff] [blame] | 25 | ** Config file syntax |
Petr Machata | 2699445 | 2013-09-19 23:43:50 +0200 | [diff] [blame] | 26 | *** mark some symbols as exported |
| 27 | For PLT hits, only exported prototypes would be considered. For |
| 28 | symtab entry point hits, all would be. |
| 29 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 30 | *** named arguments |
| 31 | This would be useful for replacing the arg1, emt2 etc. |
| 32 | |
| 33 | *** parameter pack improvements |
| 34 | The above format tweaks require that packs that expand to no types |
| 35 | at all be supported. If this works, then it should be relatively |
| 36 | painless to implement conditionals: |
| 37 | |
| 38 | | void ptrace(REQ=enum(PTRACE_TRACEME=0,...), |
| 39 | | if[REQ==0](pack(),pack(pid_t, void*, void *))) |
| 40 | |
| 41 | This is of course dangerously close to a programming language, and |
| 42 | I think ltrace should be careful to stay as simple as possible. |
| 43 | (We can hook into Lua, or TinyScheme, or some such if we want more |
| 44 | general scripting capabilities. Implementing something ad-hoc is |
| 45 | undesirable.) But the above can be nicely expressed by pattern |
| 46 | matching: |
| 47 | |
| 48 | | void ptrace(REQ=enum[int](...)): |
| 49 | | [REQ==0] => () |
| 50 | | [REQ==1 or REQ==2] => (pid_t, void*) |
| 51 | | [true] => (pid_t, void*, void*); |
| 52 | |
| 53 | Or: |
| 54 | |
| 55 | | int open(string, FLAGS=flags[int](O_RDONLY=00,...,O_CREAT=0100,...)): |
| 56 | | [(FLAGS & 0100) != 0] => (flags[int](S_IRWXU,...)) |
| 57 | |
| 58 | This would still require pretty complete expression evaluation. |
| 59 | _Including_ pointer dereferences and such. And e.g. in accept, we |
| 60 | need subtraction: |
| 61 | |
| 62 | | int accept(int, +struct(short, +array(hex(char), X-2))*, (X=uint)*); |
| 63 | |
| 64 | Perhaps we should hook to something after all. |
| 65 | |
Petr Machata | 8eacf65 | 2013-10-24 10:35:54 +0200 | [diff] [blame] | 66 | *** system call error returns |
| 67 | |
| 68 | This is closely related to above. Take the following syscall |
| 69 | prototype: |
| 70 | |
| 71 | | long read(int,+string0,ulong); |
| 72 | |
| 73 | string0 means the same as string(array(char, zero(retval))*). But |
| 74 | if read returns a negative value, that signifies errno. But zero |
| 75 | takes this at face value and is suspicious: |
| 76 | |
| 77 | | read@SYS(3 <no return ...> |
| 78 | | error: maximum array length seems negative |
| 79 | | , "\n\003\224\003\n", 4096) = -11 |
| 80 | |
| 81 | Ideally we would do what strace does, e.g.: |
| 82 | |
| 83 | | read@SYS(3, 0x12345678, 4096) = -EAGAIN |
| 84 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 85 | *** errno tracking |
| 86 | Some calls result in setting errno. Somehow mark those, and on |
Petr Machata | 8eacf65 | 2013-10-24 10:35:54 +0200 | [diff] [blame] | 87 | failure, show errno. System calls return errno as a negative |
| 88 | value (see the previous point). |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 89 | |
| 90 | *** second conversions? |
| 91 | This definitely calls for some general scripting. The goal is to |
| 92 | have seconds in adjtimex calls show as e.g. 10s, 1m15s or some |
| 93 | such. |
| 94 | |
Petr Machata | 834844a | 2012-10-25 03:39:08 +0200 | [diff] [blame] | 95 | *** format should take arguments like string does |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 96 | Format should take value argument describing the value that should |
| 97 | be analyzed. The following overwriting rules would then apply: |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 98 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 99 | | format | format(array(char, zero)*) | |
| 100 | | format(LENS) | X=LENS, format[X] | |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 101 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 102 | The latter expanded form would be canonical. |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 103 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 104 | This depends on named arguments and parameter pack improvements |
| 105 | (we need to be able to construct parameter packs that expand to |
| 106 | nothing). |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 107 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 108 | *** More fine-tuned control of right arguments |
| 109 | Combination of named arguments and some extensions could take care |
| 110 | of that: |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 111 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 112 | | void func(X=hide(int*), long*, +pack(X)); | |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 113 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 114 | This would show long* as input argument (i.e. the function could |
| 115 | mangle it), and later show the pre-fetched X. The "pack" syntax is |
| 116 | utterly undeveloped as of now. The general idea is to produce |
| 117 | arguments that expand to some mix of types and values. But maybe |
| 118 | all we need is something like |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 119 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 120 | | void func(out int*, long*); | |
| 121 | |
| 122 | ltrace would know that out/inout/in arguments are given in the |
| 123 | right order, but left pass should display in and inout arguments |
| 124 | only, and right pass then out and inout. + would be |
| 125 | backward-compatible syntactic sugar, expanded like so: |
| 126 | |
| 127 | | void func(int*, int*, +long*, long*); | |
| 128 | | void func(in int*, in int*, out long*, out long*); | |
| 129 | |
Petr Machata | c00837c | 2013-11-11 02:24:42 +0100 | [diff] [blame] | 130 | This is useful in particular for: |
| 131 | |
Petr Machata | 6e570e5 | 2013-11-11 19:33:37 +0100 | [diff] [blame] | 132 | | ulong mbsrtowcs(+wstring3_t, string*, ulong, addr); | |
| 133 | | ulong wcsrtombs(+string3, wstring_t*, ulong, addr); | |
Petr Machata | c00837c | 2013-11-11 02:24:42 +0100 | [diff] [blame] | 134 | |
| 135 | Where we would like to render arg2 on the way in, and arg1 on the |
| 136 | way out. |
| 137 | |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 138 | But sometimes we may want to see a different type on the way in and |
| 139 | on the way out. E.g. in asprintf, what's interesting on the way in |
| 140 | is the address, but on the way out we want to see buffer contents. |
| 141 | Does something like the following make sense? |
| 142 | |
| 143 | | void func(X=void*, long*, out string(X)); | |
Petr Machata | 9daea45 | 2012-10-26 02:08:08 +0200 | [diff] [blame] | 144 | |
Petr Machata | 3eb3228 | 2012-11-02 02:56:58 +0100 | [diff] [blame] | 145 | ** Support for functions that never return |
| 146 | This would be useful for __cxa_throw, presumably also for longjmp |
| 147 | (do we handle that at all?) and perhaps a handful of others. |
| 148 | |
| 149 | ** Support flag fields |
| 150 | enum-like syntax, except disjunction of several values is assumed. |
Petr Machata | 4d73ff5 | 2012-11-09 20:02:03 +0100 | [diff] [blame] | 151 | ** Support long long |
| 152 | We currently can't define time_t on 32bit machines. That mean we |
| 153 | can't describe a range of time-related functions. |
Petr Machata | 3eb3228 | 2012-11-02 02:56:58 +0100 | [diff] [blame] | 154 | |
Petr Machata | 61b4c49 | 2012-11-18 21:54:54 +0100 | [diff] [blame] | 155 | ** Support signed char, unsigned char, char |
| 156 | Also, don't format it as characted by default, string lens can do |
| 157 | it. Perhaps introduce byte and ubyte and leave 'char' as alias of |
| 158 | one of those with string lens applied by default. |
| 159 | |
| 160 | ** Support fixed-width types |
| 161 | Really we should keep everything as {u,}int{8,16,32,64} internally, |
| 162 | and have long, short and others be translated to one of those |
| 163 | according to architecture rules. Maybe this could be achieved by a |
| 164 | per-arch config file with typedefs such as: |
| 165 | |
Petr Machata | 2699445 | 2013-09-19 23:43:50 +0200 | [diff] [blame] | 166 | | typedef ulong = uint8_t; | |
Petr Machata | 61b4c49 | 2012-11-18 21:54:54 +0100 | [diff] [blame] | 167 | |
Petr Machata | 5aca651 | 2013-09-26 14:03:14 +0200 | [diff] [blame] | 168 | ** Support for ARM/AARCH64 types |
| 169 | - ARM and AARCH64 both support half-precision floating point |
| 170 | - there are two different half-precision formats, IEEE 754-2008 |
| 171 | and "alternative". Both have 10 bits of mantissa and 5 bits of |
| 172 | exponent, and differ only in how exponent==0x1F is handled. In |
| 173 | IEEE format, we get NaN's and infinities; in alternative |
| 174 | format, this encodes normalized value -1S × 2¹⁶ × (1.mant) |
| 175 | - The Floating-Point Control Register, FPCR, controls: — The |
| 176 | half-precision format where applicable, FPCR.AHP bit. |
| 177 | - AARCH64 supports fixed-point interpretation of {,double}words |
| 178 | - e.g. fixed(int, X) (int interpreted as a decimal number with X |
| 179 | binary digits of fraction). |
| 180 | - AARCH64 supports 128-bit quad words in SIMD |
| 181 | |
Petr Machata | f197727 | 2012-11-29 15:49:16 +0100 | [diff] [blame] | 182 | ** Some more functions in vect might be made to take const* |
| 183 | Or even marked __attribute__((pure)). |
Petr Machata | 7467b94 | 2012-11-20 02:36:56 +0100 | [diff] [blame] | 184 | |
Petr Machata | 2699445 | 2013-09-19 23:43:50 +0200 | [diff] [blame] | 185 | ** pretty printer support |
| 186 | GDB supports python pretty printers. We migh want to hook this in |
| 187 | and use it to format certain types. |
| 188 | |
Petr Machata | 0f6f30c | 2014-01-07 11:57:36 +0100 | [diff] [blame] | 189 | ** support new Linux kernel features |
| 190 | - PTRACE_SIEZE |
| 191 | - /proc/PID/map_files/* (but only root seems to be able to read |
| 192 | this as of now) |
| 193 | |
Petr Machata | 65af2a5 | 2012-10-23 18:12:58 +0200 | [diff] [blame] | 194 | * BUGS |
| 195 | ** After a clone(), syscalls may be seen as sysrets in s390 (see trace.c:syscall_p()) |