Peter Collingbourne | 2eeed71 | 2013-08-07 22:47:34 +0000 | [diff] [blame] | 1 | ================= |
| 2 | DataFlowSanitizer |
| 3 | ================= |
| 4 | |
Peter Collingbourne | 5d27a51 | 2013-08-14 18:54:18 +0000 | [diff] [blame] | 5 | .. toctree:: |
| 6 | :hidden: |
| 7 | |
| 8 | DataFlowSanitizerDesign |
| 9 | |
Peter Collingbourne | 2eeed71 | 2013-08-07 22:47:34 +0000 | [diff] [blame] | 10 | .. contents:: |
| 11 | :local: |
| 12 | |
| 13 | Introduction |
| 14 | ============ |
| 15 | |
| 16 | DataFlowSanitizer is a generalised dynamic data flow analysis. |
| 17 | |
| 18 | Unlike other Sanitizer tools, this tool is not designed to detect a |
| 19 | specific class of bugs on its own. Instead, it provides a generic |
| 20 | dynamic data flow analysis framework to be used by clients to help |
| 21 | detect application-specific issues within their own code. |
| 22 | |
| 23 | Usage |
| 24 | ===== |
| 25 | |
| 26 | With no program changes, applying DataFlowSanitizer to a program |
| 27 | will not alter its behavior. To use DataFlowSanitizer, the program |
| 28 | uses API functions to apply tags to data to cause it to be tracked, and to |
| 29 | check the tag of a specific data item. DataFlowSanitizer manages |
| 30 | the propagation of tags through the program according to its data flow. |
| 31 | |
| 32 | The APIs are defined in the header file ``sanitizer/dfsan_interface.h``. |
| 33 | For further information about each function, please refer to the header |
| 34 | file. |
| 35 | |
Peter Collingbourne | 5d27a51 | 2013-08-14 18:54:18 +0000 | [diff] [blame] | 36 | ABI List |
| 37 | -------- |
| 38 | |
| 39 | DataFlowSanitizer uses a list of functions known as an ABI list to decide |
| 40 | whether a call to a specific function should use the operating system's native |
| 41 | ABI or whether it should use a variant of this ABI that also propagates labels |
| 42 | through function parameters and return values. The ABI list file also controls |
| 43 | how labels are propagated in the former case. DataFlowSanitizer comes with a |
| 44 | default ABI list which is intended to eventually cover the glibc library on |
| 45 | Linux but it may become necessary for users to extend the ABI list in cases |
| 46 | where a particular library or function cannot be instrumented (e.g. because |
| 47 | it is implemented in assembly or another language which DataFlowSanitizer does |
| 48 | not support) or a function is called from a library or function which cannot |
| 49 | be instrumented. |
| 50 | |
| 51 | DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`. |
| 52 | The pass treats every function in the ``uninstrumented`` category in the |
| 53 | ABI list file as conforming to the native ABI. Unless the ABI list contains |
| 54 | additional categories for those functions, a call to one of those functions |
| 55 | will produce a warning message, as the labelling behavior of the function |
| 56 | is unknown. The other supported categories are ``discard``, ``functional`` |
| 57 | and ``custom``. |
| 58 | |
| 59 | * ``discard`` -- To the extent that this function writes to (user-accessible) |
| 60 | memory, it also updates labels in shadow memory (this condition is trivially |
| 61 | satisfied for functions which do not write to user-accessible memory). Its |
| 62 | return value is unlabelled. |
| 63 | * ``functional`` -- Like ``discard``, except that the label of its return value |
| 64 | is the union of the label of its arguments. |
| 65 | * ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F`` |
| 66 | is called, where ``F`` is the name of the function. This function may wrap |
| 67 | the original function or provide its own implementation. This category is |
| 68 | generally used for uninstrumentable functions which write to user-accessible |
| 69 | memory or which have more complex label propagation behavior. The signature |
| 70 | of ``__dfsw_F`` is based on that of ``F`` with each argument having a |
| 71 | label of type ``dfsan_label`` appended to the argument list. If ``F`` |
| 72 | is of non-void return type a final argument of type ``dfsan_label *`` |
| 73 | is appended to which the custom function can store the label for the |
| 74 | return value. For example: |
| 75 | |
| 76 | .. code-block:: c++ |
| 77 | |
| 78 | void f(int x); |
| 79 | void __dfsw_f(int x, dfsan_label x_label); |
| 80 | |
| 81 | void *memcpy(void *dest, const void *src, size_t n); |
| 82 | void *__dfsw_memcpy(void *dest, const void *src, size_t n, |
| 83 | dfsan_label dest_label, dfsan_label src_label, |
| 84 | dfsan_label n_label, dfsan_label *ret_label); |
| 85 | |
| 86 | If a function defined in the translation unit being compiled belongs to the |
| 87 | ``uninstrumented`` category, it will be compiled so as to conform to the |
| 88 | native ABI. Its arguments will be assumed to be unlabelled, but it will |
| 89 | propagate labels in shadow memory. |
| 90 | |
| 91 | For example: |
| 92 | |
| 93 | .. code-block:: none |
| 94 | |
| 95 | # main is called by the C runtime using the native ABI. |
| 96 | fun:main=uninstrumented |
| 97 | fun:main=discard |
| 98 | |
| 99 | # malloc only writes to its internal data structures, not user-accessible memory. |
| 100 | fun:malloc=uninstrumented |
| 101 | fun:malloc=discard |
| 102 | |
| 103 | # tolower is a pure function. |
| 104 | fun:tolower=uninstrumented |
| 105 | fun:tolower=functional |
| 106 | |
| 107 | # memcpy needs to copy the shadow from the source to the destination region. |
| 108 | # This is done in a custom function. |
| 109 | fun:memcpy=uninstrumented |
| 110 | fun:memcpy=custom |
| 111 | |
Peter Collingbourne | 2eeed71 | 2013-08-07 22:47:34 +0000 | [diff] [blame] | 112 | Example |
| 113 | ======= |
| 114 | |
| 115 | The following program demonstrates label propagation by checking that |
| 116 | the correct labels are propagated. |
| 117 | |
| 118 | .. code-block:: c++ |
| 119 | |
| 120 | #include <sanitizer/dfsan_interface.h> |
| 121 | #include <assert.h> |
| 122 | |
| 123 | int main(void) { |
| 124 | int i = 1; |
| 125 | dfsan_label i_label = dfsan_create_label("i", 0); |
| 126 | dfsan_set_label(i_label, &i, sizeof(i)); |
| 127 | |
| 128 | int j = 2; |
| 129 | dfsan_label j_label = dfsan_create_label("j", 0); |
| 130 | dfsan_set_label(j_label, &j, sizeof(j)); |
| 131 | |
| 132 | int k = 3; |
| 133 | dfsan_label k_label = dfsan_create_label("k", 0); |
| 134 | dfsan_set_label(k_label, &k, sizeof(k)); |
| 135 | |
| 136 | dfsan_label ij_label = dfsan_get_label(i + j); |
| 137 | assert(dfsan_has_label(ij_label, i_label)); |
| 138 | assert(dfsan_has_label(ij_label, j_label)); |
| 139 | assert(!dfsan_has_label(ij_label, k_label)); |
| 140 | |
| 141 | dfsan_label ijk_label = dfsan_get_label(i + j + k); |
| 142 | assert(dfsan_has_label(ijk_label, i_label)); |
| 143 | assert(dfsan_has_label(ijk_label, j_label)); |
| 144 | assert(dfsan_has_label(ijk_label, k_label)); |
| 145 | |
| 146 | return 0; |
| 147 | } |
| 148 | |
| 149 | Current status |
| 150 | ============== |
| 151 | |
| 152 | DataFlowSanitizer is a work in progress, currently under development for |
| 153 | x86\_64 Linux. |
| 154 | |
| 155 | Design |
| 156 | ====== |
| 157 | |
| 158 | Please refer to the :doc:`design document<DataFlowSanitizerDesign>`. |