Reid Spencer | 5f016e2 | 2007-07-11 17:01:13 +0000 | [diff] [blame] | 1 | //===---------------------------------------------------------------------===// |
| 2 | // Random Notes |
| 3 | //===---------------------------------------------------------------------===// |
| 4 | |
| 5 | C90/C99/C++ Comparisons: |
| 6 | http://david.tribble.com/text/cdiffs.htm |
| 7 | |
| 8 | //===---------------------------------------------------------------------===// |
| 9 | Extensions: |
| 10 | |
| 11 | * "#define_target X Y" |
| 12 | This preprocessor directive works exactly the same was as #define, but it |
| 13 | notes that 'X' is a target-specific preprocessor directive. When used, a |
| 14 | diagnostic is emitted indicating that the translation unit is non-portable. |
| 15 | |
| 16 | If a target-define is #undef'd before use, no diagnostic is emitted. If 'X' |
| 17 | were previously a normal #define macro, the macro is tainted. If 'X' is |
| 18 | subsequently #defined as a non-target-specific define, the taint bit is |
| 19 | cleared. |
| 20 | |
| 21 | * "#define_other_target X" |
| 22 | The preprocessor directive takes a single identifier argument. It notes |
| 23 | that this identifier is a target-specific #define for some target other than |
| 24 | the current one. Use of this identifier will result in a diagnostic. |
| 25 | |
| 26 | If 'X' is later #undef'd or #define'd, the taint bit is cleared. If 'X' is |
| 27 | already defined, X is marked as a target-specific define. |
| 28 | |
| 29 | //===---------------------------------------------------------------------===// |
| 30 | |
| 31 | To time GCC preprocessing speed without output, use: |
| 32 | "time gcc -MM file" |
| 33 | This is similar to -Eonly. |
| 34 | |
| 35 | |
| 36 | //===---------------------------------------------------------------------===// |
| 37 | |
| 38 | C++ Template Instantiation benchmark: |
| 39 | http://users.rcn.com/abrahams/instantiation_speed/index.html |
| 40 | |
| 41 | //===---------------------------------------------------------------------===// |
| 42 | |
| 43 | TODO: File Manager Speedup: |
| 44 | |
| 45 | We currently do a lot of stat'ing for files that don't exist, particularly |
| 46 | when lots of -I paths exist (e.g. see the <iostream> example, check for |
| 47 | failures in stat in FileManager::getFile). It would be far better to make |
| 48 | the following changes: |
| 49 | 1. FileEntry contains a sys::Path instead of a std::string for Name. |
| 50 | 2. sys::Path contains timestamp and size, lazily computed. Eliminate from |
| 51 | FileEntry. |
| 52 | 3. File UIDs are created on request, not when files are opened. |
| 53 | These changes make it possible to efficiently have FileEntry objects for |
| 54 | files that exist on the file system, but have not been used yet. |
| 55 | |
| 56 | Once this is done: |
| 57 | 1. DirectoryEntry gets a boolean value "has read entries". When false, not |
| 58 | all entries in the directory are in the file mgr, when true, they are. |
| 59 | 2. Instead of stat'ing the file in FileManager::getFile, check to see if |
| 60 | the dir has been read. If so, fail immediately, if not, read the dir, |
| 61 | then retry. |
| 62 | 3. Reading the dir uses the getdirentries syscall, creating an FileEntry |
| 63 | for all files found. |
| 64 | |
| 65 | //===---------------------------------------------------------------------===// |
| 66 | |
| 67 | TODO: Fast #Import: |
| 68 | |
| 69 | * Get frameworks that don't use #import to do so, e.g. |
| 70 | DirectoryService, AudioToolbox, CoreFoundation, etc. Why not using #import? |
| 71 | Because they work in C mode? C has #import. |
| 72 | * Have the lexer return a token for #import instead of handling it itself. |
| 73 | - Create a new preprocessor object with no external state (no -D/U options |
| 74 | from the command line, etc). Alternatively, keep track of exactly which |
| 75 | external state is used by a #import: declare it somehow. |
| 76 | * When having reading a #import file, keep track of whether we have (and/or |
| 77 | which) seen any "configuration" macros. Various cases: |
| 78 | - Uses of target args (__POWERPC__, __i386): Header has to be parsed |
| 79 | multiple times, per-target. What about #ifndef checks? How do we know? |
| 80 | - "Configuration" preprocessor macros not defined: POWERPC, etc. What about |
| 81 | things like __STDC__ etc? What is and what isn't allowed. |
| 82 | * Special handling for "umbrella" headers, which just contain #import stmts: |
| 83 | - Cocoa.h/AppKit.h - Contain pointers to digests instead of entire digests |
| 84 | themselves? Foundation.h isn't pure umbrella! |
| 85 | * Frameworks digests: |
| 86 | - Can put "digest" of a framework-worth of headers into the framework |
| 87 | itself. To open AppKit, just mmap |
| 88 | /System/Library/Frameworks/AppKit.framework/"digest", which provides a |
| 89 | symbol table in a well defined format. Lazily unstream stuff that is |
| 90 | needed. Contains declarations, macros, and debug information. |
| 91 | - System frameworks ship with digests. How do we handle configuration |
| 92 | information? How do we handle stuff like: |
| 93 | #if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_2 |
| 94 | which guards a bunch of decls? Should there be a couple of default |
| 95 | configs, then have the UI fall back to building/caching its own? |
| 96 | - GUI automatically builds digests when UI is idle, both of system |
| 97 | frameworks if they aren't not available in the right config, and of app |
| 98 | frameworks. |
| 99 | - GUI builds dependence graph of frameworks/digests based on #imports. If a |
| 100 | digest is out date, dependent digests are automatically invalidated. |
| 101 | |
| 102 | * New constraints on #import for objc-v3: |
| 103 | - #imported file must not define non-inline function bodies. |
| 104 | - Alternatively, they can, and these bodies get compiled/linked *once* |
| 105 | per app into a dylib. What about building user dylibs? |
| 106 | - Restrictions on ObjC grammar: can't #import the body of a for stmt or fn. |
| 107 | - Compiler must detect and reject these cases. |
| 108 | - #defines defined within a #import have two behaviors: |
| 109 | - By default, they escape the header. These macros *cannot* be #undef'd |
| 110 | by other code: this is enforced by the front-end. |
| 111 | - Optionally, user can specify what macros escape (whitelist) or can use |
| 112 | #undef. |
| 113 | |
| 114 | //===---------------------------------------------------------------------===// |
| 115 | |
| 116 | TODO: New language feature: Configuration queries: |
| 117 | - Instead of #ifdef __POWERPC__, use "if (strcmp(`cpu`, __POWERPC__))", or |
| 118 | some other, better, syntax. |
| 119 | - Use it to increase the number of "architecture-clean" #import'd files, |
| 120 | allowing a single index to be used for all fat slices. |
| 121 | |
| 122 | //===---------------------------------------------------------------------===// |
| 123 | |
| 124 | The 'portability' model in clang is sufficient to catch translation units (or |
| 125 | their parts) that are not portable, but it doesn't help if the system headers |
| 126 | are non-portable and not fixed. An alternative model that would be easy to use |
| 127 | is a 'tainting' scheme. Consider: |
| 128 | |
| 129 | int32_t |
| 130 | OSHostByteOrder(void) { |
| 131 | #if defined(__LITTLE_ENDIAN__) |
| 132 | return OSLittleEndian; |
| 133 | #elif defined(__BIG_ENDIAN__) |
| 134 | return OSBigEndian; |
| 135 | #else |
| 136 | return OSUnknownByteOrder; |
| 137 | #endif |
| 138 | } |
| 139 | |
| 140 | It would be trivial to mark 'OSHostByteOrder' as being non-portable (tainted) |
| 141 | instead of marking the entire translation unit. Then, if OSHostByteOrder is |
| 142 | never called/used by the current translation unit, the t-u wouldn't be marked |
| 143 | non-portable. However, there is no good way to handle stuff like: |
| 144 | |
| 145 | extern int X, Y; |
| 146 | |
| 147 | #ifndef __POWERPC__ |
| 148 | #define X Y |
| 149 | #endif |
| 150 | |
| 151 | int bar() { return X; } |
| 152 | |
| 153 | When compiling for powerpc, the #define is skipped, so it doesn't know that bar |
| 154 | uses a #define that is set on some other target. In practice, limited cases |
| 155 | could be handled by scanning the skipped region of a #if, but the fully general |
| 156 | case cannot be implemented efficiently. In this case, for example, the #define |
| 157 | in the protected region could be turned into either a #define_target or |
| 158 | #define_other_target as appropriate. The harder case is code like this (from |
| 159 | OSByteOrder.h): |
| 160 | |
| 161 | #if (defined(__ppc__) || defined(__ppc64__)) |
| 162 | #include <libkern/ppc/OSByteOrder.h> |
| 163 | #elif (defined(__i386__) || defined(__x86_64__)) |
| 164 | #include <libkern/i386/OSByteOrder.h> |
| 165 | #else |
| 166 | #include <libkern/machine/OSByteOrder.h> |
| 167 | #endif |
| 168 | |
| 169 | The realistic way to fix this is by having an initial #ifdef __llvm__ that |
| 170 | defines its contents in terms of the llvm bswap intrinsics. Other things should |
| 171 | be handled on a case-by-case basis. |
| 172 | |
| 173 | |
| 174 | We probably have to do something smarter like this in the future. The C++ header |
| 175 | <limits> contains a lot of code like this: |
| 176 | |
| 177 | static const int digits10 = __LDBL_DIG__; |
| 178 | static const int min_exponent = __LDBL_MIN_EXP__; |
| 179 | static const int min_exponent10 = __LDBL_MIN_10_EXP__; |
| 180 | static const float_denorm_style has_denorm |
| 181 | = bool(__LDBL_DENORM_MIN__) ? denorm_present : denorm_absent; |
| 182 | |
| 183 | ... since this isn't being used in an #ifdef, it should be easy enough to taint |
| 184 | the decl for these ivars. |
| 185 | |
| 186 | |
| 187 | /usr/include/sys/cdefs.h contains stuff like this: |
| 188 | |
| 189 | #if defined(__ppc__) |
| 190 | # if defined(__LDBL_MANT_DIG__) && defined(__DBL_MANT_DIG__) && \ |
| 191 | __LDBL_MANT_DIG__ > __DBL_MANT_DIG__ |
| 192 | # if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__-0 < 1040 |
| 193 | # define __DARWIN_LDBL_COMPAT(x) __asm("_" __STRING(x) "$LDBLStub") |
| 194 | # else |
| 195 | # define __DARWIN_LDBL_COMPAT(x) __asm("_" __STRING(x) "$LDBL128") |
| 196 | # endif |
| 197 | # define __DARWIN_LDBL_COMPAT2(x) __asm("_" __STRING(x) "$LDBL128") |
| 198 | # define __DARWIN_LONG_DOUBLE_IS_DOUBLE 0 |
| 199 | # else |
| 200 | # define __DARWIN_LDBL_COMPAT(x) /* nothing */ |
| 201 | # define __DARWIN_LDBL_COMPAT2(x) /* nothing */ |
| 202 | # define __DARWIN_LONG_DOUBLE_IS_DOUBLE 1 |
| 203 | # endif |
| 204 | #elif defined(__i386__) || defined(__ppc64__) || defined(__x86_64__) |
| 205 | # define __DARWIN_LDBL_COMPAT(x) /* nothing */ |
| 206 | # define __DARWIN_LDBL_COMPAT2(x) /* nothing */ |
| 207 | # define __DARWIN_LONG_DOUBLE_IS_DOUBLE 0 |
| 208 | #else |
| 209 | # error Unknown architecture |
| 210 | #endif |
| 211 | |
| 212 | An ideal way to solve this issue is to mark __DARWIN_LDBL_COMPAT / |
| 213 | __DARWIN_LDBL_COMPAT2 / __DARWIN_LONG_DOUBLE_IS_DOUBLE as being non-portable |
| 214 | because they depend on non-portable macros. In practice though, this may end |
| 215 | up being a serious problem: every use of printf will mark the translation unit |
| 216 | non-portable if targetting ppc32 and something else. |
| 217 | |
| 218 | //===---------------------------------------------------------------------===// |
Chris Lattner | 2686234 | 2007-07-11 17:31:59 +0000 | [diff] [blame] | 219 | |