| //===---------------------------------------------------------------------===// |
| // Random Notes |
| //===---------------------------------------------------------------------===// |
| |
| C90/C99/C++ Comparisons: |
| http://david.tribble.com/text/cdiffs.htm |
| |
| //===---------------------------------------------------------------------===// |
| Extensions: |
| |
| * "#define_target X Y" |
| This preprocessor directive works exactly the same was as #define, but it |
| notes that 'X' is a target-specific preprocessor directive. When used, a |
| diagnostic is emitted indicating that the translation unit is non-portable. |
| |
| If a target-define is #undef'd before use, no diagnostic is emitted. If 'X' |
| were previously a normal #define macro, the macro is tainted. If 'X' is |
| subsequently #defined as a non-target-specific define, the taint bit is |
| cleared. |
| |
| * "#define_other_target X" |
| The preprocessor directive takes a single identifier argument. It notes |
| that this identifier is a target-specific #define for some target other than |
| the current one. Use of this identifier will result in a diagnostic. |
| |
| If 'X' is later #undef'd or #define'd, the taint bit is cleared. If 'X' is |
| already defined, X is marked as a target-specific define. |
| |
| //===---------------------------------------------------------------------===// |
| |
| When we go to reimplement <tgmath.h>, we should do it more intelligently than |
| the GCC-supplied header. EDG has an interesting __generic builtin that provides |
| overloading for C: |
| http://www.edg.com/docs/edg_cpp.pdf |
| |
| For example, they have: |
| #define sin(x) __generic(x,,, sin, sinf, sinl, csin, csinf,csinl)(x) |
| |
| It's unclear to me why you couldn't just have a builtin like: |
| __builtin_overload(1, arg1, impl1, impl2, impl3) |
| __builtin_overload(2, arg1, arg2, impl1, impl2, impl3) |
| __builtin_overload(3, arg1, arg2, arg3, impl1, impl2, impl3) |
| |
| Where the compiler would just pick the right "impl" based on the arguments |
| provided. One nasty detail is that some arithmetic promotions most be done for |
| use by the tgmath.h stuff, but it would be nice to be able to handle vectors |
| etc as well without huge globs of macros. With the above scheme, you could |
| use: |
| |
| #define sin(x) __builtin_overload(1, x, sin, sinf, sinl, csin, csinf,csinl)(x) |
| |
| and not need to keep track of which argument to "__generic" corresponds to which |
| type, etc. |
| |
| //===---------------------------------------------------------------------===// |
| |
| To time GCC preprocessing speed without output, use: |
| "time gcc -MM file" |
| This is similar to -Eonly. |
| |
| |
| //===---------------------------------------------------------------------===// |
| |
| C++ Template Instantiation benchmark: |
| http://users.rcn.com/abrahams/instantiation_speed/index.html |
| |
| //===---------------------------------------------------------------------===// |
| |
| TODO: File Manager Speedup: |
| |
| We currently do a lot of stat'ing for files that don't exist, particularly |
| when lots of -I paths exist (e.g. see the <iostream> example, check for |
| failures in stat in FileManager::getFile). It would be far better to make |
| the following changes: |
| 1. FileEntry contains a sys::Path instead of a std::string for Name. |
| 2. sys::Path contains timestamp and size, lazily computed. Eliminate from |
| FileEntry. |
| 3. File UIDs are created on request, not when files are opened. |
| These changes make it possible to efficiently have FileEntry objects for |
| files that exist on the file system, but have not been used yet. |
| |
| Once this is done: |
| 1. DirectoryEntry gets a boolean value "has read entries". When false, not |
| all entries in the directory are in the file mgr, when true, they are. |
| 2. Instead of stat'ing the file in FileManager::getFile, check to see if |
| the dir has been read. If so, fail immediately, if not, read the dir, |
| then retry. |
| 3. Reading the dir uses the getdirentries syscall, creating an FileEntry |
| for all files found. |
| |
| //===---------------------------------------------------------------------===// |
| |
| TODO: Fast #Import: |
| |
| * Get frameworks that don't use #import to do so, e.g. |
| DirectoryService, AudioToolbox, CoreFoundation, etc. Why not using #import? |
| Because they work in C mode? C has #import. |
| * Have the lexer return a token for #import instead of handling it itself. |
| - Create a new preprocessor object with no external state (no -D/U options |
| from the command line, etc). Alternatively, keep track of exactly which |
| external state is used by a #import: declare it somehow. |
| * When having reading a #import file, keep track of whether we have (and/or |
| which) seen any "configuration" macros. Various cases: |
| - Uses of target args (__POWERPC__, __i386): Header has to be parsed |
| multiple times, per-target. What about #ifndef checks? How do we know? |
| - "Configuration" preprocessor macros not defined: POWERPC, etc. What about |
| things like __STDC__ etc? What is and what isn't allowed. |
| * Special handling for "umbrella" headers, which just contain #import stmts: |
| - Cocoa.h/AppKit.h - Contain pointers to digests instead of entire digests |
| themselves? Foundation.h isn't pure umbrella! |
| * Frameworks digests: |
| - Can put "digest" of a framework-worth of headers into the framework |
| itself. To open AppKit, just mmap |
| /System/Library/Frameworks/AppKit.framework/"digest", which provides a |
| symbol table in a well defined format. Lazily unstream stuff that is |
| needed. Contains declarations, macros, and debug information. |
| - System frameworks ship with digests. How do we handle configuration |
| information? How do we handle stuff like: |
| #if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_2 |
| which guards a bunch of decls? Should there be a couple of default |
| configs, then have the UI fall back to building/caching its own? |
| - GUI automatically builds digests when UI is idle, both of system |
| frameworks if they aren't not available in the right config, and of app |
| frameworks. |
| - GUI builds dependence graph of frameworks/digests based on #imports. If a |
| digest is out date, dependent digests are automatically invalidated. |
| |
| * New constraints on #import for objc-v3: |
| - #imported file must not define non-inline function bodies. |
| - Alternatively, they can, and these bodies get compiled/linked *once* |
| per app into a dylib. What about building user dylibs? |
| - Restrictions on ObjC grammar: can't #import the body of a for stmt or fn. |
| - Compiler must detect and reject these cases. |
| - #defines defined within a #import have two behaviors: |
| - By default, they escape the header. These macros *cannot* be #undef'd |
| by other code: this is enforced by the front-end. |
| - Optionally, user can specify what macros escape (whitelist) or can use |
| #undef. |
| |
| //===---------------------------------------------------------------------===// |
| |
| TODO: New language feature: Configuration queries: |
| - Instead of #ifdef __POWERPC__, use "if (strcmp(`cpu`, __POWERPC__))", or |
| some other, better, syntax. |
| - Use it to increase the number of "architecture-clean" #import'd files, |
| allowing a single index to be used for all fat slices. |
| |
| //===---------------------------------------------------------------------===// |
| // Specifying targets: -triple and -arch |
| ===---------------------------------------------------------------------===// |
| |
| The clang supports "-triple" and "-arch" options. At most one -triple option may |
| be specified, while multiple -arch options can be specified. Both are optional. |
| |
| The "selection of target" behavior is defined as follows: |
| |
| (1) If the user does not specify -triple: |
| |
| (a) If no -arch options are specified, the target triple used is the host |
| triple (in llvm/Config/config.h). |
| |
| (b) If one or more -arch's are specified (and no -triple), then there is |
| one triple for each -arch, where the specified arch is substituted |
| for the arch in the host triple. Example: |
| |
| host triple = i686-apple-darwin9 |
| command: clang -arch ppc -arch ppc64 ... |
| triples used: ppc-apple-darwin9 ppc64-apple-darwin9 |
| |
| (2) The user does specify a -triple (only one allowed): |
| |
| (a) If no -arch options are specified, the triple specified by -triple |
| is used. E.g clang -triple i686-apple-darwin9 |
| |
| (b) If one or more -arch options are specified, then the triple specified |
| by -triple is used as the primary target, and the arch's specified |
| by -arch are used to create secondary targets. For example: |
| |
| clang -triple i686-apple-darwin9 -arch ppc -arch ppc64 |
| |
| has the following targets: |
| |
| i686-apple-darwin9 (primary target) |
| ppc-apple-darwin9 (secondary target) |
| ppc64-apple-darwin9 (secondary target) |
| |
| The secondary targets are used in the 'portability' model (see below). |
| |
| //===---------------------------------------------------------------------===// |
| |
| The 'portability' model in clang is sufficient to catch translation units (or |
| their parts) that are not portable, but it doesn't help if the system headers |
| are non-portable and not fixed. An alternative model that would be easy to use |
| is a 'tainting' scheme. Consider: |
| |
| int32_t |
| OSHostByteOrder(void) { |
| #if defined(__LITTLE_ENDIAN__) |
| return OSLittleEndian; |
| #elif defined(__BIG_ENDIAN__) |
| return OSBigEndian; |
| #else |
| return OSUnknownByteOrder; |
| #endif |
| } |
| |
| It would be trivial to mark 'OSHostByteOrder' as being non-portable (tainted) |
| instead of marking the entire translation unit. Then, if OSHostByteOrder is |
| never called/used by the current translation unit, the t-u wouldn't be marked |
| non-portable. However, there is no good way to handle stuff like: |
| |
| extern int X, Y; |
| |
| #ifndef __POWERPC__ |
| #define X Y |
| #endif |
| |
| int bar() { return X; } |
| |
| When compiling for powerpc, the #define is skipped, so it doesn't know that bar |
| uses a #define that is set on some other target. In practice, limited cases |
| could be handled by scanning the skipped region of a #if, but the fully general |
| case cannot be implemented efficiently. In this case, for example, the #define |
| in the protected region could be turned into either a #define_target or |
| #define_other_target as appropriate. The harder case is code like this (from |
| OSByteOrder.h): |
| |
| #if (defined(__ppc__) || defined(__ppc64__)) |
| #include <libkern/ppc/OSByteOrder.h> |
| #elif (defined(__i386__) || defined(__x86_64__)) |
| #include <libkern/i386/OSByteOrder.h> |
| #else |
| #include <libkern/machine/OSByteOrder.h> |
| #endif |
| |
| The realistic way to fix this is by having an initial #ifdef __llvm__ that |
| defines its contents in terms of the llvm bswap intrinsics. Other things should |
| be handled on a case-by-case basis. |
| |
| |
| We probably have to do something smarter like this in the future. The C++ header |
| <limits> contains a lot of code like this: |
| |
| static const int digits10 = __LDBL_DIG__; |
| static const int min_exponent = __LDBL_MIN_EXP__; |
| static const int min_exponent10 = __LDBL_MIN_10_EXP__; |
| static const float_denorm_style has_denorm |
| = bool(__LDBL_DENORM_MIN__) ? denorm_present : denorm_absent; |
| |
| ... since this isn't being used in an #ifdef, it should be easy enough to taint |
| the decl for these ivars. |
| |
| |
| /usr/include/sys/cdefs.h contains stuff like this: |
| |
| #if defined(__ppc__) |
| # if defined(__LDBL_MANT_DIG__) && defined(__DBL_MANT_DIG__) && \ |
| __LDBL_MANT_DIG__ > __DBL_MANT_DIG__ |
| # if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__-0 < 1040 |
| # define __DARWIN_LDBL_COMPAT(x) __asm("_" __STRING(x) "$LDBLStub") |
| # else |
| # define __DARWIN_LDBL_COMPAT(x) __asm("_" __STRING(x) "$LDBL128") |
| # endif |
| # define __DARWIN_LDBL_COMPAT2(x) __asm("_" __STRING(x) "$LDBL128") |
| # define __DARWIN_LONG_DOUBLE_IS_DOUBLE 0 |
| # else |
| # define __DARWIN_LDBL_COMPAT(x) /* nothing */ |
| # define __DARWIN_LDBL_COMPAT2(x) /* nothing */ |
| # define __DARWIN_LONG_DOUBLE_IS_DOUBLE 1 |
| # endif |
| #elif defined(__i386__) || defined(__ppc64__) || defined(__x86_64__) |
| # define __DARWIN_LDBL_COMPAT(x) /* nothing */ |
| # define __DARWIN_LDBL_COMPAT2(x) /* nothing */ |
| # define __DARWIN_LONG_DOUBLE_IS_DOUBLE 0 |
| #else |
| # error Unknown architecture |
| #endif |
| |
| An ideal way to solve this issue is to mark __DARWIN_LDBL_COMPAT / |
| __DARWIN_LDBL_COMPAT2 / __DARWIN_LONG_DOUBLE_IS_DOUBLE as being non-portable |
| because they depend on non-portable macros. In practice though, this may end |
| up being a serious problem: every use of printf will mark the translation unit |
| non-portable if targetting ppc32 and something else. |
| |
| //===---------------------------------------------------------------------===// |
| |