Stage two of getting CFE top correct.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@39734 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/NOTES.txt b/NOTES.txt
new file mode 100644
index 0000000..da84211
--- /dev/null
+++ b/NOTES.txt
@@ -0,0 +1,218 @@
+//===---------------------------------------------------------------------===//
+// Random Notes
+//===---------------------------------------------------------------------===//
+
+C90/C99/C++ Comparisons:
+http://david.tribble.com/text/cdiffs.htm
+
+//===---------------------------------------------------------------------===//
+Extensions:
+
+ * "#define_target X Y"
+   This preprocessor directive works exactly the same was as #define, but it
+   notes that 'X' is a target-specific preprocessor directive.  When used, a
+   diagnostic is emitted indicating that the translation unit is non-portable.
+   
+   If a target-define is #undef'd before use, no diagnostic is emitted.  If 'X'
+   were previously a normal #define macro, the macro is tainted.  If 'X' is
+   subsequently #defined as a non-target-specific define, the taint bit is
+   cleared.
+   
+ * "#define_other_target X"
+    The preprocessor directive takes a single identifier argument.  It notes
+    that this identifier is a target-specific #define for some target other than
+    the current one.  Use of this identifier will result in a diagnostic.
+    
+    If 'X' is later #undef'd or #define'd, the taint bit is cleared.  If 'X' is
+    already defined, X is marked as a target-specific define. 
+
+//===---------------------------------------------------------------------===//
+
+To time GCC preprocessing speed without output, use:
+   "time gcc -MM file"
+This is similar to -Eonly.
+
+
+//===---------------------------------------------------------------------===//
+
+  C++ Template Instantiation benchmark:
+     http://users.rcn.com/abrahams/instantiation_speed/index.html
+
+//===---------------------------------------------------------------------===//
+
+TODO: File Manager Speedup:
+
+ We currently do a lot of stat'ing for files that don't exist, particularly
+ when lots of -I paths exist (e.g. see the <iostream> example, check for
+ failures in stat in FileManager::getFile).  It would be far better to make
+ the following changes:
+   1. FileEntry contains a sys::Path instead of a std::string for Name.
+   2. sys::Path contains timestamp and size, lazily computed.  Eliminate from
+      FileEntry.
+   3. File UIDs are created on request, not when files are opened.
+ These changes make it possible to efficiently have FileEntry objects for
+ files that exist on the file system, but have not been used yet.
+ 
+ Once this is done:
+   1. DirectoryEntry gets a boolean value "has read entries".  When false, not
+      all entries in the directory are in the file mgr, when true, they are.
+   2. Instead of stat'ing the file in FileManager::getFile, check to see if 
+      the dir has been read.  If so, fail immediately, if not, read the dir,
+      then retry.
+   3. Reading the dir uses the getdirentries syscall, creating an FileEntry
+      for all files found.
+
+//===---------------------------------------------------------------------===//
+
+TODO: Fast #Import:
+
+ * Get frameworks that don't use #import to do so, e.g. 
+   DirectoryService, AudioToolbox, CoreFoundation, etc.  Why not using #import?
+   Because they work in C mode? C has #import.
+ * Have the lexer return a token for #import instead of handling it itself.
+   - Create a new preprocessor object with no external state (no -D/U options
+     from the command line, etc).  Alternatively, keep track of exactly which
+     external state is used by a #import: declare it somehow.
+ * When having reading a #import file, keep track of whether we have (and/or
+   which) seen any "configuration" macros.  Various cases:
+   - Uses of target args (__POWERPC__, __i386): Header has to be parsed 
+     multiple times, per-target.  What about #ifndef checks?  How do we know?
+   - "Configuration" preprocessor macros not defined: POWERPC, etc.  What about
+     things like __STDC__ etc?  What is and what isn't allowed.
+ * Special handling for "umbrella" headers, which just contain #import stmts:
+   - Cocoa.h/AppKit.h - Contain pointers to digests instead of entire digests
+     themselves?  Foundation.h isn't pure umbrella!
+ * Frameworks digests:
+   - Can put "digest" of a framework-worth of headers into the framework
+     itself.  To open AppKit, just mmap
+     /System/Library/Frameworks/AppKit.framework/"digest", which provides a
+     symbol table in a well defined format.  Lazily unstream stuff that is
+     needed.  Contains declarations, macros, and debug information.
+   - System frameworks ship with digests.  How do we handle configuration
+     information?  How do we handle stuff like:
+       #if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_2
+     which guards a bunch of decls?  Should there be a couple of default
+     configs, then have the UI fall back to building/caching its own?
+   - GUI automatically builds digests when UI is idle, both of system
+     frameworks if they aren't not available in the right config, and of app
+     frameworks.
+   - GUI builds dependence graph of frameworks/digests based on #imports.  If a
+     digest is out date, dependent digests are automatically invalidated.
+
+ * New constraints on #import for objc-v3:
+   - #imported file must not define non-inline function bodies.
+     - Alternatively, they can, and these bodies get compiled/linked *once*
+       per app into a dylib.  What about building user dylibs?
+   - Restrictions on ObjC grammar: can't #import the body of a for stmt or fn.
+   - Compiler must detect and reject these cases.
+   - #defines defined within a #import have two behaviors:
+     - By default, they escape the header.  These macros *cannot* be #undef'd
+       by other code: this is enforced by the front-end.
+     - Optionally, user can specify what macros escape (whitelist) or can use
+       #undef.
+
+//===---------------------------------------------------------------------===//
+
+TODO: New language feature: Configuration queries:
+  - Instead of #ifdef __POWERPC__, use "if (strcmp(`cpu`, __POWERPC__))", or
+    some other, better, syntax.
+  - Use it to increase the number of "architecture-clean" #import'd files,
+    allowing a single index to be used for all fat slices.
+
+//===---------------------------------------------------------------------===//
+
+The 'portability' model in clang is sufficient to catch translation units (or
+their parts) that are not portable, but it doesn't help if the system headers
+are non-portable and not fixed.  An alternative model that would be easy to use
+is a 'tainting' scheme.  Consider:
+
+int32_t
+OSHostByteOrder(void) {
+#if defined(__LITTLE_ENDIAN__)
+    return OSLittleEndian;
+#elif defined(__BIG_ENDIAN__)
+    return OSBigEndian;
+#else
+    return OSUnknownByteOrder;
+#endif
+}
+
+It would be trivial to mark 'OSHostByteOrder' as being non-portable (tainted)
+instead of marking the entire translation unit.  Then, if OSHostByteOrder is
+never called/used by the current translation unit, the t-u wouldn't be marked
+non-portable.  However, there is no good way to handle stuff like:
+
+extern int X, Y;
+
+#ifndef __POWERPC__
+#define X Y
+#endif
+
+int bar() { return X; }
+
+When compiling for powerpc, the #define is skipped, so it doesn't know that bar
+uses a #define that is set on some other target.  In practice, limited cases
+could be handled by scanning the skipped region of a #if, but the fully general
+case cannot be implemented efficiently.  In this case, for example, the #define
+in the protected region could be turned into either a #define_target or
+#define_other_target as appropriate.  The harder case is code like this (from
+OSByteOrder.h):
+
+  #if (defined(__ppc__) || defined(__ppc64__))
+  #include <libkern/ppc/OSByteOrder.h>
+  #elif (defined(__i386__) || defined(__x86_64__))
+  #include <libkern/i386/OSByteOrder.h>
+  #else
+  #include <libkern/machine/OSByteOrder.h>
+  #endif
+
+The realistic way to fix this is by having an initial #ifdef __llvm__ that
+defines its contents in terms of the llvm bswap intrinsics.  Other things should
+be handled on a case-by-case basis.
+
+
+We probably have to do something smarter like this in the future. The C++ header
+<limits> contains a lot of code like this:
+
+   static const int digits10 = __LDBL_DIG__;
+   static const int min_exponent = __LDBL_MIN_EXP__;
+   static const int min_exponent10 = __LDBL_MIN_10_EXP__;
+   static const float_denorm_style has_denorm
+     = bool(__LDBL_DENORM_MIN__) ? denorm_present : denorm_absent;
+
+ ... since this isn't being used in an #ifdef, it should be easy enough to taint
+the decl for these ivars.
+
+
+/usr/include/sys/cdefs.h contains stuff like this:
+
+#if defined(__ppc__)
+#  if defined(__LDBL_MANT_DIG__) && defined(__DBL_MANT_DIG__) && \
+	__LDBL_MANT_DIG__ > __DBL_MANT_DIG__
+#    if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__-0 < 1040
+#      define	__DARWIN_LDBL_COMPAT(x)	__asm("_" __STRING(x) "$LDBLStub")
+#    else
+#      define	__DARWIN_LDBL_COMPAT(x)	__asm("_" __STRING(x) "$LDBL128")
+#    endif
+#    define	__DARWIN_LDBL_COMPAT2(x) __asm("_" __STRING(x) "$LDBL128")
+#    define	__DARWIN_LONG_DOUBLE_IS_DOUBLE	0
+#  else
+#   define	__DARWIN_LDBL_COMPAT(x) /* nothing */
+#   define	__DARWIN_LDBL_COMPAT2(x) /* nothing */
+#   define	__DARWIN_LONG_DOUBLE_IS_DOUBLE	1
+#  endif
+#elif defined(__i386__) || defined(__ppc64__) || defined(__x86_64__)
+#  define	__DARWIN_LDBL_COMPAT(x)	/* nothing */
+#  define	__DARWIN_LDBL_COMPAT2(x) /* nothing */
+#  define	__DARWIN_LONG_DOUBLE_IS_DOUBLE	0
+#else
+#  error Unknown architecture
+#endif
+
+An ideal way to solve this issue is to mark __DARWIN_LDBL_COMPAT / 
+__DARWIN_LDBL_COMPAT2 / __DARWIN_LONG_DOUBLE_IS_DOUBLE as being non-portable
+because they depend on non-portable macros.  In practice though, this may end
+up being a serious problem: every use of printf will mark the translation unit
+non-portable if targetting ppc32 and something else.
+
+//===---------------------------------------------------------------------===//