[PDB] Begin adding documentation for the PDB file format.
Differential Revision: https://reviews.llvm.org/D26374
llvm-svn: 286491
diff --git a/llvm/docs/PDB/index.rst b/llvm/docs/PDB/index.rst
new file mode 100644
index 0000000..2c1c3e3
--- /dev/null
+++ b/llvm/docs/PDB/index.rst
@@ -0,0 +1,160 @@
+=====================================
+The PDB File Format
+=====================================
+
+.. contents::
+ :local:
+
+.. _pdb_intro:
+
+Introduction
+============
+
+PDB (Program Database) is a file format invented by Microsoft and which contains
+debug information that can be consumed by debuggers and other tools. Since
+officially supported APIs exist on Windows for querying debug information from
+PDBs even without the user understanding the internals of the file format, a
+large ecosystem of tools has been built for Windows to consume this format. In
+order for Clang to be able to generate programs that can interoperate with these
+tools, it is necessary for us to generate PDB files ourselves.
+
+At the same time, LLVM has a long history of being able to cross-compile from
+any platform to any platform, and we wish for the same to be true here. So it
+is necessary for us to understand the PDB file format at the byte-level so that
+we can generate PDB files entirely on our own.
+
+This manual describes what we know about the PDB file format today. The layout
+of the file, the various streams contained within, the format of individual
+records within, and more.
+
+We would like to extend our heartfelt gratitude to Microsoft, without whom we
+would not be where we are today. Much of the knowledge contained within this
+manual was learned through reading code published by Microsoft on their `GitHub
+repo <https://github.com/Microsoft/microsoft-pdb>`__.
+
+.. _pdb_layout:
+
+File Layout
+===========
+
+.. toctree::
+ :hidden:
+
+ MsfFile
+ PdbStream
+ TpiStream
+ DbiStream
+ ModiStream
+ PublicStream
+ GlobalStream
+ HashStream
+
+.. _msf:
+
+The MSF Container
+-----------------
+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
+An MSF file is actually a miniature "file system within a file". It contains
+multiple streams (aka files) which can represent arbitrary data, and these
+streams are divided into blocks which may not necessarily be contiguously
+laid out within the file (aka fragmented). Additionally, the MSF contains a
+stream directory (aka MFT) which describes how the streams (files) are laid
+out within the MSF.
+
+For more information about the MSF container format, stream directory, and
+block layout, see :doc:`MsfFile`.
+
+.. _streams:
+
+Streams
+-------
+The PDB format contains a number of streams which describe various information
+such as the types, symbols, source files, and compilands (e.g. object files)
+of a program, as well as some additional streams containing hash tables that are
+used by debuggers and other tools to provide fast lookup of records and types
+by name, and various other information about how the program was compiled such
+as the specific toolchain used, and more. A summary of streams contained in a
+PDB file is as follows:
+
++--------------------+------------------------------+-------------------------------------------+
+| Name | Stream Index | Contents |
++====================+==============================+===========================================+
+| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
++--------------------+------------------------------+-------------------------------------------+
+| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
+| | | - Fields to match EXE to this PDB |
+| | | - Map of named streams to stream indices |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
+| | | - Index of TPI Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
+| | | - Indices of individual module streams |
+| | | - Indices of public / global streams |
+| | | - Section Contribution Information |
+| | | - Source File Information |
+| | | - FPO / PGO Data |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
+| | | - Index of IPI Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| /LinkInfo | - Contained in PDB Stream | - Unknown |
+| | Named Stream map | |
++--------------------+------------------------------+-------------------------------------------+
+| /src/headerblock | - Contained in PDB Stream | - Unknown |
+| | Named Stream map | |
++--------------------+------------------------------+-------------------------------------------+
+| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
+| | Named Stream map | string de-duplication |
++--------------------+------------------------------+-------------------------------------------+
+| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
+| | - One for each compiland | - Line Number Information |
++--------------------+------------------------------+-------------------------------------------+
+| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
+| | | - Index of Public Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| Global Stream | - Contained in DBI Stream | - Global Symbol Records |
+| | | - Index of Global Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
+| | | by name |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
+| | | by name |
++--------------------+------------------------------+-------------------------------------------+
+
+More information about the structure of each of these can be found on the
+following pages:
+
+:doc:`PdbStream`
+ Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
+
+:doc:`TpiStream`
+ Information about the TPI stream and the CodeView records contained within.
+
+:doc:`DbiStream`
+ Information about the DBI stream and relevant substreams including the Module Substreams,
+ source file information, and CodeView symbol records contained within.
+
+:doc:`ModiStream`
+ Information about the Module Information Stream, of which there is one for each compilation
+ unit and the format of symbols contained within.
+
+:doc:`PublicStream`
+ Information about the Public Symbol Stream.
+
+:doc:`GlobalStream`
+ Information about the Global Symbol Stream.
+
+:doc:`HashStream`
+ Information about the Hash Table stream, and how it can be used to quickly look up records
+ by name.
+
+CodeView
+========
+CodeView is another format which comes into the picture. While MSF defines
+the structure of the overall file, and PDB defines the set of streams that
+appear within the MSF file and the format of those streams, CodeView defines
+the format of **symbol and type records** that appear within specific streams.
+Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for
+more information about the CodeView format.