[PDB] Begin adding documentation for the PDB file format.

Differential Revision: https://reviews.llvm.org/D26374

llvm-svn: 286491
diff --git a/llvm/docs/PDB/index.rst b/llvm/docs/PDB/index.rst
new file mode 100644
index 0000000..2c1c3e3
--- /dev/null
+++ b/llvm/docs/PDB/index.rst
@@ -0,0 +1,160 @@
+=====================================

+The PDB File Format

+=====================================

+

+.. contents::

+   :local:

+

+.. _pdb_intro:

+

+Introduction

+============

+

+PDB (Program Database) is a file format invented by Microsoft and which contains

+debug information that can be consumed by debuggers and other tools.  Since

+officially supported APIs exist on Windows for querying debug information from

+PDBs even without the user understanding the internals of the file format, a

+large ecosystem of tools has been built for Windows to consume this format.  In

+order for Clang to be able to generate programs that can interoperate with these

+tools, it is necessary for us to generate PDB files ourselves.

+

+At the same time, LLVM has a long history of being able to cross-compile from

+any platform to any platform, and we wish for the same to be true here.  So it

+is necessary for us to understand the PDB file format at the byte-level so that

+we can generate PDB files entirely on our own.

+

+This manual describes what we know about the PDB file format today.  The layout

+of the file, the various streams contained within, the format of individual

+records within, and more.

+

+We would like to extend our heartfelt gratitude to Microsoft, without whom we

+would not be where we are today.  Much of the knowledge contained within this

+manual was learned through reading code published by Microsoft on their `GitHub

+repo <https://github.com/Microsoft/microsoft-pdb>`__.

+

+.. _pdb_layout:

+

+File Layout

+===========

+

+.. toctree::

+   :hidden:

+   

+   MsfFile

+   PdbStream

+   TpiStream

+   DbiStream

+   ModiStream

+   PublicStream

+   GlobalStream

+   HashStream

+

+.. _msf:

+

+The MSF Container

+-----------------

+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.

+An MSF file is actually a miniature "file system within a file".  It contains

+multiple streams (aka files) which can represent arbitrary data, and these

+streams are divided into blocks which may not necessarily be contiguously

+laid out within the file (aka fragmented).  Additionally, the MSF contains a

+stream directory (aka MFT) which describes how the streams (files) are laid

+out within the MSF.

+

+For more information about the MSF container format, stream directory, and

+block layout, see :doc:`MsfFile`.

+

+.. _streams:

+

+Streams

+-------

+The PDB format contains a number of streams which describe various information

+such as the types, symbols, source files, and compilands (e.g. object files)

+of a program, as well as some additional streams containing hash tables that are

+used by debuggers and other tools to provide fast lookup of records and types

+by name, and various other information about how the program was compiled such

+as the specific toolchain used, and more.  A summary of streams contained in a

+PDB file is as follows:

+

++--------------------+------------------------------+-------------------------------------------+

+| Name               | Stream Index                 | Contents                                  |

++====================+==============================+===========================================+

+| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |

++--------------------+------------------------------+-------------------------------------------+

+| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |

+|                    |                              | - Fields to match EXE to this PDB         |

+|                    |                              | - Map of named streams to stream indices  |

++--------------------+------------------------------+-------------------------------------------+

+| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |

+|                    |                              | - Index of TPI Hash Stream                |

++--------------------+------------------------------+-------------------------------------------+

+| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |

+|                    |                              | - Indices of individual module streams    |

+|                    |                              | - Indices of public / global streams      |

+|                    |                              | - Section Contribution Information        |

+|                    |                              | - Source File Information                 |

+|                    |                              | - FPO / PGO Data                          |

++--------------------+------------------------------+-------------------------------------------+

+| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |

+|                    |                              | - Index of IPI Hash Stream                |

++--------------------+------------------------------+-------------------------------------------+

+| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |

+|                    |   Named Stream map           |                                           |

++--------------------+------------------------------+-------------------------------------------+

+| /src/headerblock   | - Contained in PDB Stream    | - Unknown                                 |

+|                    |   Named Stream map           |                                           |

++--------------------+------------------------------+-------------------------------------------+

+| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |

+|                    |   Named Stream map           |   string de-duplication                   |

++--------------------+------------------------------+-------------------------------------------+

+| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |

+|                    | - One for each compiland     | - Line Number Information                 |

++--------------------+------------------------------+-------------------------------------------+

+| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |

+|                    |                              | - Index of Public Hash Stream             |

++--------------------+------------------------------+-------------------------------------------+

+| Global Stream      | - Contained in DBI Stream    | - Global Symbol Records                   |

+|                    |                              | - Index of Global Hash Stream             |

++--------------------+------------------------------+-------------------------------------------+

+| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |

+|                    |                              |   by name                                 |

++--------------------+------------------------------+-------------------------------------------+

+| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |

+|                    |                              |   by name                                 |

++--------------------+------------------------------+-------------------------------------------+

+

+More information about the structure of each of these can be found on the

+following pages:

+   

+:doc:`PdbStream`

+   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.

+

+:doc:`TpiStream`

+   Information about the TPI stream and the CodeView records contained within.

+

+:doc:`DbiStream`

+   Information about the DBI stream and relevant substreams including the Module Substreams,

+   source file information, and CodeView symbol records contained within.

+

+:doc:`ModiStream`

+   Information about the Module Information Stream, of which there is one for each compilation

+   unit and the format of symbols contained within.

+

+:doc:`PublicStream`

+   Information about the Public Symbol Stream.

+

+:doc:`GlobalStream`

+   Information about the Global Symbol Stream.

+

+:doc:`HashStream`

+   Information about the Hash Table stream, and how it can be used to quickly look up records

+   by name.

+

+CodeView

+========

+CodeView is another format which comes into the picture.  While MSF defines

+the structure of the overall file, and PDB defines the set of streams that

+appear within the MSF file and the format of those streams, CodeView defines

+the format of **symbol and type records** that appear within specific streams.

+Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for

+more information about the CodeView format.