| \section{\module{cgi} --- | 
 |          Common Gateway Interface support.} | 
 | \declaremodule{standard}{cgi} | 
 |  | 
 | \modulesynopsis{Common Gateway Interface support, used to interpret | 
 | forms in server-side scripts.} | 
 |  | 
 | \indexii{WWW}{server} | 
 | \indexii{CGI}{protocol} | 
 | \indexii{HTTP}{protocol} | 
 | \indexii{MIME}{headers} | 
 | \index{URL} | 
 |  | 
 |  | 
 | Support module for CGI (Common Gateway Interface) scripts.% | 
 | \index{Common Gateway Interface} | 
 |  | 
 | This module defines a number of utilities for use by CGI scripts | 
 | written in Python. | 
 |  | 
 | \subsection{Introduction} | 
 | \nodename{cgi-intro} | 
 |  | 
 | A CGI script is invoked by an HTTP server, usually to process user | 
 | input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element. | 
 |  | 
 | Most often, CGI scripts live in the server's special \file{cgi-bin} | 
 | directory.  The HTTP server places all sorts of information about the | 
 | request (such as the client's hostname, the requested URL, the query | 
 | string, and lots of other goodies) in the script's shell environment, | 
 | executes the script, and sends the script's output back to the client. | 
 |  | 
 | The script's input is connected to the client too, and sometimes the | 
 | form data is read this way; at other times the form data is passed via | 
 | the ``query string'' part of the URL.  This module is intended | 
 | to take care of the different cases and provide a simpler interface to | 
 | the Python script.  It also provides a number of utilities that help | 
 | in debugging scripts, and the latest addition is support for file | 
 | uploads from a form (if your browser supports it --- Grail 0.3 and | 
 | Netscape 2.0 do). | 
 |  | 
 | The output of a CGI script should consist of two sections, separated | 
 | by a blank line.  The first section contains a number of headers, | 
 | telling the client what kind of data is following.  Python code to | 
 | generate a minimal header section looks like this: | 
 |  | 
 | \begin{verbatim} | 
 | print "Content-Type: text/html"     # HTML is following | 
 | print                               # blank line, end of headers | 
 | \end{verbatim} | 
 |  | 
 | The second section is usually HTML, which allows the client software | 
 | to display nicely formatted text with header, in-line images, etc. | 
 | Here's Python code that prints a simple piece of HTML: | 
 |  | 
 | \begin{verbatim} | 
 | print "<TITLE>CGI script output</TITLE>" | 
 | print "<H1>This is my first CGI script</H1>" | 
 | print "Hello, world!" | 
 | \end{verbatim} | 
 |  | 
 | \subsection{Using the cgi module} | 
 | \nodename{Using the cgi module} | 
 |  | 
 | Begin by writing \samp{import cgi}.  Do not use \samp{from cgi import | 
 | *} --- the module defines all sorts of names for its own use or for | 
 | backward compatibility that you don't want in your namespace. | 
 |  | 
 | It's best to use the \class{FieldStorage} class.  The other classes | 
 | defined in this module are provided mostly for backward compatibility. | 
 | Instantiate it exactly once, without arguments.  This reads the form | 
 | contents from standard input or the environment (depending on the | 
 | value of various environment variables set according to the CGI | 
 | standard).  Since it may consume standard input, it should be | 
 | instantiated only once. | 
 |  | 
 | The \class{FieldStorage} instance can be indexed like a Python | 
 | dictionary, and also supports the standard dictionary methods | 
 | \function{has_key()} and \function{keys()}. | 
 | Form fields containing empty strings are ignored | 
 | and do not appear in the dictionary; to keep such values, provide | 
 | the optional \samp{keep_blank_values} argument when creating the | 
 | \class {FieldStorage} instance. | 
 |  | 
 | For instance, the following code (which assumes that the  | 
 | \code{Content-Type} header and blank line have already been printed) | 
 | checks that the fields \code{name} and \code{addr} are both set to a | 
 | non-empty string: | 
 |  | 
 | \begin{verbatim} | 
 | form = cgi.FieldStorage() | 
 | form_ok = 0 | 
 | if form.has_key("name") and form.has_key("addr"): | 
 |     form_ok = 1 | 
 | if not form_ok: | 
 |     print "<H1>Error</H1>" | 
 |     print "Please fill in the name and addr fields." | 
 |     return | 
 | print "<p>name:", form["name"].value | 
 | print "<p>addr:", form["addr"].value | 
 | ...further form processing here... | 
 | \end{verbatim} | 
 |  | 
 | Here the fields, accessed through \samp{form[\var{key}]}, are | 
 | themselves instances of \class{FieldStorage} (or | 
 | \class{MiniFieldStorage}, depending on the form encoding). | 
 | The \member{value} attribute of the instance yields the string value | 
 | of the field.  The \function{getvalue()} method returns this string value | 
 | directly; it also accepts an optional second argument as a default to | 
 | return if the requested key is not present. | 
 |  | 
 | If the submitted form data contains more than one field with the same | 
 | name, the object retrieved by \samp{form[\var{key}]} is not a | 
 | \class{FieldStorage} or \class{MiniFieldStorage} | 
 | instance but a list of such instances.  Similarly, in this situation, | 
 | \samp{form.getvalue(\var{key})} would return a list of strings. | 
 | If you expect this possibility | 
 | (i.e., when your HTML form contains multiple fields with the same | 
 | name), use the \function{type()} function to determine whether you | 
 | have a single instance or a list of instances.  For example, here's | 
 | code that concatenates any number of username fields, separated by | 
 | commas: | 
 |  | 
 | \begin{verbatim} | 
 | value = form.getvalue("username", "") | 
 | if type(value) is type([]): | 
 |     # Multiple username fields specified | 
 |     usernames = ",".join(value) | 
 | else: | 
 |     # Single or no username field specified | 
 |     usernames = value | 
 | \end{verbatim} | 
 |  | 
 | If a field represents an uploaded file, accessing the value via the | 
 | \member{value} attribute or the \function{getvalue()} method reads the | 
 | entire file in memory as a string.  This may not be what you want. | 
 | You can test for an uploaded file by testing either the \member{filename} | 
 | attribute or the \member{file} attribute.  You can then read the data at | 
 | leisure from the \member{file} attribute: | 
 |  | 
 | \begin{verbatim} | 
 | fileitem = form["userfile"] | 
 | if fileitem.file: | 
 |     # It's an uploaded file; count lines | 
 |     linecount = 0 | 
 |     while 1: | 
 |         line = fileitem.file.readline() | 
 |         if not line: break | 
 |         linecount = linecount + 1 | 
 | \end{verbatim} | 
 |  | 
 | The file upload draft standard entertains the possibility of uploading | 
 | multiple files from one field (using a recursive | 
 | \mimetype{multipart/*} encoding).  When this occurs, the item will be | 
 | a dictionary-like \class{FieldStorage} item.  This can be determined | 
 | by testing its \member{type} attribute, which should be | 
 | \mimetype{multipart/form-data} (or perhaps another MIME type matching | 
 | \mimetype{multipart/*}).  In this case, it can be iterated over | 
 | recursively just like the top-level form object. | 
 |  | 
 | When a form is submitted in the ``old'' format (as the query string or | 
 | as a single data part of type | 
 | \mimetype{application/x-www-form-urlencoded}), the items will actually | 
 | be instances of the class \class{MiniFieldStorage}.  In this case, the | 
 | \member{list}, \member{file}, and \member{filename} attributes are | 
 | always \code{None}. | 
 |  | 
 |  | 
 | \subsection{Old classes} | 
 |  | 
 | These classes, present in earlier versions of the \module{cgi} module, | 
 | are still supported for backward compatibility.  New applications | 
 | should use the \class{FieldStorage} class. | 
 |  | 
 | \class{SvFormContentDict} stores single value form content as | 
 | dictionary; it assumes each field name occurs in the form only once. | 
 |  | 
 | \class{FormContentDict} stores multiple value form content as a | 
 | dictionary (the form items are lists of values).  Useful if your form | 
 | contains multiple fields with the same name. | 
 |  | 
 | Other classes (\class{FormContent}, \class{InterpFormContentDict}) are | 
 | present for backwards compatibility with really old applications only. | 
 | If you still use these and would be inconvenienced when they | 
 | disappeared from a next version of this module, drop me a note. | 
 |  | 
 |  | 
 | \subsection{Functions} | 
 | \nodename{Functions in cgi module} | 
 |  | 
 | These are useful if you want more control, or if you want to employ | 
 | some of the algorithms implemented in this module in other | 
 | circumstances. | 
 |  | 
 | \begin{funcdesc}{parse}{fp} | 
 | Parse a query in the environment or from a file (default | 
 | \code{sys.stdin}). | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values, strict_parsing}} | 
 | Parse a query string given as a string argument (data of type  | 
 | \mimetype{application/x-www-form-urlencoded}).  Data are | 
 | returned as a dictionary.  The dictionary keys are the unique query | 
 | variable names and the values are lists of values for each name. | 
 |  | 
 | The optional argument \var{keep_blank_values} is | 
 | a flag indicating whether blank values in | 
 | URL encoded queries should be treated as blank strings.   | 
 | A true value indicates that blanks should be retained as  | 
 | blank strings.  The default false value indicates that | 
 | blank values are to be ignored and treated as if they were | 
 | not included. | 
 |  | 
 | The optional argument \var{strict_parsing} is a flag indicating what | 
 | to do with parsing errors.  If false (the default), errors | 
 | are silently ignored.  If true, errors raise a ValueError | 
 | exception. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values, strict_parsing}} | 
 | Parse a query string given as a string argument (data of type  | 
 | \mimetype{application/x-www-form-urlencoded}).  Data are | 
 | returned as a list of name, value pairs. | 
 |  | 
 | The optional argument \var{keep_blank_values} is | 
 | a flag indicating whether blank values in | 
 | URL encoded queries should be treated as blank strings.   | 
 | A true value indicates that blanks should be retained as  | 
 | blank strings.  The default false value indicates that | 
 | blank values are to be ignored and treated as if they were | 
 | not included. | 
 |  | 
 | The optional argument \var{strict_parsing} is a flag indicating what | 
 | to do with parsing errors.  If false (the default), errors | 
 | are silently ignored.  If true, errors raise a ValueError | 
 | exception. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{parse_multipart}{fp, pdict} | 
 | Parse input of type \mimetype{multipart/form-data} (for  | 
 | file uploads).  Arguments are \var{fp} for the input file and | 
 | \var{pdict} for a dictionary containing other parameters in | 
 | the \code{Content-Type} header. | 
 |  | 
 | Returns a dictionary just like \function{parse_qs()} keys are the | 
 | field names, each value is a list of values for that field.  This is | 
 | easy to use but not much good if you are expecting megabytes to be | 
 | uploaded --- in that case, use the \class{FieldStorage} class instead | 
 | which is much more flexible. | 
 |  | 
 | Note that this does not parse nested multipart parts --- use | 
 | \class{FieldStorage} for that. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{parse_header}{string} | 
 | Parse a MIME header (such as \code{Content-Type}) into a main | 
 | value and a dictionary of parameters. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{test}{} | 
 | Robust test CGI script, usable as main program. | 
 | Writes minimal HTTP headers and formats all information provided to | 
 | the script in HTML form. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{print_environ}{} | 
 | Format the shell environment in HTML. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{print_form}{form} | 
 | Format a form in HTML. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{print_directory}{} | 
 | Format the current directory in HTML. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{print_environ_usage}{} | 
 | Print a list of useful (used by CGI) environment variables in | 
 | HTML. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{escape}{s\optional{, quote}} | 
 | Convert the characters | 
 | \character{\&}, \character{<} and \character{>} in string \var{s} to | 
 | HTML-safe sequences.  Use this if you need to display text that might | 
 | contain such characters in HTML.  If the optional flag \var{quote} is | 
 | true, the double quote character (\character{"}) is also translated; | 
 | this helps for inclusion in an HTML attribute value, e.g. in \code{<A | 
 | HREF="...">}. | 
 | \end{funcdesc} | 
 |  | 
 |  | 
 | \subsection{Caring about security} | 
 |  | 
 | There's one important rule: if you invoke an external program (e.g. | 
 | via the \function{os.system()} or \function{os.popen()} functions), | 
 | make very sure you don't pass arbitrary strings received from the | 
 | client to the shell.  This is a well-known security hole whereby | 
 | clever hackers anywhere on the web can exploit a gullible CGI script | 
 | to invoke arbitrary shell commands.  Even parts of the URL or field | 
 | names cannot be trusted, since the request doesn't have to come from | 
 | your form! | 
 |  | 
 | To be on the safe side, if you must pass a string gotten from a form | 
 | to a shell command, you should make sure the string contains only | 
 | alphanumeric characters, dashes, underscores, and periods. | 
 |  | 
 |  | 
 | \subsection{Installing your CGI script on a Unix system} | 
 |  | 
 | Read the documentation for your HTTP server and check with your local | 
 | system administrator to find the directory where CGI scripts should be | 
 | installed; usually this is in a directory \file{cgi-bin} in the server tree. | 
 |  | 
 | Make sure that your script is readable and executable by ``others''; the | 
 | \UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755 | 
 | \var{filename}}).  Make sure that the first line of the script contains | 
 | \code{\#!} starting in column 1 followed by the pathname of the Python | 
 | interpreter, for instance: | 
 |  | 
 | \begin{verbatim} | 
 | #!/usr/local/bin/python | 
 | \end{verbatim} | 
 |  | 
 | Make sure the Python interpreter exists and is executable by ``others''. | 
 |  | 
 | Make sure that any files your script needs to read or write are | 
 | readable or writable, respectively, by ``others'' --- their mode | 
 | should be \code{0644} for readable and \code{0666} for writable.  This | 
 | is because, for security reasons, the HTTP server executes your script | 
 | as user ``nobody'', without any special privileges.  It can only read | 
 | (write, execute) files that everybody can read (write, execute).  The | 
 | current directory at execution time is also different (it is usually | 
 | the server's cgi-bin directory) and the set of environment variables | 
 | is also different from what you get at login.  In particular, don't | 
 | count on the shell's search path for executables (\envvar{PATH}) or | 
 | the Python module search path (\envvar{PYTHONPATH}) to be set to | 
 | anything interesting. | 
 |  | 
 | If you need to load modules from a directory which is not on Python's | 
 | default module search path, you can change the path in your script, | 
 | before importing other modules, e.g.: | 
 |  | 
 | \begin{verbatim} | 
 | import sys | 
 | sys.path.insert(0, "/usr/home/joe/lib/python") | 
 | sys.path.insert(0, "/usr/local/lib/python") | 
 | \end{verbatim} | 
 |  | 
 | (This way, the directory inserted last will be searched first!) | 
 |  | 
 | Instructions for non-\UNIX{} systems will vary; check your HTTP server's | 
 | documentation (it will usually have a section on CGI scripts). | 
 |  | 
 |  | 
 | \subsection{Testing your CGI script} | 
 |  | 
 | Unfortunately, a CGI script will generally not run when you try it | 
 | from the command line, and a script that works perfectly from the | 
 | command line may fail mysteriously when run from the server.  There's | 
 | one reason why you should still test your script from the command | 
 | line: if it contains a syntax error, the Python interpreter won't | 
 | execute it at all, and the HTTP server will most likely send a cryptic | 
 | error to the client. | 
 |  | 
 | Assuming your script has no syntax errors, yet it does not work, you | 
 | have no choice but to read the next section. | 
 |  | 
 |  | 
 | \subsection{Debugging CGI scripts} | 
 |  | 
 | First of all, check for trivial installation errors --- reading the | 
 | section above on installing your CGI script carefully can save you a | 
 | lot of time.  If you wonder whether you have understood the | 
 | installation procedure correctly, try installing a copy of this module | 
 | file (\file{cgi.py}) as a CGI script.  When invoked as a script, the file | 
 | will dump its environment and the contents of the form in HTML form. | 
 | Give it the right mode etc, and send it a request.  If it's installed | 
 | in the standard \file{cgi-bin} directory, it should be possible to send it a | 
 | request by entering a URL into your browser of the form: | 
 |  | 
 | \begin{verbatim} | 
 | http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home | 
 | \end{verbatim} | 
 |  | 
 | If this gives an error of type 404, the server cannot find the script | 
 | -- perhaps you need to install it in a different directory.  If it | 
 | gives another error (e.g.  500), there's an installation problem that | 
 | you should fix before trying to go any further.  If you get a nicely | 
 | formatted listing of the environment and form content (in this | 
 | example, the fields should be listed as ``addr'' with value ``At Home'' | 
 | and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been | 
 | installed correctly.  If you follow the same procedure for your own | 
 | script, you should now be able to debug it. | 
 |  | 
 | The next step could be to call the \module{cgi} module's | 
 | \function{test()} function from your script: replace its main code | 
 | with the single statement | 
 |  | 
 | \begin{verbatim} | 
 | cgi.test() | 
 | \end{verbatim} | 
 |  | 
 | This should produce the same results as those gotten from installing | 
 | the \file{cgi.py} file itself. | 
 |  | 
 | When an ordinary Python script raises an unhandled exception | 
 | (e.g. because of a typo in a module name, a file that can't be opened, | 
 | etc.), the Python interpreter prints a nice traceback and exits. | 
 | While the Python interpreter will still do this when your CGI script | 
 | raises an exception, most likely the traceback will end up in one of | 
 | the HTTP server's log file, or be discarded altogether. | 
 |  | 
 | Fortunately, once you have managed to get your script to execute | 
 | \emph{some} code, it is easy to catch exceptions and cause a traceback | 
 | to be printed.  The \function{test()} function below in this module is | 
 | an example.  Here are the rules: | 
 |  | 
 | \begin{enumerate} | 
 | \item Import the traceback module before entering the \keyword{try} | 
 |    ... \keyword{except} statement | 
 |  | 
 | \item Assign \code{sys.stderr} to be \code{sys.stdout} | 
 |  | 
 | \item Make sure you finish printing the headers and the blank line | 
 |    early | 
 |  | 
 | \item Wrap all remaining code in a \keyword{try} ... \keyword{except} | 
 |    statement | 
 |  | 
 | \item In the except clause, call \function{traceback.print_exc()} | 
 | \end{enumerate} | 
 |  | 
 | For example: | 
 |  | 
 | \begin{verbatim} | 
 | import sys | 
 | import traceback | 
 | print "Content-Type: text/html" | 
 | print | 
 | sys.stderr = sys.stdout | 
 | try: | 
 |     ...your code here... | 
 | except: | 
 |     print "\n\n<PRE>" | 
 |     traceback.print_exc() | 
 | \end{verbatim} | 
 |  | 
 | Notes: The assignment to \code{sys.stderr} is needed because the | 
 | traceback prints to \code{sys.stderr}. | 
 | The \code{print "{\e}n{\e}n<PRE>"} statement is necessary to | 
 | disable the word wrapping in HTML. | 
 |  | 
 | If you suspect that there may be a problem in importing the traceback | 
 | module, you can use an even more robust approach (which only uses | 
 | built-in modules): | 
 |  | 
 | \begin{verbatim} | 
 | import sys | 
 | sys.stderr = sys.stdout | 
 | print "Content-Type: text/plain" | 
 | print | 
 | ...your code here... | 
 | \end{verbatim} | 
 |  | 
 | This relies on the Python interpreter to print the traceback.  The | 
 | content type of the output is set to plain text, which disables all | 
 | HTML processing.  If your script works, the raw HTML will be displayed | 
 | by your client.  If it raises an exception, most likely after the | 
 | first two lines have been printed, a traceback will be displayed. | 
 | Because no HTML interpretation is going on, the traceback will | 
 | readable. | 
 |  | 
 |  | 
 | \subsection{Common problems and solutions} | 
 |  | 
 | \begin{itemize} | 
 | \item Most HTTP servers buffer the output from CGI scripts until the | 
 | script is completed.  This means that it is not possible to display a | 
 | progress report on the client's display while the script is running. | 
 |  | 
 | \item Check the installation instructions above. | 
 |  | 
 | \item Check the HTTP server's log files.  (\samp{tail -f logfile} in a | 
 | separate window may be useful!) | 
 |  | 
 | \item Always check a script for syntax errors first, by doing something | 
 | like \samp{python script.py}. | 
 |  | 
 | \item When using any of the debugging techniques, don't forget to add | 
 | \samp{import sys} to the top of the script. | 
 |  | 
 | \item When invoking external programs, make sure they can be found. | 
 | Usually, this means using absolute path names --- \envvar{PATH} is | 
 | usually not set to a very useful value in a CGI script. | 
 |  | 
 | \item When reading or writing external files, make sure they can be read | 
 | or written by every user on the system. | 
 |  | 
 | \item Don't try to give a CGI script a set-uid mode.  This doesn't work on | 
 | most systems, and is a security liability as well. | 
 | \end{itemize} | 
 |  |