blob: f115263de1c120069326bb454702aaa84d23a994 [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{cgi}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-cgi}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003\stmodindex{cgi}
4\indexii{WWW}{server}
5\indexii{CGI}{protocol}
6\indexii{HTTP}{protocol}
7\indexii{MIME}{headers}
8\index{URL}
9
Fred Drake19479911998-02-13 06:58:54 +000010\setindexsubitem{(in module cgi)}
Guido van Rossum86751151995-02-28 17:14:32 +000011
Guido van Rossuma29cc971996-07-30 18:22:07 +000012Support module for CGI (Common Gateway Interface) scripts.
Guido van Rossuma12ef941995-02-27 17:53:25 +000013
Guido van Rossuma29cc971996-07-30 18:22:07 +000014This module defines a number of utilities for use by CGI scripts
15written in Python.
Guido van Rossuma12ef941995-02-27 17:53:25 +000016
Guido van Rossuma29cc971996-07-30 18:22:07 +000017\subsection{Introduction}
18\nodename{Introduction to the CGI module}
Guido van Rossuma12ef941995-02-27 17:53:25 +000019
Guido van Rossuma29cc971996-07-30 18:22:07 +000020A CGI script is invoked by an HTTP server, usually to process user
21input submitted through an HTML \code{<FORM>} or \code{<ISINPUT>} element.
22
Fred Drakea2e268a1997-12-09 03:28:42 +000023Most often, CGI scripts live in the server's special \file{cgi-bin}
Guido van Rossuma29cc971996-07-30 18:22:07 +000024directory. The HTTP server places all sorts of information about the
25request (such as the client's hostname, the requested URL, the query
26string, and lots of other goodies) in the script's shell environment,
27executes the script, and sends the script's output back to the client.
28
29The script's input is connected to the client too, and sometimes the
30form data is read this way; at other times the form data is passed via
Fred Drakea2e268a1997-12-09 03:28:42 +000031the ``query string'' part of the URL. This module (\file{cgi.py}) is intended
Guido van Rossuma29cc971996-07-30 18:22:07 +000032to take care of the different cases and provide a simpler interface to
33the Python script. It also provides a number of utilities that help
34in debugging scripts, and the latest addition is support for file
35uploads from a form (if your browser supports it -- Grail 0.3 and
36Netscape 2.0 do).
37
38The output of a CGI script should consist of two sections, separated
39by a blank line. The first section contains a number of headers,
40telling the client what kind of data is following. Python code to
41generate a minimal header section looks like this:
Guido van Rossuma12ef941995-02-27 17:53:25 +000042
Fred Drake19479911998-02-13 06:58:54 +000043\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +000044print "Content-type: text/html" # HTML is following
45print # blank line, end of headers
Fred Drake19479911998-02-13 06:58:54 +000046\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +000047%
Guido van Rossuma29cc971996-07-30 18:22:07 +000048The second section is usually HTML, which allows the client software
49to display nicely formatted text with header, in-line images, etc.
50Here's Python code that prints a simple piece of HTML:
Guido van Rossum470be141995-03-17 16:07:09 +000051
Fred Drake19479911998-02-13 06:58:54 +000052\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +000053print "<TITLE>CGI script output</TITLE>"
54print "<H1>This is my first CGI script</H1>"
55print "Hello, world!"
Fred Drake19479911998-02-13 06:58:54 +000056\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +000057%
Guido van Rossuma29cc971996-07-30 18:22:07 +000058(It may not be fully legal HTML according to the letter of the
59standard, but any browser will understand it.)
Guido van Rossum470be141995-03-17 16:07:09 +000060
Guido van Rossuma29cc971996-07-30 18:22:07 +000061\subsection{Using the cgi module}
62\nodename{Using the cgi module}
63
64Begin by writing \code{import cgi}. Don't use \code{from cgi import *} -- the
65module defines all sorts of names for its own use or for backward
66compatibility that you don't want in your namespace.
67
68It's best to use the \code{FieldStorage} class. The other classes define in this
69module are provided mostly for backward compatibility. Instantiate it
70exactly once, without arguments. This reads the form contents from
71standard input or the environment (depending on the value of various
72environment variables set according to the CGI standard). Since it may
73consume standard input, it should be instantiated only once.
74
75The \code{FieldStorage} instance can be accessed as if it were a Python
76dictionary. For instance, the following code (which assumes that the
77\code{Content-type} header and blank line have already been printed) checks that
78the fields \code{name} and \code{addr} are both set to a non-empty string:
Guido van Rossum470be141995-03-17 16:07:09 +000079
Fred Drake19479911998-02-13 06:58:54 +000080\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +000081form = cgi.FieldStorage()
82form_ok = 0
83if form.has_key("name") and form.has_key("addr"):
84 if form["name"].value != "" and form["addr"].value != "":
85 form_ok = 1
86if not form_ok:
87 print "<H1>Error</H1>"
88 print "Please fill in the name and addr fields."
89 return
90...further form processing here...
Fred Drake19479911998-02-13 06:58:54 +000091\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +000092%
Guido van Rossuma29cc971996-07-30 18:22:07 +000093Here the fields, accessed through \code{form[key]}, are themselves instances
94of \code{FieldStorage} (or \code{MiniFieldStorage}, depending on the form encoding).
Guido van Rossum470be141995-03-17 16:07:09 +000095
Guido van Rossuma29cc971996-07-30 18:22:07 +000096If the submitted form data contains more than one field with the same
97name, the object retrieved by \code{form[key]} is not a \code{(Mini)FieldStorage}
98instance but a list of such instances. If you expect this possibility
99(i.e., when your HTML form comtains multiple fields with the same
100name), use the \code{type()} function to determine whether you have a single
101instance or a list of instances. For example, here's code that
102concatenates any number of username fields, separated by commas:
Guido van Rossum470be141995-03-17 16:07:09 +0000103
Fred Drake19479911998-02-13 06:58:54 +0000104\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000105username = form["username"]
106if type(username) is type([]):
107 # Multiple username fields specified
108 usernames = ""
109 for item in username:
110 if usernames:
111 # Next item -- insert comma
112 usernames = usernames + "," + item.value
113 else:
114 # First item -- don't insert comma
115 usernames = item.value
116else:
117 # Single username field specified
118 usernames = username.value
Fred Drake19479911998-02-13 06:58:54 +0000119\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000120%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000121If a field represents an uploaded file, the value attribute reads the
122entire file in memory as a string. This may not be what you want. You can
123test for an uploaded file by testing either the filename attribute or the
124file attribute. You can then read the data at leasure from the file
125attribute:
126
Fred Drake19479911998-02-13 06:58:54 +0000127\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000128fileitem = form["userfile"]
129if fileitem.file:
130 # It's an uploaded file; count lines
131 linecount = 0
132 while 1:
133 line = fileitem.file.readline()
134 if not line: break
135 linecount = linecount + 1
Fred Drake19479911998-02-13 06:58:54 +0000136\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000137%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000138The file upload draft standard entertains the possibility of uploading
139multiple files from one field (using a recursive \code{multipart/*}
140encoding). When this occurs, the item will be a dictionary-like
141FieldStorage item. This can be determined by testing its type
142attribute, which should have the value \code{multipart/form-data} (or
143perhaps another string beginning with \code{multipart/} It this case, it
144can be iterated over recursively just like the top-level form object.
145
146When a form is submitted in the ``old'' format (as the query string or as a
147single data part of type \code{application/x-www-form-urlencoded}), the items
148will actually be instances of the class \code{MiniFieldStorage}. In this case,
149the list, file and filename attributes are always \code{None}.
150
151
152\subsection{Old classes}
153
154These classes, present in earlier versions of the \code{cgi} module, are still
Guido van Rossuma5a4c2a1996-10-24 14:47:44 +0000155supported for backward compatibility. New applications should use the
156FieldStorage class.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000157
Guido van Rossum81e479a1997-08-25 18:28:03 +0000158\code{SvFormContentDict}
159single value form content as dictionary; assumes each
Guido van Rossuma29cc971996-07-30 18:22:07 +0000160field name occurs in the form only once.
161
Guido van Rossum81e479a1997-08-25 18:28:03 +0000162\code{FormContentDict}
163multiple value form content as dictionary (the form
Guido van Rossuma29cc971996-07-30 18:22:07 +0000164items are lists of values). Useful if your form contains multiple
165fields with the same name.
166
167Other classes (\code{FormContent}, \code{InterpFormContentDict}) are present for
168backwards compatibility with really old applications only. If you still
169use these and would be inconvenienced when they disappeared from a next
170version of this module, drop me a note.
171
172
173\subsection{Functions}
Fred Drake4b3f0311996-12-13 22:04:31 +0000174\nodename{Functions in cgi module}
Guido van Rossuma29cc971996-07-30 18:22:07 +0000175
176These are useful if you want more control, or if you want to employ
177some of the algorithms implemented in this module in other
178circumstances.
179
Guido van Rossum81e479a1997-08-25 18:28:03 +0000180\begin{funcdesc}{parse}{fp}
181Parse a query in the environment or from a file (default \code{sys.stdin}).
Guido van Rossuma29cc971996-07-30 18:22:07 +0000182\end{funcdesc}
183
Guido van Rossum81e479a1997-08-25 18:28:03 +0000184\begin{funcdesc}{parse_qs}{qs}
185parse a query string given as a string argument (data of type
Guido van Rossuma29cc971996-07-30 18:22:07 +0000186\code{application/x-www-form-urlencoded}).
187\end{funcdesc}
188
Guido van Rossum81e479a1997-08-25 18:28:03 +0000189\begin{funcdesc}{parse_multipart}{fp\, pdict}
190parse input of type \code{multipart/form-data} (for
Guido van Rossuma29cc971996-07-30 18:22:07 +0000191file uploads). Arguments are \code{fp} for the input file and
192 \code{pdict} for the dictionary containing other parameters of \code{content-type} header
193
Guido van Rossum81e479a1997-08-25 18:28:03 +0000194 Returns a dictionary just like \code{parse_qs()}
195keys are the field names, each
Guido van Rossuma29cc971996-07-30 18:22:07 +0000196 value is a list of values for that field. This is easy to use but not
197 much good if you are expecting megabytes to be uploaded -- in that case,
198 use the \code{FieldStorage} class instead which is much more flexible. Note
199 that \code{content-type} is the raw, unparsed contents of the \code{content-type}
200 header.
201
202 Note that this does not parse nested multipart parts -- use \code{FieldStorage} for
203 that.
204\end{funcdesc}
205
Guido van Rossum81e479a1997-08-25 18:28:03 +0000206\begin{funcdesc}{parse_header}{string}
207parse a header like \code{Content-type} into a main
Guido van Rossuma29cc971996-07-30 18:22:07 +0000208content-type and a dictionary of parameters.
209\end{funcdesc}
210
Guido van Rossum81e479a1997-08-25 18:28:03 +0000211\begin{funcdesc}{test}{}
212robust test CGI script, usable as main program.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000213 Writes minimal HTTP headers and formats all information provided to
214 the script in HTML form.
215\end{funcdesc}
216
Guido van Rossum81e479a1997-08-25 18:28:03 +0000217\begin{funcdesc}{print_environ}{}
218format the shell environment in HTML.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000219\end{funcdesc}
220
Guido van Rossum81e479a1997-08-25 18:28:03 +0000221\begin{funcdesc}{print_form}{form}
222format a form in HTML.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000223\end{funcdesc}
224
Guido van Rossum81e479a1997-08-25 18:28:03 +0000225\begin{funcdesc}{print_directory}{}
226format the current directory in HTML.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000227\end{funcdesc}
228
Guido van Rossum81e479a1997-08-25 18:28:03 +0000229\begin{funcdesc}{print_environ_usage}{}
230print a list of useful (used by CGI) environment variables in
Guido van Rossuma29cc971996-07-30 18:22:07 +0000231HTML.
232\end{funcdesc}
233
Guido van Rossum81e479a1997-08-25 18:28:03 +0000234\begin{funcdesc}{escape}{s\optional{\, quote}}
235convert the characters
Guido van Rossum6576dd61997-07-19 20:16:07 +0000236``\code{\&}'', ``\code{<}'' and ``\code{>}'' in string \var{s} to HTML-safe
Guido van Rossuma29cc971996-07-30 18:22:07 +0000237sequences. Use this if you need to display text that might contain
Guido van Rossum6576dd61997-07-19 20:16:07 +0000238such characters in HTML. If the optional flag \var{quote} is true,
239the double quote character (\code{"}) is also translated; this helps
240for inclusion in an HTML attribute value, e.g. in ``\code{<A HREF="...">}''.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000241\end{funcdesc}
242
243
244\subsection{Caring about security}
245
246There's one important rule: if you invoke an external program (e.g.
247via the \code{os.system()} or \code{os.popen()} functions), make very sure you don't
248pass arbitrary strings received from the client to the shell. This is
249a well-known security hole whereby clever hackers anywhere on the web
250can exploit a gullible CGI script to invoke arbitrary shell commands.
251Even parts of the URL or field names cannot be trusted, since the
252request doesn't have to come from your form!
253
254To be on the safe side, if you must pass a string gotten from a form
255to a shell command, you should make sure the string contains only
256alphanumeric characters, dashes, underscores, and periods.
257
258
259\subsection{Installing your CGI script on a Unix system}
260
261Read the documentation for your HTTP server and check with your local
262system administrator to find the directory where CGI scripts should be
Fred Drakea2e268a1997-12-09 03:28:42 +0000263installed; usually this is in a directory \file{cgi-bin} in the server tree.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000264
265Make sure that your script is readable and executable by ``others''; the
Fred Drakeefc1e0f1998-01-13 19:00:33 +0000266\UNIX{} file mode should be 755 (use \code{chmod 755 filename}). Make sure
Guido van Rossuma29cc971996-07-30 18:22:07 +0000267that the first line of the script contains \code{\#!} starting in column 1
268followed by the pathname of the Python interpreter, for instance:
269
Fred Drake19479911998-02-13 06:58:54 +0000270\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000271#!/usr/local/bin/python
Fred Drake19479911998-02-13 06:58:54 +0000272\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000273%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000274Make sure the Python interpreter exists and is executable by ``others''.
275
276Make sure that any files your script needs to read or write are
277readable or writable, respectively, by ``others'' -- their mode should
278be 644 for readable and 666 for writable. This is because, for
279security reasons, the HTTP server executes your script as user
280``nobody'', without any special privileges. It can only read (write,
281execute) files that everybody can read (write, execute). The current
282directory at execution time is also different (it is usually the
283server's cgi-bin directory) and the set of environment variables is
284also different from what you get at login. in particular, don't count
285on the shell's search path for executables (\code{\$PATH}) or the Python
286module search path (\code{\$PYTHONPATH}) to be set to anything interesting.
287
288If you need to load modules from a directory which is not on Python's
289default module search path, you can change the path in your script,
290before importing other modules, e.g.:
291
Fred Drake19479911998-02-13 06:58:54 +0000292\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000293import sys
294sys.path.insert(0, "/usr/home/joe/lib/python")
295sys.path.insert(0, "/usr/local/lib/python")
Fred Drake19479911998-02-13 06:58:54 +0000296\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000297%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000298(This way, the directory inserted last will be searched first!)
299
Fred Drakeefc1e0f1998-01-13 19:00:33 +0000300Instructions for non-\UNIX{} systems will vary; check your HTTP server's
Guido van Rossuma29cc971996-07-30 18:22:07 +0000301documentation (it will usually have a section on CGI scripts).
302
303
304\subsection{Testing your CGI script}
305
306Unfortunately, a CGI script will generally not run when you try it
307from the command line, and a script that works perfectly from the
308command line may fail mysteriously when run from the server. There's
309one reason why you should still test your script from the command
310line: if it contains a syntax error, the python interpreter won't
311execute it at all, and the HTTP server will most likely send a cryptic
312error to the client.
313
314Assuming your script has no syntax errors, yet it does not work, you
315have no choice but to read the next section:
316
317
318\subsection{Debugging CGI scripts}
319
320First of all, check for trivial installation errors -- reading the
321section above on installing your CGI script carefully can save you a
322lot of time. If you wonder whether you have understood the
323installation procedure correctly, try installing a copy of this module
Fred Drakea2e268a1997-12-09 03:28:42 +0000324file (\file{cgi.py}) as a CGI script. When invoked as a script, the file
Guido van Rossuma29cc971996-07-30 18:22:07 +0000325will dump its environment and the contents of the form in HTML form.
326Give it the right mode etc, and send it a request. If it's installed
Fred Drakea2e268a1997-12-09 03:28:42 +0000327in the standard \file{cgi-bin} directory, it should be possible to send it a
Guido van Rossuma29cc971996-07-30 18:22:07 +0000328request by entering a URL into your browser of the form:
329
Fred Drake19479911998-02-13 06:58:54 +0000330\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000331http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
Fred Drake19479911998-02-13 06:58:54 +0000332\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000333%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000334If this gives an error of type 404, the server cannot find the script
335-- perhaps you need to install it in a different directory. If it
336gives another error (e.g. 500), there's an installation problem that
337you should fix before trying to go any further. If you get a nicely
338formatted listing of the environment and form content (in this
339example, the fields should be listed as ``addr'' with value ``At Home''
Fred Drakea2e268a1997-12-09 03:28:42 +0000340and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
Guido van Rossuma29cc971996-07-30 18:22:07 +0000341installed correctly. If you follow the same procedure for your own
342script, you should now be able to debug it.
343
Fred Drakea2e268a1997-12-09 03:28:42 +0000344The next step could be to call the \code{cgi} module's \code{test()}
345function from your script: replace its main code with the single
346statement
Guido van Rossuma29cc971996-07-30 18:22:07 +0000347
Fred Drake19479911998-02-13 06:58:54 +0000348\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000349cgi.test()
Fred Drake19479911998-02-13 06:58:54 +0000350\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000351%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000352This should produce the same results as those gotten from installing
Fred Drakea2e268a1997-12-09 03:28:42 +0000353the \file{cgi.py} file itself.
Guido van Rossuma29cc971996-07-30 18:22:07 +0000354
355When an ordinary Python script raises an unhandled exception
356(e.g. because of a typo in a module name, a file that can't be opened,
357etc.), the Python interpreter prints a nice traceback and exits.
358While the Python interpreter will still do this when your CGI script
359raises an exception, most likely the traceback will end up in one of
360the HTTP server's log file, or be discarded altogether.
361
362Fortunately, once you have managed to get your script to execute
363*some* code, it is easy to catch exceptions and cause a traceback to
364be printed. The \code{test()} function below in this module is an example.
365Here are the rules:
366
367\begin{enumerate}
368 \item Import the traceback module (before entering the
369 try-except!)
370
371 \item Make sure you finish printing the headers and the blank
372 line early
373
374 \item Assign \code{sys.stderr} to \code{sys.stdout}
375
376 \item Wrap all remaining code in a try-except statement
377
378 \item In the except clause, call \code{traceback.print_exc()}
379\end{enumerate}
380
381For example:
382
Fred Drake19479911998-02-13 06:58:54 +0000383\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000384import sys
385import traceback
386print "Content-type: text/html"
387print
388sys.stderr = sys.stdout
389try:
390 ...your code here...
391except:
392 print "\n\n<PRE>"
393 traceback.print_exc()
Fred Drake19479911998-02-13 06:58:54 +0000394\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000395%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000396Notes: The assignment to \code{sys.stderr} is needed because the traceback
Guido van Rossum9d62e801997-11-25 00:35:44 +0000397prints to \code{sys.stderr}.
398The \code{print "{\e}n{\e}n<PRE>"} statement is necessary to
Guido van Rossuma29cc971996-07-30 18:22:07 +0000399disable the word wrapping in HTML.
400
401If you suspect that there may be a problem in importing the traceback
402module, you can use an even more robust approach (which only uses
403built-in modules):
404
Fred Drake19479911998-02-13 06:58:54 +0000405\begin{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000406import sys
407sys.stderr = sys.stdout
408print "Content-type: text/plain"
409print
410...your code here...
Fred Drake19479911998-02-13 06:58:54 +0000411\end{verbatim}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000412%
Guido van Rossuma29cc971996-07-30 18:22:07 +0000413This relies on the Python interpreter to print the traceback. The
414content type of the output is set to plain text, which disables all
415HTML processing. If your script works, the raw HTML will be displayed
416by your client. If it raises an exception, most likely after the
417first two lines have been printed, a traceback will be displayed.
418Because no HTML interpretation is going on, the traceback will
419readable.
420
421
422\subsection{Common problems and solutions}
Guido van Rossum470be141995-03-17 16:07:09 +0000423
424\begin{itemize}
Guido van Rossuma29cc971996-07-30 18:22:07 +0000425\item Most HTTP servers buffer the output from CGI scripts until the
426script is completed. This means that it is not possible to display a
427progress report on the client's display while the script is running.
428
429\item Check the installation instructions above.
430
431\item Check the HTTP server's log files. (\code{tail -f logfile} in a separate
432window may be useful!)
433
434\item Always check a script for syntax errors first, by doing something
435like \code{python script.py}.
436
437\item When using any of the debugging techniques, don't forget to add
438\code{import sys} to the top of the script.
439
440\item When invoking external programs, make sure they can be found.
441Usually, this means using absolute path names -- \code{\$PATH} is usually not
442set to a very useful value in a CGI script.
443
444\item When reading or writing external files, make sure they can be read
445or written by every user on the system.
446
447\item Don't try to give a CGI script a set-uid mode. This doesn't work on
448most systems, and is a security liability as well.
Guido van Rossum470be141995-03-17 16:07:09 +0000449\end{itemize}
450