blob: 29ed5459e3da6b022ec4505e8d2b5bf3724753da [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`cgi` --- Common Gateway Interface support.
3================================================
4
5.. module:: cgi
6 :synopsis: Helpers for running Python scripts via the Common Gateway Interface.
7
8
9.. index::
10 pair: WWW; server
11 pair: CGI; protocol
12 pair: HTTP; protocol
13 pair: MIME; headers
14 single: URL
15 single: Common Gateway Interface
16
17Support module for Common Gateway Interface (CGI) scripts.
18
19This module defines a number of utilities for use by CGI scripts written in
20Python.
21
22
23Introduction
24------------
25
26.. _cgi-intro:
27
28A CGI script is invoked by an HTTP server, usually to process user input
29submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element.
30
31Most often, CGI scripts live in the server's special :file:`cgi-bin` directory.
32The HTTP server places all sorts of information about the request (such as the
33client's hostname, the requested URL, the query string, and lots of other
34goodies) in the script's shell environment, executes the script, and sends the
35script's output back to the client.
36
37The script's input is connected to the client too, and sometimes the form data
38is read this way; at other times the form data is passed via the "query string"
39part of the URL. This module is intended to take care of the different cases
40and provide a simpler interface to the Python script. It also provides a number
41of utilities that help in debugging scripts, and the latest addition is support
42for file uploads from a form (if your browser supports it).
43
44The output of a CGI script should consist of two sections, separated by a blank
45line. The first section contains a number of headers, telling the client what
46kind of data is following. Python code to generate a minimal header section
47looks like this::
48
49 print "Content-Type: text/html" # HTML is following
50 print # blank line, end of headers
51
52The second section is usually HTML, which allows the client software to display
53nicely formatted text with header, in-line images, etc. Here's Python code that
54prints a simple piece of HTML::
55
56 print "<TITLE>CGI script output</TITLE>"
57 print "<H1>This is my first CGI script</H1>"
58 print "Hello, world!"
59
60
61.. _using-the-cgi-module:
62
63Using the cgi module
64--------------------
65
66Begin by writing ``import cgi``. Do not use ``from cgi import *`` --- the
67module defines all sorts of names for its own use or for backward compatibility
68that you don't want in your namespace.
69
70When you write a new script, consider adding the line::
71
72 import cgitb; cgitb.enable()
73
74This activates a special exception handler that will display detailed reports in
75the Web browser if any errors occur. If you'd rather not show the guts of your
76program to users of your script, you can have the reports saved to files
77instead, with a line like this::
78
79 import cgitb; cgitb.enable(display=0, logdir="/tmp")
80
81It's very helpful to use this feature during script development. The reports
82produced by :mod:`cgitb` provide information that can save you a lot of time in
83tracking down bugs. You can always remove the ``cgitb`` line later when you
84have tested your script and are confident that it works correctly.
85
86To get at submitted form data, it's best to use the :class:`FieldStorage` class.
87The other classes defined in this module are provided mostly for backward
88compatibility. Instantiate it exactly once, without arguments. This reads the
89form contents from standard input or the environment (depending on the value of
90various environment variables set according to the CGI standard). Since it may
91consume standard input, it should be instantiated only once.
92
93The :class:`FieldStorage` instance can be indexed like a Python dictionary, and
94also supports the standard dictionary methods :meth:`has_key` and :meth:`keys`.
95The built-in :func:`len` is also supported. Form fields containing empty
96strings are ignored and do not appear in the dictionary; to keep such values,
97provide a true value for the optional *keep_blank_values* keyword parameter when
98creating the :class:`FieldStorage` instance.
99
100For instance, the following code (which assumes that the
101:mailheader:`Content-Type` header and blank line have already been printed)
102checks that the fields ``name`` and ``addr`` are both set to a non-empty
103string::
104
105 form = cgi.FieldStorage()
106 if not (form.has_key("name") and form.has_key("addr")):
107 print "<H1>Error</H1>"
108 print "Please fill in the name and addr fields."
109 return
110 print "<p>name:", form["name"].value
111 print "<p>addr:", form["addr"].value
112 ...further form processing here...
113
114Here the fields, accessed through ``form[key]``, are themselves instances of
115:class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form
116encoding). The :attr:`value` attribute of the instance yields the string value
117of the field. The :meth:`getvalue` method returns this string value directly;
118it also accepts an optional second argument as a default to return if the
119requested key is not present.
120
121If the submitted form data contains more than one field with the same name, the
122object retrieved by ``form[key]`` is not a :class:`FieldStorage` or
123:class:`MiniFieldStorage` instance but a list of such instances. Similarly, in
124this situation, ``form.getvalue(key)`` would return a list of strings. If you
125expect this possibility (when your HTML form contains multiple fields with the
126same name), use the :func:`getlist` function, which always returns a list of
127values (so that you do not need to special-case the single item case). For
128example, this code concatenates any number of username fields, separated by
129commas::
130
131 value = form.getlist("username")
132 usernames = ",".join(value)
133
134If a field represents an uploaded file, accessing the value via the
135:attr:`value` attribute or the :func:`getvalue` method reads the entire file in
136memory as a string. This may not be what you want. You can test for an uploaded
137file by testing either the :attr:`filename` attribute or the :attr:`file`
138attribute. You can then read the data at leisure from the :attr:`file`
139attribute::
140
141 fileitem = form["userfile"]
142 if fileitem.file:
143 # It's an uploaded file; count lines
144 linecount = 0
145 while 1:
146 line = fileitem.file.readline()
147 if not line: break
148 linecount = linecount + 1
149
150The file upload draft standard entertains the possibility of uploading multiple
151files from one field (using a recursive :mimetype:`multipart/\*` encoding).
152When this occurs, the item will be a dictionary-like :class:`FieldStorage` item.
153This can be determined by testing its :attr:`type` attribute, which should be
154:mimetype:`multipart/form-data` (or perhaps another MIME type matching
155:mimetype:`multipart/\*`). In this case, it can be iterated over recursively
156just like the top-level form object.
157
158When a form is submitted in the "old" format (as the query string or as a single
159data part of type :mimetype:`application/x-www-form-urlencoded`), the items will
160actually be instances of the class :class:`MiniFieldStorage`. In this case, the
161:attr:`list`, :attr:`file`, and :attr:`filename` attributes are always ``None``.
162
163
164Higher Level Interface
165----------------------
166
167.. versionadded:: 2.2
168
169The previous section explains how to read CGI form data using the
170:class:`FieldStorage` class. This section describes a higher level interface
171which was added to this class to allow one to do it in a more readable and
172intuitive way. The interface doesn't make the techniques described in previous
173sections obsolete --- they are still useful to process file uploads efficiently,
174for example.
175
176.. % XXX: Is this true ?
177
178The interface consists of two simple methods. Using the methods you can process
179form data in a generic way, without the need to worry whether only one or more
180values were posted under one name.
181
182In the previous section, you learned to write following code anytime you
183expected a user to post more than one value under one name::
184
185 item = form.getvalue("item")
186 if isinstance(item, list):
187 # The user is requesting more than one item.
188 else:
189 # The user is requesting only one item.
190
191This situation is common for example when a form contains a group of multiple
192checkboxes with the same name::
193
194 <input type="checkbox" name="item" value="1" />
195 <input type="checkbox" name="item" value="2" />
196
197In most situations, however, there's only one form control with a particular
198name in a form and then you expect and need only one value associated with this
199name. So you write a script containing for example this code::
200
201 user = form.getvalue("user").upper()
202
203The problem with the code is that you should never expect that a client will
204provide valid input to your scripts. For example, if a curious user appends
205another ``user=foo`` pair to the query string, then the script would crash,
206because in this situation the ``getvalue("user")`` method call returns a list
207instead of a string. Calling the :meth:`toupper` method on a list is not valid
208(since lists do not have a method of this name) and results in an
209:exc:`AttributeError` exception.
210
211Therefore, the appropriate way to read form data values was to always use the
212code which checks whether the obtained value is a single value or a list of
213values. That's annoying and leads to less readable scripts.
214
215A more convenient approach is to use the methods :meth:`getfirst` and
216:meth:`getlist` provided by this higher level interface.
217
218
219.. method:: FieldStorage.getfirst(name[, default])
220
221 This method always returns only one value associated with form field *name*.
222 The method returns only the first value in case that more values were posted
223 under such name. Please note that the order in which the values are received
224 may vary from browser to browser and should not be counted on. [#]_ If no such
225 form field or value exists then the method returns the value specified by the
226 optional parameter *default*. This parameter defaults to ``None`` if not
227 specified.
228
229
230.. method:: FieldStorage.getlist(name)
231
232 This method always returns a list of values associated with form field *name*.
233 The method returns an empty list if no such form field or value exists for
234 *name*. It returns a list consisting of one item if only one such value exists.
235
236Using these methods you can write nice compact code::
237
238 import cgi
239 form = cgi.FieldStorage()
240 user = form.getfirst("user", "").upper() # This way it's safe.
241 for item in form.getlist("item"):
242 do_something(item)
243
244
245Old classes
246-----------
247
248These classes, present in earlier versions of the :mod:`cgi` module, are still
249supported for backward compatibility. New applications should use the
250:class:`FieldStorage` class.
251
252:class:`SvFormContentDict` stores single value form content as dictionary; it
253assumes each field name occurs in the form only once.
254
255:class:`FormContentDict` stores multiple value form content as a dictionary (the
256form items are lists of values). Useful if your form contains multiple fields
257with the same name.
258
259Other classes (:class:`FormContent`, :class:`InterpFormContentDict`) are present
260for backwards compatibility with really old applications only. If you still use
261these and would be inconvenienced when they disappeared from a next version of
262this module, drop me a note.
263
264
265.. _functions-in-cgi-module:
266
267Functions
268---------
269
270These are useful if you want more control, or if you want to employ some of the
271algorithms implemented in this module in other circumstances.
272
273
274.. function:: parse(fp[, keep_blank_values[, strict_parsing]])
275
276 Parse a query in the environment or from a file (the file defaults to
277 ``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
278 passed to :func:`parse_qs` unchanged.
279
280
281.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]])
282
283 Parse a query string given as a string argument (data of type
284 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
285 dictionary. The dictionary keys are the unique query variable names and the
286 values are lists of values for each name.
287
288 The optional argument *keep_blank_values* is a flag indicating whether blank
289 values in URL encoded queries should be treated as blank strings. A true value
290 indicates that blanks should be retained as blank strings. The default false
291 value indicates that blank values are to be ignored and treated as if they were
292 not included.
293
294 The optional argument *strict_parsing* is a flag indicating what to do with
295 parsing errors. If false (the default), errors are silently ignored. If true,
296 errors raise a :exc:`ValueError` exception.
297
298 Use the :func:`urllib.urlencode` function to convert such dictionaries into
299 query strings.
300
301
302.. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]])
303
304 Parse a query string given as a string argument (data of type
305 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
306 name, value pairs.
307
308 The optional argument *keep_blank_values* is a flag indicating whether blank
309 values in URL encoded queries should be treated as blank strings. A true value
310 indicates that blanks should be retained as blank strings. The default false
311 value indicates that blank values are to be ignored and treated as if they were
312 not included.
313
314 The optional argument *strict_parsing* is a flag indicating what to do with
315 parsing errors. If false (the default), errors are silently ignored. If true,
316 errors raise a :exc:`ValueError` exception.
317
318 Use the :func:`urllib.urlencode` function to convert such lists of pairs into
319 query strings.
320
321
322.. function:: parse_multipart(fp, pdict)
323
324 Parse input of type :mimetype:`multipart/form-data` (for file uploads).
325 Arguments are *fp* for the input file and *pdict* for a dictionary containing
326 other parameters in the :mailheader:`Content-Type` header.
327
328 Returns a dictionary just like :func:`parse_qs` keys are the field names, each
329 value is a list of values for that field. This is easy to use but not much good
330 if you are expecting megabytes to be uploaded --- in that case, use the
331 :class:`FieldStorage` class instead which is much more flexible.
332
333 Note that this does not parse nested multipart parts --- use
334 :class:`FieldStorage` for that.
335
336
337.. function:: parse_header(string)
338
339 Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
340 dictionary of parameters.
341
342
343.. function:: test()
344
345 Robust test CGI script, usable as main program. Writes minimal HTTP headers and
346 formats all information provided to the script in HTML form.
347
348
349.. function:: print_environ()
350
351 Format the shell environment in HTML.
352
353
354.. function:: print_form(form)
355
356 Format a form in HTML.
357
358
359.. function:: print_directory()
360
361 Format the current directory in HTML.
362
363
364.. function:: print_environ_usage()
365
366 Print a list of useful (used by CGI) environment variables in HTML.
367
368
369.. function:: escape(s[, quote])
370
371 Convert the characters ``'&'``, ``'<'`` and ``'>'`` in string *s* to HTML-safe
372 sequences. Use this if you need to display text that might contain such
373 characters in HTML. If the optional flag *quote* is true, the quotation mark
374 character (``'"'``) is also translated; this helps for inclusion in an HTML
375 attribute value, as in ``<A HREF="...">``. If the value to be quoted might
376 include single- or double-quote characters, or both, consider using the
377 :func:`quoteattr` function in the :mod:`xml.sax.saxutils` module instead.
378
379
380.. _cgi-security:
381
382Caring about security
383---------------------
384
385.. index:: pair: CGI; security
386
387There's one important rule: if you invoke an external program (via the
388:func:`os.system` or :func:`os.popen` functions. or others with similar
389functionality), make very sure you don't pass arbitrary strings received from
390the client to the shell. This is a well-known security hole whereby clever
391hackers anywhere on the Web can exploit a gullible CGI script to invoke
392arbitrary shell commands. Even parts of the URL or field names cannot be
393trusted, since the request doesn't have to come from your form!
394
395To be on the safe side, if you must pass a string gotten from a form to a shell
396command, you should make sure the string contains only alphanumeric characters,
397dashes, underscores, and periods.
398
399
400Installing your CGI script on a Unix system
401-------------------------------------------
402
403Read the documentation for your HTTP server and check with your local system
404administrator to find the directory where CGI scripts should be installed;
405usually this is in a directory :file:`cgi-bin` in the server tree.
406
407Make sure that your script is readable and executable by "others"; the Unix file
408mode should be ``0755`` octal (use ``chmod 0755 filename``). Make sure that the
409first line of the script contains ``#!`` starting in column 1 followed by the
410pathname of the Python interpreter, for instance::
411
412 #!/usr/local/bin/python
413
414Make sure the Python interpreter exists and is executable by "others".
415
416Make sure that any files your script needs to read or write are readable or
417writable, respectively, by "others" --- their mode should be ``0644`` for
418readable and ``0666`` for writable. This is because, for security reasons, the
419HTTP server executes your script as user "nobody", without any special
420privileges. It can only read (write, execute) files that everybody can read
421(write, execute). The current directory at execution time is also different (it
422is usually the server's cgi-bin directory) and the set of environment variables
423is also different from what you get when you log in. In particular, don't count
424on the shell's search path for executables (:envvar:`PATH`) or the Python module
425search path (:envvar:`PYTHONPATH`) to be set to anything interesting.
426
427If you need to load modules from a directory which is not on Python's default
428module search path, you can change the path in your script, before importing
429other modules. For example::
430
431 import sys
432 sys.path.insert(0, "/usr/home/joe/lib/python")
433 sys.path.insert(0, "/usr/local/lib/python")
434
435(This way, the directory inserted last will be searched first!)
436
437Instructions for non-Unix systems will vary; check your HTTP server's
438documentation (it will usually have a section on CGI scripts).
439
440
441Testing your CGI script
442-----------------------
443
444Unfortunately, a CGI script will generally not run when you try it from the
445command line, and a script that works perfectly from the command line may fail
446mysteriously when run from the server. There's one reason why you should still
447test your script from the command line: if it contains a syntax error, the
448Python interpreter won't execute it at all, and the HTTP server will most likely
449send a cryptic error to the client.
450
451Assuming your script has no syntax errors, yet it does not work, you have no
452choice but to read the next section.
453
454
455Debugging CGI scripts
456---------------------
457
458.. index:: pair: CGI; debugging
459
460First of all, check for trivial installation errors --- reading the section
461above on installing your CGI script carefully can save you a lot of time. If
462you wonder whether you have understood the installation procedure correctly, try
463installing a copy of this module file (:file:`cgi.py`) as a CGI script. When
464invoked as a script, the file will dump its environment and the contents of the
465form in HTML form. Give it the right mode etc, and send it a request. If it's
466installed in the standard :file:`cgi-bin` directory, it should be possible to
467send it a request by entering a URL into your browser of the form::
468
469 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
470
471If this gives an error of type 404, the server cannot find the script -- perhaps
472you need to install it in a different directory. If it gives another error,
473there's an installation problem that you should fix before trying to go any
474further. If you get a nicely formatted listing of the environment and form
475content (in this example, the fields should be listed as "addr" with value "At
476Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been
477installed correctly. If you follow the same procedure for your own script, you
478should now be able to debug it.
479
480The next step could be to call the :mod:`cgi` module's :func:`test` function
481from your script: replace its main code with the single statement ::
482
483 cgi.test()
484
485This should produce the same results as those gotten from installing the
486:file:`cgi.py` file itself.
487
488When an ordinary Python script raises an unhandled exception (for whatever
489reason: of a typo in a module name, a file that can't be opened, etc.), the
490Python interpreter prints a nice traceback and exits. While the Python
491interpreter will still do this when your CGI script raises an exception, most
492likely the traceback will end up in one of the HTTP server's log files, or be
493discarded altogether.
494
495Fortunately, once you have managed to get your script to execute *some* code,
496you can easily send tracebacks to the Web browser using the :mod:`cgitb` module.
497If you haven't done so already, just add the line::
498
499 import cgitb; cgitb.enable()
500
501to the top of your script. Then try running it again; when a problem occurs,
502you should see a detailed report that will likely make apparent the cause of the
503crash.
504
505If you suspect that there may be a problem in importing the :mod:`cgitb` module,
506you can use an even more robust approach (which only uses built-in modules)::
507
508 import sys
509 sys.stderr = sys.stdout
510 print "Content-Type: text/plain"
511 print
512 ...your code here...
513
514This relies on the Python interpreter to print the traceback. The content type
515of the output is set to plain text, which disables all HTML processing. If your
516script works, the raw HTML will be displayed by your client. If it raises an
517exception, most likely after the first two lines have been printed, a traceback
518will be displayed. Because no HTML interpretation is going on, the traceback
519will be readable.
520
521
522Common problems and solutions
523-----------------------------
524
525* Most HTTP servers buffer the output from CGI scripts until the script is
526 completed. This means that it is not possible to display a progress report on
527 the client's display while the script is running.
528
529* Check the installation instructions above.
530
531* Check the HTTP server's log files. (``tail -f logfile`` in a separate window
532 may be useful!)
533
534* Always check a script for syntax errors first, by doing something like
535 ``python script.py``.
536
537* If your script does not have any syntax errors, try adding ``import cgitb;
538 cgitb.enable()`` to the top of the script.
539
540* When invoking external programs, make sure they can be found. Usually, this
541 means using absolute path names --- :envvar:`PATH` is usually not set to a very
542 useful value in a CGI script.
543
544* When reading or writing external files, make sure they can be read or written
545 by the userid under which your CGI script will be running: this is typically the
546 userid under which the web server is running, or some explicitly specified
547 userid for a web server's ``suexec`` feature.
548
549* Don't try to give a CGI script a set-uid mode. This doesn't work on most
550 systems, and is a security liability as well.
551
552.. rubric:: Footnotes
553
554.. [#] Note that some recent versions of the HTML specification do state what order the
555 field values should be supplied in, but knowing whether a request was
556 received from a conforming browser, or even from a browser at all, is tedious
557 and error-prone.
558