blob: 478c95a9b00a19a06e12b6eb5b70d6a925693f23 [file] [log] [blame]
Georg Brandl0d8f0732009-04-05 22:20:44 +00001:mod:`cgi` --- Common Gateway Interface support
2===============================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
4.. module:: cgi
5 :synopsis: Helpers for running Python scripts via the Common Gateway Interface.
6
7
8.. index::
9 pair: WWW; server
10 pair: CGI; protocol
11 pair: HTTP; protocol
12 pair: MIME; headers
13 single: URL
14 single: Common Gateway Interface
15
Raymond Hettingera1993682011-01-27 01:20:32 +000016**Source code:** :source:`Lib/cgi.py`
17
18--------------
19
Georg Brandl116aa622007-08-15 14:28:22 +000020Support module for Common Gateway Interface (CGI) scripts.
21
22This module defines a number of utilities for use by CGI scripts written in
23Python.
24
25
26Introduction
27------------
28
29.. _cgi-intro:
30
31A CGI script is invoked by an HTTP server, usually to process user input
32submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element.
33
34Most often, CGI scripts live in the server's special :file:`cgi-bin` directory.
35The HTTP server places all sorts of information about the request (such as the
36client's hostname, the requested URL, the query string, and lots of other
37goodies) in the script's shell environment, executes the script, and sends the
38script's output back to the client.
39
40The script's input is connected to the client too, and sometimes the form data
41is read this way; at other times the form data is passed via the "query string"
42part of the URL. This module is intended to take care of the different cases
43and provide a simpler interface to the Python script. It also provides a number
44of utilities that help in debugging scripts, and the latest addition is support
45for file uploads from a form (if your browser supports it).
46
47The output of a CGI script should consist of two sections, separated by a blank
48line. The first section contains a number of headers, telling the client what
49kind of data is following. Python code to generate a minimal header section
50looks like this::
51
Georg Brandl6911e3c2007-09-04 07:15:32 +000052 print("Content-Type: text/html") # HTML is following
53 print() # blank line, end of headers
Georg Brandl116aa622007-08-15 14:28:22 +000054
55The second section is usually HTML, which allows the client software to display
56nicely formatted text with header, in-line images, etc. Here's Python code that
57prints a simple piece of HTML::
58
Georg Brandl6911e3c2007-09-04 07:15:32 +000059 print("<TITLE>CGI script output</TITLE>")
60 print("<H1>This is my first CGI script</H1>")
61 print("Hello, world!")
Georg Brandl116aa622007-08-15 14:28:22 +000062
63
64.. _using-the-cgi-module:
65
66Using the cgi module
67--------------------
68
Georg Brandl49d1b4f2008-05-11 21:42:51 +000069Begin by writing ``import cgi``.
Georg Brandl116aa622007-08-15 14:28:22 +000070
Benjamin Petersonad3d5c22009-02-26 03:38:59 +000071When you write a new script, consider adding these lines::
Georg Brandl116aa622007-08-15 14:28:22 +000072
Benjamin Petersonad3d5c22009-02-26 03:38:59 +000073 import cgitb
74 cgitb.enable()
Georg Brandl116aa622007-08-15 14:28:22 +000075
76This activates a special exception handler that will display detailed reports in
77the Web browser if any errors occur. If you'd rather not show the guts of your
78program to users of your script, you can have the reports saved to files
Benjamin Petersonad3d5c22009-02-26 03:38:59 +000079instead, with code like this::
Georg Brandl116aa622007-08-15 14:28:22 +000080
Benjamin Petersonad3d5c22009-02-26 03:38:59 +000081 import cgitb
Petri Lehtinen9f74c6c2013-02-23 19:26:56 +010082 cgitb.enable(display=0, logdir="/path/to/logdir")
Georg Brandl116aa622007-08-15 14:28:22 +000083
84It's very helpful to use this feature during script development. The reports
85produced by :mod:`cgitb` provide information that can save you a lot of time in
86tracking down bugs. You can always remove the ``cgitb`` line later when you
87have tested your script and are confident that it works correctly.
88
Senthil Kumaran290416f2012-04-30 22:43:13 +080089To get at submitted form data, use the :class:`FieldStorage` class. If the form
90contains non-ASCII characters, use the *encoding* keyword parameter set to the
91value of the encoding defined for the document. It is usually contained in the
92META tag in the HEAD section of the HTML document or by the
93:mailheader:`Content-Type` header). This reads the form contents from the
94standard input or the environment (depending on the value of various
95environment variables set according to the CGI standard). Since it may consume
96standard input, it should be instantiated only once.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Ezio Melottic7e994d2009-07-22 21:17:14 +000098The :class:`FieldStorage` instance can be indexed like a Python dictionary.
99It allows membership testing with the :keyword:`in` operator, and also supports
100the standard dictionary method :meth:`keys` and the built-in function
101:func:`len`. Form fields containing empty strings are ignored and do not appear
102in the dictionary; to keep such values, provide a true value for the optional
103*keep_blank_values* keyword parameter when creating the :class:`FieldStorage`
104instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000105
106For instance, the following code (which assumes that the
107:mailheader:`Content-Type` header and blank line have already been printed)
108checks that the fields ``name`` and ``addr`` are both set to a non-empty
109string::
110
111 form = cgi.FieldStorage()
Ezio Melottic7e994d2009-07-22 21:17:14 +0000112 if "name" not in form or "addr" not in form:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000113 print("<H1>Error</H1>")
114 print("Please fill in the name and addr fields.")
Georg Brandl116aa622007-08-15 14:28:22 +0000115 return
Georg Brandl6911e3c2007-09-04 07:15:32 +0000116 print("<p>name:", form["name"].value)
117 print("<p>addr:", form["addr"].value)
Georg Brandl116aa622007-08-15 14:28:22 +0000118 ...further form processing here...
119
120Here the fields, accessed through ``form[key]``, are themselves instances of
121:class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form
122encoding). The :attr:`value` attribute of the instance yields the string value
123of the field. The :meth:`getvalue` method returns this string value directly;
124it also accepts an optional second argument as a default to return if the
125requested key is not present.
126
127If the submitted form data contains more than one field with the same name, the
128object retrieved by ``form[key]`` is not a :class:`FieldStorage` or
129:class:`MiniFieldStorage` instance but a list of such instances. Similarly, in
130this situation, ``form.getvalue(key)`` would return a list of strings. If you
131expect this possibility (when your HTML form contains multiple fields with the
132same name), use the :func:`getlist` function, which always returns a list of
133values (so that you do not need to special-case the single item case). For
134example, this code concatenates any number of username fields, separated by
135commas::
136
137 value = form.getlist("username")
138 usernames = ",".join(value)
139
140If a field represents an uploaded file, accessing the value via the
141:attr:`value` attribute or the :func:`getvalue` method reads the entire file in
Senthil Kumaran290416f2012-04-30 22:43:13 +0800142memory as bytes. This may not be what you want. You can test for an uploaded
Georg Brandl502d9a52009-07-26 15:02:41 +0000143file by testing either the :attr:`filename` attribute or the :attr:`!file`
144attribute. You can then read the data at leisure from the :attr:`!file`
Senthil Kumaran290416f2012-04-30 22:43:13 +0800145attribute (the :func:`read` and :func:`readline` methods will return bytes)::
Georg Brandl116aa622007-08-15 14:28:22 +0000146
147 fileitem = form["userfile"]
148 if fileitem.file:
149 # It's an uploaded file; count lines
150 linecount = 0
Collin Winter46334482007-09-10 00:49:57 +0000151 while True:
Georg Brandl116aa622007-08-15 14:28:22 +0000152 line = fileitem.file.readline()
153 if not line: break
154 linecount = linecount + 1
155
Sean Reifscheider782d6b42007-09-18 23:39:35 +0000156If an error is encountered when obtaining the contents of an uploaded file
157(for example, when the user interrupts the form submission by clicking on
158a Back or Cancel button) the :attr:`done` attribute of the object for the
159field will be set to the value -1.
160
Georg Brandl116aa622007-08-15 14:28:22 +0000161The file upload draft standard entertains the possibility of uploading multiple
162files from one field (using a recursive :mimetype:`multipart/\*` encoding).
163When this occurs, the item will be a dictionary-like :class:`FieldStorage` item.
Georg Brandl502d9a52009-07-26 15:02:41 +0000164This can be determined by testing its :attr:`!type` attribute, which should be
Georg Brandl116aa622007-08-15 14:28:22 +0000165:mimetype:`multipart/form-data` (or perhaps another MIME type matching
166:mimetype:`multipart/\*`). In this case, it can be iterated over recursively
167just like the top-level form object.
168
169When a form is submitted in the "old" format (as the query string or as a single
170data part of type :mimetype:`application/x-www-form-urlencoded`), the items will
171actually be instances of the class :class:`MiniFieldStorage`. In this case, the
Georg Brandl502d9a52009-07-26 15:02:41 +0000172:attr:`!list`, :attr:`!file`, and :attr:`filename` attributes are always ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000173
Benjamin Petersondcf97b92008-07-02 17:30:14 +0000174A form submitted via POST that also has a query string will contain both
175:class:`FieldStorage` and :class:`MiniFieldStorage` items.
Georg Brandl116aa622007-08-15 14:28:22 +0000176
177Higher Level Interface
178----------------------
179
Georg Brandl116aa622007-08-15 14:28:22 +0000180The previous section explains how to read CGI form data using the
181:class:`FieldStorage` class. This section describes a higher level interface
182which was added to this class to allow one to do it in a more readable and
183intuitive way. The interface doesn't make the techniques described in previous
184sections obsolete --- they are still useful to process file uploads efficiently,
185for example.
186
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000187.. XXX: Is this true ?
Georg Brandl116aa622007-08-15 14:28:22 +0000188
189The interface consists of two simple methods. Using the methods you can process
190form data in a generic way, without the need to worry whether only one or more
191values were posted under one name.
192
193In the previous section, you learned to write following code anytime you
194expected a user to post more than one value under one name::
195
196 item = form.getvalue("item")
197 if isinstance(item, list):
198 # The user is requesting more than one item.
199 else:
200 # The user is requesting only one item.
201
202This situation is common for example when a form contains a group of multiple
203checkboxes with the same name::
204
205 <input type="checkbox" name="item" value="1" />
206 <input type="checkbox" name="item" value="2" />
207
208In most situations, however, there's only one form control with a particular
209name in a form and then you expect and need only one value associated with this
210name. So you write a script containing for example this code::
211
212 user = form.getvalue("user").upper()
213
214The problem with the code is that you should never expect that a client will
215provide valid input to your scripts. For example, if a curious user appends
216another ``user=foo`` pair to the query string, then the script would crash,
217because in this situation the ``getvalue("user")`` method call returns a list
Benjamin Peterson8719ad52009-09-11 22:24:02 +0000218instead of a string. Calling the :meth:`~str.upper` method on a list is not valid
Georg Brandl116aa622007-08-15 14:28:22 +0000219(since lists do not have a method of this name) and results in an
220:exc:`AttributeError` exception.
221
222Therefore, the appropriate way to read form data values was to always use the
223code which checks whether the obtained value is a single value or a list of
224values. That's annoying and leads to less readable scripts.
225
226A more convenient approach is to use the methods :meth:`getfirst` and
227:meth:`getlist` provided by this higher level interface.
228
229
Georg Brandl0d8f0732009-04-05 22:20:44 +0000230.. method:: FieldStorage.getfirst(name, default=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000231
232 This method always returns only one value associated with form field *name*.
233 The method returns only the first value in case that more values were posted
234 under such name. Please note that the order in which the values are received
235 may vary from browser to browser and should not be counted on. [#]_ If no such
236 form field or value exists then the method returns the value specified by the
237 optional parameter *default*. This parameter defaults to ``None`` if not
238 specified.
239
240
241.. method:: FieldStorage.getlist(name)
242
243 This method always returns a list of values associated with form field *name*.
244 The method returns an empty list if no such form field or value exists for
245 *name*. It returns a list consisting of one item if only one such value exists.
246
247Using these methods you can write nice compact code::
248
249 import cgi
250 form = cgi.FieldStorage()
251 user = form.getfirst("user", "").upper() # This way it's safe.
252 for item in form.getlist("item"):
253 do_something(item)
254
255
Georg Brandl116aa622007-08-15 14:28:22 +0000256.. _functions-in-cgi-module:
257
258Functions
259---------
260
261These are useful if you want more control, or if you want to employ some of the
262algorithms implemented in this module in other circumstances.
263
264
Georg Brandl0d8f0732009-04-05 22:20:44 +0000265.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000266
267 Parse a query in the environment or from a file (the file defaults to
268 ``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
Facundo Batistac469d4c2008-09-03 22:49:01 +0000269 passed to :func:`urllib.parse.parse_qs` unchanged.
Georg Brandl116aa622007-08-15 14:28:22 +0000270
271
Georg Brandl0d8f0732009-04-05 22:20:44 +0000272.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000273
Facundo Batistac469d4c2008-09-03 22:49:01 +0000274 This function is deprecated in this module. Use :func:`urllib.parse.parse_qs`
Georg Brandlae2dbe22009-03-13 19:04:40 +0000275 instead. It is maintained here only for backward compatibility.
Georg Brandl116aa622007-08-15 14:28:22 +0000276
Georg Brandl0d8f0732009-04-05 22:20:44 +0000277.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000278
Facundo Batistac469d4c2008-09-03 22:49:01 +0000279 This function is deprecated in this module. Use :func:`urllib.parse.parse_qs`
Georg Brandlae2dbe22009-03-13 19:04:40 +0000280 instead. It is maintained here only for backward compatibility.
Georg Brandl116aa622007-08-15 14:28:22 +0000281
282.. function:: parse_multipart(fp, pdict)
283
284 Parse input of type :mimetype:`multipart/form-data` (for file uploads).
285 Arguments are *fp* for the input file and *pdict* for a dictionary containing
286 other parameters in the :mailheader:`Content-Type` header.
287
Facundo Batistac469d4c2008-09-03 22:49:01 +0000288 Returns a dictionary just like :func:`urllib.parse.parse_qs` keys are the field names, each
Georg Brandl116aa622007-08-15 14:28:22 +0000289 value is a list of values for that field. This is easy to use but not much good
290 if you are expecting megabytes to be uploaded --- in that case, use the
291 :class:`FieldStorage` class instead which is much more flexible.
292
293 Note that this does not parse nested multipart parts --- use
294 :class:`FieldStorage` for that.
295
296
297.. function:: parse_header(string)
298
299 Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
300 dictionary of parameters.
301
302
303.. function:: test()
304
305 Robust test CGI script, usable as main program. Writes minimal HTTP headers and
306 formats all information provided to the script in HTML form.
307
308
309.. function:: print_environ()
310
311 Format the shell environment in HTML.
312
313
314.. function:: print_form(form)
315
316 Format a form in HTML.
317
318
319.. function:: print_directory()
320
321 Format the current directory in HTML.
322
323
324.. function:: print_environ_usage()
325
326 Print a list of useful (used by CGI) environment variables in HTML.
327
328
Georg Brandl0d8f0732009-04-05 22:20:44 +0000329.. function:: escape(s, quote=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000330
331 Convert the characters ``'&'``, ``'<'`` and ``'>'`` in string *s* to HTML-safe
332 sequences. Use this if you need to display text that might contain such
333 characters in HTML. If the optional flag *quote* is true, the quotation mark
Georg Brandl18009342010-08-02 21:51:18 +0000334 character (``"``) is also translated; this helps for inclusion in an HTML
335 attribute value delimited by double quotes, as in ``<a href="...">``. Note
336 that single quotes are never translated.
337
Georg Brandl1f7fffb2010-10-15 15:57:45 +0000338 .. deprecated:: 3.2
339 This function is unsafe because *quote* is false by default, and therefore
340 deprecated. Use :func:`html.escape` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000341
342
343.. _cgi-security:
344
345Caring about security
346---------------------
347
348.. index:: pair: CGI; security
349
350There's one important rule: if you invoke an external program (via the
351:func:`os.system` or :func:`os.popen` functions. or others with similar
352functionality), make very sure you don't pass arbitrary strings received from
353the client to the shell. This is a well-known security hole whereby clever
354hackers anywhere on the Web can exploit a gullible CGI script to invoke
355arbitrary shell commands. Even parts of the URL or field names cannot be
356trusted, since the request doesn't have to come from your form!
357
358To be on the safe side, if you must pass a string gotten from a form to a shell
359command, you should make sure the string contains only alphanumeric characters,
360dashes, underscores, and periods.
361
362
363Installing your CGI script on a Unix system
364-------------------------------------------
365
366Read the documentation for your HTTP server and check with your local system
367administrator to find the directory where CGI scripts should be installed;
368usually this is in a directory :file:`cgi-bin` in the server tree.
369
370Make sure that your script is readable and executable by "others"; the Unix file
Georg Brandlf4a41232008-05-26 17:55:52 +0000371mode should be ``0o755`` octal (use ``chmod 0755 filename``). Make sure that the
Georg Brandl116aa622007-08-15 14:28:22 +0000372first line of the script contains ``#!`` starting in column 1 followed by the
373pathname of the Python interpreter, for instance::
374
375 #!/usr/local/bin/python
376
377Make sure the Python interpreter exists and is executable by "others".
378
379Make sure that any files your script needs to read or write are readable or
Georg Brandlf4a41232008-05-26 17:55:52 +0000380writable, respectively, by "others" --- their mode should be ``0o644`` for
381readable and ``0o666`` for writable. This is because, for security reasons, the
Georg Brandl116aa622007-08-15 14:28:22 +0000382HTTP server executes your script as user "nobody", without any special
383privileges. It can only read (write, execute) files that everybody can read
384(write, execute). The current directory at execution time is also different (it
385is usually the server's cgi-bin directory) and the set of environment variables
386is also different from what you get when you log in. In particular, don't count
387on the shell's search path for executables (:envvar:`PATH`) or the Python module
388search path (:envvar:`PYTHONPATH`) to be set to anything interesting.
389
390If you need to load modules from a directory which is not on Python's default
391module search path, you can change the path in your script, before importing
392other modules. For example::
393
394 import sys
395 sys.path.insert(0, "/usr/home/joe/lib/python")
396 sys.path.insert(0, "/usr/local/lib/python")
397
398(This way, the directory inserted last will be searched first!)
399
400Instructions for non-Unix systems will vary; check your HTTP server's
401documentation (it will usually have a section on CGI scripts).
402
403
404Testing your CGI script
405-----------------------
406
407Unfortunately, a CGI script will generally not run when you try it from the
408command line, and a script that works perfectly from the command line may fail
409mysteriously when run from the server. There's one reason why you should still
410test your script from the command line: if it contains a syntax error, the
411Python interpreter won't execute it at all, and the HTTP server will most likely
412send a cryptic error to the client.
413
414Assuming your script has no syntax errors, yet it does not work, you have no
415choice but to read the next section.
416
417
418Debugging CGI scripts
419---------------------
420
421.. index:: pair: CGI; debugging
422
423First of all, check for trivial installation errors --- reading the section
424above on installing your CGI script carefully can save you a lot of time. If
425you wonder whether you have understood the installation procedure correctly, try
426installing a copy of this module file (:file:`cgi.py`) as a CGI script. When
427invoked as a script, the file will dump its environment and the contents of the
428form in HTML form. Give it the right mode etc, and send it a request. If it's
429installed in the standard :file:`cgi-bin` directory, it should be possible to
430send it a request by entering a URL into your browser of the form::
431
432 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
433
434If this gives an error of type 404, the server cannot find the script -- perhaps
435you need to install it in a different directory. If it gives another error,
436there's an installation problem that you should fix before trying to go any
437further. If you get a nicely formatted listing of the environment and form
438content (in this example, the fields should be listed as "addr" with value "At
439Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been
440installed correctly. If you follow the same procedure for your own script, you
441should now be able to debug it.
442
443The next step could be to call the :mod:`cgi` module's :func:`test` function
444from your script: replace its main code with the single statement ::
445
446 cgi.test()
447
448This should produce the same results as those gotten from installing the
449:file:`cgi.py` file itself.
450
451When an ordinary Python script raises an unhandled exception (for whatever
452reason: of a typo in a module name, a file that can't be opened, etc.), the
453Python interpreter prints a nice traceback and exits. While the Python
454interpreter will still do this when your CGI script raises an exception, most
455likely the traceback will end up in one of the HTTP server's log files, or be
456discarded altogether.
457
458Fortunately, once you have managed to get your script to execute *some* code,
459you can easily send tracebacks to the Web browser using the :mod:`cgitb` module.
Benjamin Petersonad3d5c22009-02-26 03:38:59 +0000460If you haven't done so already, just add the lines::
Georg Brandl116aa622007-08-15 14:28:22 +0000461
Benjamin Petersonad3d5c22009-02-26 03:38:59 +0000462 import cgitb
463 cgitb.enable()
Georg Brandl116aa622007-08-15 14:28:22 +0000464
465to the top of your script. Then try running it again; when a problem occurs,
466you should see a detailed report that will likely make apparent the cause of the
467crash.
468
469If you suspect that there may be a problem in importing the :mod:`cgitb` module,
470you can use an even more robust approach (which only uses built-in modules)::
471
472 import sys
473 sys.stderr = sys.stdout
Georg Brandl6911e3c2007-09-04 07:15:32 +0000474 print("Content-Type: text/plain")
475 print()
Georg Brandl116aa622007-08-15 14:28:22 +0000476 ...your code here...
477
478This relies on the Python interpreter to print the traceback. The content type
479of the output is set to plain text, which disables all HTML processing. If your
480script works, the raw HTML will be displayed by your client. If it raises an
481exception, most likely after the first two lines have been printed, a traceback
482will be displayed. Because no HTML interpretation is going on, the traceback
483will be readable.
484
485
486Common problems and solutions
487-----------------------------
488
489* Most HTTP servers buffer the output from CGI scripts until the script is
490 completed. This means that it is not possible to display a progress report on
491 the client's display while the script is running.
492
493* Check the installation instructions above.
494
495* Check the HTTP server's log files. (``tail -f logfile`` in a separate window
496 may be useful!)
497
498* Always check a script for syntax errors first, by doing something like
499 ``python script.py``.
500
501* If your script does not have any syntax errors, try adding ``import cgitb;
502 cgitb.enable()`` to the top of the script.
503
504* When invoking external programs, make sure they can be found. Usually, this
505 means using absolute path names --- :envvar:`PATH` is usually not set to a very
506 useful value in a CGI script.
507
508* When reading or writing external files, make sure they can be read or written
509 by the userid under which your CGI script will be running: this is typically the
510 userid under which the web server is running, or some explicitly specified
511 userid for a web server's ``suexec`` feature.
512
513* Don't try to give a CGI script a set-uid mode. This doesn't work on most
514 systems, and is a security liability as well.
515
516.. rubric:: Footnotes
517
Georg Brandl1f7fffb2010-10-15 15:57:45 +0000518.. [#] Note that some recent versions of the HTML specification do state what
519 order the field values should be supplied in, but knowing whether a request
520 was received from a conforming browser, or even from a browser at all, is
521 tedious and error-prone.
Georg Brandl116aa622007-08-15 14:28:22 +0000522