blob: 37deabd5860f1196b1dc8a900d563a0d3ad5152c [file] [log] [blame]
Guido van Rossum9a22de11995-01-12 12:29:47 +00001#!/usr/local/bin/python
Guido van Rossum1c9daa81995-09-18 21:52:37 +00002
Guido van Rossum72755611996-03-06 07:20:06 +00003"""Support module for CGI (Common Gateway Interface) scripts.
Guido van Rossum1c9daa81995-09-18 21:52:37 +00004
Guido van Rossum7aee3841996-03-07 18:00:44 +00005This module defines a number of utilities for use by CGI scripts
6written in Python.
Guido van Rossum9a22de11995-01-12 12:29:47 +00007
8
Guido van Rossum72755611996-03-06 07:20:06 +00009Introduction
10------------
11
Guido van Rossum391b4e61996-03-06 19:11:33 +000012A CGI script is invoked by an HTTP server, usually to process user
13input submitted through an HTML <FORM> or <ISINPUT> element.
Guido van Rossum72755611996-03-06 07:20:06 +000014
Guido van Rossum391b4e61996-03-06 19:11:33 +000015Most often, CGI scripts live in the server's special cgi-bin
16directory. The HTTP server places all sorts of information about the
17request (such as the client's hostname, the requested URL, the query
18string, and lots of other goodies) in the script's shell environment,
19executes the script, and sends the script's output back to the client.
Guido van Rossum72755611996-03-06 07:20:06 +000020
Guido van Rossum391b4e61996-03-06 19:11:33 +000021The script's input is connected to the client too, and sometimes the
22form data is read this way; at other times the form data is passed via
23the "query string" part of the URL. This module (cgi.py) is intended
24to take care of the different cases and provide a simpler interface to
25the Python script. It also provides a number of utilities that help
26in debugging scripts, and the latest addition is support for file
27uploads from a form (if your browser supports it -- Grail 0.3 and
28Netscape 2.0 do).
Guido van Rossum72755611996-03-06 07:20:06 +000029
Guido van Rossum391b4e61996-03-06 19:11:33 +000030The output of a CGI script should consist of two sections, separated
31by a blank line. The first section contains a number of headers,
32telling the client what kind of data is following. Python code to
33generate a minimal header section looks like this:
Guido van Rossum72755611996-03-06 07:20:06 +000034
Guido van Rossum243ddcd1996-03-07 06:33:07 +000035 print "Content-type: text/html" # HTML is following
36 print # blank line, end of headers
Guido van Rossum72755611996-03-06 07:20:06 +000037
Guido van Rossum391b4e61996-03-06 19:11:33 +000038The second section is usually HTML, which allows the client software
39to display nicely formatted text with header, in-line images, etc.
40Here's Python code that prints a simple piece of HTML:
Guido van Rossum72755611996-03-06 07:20:06 +000041
42 print "<TITLE>CGI script output</TITLE>"
43 print "<H1>This is my first CGI script</H1>"
44 print "Hello, world!"
45
Guido van Rossum391b4e61996-03-06 19:11:33 +000046(It may not be fully legal HTML according to the letter of the
47standard, but any browser will understand it.)
Guido van Rossum72755611996-03-06 07:20:06 +000048
49
50Using the cgi module
51--------------------
52
Guido van Rossum391b4e61996-03-06 19:11:33 +000053Begin by writing "import cgi". Don't use "from cgi import *" -- the
54module defines all sorts of names for its own use that you don't want
55in your namespace.
Guido van Rossum72755611996-03-06 07:20:06 +000056
Guido van Rossum391b4e61996-03-06 19:11:33 +000057If you have a standard form, it's best to use the SvFormContentDict
58class. Instantiate the SvFormContentDict class exactly once: it
59consumes any input on standard input, which can't be wound back (it's
60a network connection, not a disk file).
Guido van Rossum72755611996-03-06 07:20:06 +000061
Guido van Rossum391b4e61996-03-06 19:11:33 +000062The SvFormContentDict instance can be accessed as if it were a Python
63dictionary. For instance, the following code checks that the fields
Guido van Rossum72755611996-03-06 07:20:06 +000064"name" and "addr" are both set to a non-empty string:
65
66 form = SvFormContentDict()
67 form_ok = 0
68 if form.has_key("name") and form.has_key("addr"):
69 if form["name"] != "" and form["addr"] != "":
70 form_ok = 1
71 if not form_ok:
72 print "<H1>Error</H1>"
73 print "Please fill in the name and addr fields."
74 return
75 ...actual form processing here...
76
Guido van Rossum391b4e61996-03-06 19:11:33 +000077If you have an input item of type "file" in your form and the client
78supports file uploads, the value for that field, if present in the
79form, is not a string but a tuple of (filename, content-type, data).
Guido van Rossum72755611996-03-06 07:20:06 +000080
Guido van Rossum7aee3841996-03-07 18:00:44 +000081A more flexible alternative to [Sv]FormContentDict is the class
82FieldStorage. See that class's doc string.
83
Guido van Rossum72755611996-03-06 07:20:06 +000084
85Overview of classes
86-------------------
87
Guido van Rossum7aee3841996-03-07 18:00:44 +000088FieldStorage: new more flexible class; described above.
89
Guido van Rossum391b4e61996-03-06 19:11:33 +000090SvFormContentDict: single value form content as dictionary; described
Guido van Rossum72755611996-03-06 07:20:06 +000091above.
92
Guido van Rossum391b4e61996-03-06 19:11:33 +000093FormContentDict: multiple value form content as dictionary (the form
94items are lists of values). Useful if your form contains multiple
95fields with the same name.
Guido van Rossum72755611996-03-06 07:20:06 +000096
Guido van Rossum391b4e61996-03-06 19:11:33 +000097Other classes (FormContent, InterpFormContentDict) are present for
Guido van Rossum72755611996-03-06 07:20:06 +000098backwards compatibility only.
99
100
101Overview of functions
102---------------------
103
Guido van Rossum391b4e61996-03-06 19:11:33 +0000104These are useful if you want more control, or if you want to employ
105some of the algorithms implemented in this module in other
106circumstances.
Guido van Rossum72755611996-03-06 07:20:06 +0000107
108parse(): parse a form into a Python dictionary.
109
110parse_qs(qs): parse a query string.
111
Guido van Rossum391b4e61996-03-06 19:11:33 +0000112parse_multipart(...): parse input of type multipart/form-data (for
113file uploads).
Guido van Rossum72755611996-03-06 07:20:06 +0000114
Guido van Rossum391b4e61996-03-06 19:11:33 +0000115parse_header(string): parse a header like Content-type into a main
116value and a dictionary of parameters.
Guido van Rossum72755611996-03-06 07:20:06 +0000117
118test(): complete test program.
119
120print_environ(): format the shell environment in HTML.
121
122print_form(form): format a form in HTML.
123
Guido van Rossum391b4e61996-03-06 19:11:33 +0000124print_environ_usage(): print a list of useful environment variables in
125HTML.
Guido van Rossum72755611996-03-06 07:20:06 +0000126
Guido van Rossum391b4e61996-03-06 19:11:33 +0000127escape(): convert the characters "&", "<" and ">" to HTML-safe
128sequences. Use this if you need to display text that might contain
129such characters in HTML. To translate URLs for inclusion in the HREF
130attribute of an <A> tag, use urllib.quote().
Guido van Rossum72755611996-03-06 07:20:06 +0000131
132
133Caring about security
134---------------------
135
Guido van Rossum391b4e61996-03-06 19:11:33 +0000136There's one important rule: if you invoke an external program (e.g.
137via the os.system() or os.popen() functions), make very sure you don't
138pass arbitrary strings received from the client to the shell. This is
139a well-known security hole whereby clever hackers anywhere on the web
140can exploit a gullible CGI script to invoke arbitrary shell commands.
141Even parts of the URL or field names cannot be trusted, since the
142request doesn't have to come from your form!
Guido van Rossum72755611996-03-06 07:20:06 +0000143
Guido van Rossum391b4e61996-03-06 19:11:33 +0000144To be on the safe side, if you must pass a string gotten from a form
145to a shell command, you should make sure the string contains only
146alphanumeric characters, dashes, underscores, and periods.
Guido van Rossum72755611996-03-06 07:20:06 +0000147
148
149Installing your CGI script on a Unix system
150-------------------------------------------
151
Guido van Rossum391b4e61996-03-06 19:11:33 +0000152Read the documentation for your HTTP server and check with your local
153system administrator to find the directory where CGI scripts should be
Guido van Rossum72755611996-03-06 07:20:06 +0000154installed; usually this is in a directory cgi-bin in the server tree.
155
Guido van Rossum391b4e61996-03-06 19:11:33 +0000156Make sure that your script is readable and executable by "others"; the
157Unix file mode should be 755 (use "chmod 755 filename"). Make sure
158that the first line of the script contains "#!" starting in column 1
159followed by the pathname of the Python interpreter, for instance:
Guido van Rossum72755611996-03-06 07:20:06 +0000160
161 #!/usr/local/bin/python
162
Guido van Rossum391b4e61996-03-06 19:11:33 +0000163Make sure the Python interpreter exists and is executable by "others".
Guido van Rossum72755611996-03-06 07:20:06 +0000164
Guido van Rossum391b4e61996-03-06 19:11:33 +0000165Make sure that any files your script needs to read or write are
166readable or writable, respectively, by "others" -- their mode should
167be 644 for readable and 666 for writable. This is because, for
168security reasons, the HTTP server executes your script as user
169"nobody", without any special privileges. It can only read (write,
170execute) files that everybody can read (write, execute). The current
171directory at execution time is also different (it is usually the
172server's cgi-bin directory) and the set of environment variables is
173also different from what you get at login. in particular, don't count
174on the shell's search path for executables ($PATH) or the Python
175module search path ($PYTHONPATH) to be set to anything interesting.
Guido van Rossum72755611996-03-06 07:20:06 +0000176
Guido van Rossum391b4e61996-03-06 19:11:33 +0000177If you need to load modules from a directory which is not on Python's
178default module search path, you can change the path in your script,
179before importing other modules, e.g.:
Guido van Rossum72755611996-03-06 07:20:06 +0000180
181 import sys
182 sys.path.insert(0, "/usr/home/joe/lib/python")
183 sys.path.insert(0, "/usr/local/lib/python")
184
185(This way, the directory inserted last will be searched first!)
186
Guido van Rossum391b4e61996-03-06 19:11:33 +0000187Instructions for non-Unix systems will vary; check your HTTP server's
Guido van Rossum72755611996-03-06 07:20:06 +0000188documentation (it will usually have a section on CGI scripts).
189
190
191Testing your CGI script
192-----------------------
193
Guido van Rossum391b4e61996-03-06 19:11:33 +0000194Unfortunately, a CGI script will generally not run when you try it
195from the command line, and a script that works perfectly from the
196command line may fail mysteriously when run from the server. There's
197one reason why you should still test your script from the command
198line: if it contains a syntax error, the python interpreter won't
199execute it at all, and the HTTP server will most likely send a cryptic
200error to the client.
Guido van Rossum72755611996-03-06 07:20:06 +0000201
Guido van Rossum391b4e61996-03-06 19:11:33 +0000202Assuming your script has no syntax errors, yet it does not work, you
203have no choice but to read the next section:
Guido van Rossum72755611996-03-06 07:20:06 +0000204
205
206Debugging CGI scripts
207---------------------
208
Guido van Rossum391b4e61996-03-06 19:11:33 +0000209First of all, check for trivial installation errors -- reading the
210section above on installing your CGI script carefully can save you a
211lot of time. If you wonder whether you have understood the
212installation procedure correctly, try installing a copy of this module
213file (cgi.py) as a CGI script. When invoked as a script, the file
214will dump its environment and the contents of the form in HTML form.
215Give it the right mode etc, and send it a request. If it's installed
216in the standard cgi-bin directory, it should be possible to send it a
217request by entering a URL into your browser of the form:
Guido van Rossum72755611996-03-06 07:20:06 +0000218
219 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
220
Guido van Rossum391b4e61996-03-06 19:11:33 +0000221If this gives an error of type 404, the server cannot find the script
222-- perhaps you need to install it in a different directory. If it
223gives another error (e.g. 500), there's an installation problem that
224you should fix before trying to go any further. If you get a nicely
225formatted listing of the environment and form content (in this
226example, the fields should be listed as "addr" with value "At Home"
227and "name" with value "Joe Blow"), the cgi.py script has been
228installed correctly. If you follow the same procedure for your own
229script, you should now be able to debug it.
Guido van Rossum72755611996-03-06 07:20:06 +0000230
Guido van Rossum391b4e61996-03-06 19:11:33 +0000231The next step could be to call the cgi module's test() function from
232your script: replace its main code with the single statement
Guido van Rossum72755611996-03-06 07:20:06 +0000233
234 cgi.test()
235
Guido van Rossum391b4e61996-03-06 19:11:33 +0000236This should produce the same results as those gotten from installing
237the cgi.py file itself.
Guido van Rossum72755611996-03-06 07:20:06 +0000238
Guido van Rossum391b4e61996-03-06 19:11:33 +0000239When an ordinary Python script raises an unhandled exception
240(e.g. because of a typo in a module name, a file that can't be opened,
241etc.), the Python interpreter prints a nice traceback and exits.
242While the Python interpreter will still do this when your CGI script
243raises an exception, most likely the traceback will end up in one of
244the HTTP server's log file, or be discarded altogether.
Guido van Rossum72755611996-03-06 07:20:06 +0000245
Guido van Rossum391b4e61996-03-06 19:11:33 +0000246Fortunately, once you have managed to get your script to execute
247*some* code, it is easy to catch exceptions and cause a traceback to
248be printed. The test() function below in this module is an example.
249Here are the rules:
Guido van Rossum72755611996-03-06 07:20:06 +0000250
Guido van Rossum391b4e61996-03-06 19:11:33 +0000251 1. Import the traceback module (before entering the
252 try-except!)
Guido van Rossum72755611996-03-06 07:20:06 +0000253
Guido van Rossum391b4e61996-03-06 19:11:33 +0000254 2. Make sure you finish printing the headers and the blank
255 line early
Guido van Rossum72755611996-03-06 07:20:06 +0000256
257 3. Assign sys.stderr to sys.stdout
258
259 3. Wrap all remaining code in a try-except statement
260
261 4. In the except clause, call traceback.print_exc()
262
263For example:
264
265 import sys
266 import traceback
267 print "Content-type: text/html"
268 print
269 sys.stderr = sys.stdout
270 try:
271 ...your code here...
272 except:
273 print "\n\n<PRE>"
274 traceback.print_exc()
275
Guido van Rossum391b4e61996-03-06 19:11:33 +0000276Notes: The assignment to sys.stderr is needed because the traceback
277prints to sys.stderr. The print "\n\n<PRE>" statement is necessary to
278disable the word wrapping in HTML.
Guido van Rossum72755611996-03-06 07:20:06 +0000279
Guido van Rossum391b4e61996-03-06 19:11:33 +0000280If you suspect that there may be a problem in importing the traceback
281module, you can use an even more robust approach (which only uses
282built-in modules):
Guido van Rossum72755611996-03-06 07:20:06 +0000283
284 import sys
285 sys.stderr = sys.stdout
286 print "Content-type: text/plain"
287 print
288 ...your code here...
289
Guido van Rossum391b4e61996-03-06 19:11:33 +0000290This relies on the Python interpreter to print the traceback. The
291content type of the output is set to plain text, which disables all
292HTML processing. If your script works, the raw HTML will be displayed
293by your client. If it raises an exception, most likely after the
294first two lines have been printed, a traceback will be displayed.
295Because no HTML interpretation is going on, the traceback will
296readable.
Guido van Rossum72755611996-03-06 07:20:06 +0000297
298Good luck!
299
300
301Common problems and solutions
302-----------------------------
303
Guido van Rossum391b4e61996-03-06 19:11:33 +0000304- Most HTTP servers buffer the output from CGI scripts until the
305script is completed. This means that it is not possible to display a
306progress report on the client's display while the script is running.
Guido van Rossum72755611996-03-06 07:20:06 +0000307
308- Check the installation instructions above.
309
Guido van Rossum391b4e61996-03-06 19:11:33 +0000310- Check the HTTP server's log files. ("tail -f logfile" in a separate
Guido van Rossum72755611996-03-06 07:20:06 +0000311window may be useful!)
312
Guido van Rossum391b4e61996-03-06 19:11:33 +0000313- Always check a script for syntax errors first, by doing something
314like "python script.py".
Guido van Rossum72755611996-03-06 07:20:06 +0000315
316- When using any of the debugging techniques, don't forget to add
317"import sys" to the top of the script.
318
Guido van Rossum391b4e61996-03-06 19:11:33 +0000319- When invoking external programs, make sure they can be found.
320Usually, this means using absolute path names -- $PATH is usually not
321set to a very useful value in a CGI script.
Guido van Rossum72755611996-03-06 07:20:06 +0000322
Guido van Rossum391b4e61996-03-06 19:11:33 +0000323- When reading or writing external files, make sure they can be read
324or written by every user on the system.
Guido van Rossum72755611996-03-06 07:20:06 +0000325
Guido van Rossum391b4e61996-03-06 19:11:33 +0000326- Don't try to give a CGI script a set-uid mode. This doesn't work on
327most systems, and is a security liability as well.
Guido van Rossum72755611996-03-06 07:20:06 +0000328
329
330History
331-------
332
Guido van Rossum391b4e61996-03-06 19:11:33 +0000333Michael McLay started this module. Steve Majewski changed the
334interface to SvFormContentDict and FormContentDict. The multipart
335parsing was inspired by code submitted by Andreas Paepcke. Guido van
336Rossum rewrote, reformatted and documented the module and is currently
337responsible for its maintenance.
Guido van Rossum72755611996-03-06 07:20:06 +0000338
339"""
340
341
342# Imports
343# =======
344
345import string
346import regsub
347import sys
348import os
349import urllib
350
351
352# A shorthand for os.environ
353environ = os.environ
354
355
356# Parsing functions
357# =================
358
359def parse(fp=None):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000360 """Parse a query in the environment or from a file (default stdin)"""
361 if not fp:
362 fp = sys.stdin
363 if not environ.has_key('REQUEST_METHOD'):
364 environ['REQUEST_METHOD'] = 'GET' # For testing stand-alone
365 if environ['REQUEST_METHOD'] == 'POST':
366 ctype, pdict = parse_header(environ['CONTENT_TYPE'])
367 if ctype == 'multipart/form-data':
368 return parse_multipart(fp, ctype, pdict)
369 elif ctype == 'application/x-www-form-urlencoded':
370 clength = string.atoi(environ['CONTENT_LENGTH'])
371 qs = fp.read(clength)
Guido van Rossum1c9daa81995-09-18 21:52:37 +0000372 else:
Guido van Rossum7aee3841996-03-07 18:00:44 +0000373 qs = '' # Bad content-type
374 environ['QUERY_STRING'] = qs # XXX Shouldn't, really
375 elif environ.has_key('QUERY_STRING'):
376 qs = environ['QUERY_STRING']
377 else:
378 if sys.argv[1:]:
379 qs = sys.argv[1]
380 else:
381 qs = ""
382 environ['QUERY_STRING'] = qs # XXX Shouldn't, really
383 return parse_qs(qs)
Guido van Rossume7808771995-08-07 20:12:09 +0000384
385
386def parse_qs(qs):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000387 """Parse a query given as a string argument"""
388 name_value_pairs = string.splitfields(qs, '&')
389 dict = {}
390 for name_value in name_value_pairs:
391 nv = string.splitfields(name_value, '=')
392 if len(nv) != 2:
393 continue
394 name = nv[0]
395 value = urllib.unquote(regsub.gsub('+', ' ', nv[1]))
396 if len(value):
397 if dict.has_key (name):
398 dict[name].append(value)
399 else:
400 dict[name] = [value]
401 return dict
Guido van Rossum9a22de11995-01-12 12:29:47 +0000402
403
Guido van Rossum72755611996-03-06 07:20:06 +0000404def parse_multipart(fp, ctype, pdict):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000405 """Parse multipart input.
Guido van Rossum9a22de11995-01-12 12:29:47 +0000406
Guido van Rossum7aee3841996-03-07 18:00:44 +0000407 Arguments:
408 fp : input file
409 ctype: content-type
410 pdict: dictionary containing other parameters of conten-type header
Guido van Rossum72755611996-03-06 07:20:06 +0000411
Guido van Rossum7aee3841996-03-07 18:00:44 +0000412 Returns a dictionary just like parse_qs() (keys are the field
413 names, each value is a list of values for that field) except that
414 if the value was an uploaded file, it is a tuple of the form
415 (filename, content-type, data). Note that content-type is the
416 raw, unparsed contents of the content-type header.
Guido van Rossum72755611996-03-06 07:20:06 +0000417
Guido van Rossum7aee3841996-03-07 18:00:44 +0000418 XXX Should we parse further when the content-type is
419 multipart/*?
Guido van Rossum72755611996-03-06 07:20:06 +0000420
Guido van Rossum7aee3841996-03-07 18:00:44 +0000421 """
422 import mimetools
423 if pdict.has_key('boundary'):
424 boundary = pdict['boundary']
425 else:
426 boundary = ""
427 nextpart = "--" + boundary
428 lastpart = "--" + boundary + "--"
429 partdict = {}
430 terminator = ""
431
432 while terminator != lastpart:
433 bytes = -1
434 data = None
435 if terminator:
436 # At start of next part. Read headers first.
437 headers = mimetools.Message(fp)
438 clength = headers.getheader('content-length')
439 if clength:
440 try:
441 bytes = string.atoi(clength)
442 except string.atoi_error:
443 pass
444 if bytes > 0:
445 data = fp.read(bytes)
446 else:
447 data = ""
448 # Read lines until end of part.
449 lines = []
450 while 1:
451 line = fp.readline()
452 if not line:
453 terminator = lastpart # End outer loop
454 break
455 if line[:2] == "--":
456 terminator = string.strip(line)
457 if terminator in (nextpart, lastpart):
458 break
459 if line[-2:] == '\r\n':
460 line = line[:-2]
461 elif line[-1:] == '\n':
462 line = line[:-1]
463 lines.append(line)
464 # Done with part.
465 if data is None:
466 continue
467 if bytes < 0:
468 data = string.joinfields(lines, "\n")
469 line = headers['content-disposition']
470 if not line:
471 continue
472 key, params = parse_header(line)
473 if key != 'form-data':
474 continue
475 if params.has_key('name'):
476 name = params['name']
Guido van Rossum72755611996-03-06 07:20:06 +0000477 else:
Guido van Rossum7aee3841996-03-07 18:00:44 +0000478 continue
479 if params.has_key('filename'):
480 data = (params['filename'],
481 headers.getheader('content-type'), data)
482 if partdict.has_key(name):
483 partdict[name].append(data)
484 else:
485 partdict[name] = [data]
Guido van Rossum72755611996-03-06 07:20:06 +0000486
Guido van Rossum7aee3841996-03-07 18:00:44 +0000487 return partdict
Guido van Rossum9a22de11995-01-12 12:29:47 +0000488
489
Guido van Rossum72755611996-03-06 07:20:06 +0000490def parse_header(line):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000491 """Parse a Content-type like header.
492
493 Return the main content-type and a dictionary of options.
494
495 """
496 plist = map(string.strip, string.splitfields(line, ';'))
497 key = string.lower(plist[0])
498 del plist[0]
499 pdict = {}
500 for p in plist:
501 i = string.find(p, '=')
502 if i >= 0:
503 name = string.lower(string.strip(p[:i]))
504 value = string.strip(p[i+1:])
505 if len(value) >= 2 and value[0] == value[-1] == '"':
506 value = value[1:-1]
507 pdict[name] = value
508 return key, pdict
Guido van Rossum72755611996-03-06 07:20:06 +0000509
510
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000511# Classes for field storage
512# =========================
513
514class MiniFieldStorage:
515
Guido van Rossum7aee3841996-03-07 18:00:44 +0000516 """Internal: dummy FieldStorage, used with query string format."""
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000517
Guido van Rossum7aee3841996-03-07 18:00:44 +0000518 # Dummy attributes
519 filename = None
520 list = None
521 type = None
522 typ_options = {}
523 disposition = None
524 disposition_options = {}
525 headers = {}
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000526
Guido van Rossum7aee3841996-03-07 18:00:44 +0000527 def __init__(self, name, value):
528 """Constructor from field name and value."""
529 from StringIO import StringIO
530 self.name = name
531 self.value = value
532 self.file = StringIO(value)
533
534 def __repr__(self):
535 """Return printable representation."""
536 return "MiniFieldStorage(%s, %s)" % (`self.name`, `self.value`)
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000537
538
539class FieldStorage:
540
Guido van Rossum7aee3841996-03-07 18:00:44 +0000541 """Store a sequence of fields, reading multipart/form-data.
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000542
Guido van Rossum7aee3841996-03-07 18:00:44 +0000543 This class provides naming, typing, files stored on disk, and
544 more. At the top level, it is accessible like a dictionary, whose
545 keys are the field names. (Note: None can occur as a field name.)
546 The items are either a Python list (if there's multiple values) or
547 another FieldStorage or MiniFieldStorage object. If it's a single
548 object, it has the following attributes:
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000549
Guido van Rossum7aee3841996-03-07 18:00:44 +0000550 name: the field name, if specified; otherwise None
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000551
Guido van Rossum7aee3841996-03-07 18:00:44 +0000552 filename: the filename, if specified; otherwise None; this is the
553 client side filename, *not* the file name on which it is
554 stored (that's a temporary you don't deal with)
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000555
Guido van Rossum7aee3841996-03-07 18:00:44 +0000556 value: the value as a *string*; for file uploads, this
557 transparently reads the file every time you request the value
558
559 file: the file(-like) object from which you can read the data;
560 None if the data is stored a simple string
561
562 type: the content-type, or None if not specified
563
564 type_options: dictionary of options specified on the content-type
565 line
566
567 disposition: content-disposition, or None if not specified
568
569 disposition_options: dictionary of corresponding options
570
571 headers: a dictionary(-like) object (sometimes rfc822.Message or a
572 subclass thereof) containing *all* headers
573
574 The class is subclassable, mostly for the purpose of overriding
575 the make_file() method, which is called internally to come up with
576 a file open for reading and writing. This makes it possible to
577 override the default choice of storing all files in a temporary
578 directory and unlinking them as soon as they have been opened.
579
580 """
581
582 def __init__(self, fp=None, headers=None, outerboundary=""):
583 """Constructor. Read multipart/* until last part.
584
585 Arguments, all optional:
586
587 fp : file pointer; default: sys.stdin
588
589 headers : header dictionary-like object; default:
590 taken from environ as per CGI spec
591
592 outerboundary : optional terminating multipart boundary
593 (for internal use only)
594
595 """
596 method = None
597 if environ.has_key('REQUEST_METHOD'):
598 method = string.upper(environ['REQUEST_METHOD'])
599 if not fp and method == 'GET':
600 qs = None
601 if environ.has_key('QUERY_STRING'):
602 qs = environ['QUERY_STRING']
603 from StringIO import StringIO
604 fp = StringIO(qs or "")
605 if headers is None:
606 headers = {'content-type':
607 "application/x-www-form-urlencoded"}
608 if headers is None:
609 headers = {}
610 if environ.has_key('CONTENT_TYPE'):
611 headers['content-type'] = environ['CONTENT_TYPE']
612 if environ.has_key('CONTENT_LENGTH'):
613 headers['content-length'] = environ['CONTENT_LENGTH']
614 self.fp = fp or sys.stdin
615 self.headers = headers
616 self.outerboundary = outerboundary
617
618 # Process content-disposition header
619 cdisp, pdict = "", {}
620 if self.headers.has_key('content-disposition'):
621 cdisp, pdict = parse_header(self.headers['content-disposition'])
622 self.disposition = cdisp
623 self.disposition_options = pdict
624 self.name = None
625 if pdict.has_key('name'):
626 self.name = pdict['name']
627 self.filename = None
628 if pdict.has_key('filename'):
629 self.filename = pdict['filename']
630
631 # Process content-type header
632 ctype, pdict = "text/plain", {}
633 if self.headers.has_key('content-type'):
634 ctype, pdict = parse_header(self.headers['content-type'])
635 self.type = ctype
636 self.type_options = pdict
637 self.innerboundary = ""
638 if pdict.has_key('boundary'):
639 self.innerboundary = pdict['boundary']
640 clen = -1
641 if self.headers.has_key('content-length'):
642 try:
643 clen = string.atoi(self.headers['content-length'])
644 except:
645 pass
646 self.length = clen
647
648 self.list = self.file = None
649 self.done = 0
650 self.lines = []
651 if ctype == 'application/x-www-form-urlencoded':
652 self.read_urlencoded()
653 elif ctype[:10] == 'multipart/':
654 self.read_multi()
655 else:
656 self.read_single()
657
658 def __repr__(self):
659 """Return a printable representation."""
660 return "FieldStorage(%s, %s, %s)" % (
661 `self.name`, `self.filename`, `self.value`)
662
663 def __getattr__(self, name):
664 if name != 'value':
665 raise AttributeError, name
666 if self.file:
667 self.file.seek(0)
668 value = self.file.read()
669 self.file.seek(0)
670 elif self.list is not None:
671 value = self.list
672 else:
673 value = None
674 return value
675
676 def __getitem__(self, key):
677 """Dictionary style indexing."""
678 if self.list is None:
679 raise TypeError, "not indexable"
680 found = []
681 for item in self.list:
682 if item.name == key: found.append(item)
683 if not found:
684 raise KeyError, key
685 return found
686
687 def keys(self):
688 """Dictionary style keys() method."""
689 if self.list is None:
690 raise TypeError, "not indexable"
691 keys = []
692 for item in self.list:
693 if item.name not in keys: keys.append(item.name)
694 return keys
695
696 def read_urlencoded(self):
697 """Internal: read data in query string format."""
698 qs = self.fp.read(self.length)
699 dict = parse_qs(qs)
700 self.list = []
701 for key, valuelist in dict.items():
702 for value in valuelist:
703 self.list.append(MiniFieldStorage(key, value))
704 self.skip_lines()
705
706 def read_multi(self):
707 """Internal: read a part that is itself multipart."""
708 import rfc822
709 self.list = []
710 part = self.__class__(self.fp, {}, self.innerboundary)
711 # Throw first part away
712 while not part.done:
713 headers = rfc822.Message(self.fp)
714 part = self.__class__(self.fp, headers, self.innerboundary)
715 self.list.append(part)
716 self.skip_lines()
717
718 def read_single(self):
719 """Internal: read an atomic part."""
720 if self.length >= 0:
721 self.read_binary()
722 self.skip_lines()
723 else:
724 self.read_lines()
725 self.file.seek(0)
726
727 bufsize = 8*1024 # I/O buffering size for copy to file
728
729 def read_binary(self):
730 """Internal: read binary data."""
731 self.file = self.make_file('b')
732 todo = self.length
733 if todo >= 0:
734 while todo > 0:
735 data = self.fp.read(min(todo, self.bufsize))
736 if not data:
737 self.done = -1
738 break
739 self.file.write(data)
740 todo = todo - len(data)
741
742 def read_lines(self):
743 """Internal: read lines until EOF or outerboundary."""
744 self.file = self.make_file('')
745 if self.outerboundary:
746 self.read_lines_to_outerboundary()
747 else:
748 self.read_lines_to_eof()
749
750 def read_lines_to_eof(self):
751 """Internal: read lines until EOF."""
752 while 1:
753 line = self.fp.readline()
754 if not line:
755 self.done = -1
756 break
757 self.lines.append(line)
758 if line[-2:] == '\r\n':
759 line = line[:-2] + '\n'
760 self.file.write(line)
761
762 def read_lines_to_outerboundary(self):
763 """Internal: read lines until outerboundary."""
764 next = "--" + self.outerboundary
765 last = next + "--"
766 delim = ""
767 while 1:
768 line = self.fp.readline()
769 if not line:
770 self.done = -1
771 break
772 self.lines.append(line)
773 if line[:2] == "--":
774 strippedline = string.strip(line)
775 if strippedline == next:
776 break
777 if strippedline == last:
778 self.done = 1
779 break
780 if line[-2:] == "\r\n":
781 line = line[:-2]
782 elif line[-1] == "\n":
783 line = line[:-1]
784 self.file.write(delim + line)
785 delim = "\n"
786
787 def skip_lines(self):
788 """Internal: skip lines until outer boundary if defined."""
789 if not self.outerboundary or self.done:
790 return
791 next = "--" + self.outerboundary
792 last = next + "--"
793 while 1:
794 line = self.fp.readline()
795 if not line:
796 self.done = -1
797 break
798 self.lines.append(line)
799 if line[:2] == "--":
800 strippedline = string.strip(line)
801 if strippedline == next:
802 break
803 if strippedline == last:
804 self.done = 1
805 break
806
807 def make_file(self, binary):
808 """Overridable: return a readable & writable file.
809
810 The file will be used as follows:
811 - data is written to it
812 - seek(0)
813 - data is read from it
814
815 The 'binary' argument is 'b' if the file should be created in
816 binary mode (on non-Unix systems), '' otherwise.
817
818 The intention is that you can override this method to
819 selectively create a real (temporary) file or use a memory
820 file dependent on the perceived size of the file or the
821 presence of a filename, etc.
822
823 """
824
825 # Prefer ArrayIO over StringIO, if it's available
826 try:
827 from ArrayIO import ArrayIO
828 ioclass = ArrayIO
829 except ImportError:
830 from StringIO import StringIO
831 ioclass = StringIO
832 return ioclass()
Guido van Rossum243ddcd1996-03-07 06:33:07 +0000833
834
Guido van Rossum72755611996-03-06 07:20:06 +0000835# Main classes
836# ============
Guido van Rossum9a22de11995-01-12 12:29:47 +0000837
838class FormContentDict:
Guido van Rossum7aee3841996-03-07 18:00:44 +0000839 """Basic (multiple values per field) form content as dictionary.
Guido van Rossum72755611996-03-06 07:20:06 +0000840
Guido van Rossum7aee3841996-03-07 18:00:44 +0000841 form = FormContentDict()
842
843 form[key] -> [value, value, ...]
844 form.has_key(key) -> Boolean
845 form.keys() -> [key, key, ...]
846 form.values() -> [[val, val, ...], [val, val, ...], ...]
847 form.items() -> [(key, [val, val, ...]), (key, [val, val, ...]), ...]
848 form.dict == {key: [val, val, ...], ...}
849
850 """
851 def __init__( self ):
852 self.dict = parse()
853 self.query_string = environ['QUERY_STRING']
854 def __getitem__(self,key):
855 return self.dict[key]
856 def keys(self):
857 return self.dict.keys()
858 def has_key(self, key):
859 return self.dict.has_key(key)
860 def values(self):
861 return self.dict.values()
862 def items(self):
863 return self.dict.items()
864 def __len__( self ):
865 return len(self.dict)
Guido van Rossum9a22de11995-01-12 12:29:47 +0000866
867
Guido van Rossum9a22de11995-01-12 12:29:47 +0000868class SvFormContentDict(FormContentDict):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000869 """Strict single-value expecting form content as dictionary.
870
871 IF you only expect a single value for each field, then form[key]
872 will return that single value. It will raise an IndexError if
873 that expectation is not true. IF you expect a field to have
874 possible multiple values, than you can use form.getlist(key) to
875 get all of the values. values() and items() are a compromise:
876 they return single strings where there is a single value, and
877 lists of strings otherwise.
878
879 """
880 def __getitem__(self, key):
881 if len(self.dict[key]) > 1:
882 raise IndexError, 'expecting a single value'
883 return self.dict[key][0]
884 def getlist(self, key):
885 return self.dict[key]
886 def values(self):
887 lis = []
888 for each in self.dict.values():
889 if len( each ) == 1 :
890 lis.append(each[0])
891 else: lis.append(each)
892 return lis
893 def items(self):
894 lis = []
895 for key,value in self.dict.items():
896 if len(value) == 1 :
897 lis.append((key, value[0]))
898 else: lis.append((key, value))
899 return lis
Guido van Rossum9a22de11995-01-12 12:29:47 +0000900
901
Guido van Rossum9a22de11995-01-12 12:29:47 +0000902class InterpFormContentDict(SvFormContentDict):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000903 """This class is present for backwards compatibility only."""
904 def __getitem__( self, key ):
905 v = SvFormContentDict.__getitem__( self, key )
906 if v[0] in string.digits+'+-.' :
907 try: return string.atoi( v )
908 except ValueError:
909 try: return string.atof( v )
910 except ValueError: pass
911 return string.strip(v)
912 def values( self ):
913 lis = []
914 for key in self.keys():
915 try:
916 lis.append( self[key] )
917 except IndexError:
918 lis.append( self.dict[key] )
919 return lis
920 def items( self ):
921 lis = []
922 for key in self.keys():
923 try:
924 lis.append( (key, self[key]) )
925 except IndexError:
926 lis.append( (key, self.dict[key]) )
927 return lis
Guido van Rossum9a22de11995-01-12 12:29:47 +0000928
929
Guido van Rossum9a22de11995-01-12 12:29:47 +0000930class FormContent(FormContentDict):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000931 """This class is present for backwards compatibility only."""
932 def values(self,key):
933 if self.dict.has_key(key):return self.dict[key]
934 else: return None
935 def indexed_value(self,key, location):
936 if self.dict.has_key(key):
937 if len (self.dict[key]) > location:
938 return self.dict[key][location]
939 else: return None
940 else: return None
941 def value(self,key):
942 if self.dict.has_key(key):return self.dict[key][0]
943 else: return None
944 def length(self,key):
945 return len (self.dict[key])
946 def stripped(self,key):
947 if self.dict.has_key(key):return string.strip(self.dict[key][0])
948 else: return None
949 def pars(self):
950 return self.dict
Guido van Rossum9a22de11995-01-12 12:29:47 +0000951
952
Guido van Rossum72755611996-03-06 07:20:06 +0000953# Test/debug code
954# ===============
Guido van Rossum9a22de11995-01-12 12:29:47 +0000955
Guido van Rossum72755611996-03-06 07:20:06 +0000956def test():
Guido van Rossum7aee3841996-03-07 18:00:44 +0000957 """Robust test CGI script, usable as main program.
Guido van Rossum9a22de11995-01-12 12:29:47 +0000958
Guido van Rossum7aee3841996-03-07 18:00:44 +0000959 Write minimal HTTP headers and dump all information provided to
960 the script in HTML form.
961
962 """
963 import traceback
964 print "Content-type: text/html"
965 print
966 sys.stderr = sys.stdout
967 try:
968 print_form(FieldStorage())
969 print_environ()
970 print_directory()
971 print_environ_usage()
972 except:
973 print "\n\n<PRE>" # Turn off HTML word wrap
974 traceback.print_exc()
Guido van Rossum9a22de11995-01-12 12:29:47 +0000975
Guido van Rossum72755611996-03-06 07:20:06 +0000976def print_environ():
Guido van Rossum7aee3841996-03-07 18:00:44 +0000977 """Dump the shell environment as HTML."""
978 keys = environ.keys()
979 keys.sort()
980 print
981 print "<H3>Shell environment:</H3>"
982 print "<DL>"
983 for key in keys:
984 print "<DT>", escape(key), "<DD>", escape(environ[key])
985 print "</DL>"
986 print
Guido van Rossum72755611996-03-06 07:20:06 +0000987
988def print_form(form):
Guido van Rossum7aee3841996-03-07 18:00:44 +0000989 """Dump the contents of a form as HTML."""
990 keys = form.keys()
991 keys.sort()
992 print
993 print "<H3>Form contents:</H3>"
994 print "<DL>"
995 for key in keys:
996 print "<DT>" + escape(key) + ":",
997 value = form[key]
998 print "<i>" + escape(`type(value)`) + "</i>"
999 print "<DD>" + escape(`value`)
1000 print "</DL>"
1001 print
1002
1003def print_directory():
1004 """Dump the current directory as HTML."""
1005 print
1006 print "<H3>Current Working Directory:</H3>"
1007 try:
1008 pwd = os.getcwd()
1009 except os.error, msg:
1010 print "os.error:", escape(str(msg))
1011 else:
1012 print escape(pwd)
1013 print
Guido van Rossum9a22de11995-01-12 12:29:47 +00001014
1015def print_environ_usage():
Guido van Rossum7aee3841996-03-07 18:00:44 +00001016 """Dump a list of environment variables used by CGI as HTML."""
1017 print """
Guido van Rossum72755611996-03-06 07:20:06 +00001018<H3>These environment variables could have been set:</H3>
1019<UL>
Guido van Rossum9a22de11995-01-12 12:29:47 +00001020<LI>AUTH_TYPE
1021<LI>CONTENT_LENGTH
1022<LI>CONTENT_TYPE
1023<LI>DATE_GMT
1024<LI>DATE_LOCAL
1025<LI>DOCUMENT_NAME
1026<LI>DOCUMENT_ROOT
1027<LI>DOCUMENT_URI
1028<LI>GATEWAY_INTERFACE
1029<LI>LAST_MODIFIED
1030<LI>PATH
1031<LI>PATH_INFO
1032<LI>PATH_TRANSLATED
1033<LI>QUERY_STRING
1034<LI>REMOTE_ADDR
1035<LI>REMOTE_HOST
1036<LI>REMOTE_IDENT
1037<LI>REMOTE_USER
1038<LI>REQUEST_METHOD
1039<LI>SCRIPT_NAME
1040<LI>SERVER_NAME
1041<LI>SERVER_PORT
1042<LI>SERVER_PROTOCOL
1043<LI>SERVER_ROOT
1044<LI>SERVER_SOFTWARE
1045</UL>
Guido van Rossum7aee3841996-03-07 18:00:44 +00001046In addition, HTTP headers sent by the server may be passed in the
1047environment as well. Here are some common variable names:
1048<UL>
1049<LI>HTTP_ACCEPT
1050<LI>HTTP_CONNECTION
1051<LI>HTTP_HOST
1052<LI>HTTP_PRAGMA
1053<LI>HTTP_REFERER
1054<LI>HTTP_USER_AGENT
1055</UL>
Guido van Rossum9a22de11995-01-12 12:29:47 +00001056"""
1057
Guido van Rossum9a22de11995-01-12 12:29:47 +00001058
Guido van Rossum72755611996-03-06 07:20:06 +00001059# Utilities
1060# =========
Guido van Rossum9a22de11995-01-12 12:29:47 +00001061
Guido van Rossum72755611996-03-06 07:20:06 +00001062def escape(s):
Guido van Rossum7aee3841996-03-07 18:00:44 +00001063 """Replace special characters '&', '<' and '>' by SGML entities."""
1064 s = regsub.gsub("&", "&amp;", s) # Must be done first!
1065 s = regsub.gsub("<", "&lt;", s)
1066 s = regsub.gsub(">", "&gt;", s)
1067 return s
Guido van Rossum9a22de11995-01-12 12:29:47 +00001068
Guido van Rossum9a22de11995-01-12 12:29:47 +00001069
Guido van Rossum72755611996-03-06 07:20:06 +00001070# Invoke mainline
1071# ===============
1072
1073# Call test() when this file is run as a script (not imported as a module)
1074if __name__ == '__main__':
Guido van Rossum7aee3841996-03-07 18:00:44 +00001075 test()