Web application with CGI Python

Added 26 Feb 2023, 5:28 p.m. edited 18 Jun 2023, 1:12 a.m.

There are any number of ways of using Python to create a web application, WSGI, ASGI or even AWS Chalice can be used, but if you already have web infrastructure in place and are confident you understand the various security implications involving a full stack, then the "old fashioned" solution of CGI shouldn't be overlooked if only because of its simplicity and flexibility.

Configuration from an Apache point of view is fairly trivial, I prefer to keep configurations for different projects separate from the main configuration and include it in.

ScriptAliasMatch ^/pycgi.+ /home/chris/development/pycgi/main.py
ScriptAlias /pycgi /home/chris/development/pycgi/main.py
<Directory "/home/chris/development/pycgi">
    AllowOverride 
    Options +ExecCGI
    AddHandler cgi-script .py
    Require all granted
</Directory>

This is basically a catch all not just for the main directory we're using for the project but also any hierarchy of subdirectories within the directory. As I'm implementing my own routing system, all URLs are handled by a single file

As this is a CGI application its important to ensure that the first line of the "main" script contains a correct "shebang"

#!/usr/bin/env python

Getting this correct is important as a CGI script can have any specified interpreter, and while we can rely on env being in a specific place on many Unix like systems, if you're using Windows for development (unlucky!) then you'll likely have to explicitly specify the python interpreter. An error here will often produce an error (in the Apache error logs) stating that the file its found doesn't exist, its actually meaning that the interpreter can't be found but its not the clearest error message.

    route = str(os.environ.get('REQUEST_URI')).replace(settings.APP_ROOT,'')
    route = route.split('?')[0] # strip url params from route
    
    if (len(route) and route[-1] == '/'): route = route[:-1] # trailing slash stripped
    if (len(route)): route = route[1:]  # leading slash stripped
    if (route == ''): route = 'index'
    route = route.replace('/','.')

    user = None
    if route not in settings.NOLOG:
        user = util.check_login()

    try:
        page = importlib.import_module("pages." + route)
    except ModuleNotFoundError:
        # see if its a static file
        if os.path.exists(route):
            with open(route, 'rb') as fp:
                data = fp.read()
            content = 'text/plain'
            typ = route[-4:]
            if typ == ".png":
                content = "image/png"
            if typ == ".ico":
                content = "image/x-icon"
            if typ == ".css":
                content = "text/css"
            print ("Content-Type: " + content + "\n", flush = True)
            sys.stdout.buffer.write(data)
            exit()
        else:
            print ("Status: 404 Not Found")
            print ("Warning: Route not found - "+route)
            print ("Content-type: text/html; charset=utf-8\n")
            print ("<h2>404 Not Found</h2>")
            print ("<p>Warning: Route not found - "+route+"</p>")
            print ("<p>"+os.getcwd()+"</p>")
        exit()
    
    # index function in routed source returns page content as an XML dom
    dom = page.index(user)
    print ("Content-type: text/html; charset=utf-8\n")
    print(dom.toprettyxml(encoding='utf-8').decode('utf-8'))

This is the bulk of the main script it first of all converts the full path into a module import, all the separate scripts for each page are in a pages directory, so a path of /pycgi/user/maintenance will import from pages/user/maintenance.py (pages.user.maintenance). Each page has an index function which acts as the landing function.

If a script cannot be matched to the URL, then the route is checked to see if a static file can match, which if it exists it's served raw, useful for ancillary files like CSS files etc.

Its worth noting the differences from python frameworks (even micro frameworks), your code is often passed an object that represents at the least the HTTP request that was made. With a CGI script things are a little different, the OS environment is used to communicate important pieces of information about the current request. This means if for some reason an exception cannot be caught or you're just seeing no output, that the script can be run from the terminal, sometimes showing you a little more useful of an error message.

$ REQUEST_SCHEME="http" REQUEST_METHOD=GET SERVER_PROTOCOL=HTTP/1.1 HTTP_HOST=localhost REQUEST_URI="/pycgi" python main.py

Set-Cookie: PYCGI=UUID=dada63d6-ba56-4208-bf70-729a3c500012; Path=/
Content-type: text/html; charset=utf-8

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
  PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN'
  'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
     <link href="style.css" rel="stylesheet" type="text/css"/>
        <title>Index</title>
    </head>
    <body>
     <p>This the index</p>
   </body>
</html>

There are plenty of other issues to discuss like getting the body of a POST request, dealing with user sessions, but getting some kind of basic routing implemented makes for a good start, I'll deal with other topics in up coming posts

Enjoy!