Internet

Beginning CGI Programming with Perl

The CGI Programming Paradigm

Probably the two most common questions about CGI programming are, “What is CGI programming?” and “Why is CGI programming different from other programming?” The first question is the harder question to answer and certainly is the combination of all the pages in these tutorials, but there is a short answer: CGI programming is writing applications that act as interface or gateway programs between the client browser, Web server, and a traditional programming application.

The second question, “Why is CGI programming different from other programming?” requires a longer answer. The answer really needs to be broken up into three parts. Each part describes a different section of the CGI program’s environment, and it is the environment that the CGI program operates under that makes it so different from other programming paradigms. First, a CGI program must be especially concerned about security. Next, the CGI programmer must understand how data is passed to other programs and how it is returned. And finally, the CGI programmer must learn how to develop software in an environment where his program has no built-in mechanisms to enable it to remember what it did last.

CGI Programs and Security

Why does your CGI program have to be so concerned about security? Unfortunately, your main concern is hackers. Your CGI programs operate in a very insecure environment. By their nature, your programs must be usable by anyone in the world. Also by their nature, they can be executed at any time of the day. And, they can be run over and over again by people looking for security holes in your code. Because the Net is a place where anyone and everyone has the freedom to search, play, and explore to his heart’s content, your programs are bound to be tested eventually by someone with at least an overabundance of curiosity. This means that you must spend extra time thinking about how your program could be broken by a hacker. In addition, because many applications are written in an interpretive language like Perl, your program source code is easier to access. If a hacker can get at your source code, your code is at much greater risk.

The Basic Data-Passing Methods of CGI

The way data is sent back and forth across the Internet is one of the most unique aspects of CGI programming. Gathering data and decoding data are the subjects of Tutorials 4 and 5, respectively, but a brief introduction is warranted. Your CGI program cannot be designed without first understanding how data is built using the HTML hypertext link or the HTML Form fields. Both mechanisms create a unique environment in which data is encoded and passed based on both user input and statically defined data structures. When you design your CGI program, you first must design the user input format. This format is fixed in two data-passing mechanisms: the Get and Post methods. Both these methods use HTTP headers to communicate with your CGI program and to send your CGI program data. As you design your CGI program, you must be aware of the limitations of both these methods.

In addition, your CGI programs must be able to deal with the multiple input engines on the Internet, which have an impact on the format of the data your CGI program can return. Your CGI program can be called from all types of browsers-from the text-only Lynx program, the HTML 1.0-capable browsers, or the browsers like Netscape that include data (such as the cookie) that isn’t even included in the HTTP specification. It is up to you to design your CGI program to deal with this multiplicity of client/browsers! Each will be sending different information to your CGI program, describing itself and its capabilities in the HTTP request headers discussed in Tutorial 2.

After you have the data from these myriad sources, your CGI program must be able to figure out what to do with it. The data passed to your CGI program is encoded so that it will not conflict with the existing MIME protocols of the Internet. You will learn about decoding data in Tutorial 5. After your CGI program decodes the data, it must decide how to return information to the calling program. Because not all browsers are created equal, your CGI program might want to return different information based on the browser software calling it. You will learn how to do this in the last part of Tutorial 2.

CGI’s Stateless Environment

The implementation of the HTTP stateless protocol has a profound effect on how you design your CGI programs. Each new action is performed without any knowledge of previous actions, and multiple copies of your CGI program can execute at the same time. This has a dramatic effect on how your program accesses files and data. Database programming alone can be complicated, but if you add parallel processing on top of it, you have an even more complicated problem.

Traditional programming paradigms use sequential logic to solve problems. The data you set up 100 lines of code ago is expected to be available when you need it to pass to a subroutine or write to a file. Usually, when you run one program in a traditional environment, it gets to run to completion without fear of another copy of itself modifying the same data.

Neither of these conditions is true for your CGI programs. If you are building a multipaged site where the information on one page can affect the actions of another page, you have a complication for which you must design. Unless you take special steps, what happened on Web page 12 is not available the next time Web page 12 or any other page in your site is accessed. Each new Web page access creates a brand new call to your CGI program. This means that your CGI program has to take special measures to keep track of what happened the last time. One common means is for your CGI program to save information from the last event into a file. That method still has limitations, however, because your program can be executed simultaneously by several clients. You need to know which client is calling you.

To get around these special problems, the HTML form-input type of Hidden was created. The Hidden Input type enables your program to return data in the called Web pages that aren’t displayed to the Web client. When the client calls the next Web page on your site, the Hidden Input type is returned as data to your CGI program. This way, your CGI program has a chance to remember what happened last time.

This approach has at least one major problem. Hidden data is visible as soon as your Web client uses the View Source button on his browser. This means that he can change the data returned to your CGI program.

To complicate things even further, because your CGI program can be called from multiple browsers simultaneously, your program can be modifying a file at the same time another copy of the same program is modifying the same file. Unless you take special precautions to deal with this situation, some of your data is going to get lost. In the case in which two programs have the same file open, the program that closes the file last wins! The data saved by the earlier program is lost, overwritten by the changes made by the program that closed the file last. How do you solve this problem? You have to design a special database handle that locks the file for writing whenever any code in your CGI program has the file out for updating.

These are just the most obvious problems. It is your job as a CGI programmer to think about these potential problems and to come up with effective solutions.

One solution to the problem that hidden data is visible using the View Source button is the experimental HTTP header called a cookie. This cookie acts something like a hidden field, but it cannot be accessed by the user. Only your CGI program and the browser can see this field. This gives you a second and more secure means of keeping track of what is happening at your Web site. The HTTP cookie is discussed in Tutorials 6 and 7.

Preventing the Most Common CGI Bugs

I suspect that you would prefer to just get your first CGI program working. If you can prevent the common CGI errors described in this section, you will be well on your way to getting your first CGI program working. What happens when you try to run your first CGI program and you see a Server Error (500) message such as the one shown in Figure 1.3?

Figure 1.3 : The Server Error message.

It seems like such an ominous error message. Drop everything and write your System Administrator a message describing exactly what you did to break the server. And what about the Forbidden (403) error message in Figure 1.1? Is the System Administrator going to cut off your programming privileges? DOES ANYONE KNOW? Can you just not tell anyone and it will go AWAY??!! Well, yes and no.

First of all, I suspect that you realize that all these error messages are generated automatically by your Web server, so nobody “knows” and, in most cases, nobody cares, but the error doesn’t go away. Your Web server logs into an error log file every error it sees. This file is a marvelous source for figuring out what went wrong with your program. The error log file your server uses is probably in the server root document tree described earlier.

Usually, you will have read-only privileges for the files on the server root. This means that you can read what’s in the error log files, but you can’t change it. The error log files also are used by your System Administrator to watch for potential security risks on her server because each access to the system is logged into these files.

Tell the Server Your File Is Executable

There is one way to keep your programs from showing up in the error log files: Never make any mistakes! Because I’ve never been able to be successful with that advice, I’ve followed the more practical advice of always (well, okay, almost always) executing my CGI programs from the command line before trying to test them from my Web browser. Just enter the filename of your program from the prompt. If everything is okay, your CGI program executes as expected and you should see the HTML your CGI program generated on-screen.

Tip

If you have an error, Perl usually is very good about helping you find what is wrong. Perl tells you the line where the error is located and suggests what it thinks the problem might be. I suggest fixing one or two errors at a time and then retrying your program from the command line. Quite often, one error contributes to and creates lots of other errors. That’s why I suggest that you fix just a couple of bugs at a time.

One of the first things you are likely to forget is to tell the system under which language to run your script. Setting the file extension to .pl doesn’t do it. The thing that tells the system how to run your CGI program is the first line of a Perl script. The first line should look something like this:

#! /usr/local/bin/perl

The line must align flush with the left margin, and the path to the Perl interpreter must be correct. If you don’t know where Perl is on your server, the following exercise will help you figure it out.