Internet

Beginning CGI Programming with Perl

Exercise 2.1. Reading and decoding the User-Agent field

The CGI program to determine which browser is calling your Web page has two basic steps. First, it must figure out which browser is accessing it. Then, it must return the correct Location headers based on the information figured out in step 1.

Because Netscape is the offending browser by going off on its own and implementing all those cool extensions that are so much fun to use, let’s just deal with the Netscape browser. If Netscape were the only browser that could handle tables, this program would be complete. In practice, this code should deal with all the browsers that can and can’t handle the HTML+ extensions.

The format of HTTP_USER_AGENT is illustrated by how these two popular browsers define their User-Agent request header:

  • Mozilla/1.1N (Windows; I; 16bit)
  • AIR_Mosaic (16bit)/v1.00.198.07

You can find out what types of browsers are looking at your Web page by looking in the server log files. These log files are discussed in further detail in Tutorial 10, “Keeping Track of Your Web Page Visitors.”

The easiest thing to do is to split HTTP_USER_AGENT into fields and then compare them against browsers you know will work for your enhanced Web page. Listing 2.3 contains the Perl code to do this. As with all the code in these tutorials, I step through the new and relevant Perl code. You are not expected to know Perl. However, I hope you will feel comfortable enough with Perl by the time you complete these tutorials to write CGI programs of your own.


Listing 2.3. Perl code to return a Web page based on a browser.

01: #!/usr/local/bin/perl0203: @user_agent = split(/\//,$ENV{'HTTP_USER_AGENT'});04:05: if ($user_agent[0] eq "Mozilla"){06:     @version = split(/ /,$user_agent[1]);07:     $version_number = substr($version[0], 0, 3);08:     if ($version_number < 1.1){09:        print "Location: http://www.county-line-bbq/clbbq-plus.html.com\n\n";10:     }11:     else{12:        print "Location: http:// www.county-line-bbq/clbbq-minus.html.com  \n\n";13:     }14: }15: else{16:     print "Location: http:// www.county-line-bbq/clbbq-minus.html.com \n\n";

It takes several steps to get the data in the HTTP_USER_AGENT environment variable into a format your CGI program can use. First, you need to separate out the browser type. This is the part of the HTTP_USER_AGENT field before the first forward slash (/).

Line 3 uses the split function to separate the HTTP_USER_AGENT variable into parts wherever it finds a forward slash (/). The split function in Perl is really powerful, and because each portion of line 3 is important and possibly new to you, definitions of each element of line 3 follow:

  • @user_agent defines a new array variable.
  • = says to assign any matches in the variable on the right side to the variable on the left side. In this case, the left-hand side is an array, so each different match makes a new element in the array.
  • /\// is the pattern to look for and perform the splits on. Unfortunately, this is a really hard pattern for Perl to deal with. And, as a human, I find it a bit confusing also. A pattern is formed of /pattern/. In this case, the pattern is \/. The first \ is called an escape character. It tells Perl not to interpret the next character as a special character. So the real pattern to match on is the / character. If you didn’t add the escape character (\) in the pattern, Perl would see three forward slashes, as you see in this Perl fragment:
    split(///,$ENV{'HTTP_USER_AGENT'})
    Looking at it this way, maybe you can see why Perl would get confused. Perl expects a pattern to split on between the first two forward slashes (//). Unless you tell Perl to not interpret the forward slash (/) in the pattern you are looking for, it just gives up and says I don’t know what to do. So help out your Perl interpreter. When you have a special character in your search patterns such as a quotation mark ("'`), percent sign (%), or forward slash (/), use the escape character (\) before the special character so that Perl knows not to try to interpret the special character. You and your Perl interpreter will be much happier.

This means that the first element in the User-Agent array is set to Mozilla or AIR_Mosaic (16bit) for the purpose of this example.

So now you have the name of the browser in the first element of the @user_agent array. The next thing to do is find out which browser is calling you.

Line 5,

if ($user_agent[0] eq "Mozilla"){

compares the first element of the array @user_agent with the string Mozilla. If they match, you take the if path. If they don’t, you take the else path. The CGI program uses the comparison operator eq because it is comparing strings instead of numbers. In Perl, strings are compared with eq and numbers are compared with ==.

The next thing to do is to figure out what version of the browser is accessing your Web page. Even Netscape couldn’t read HTML tables before version 1.1. So you need to look at the rest of the data in the @user_agent array and separate that out to get the version number.

Line 6,

@version = split(/ /,$user_agent[1]);

examines the second field returned from the last split command and splits it based on any spaces it finds.

So now the first field in the @version array, $version[0], should contain the Mozilla version number 1.1N. The next step is to turn this into a number so that you can decide whether it is version 1.1 or greater.

The version returned from the split function includes an ASCII character in it-the N, to be exact. This means that the program can’t compare it against a number. If you leave the N in the version, the code must check for every version of Netscape because string comparison is an exact match, unlike numbers that you can compare against a range. A string comparison would require the code to check for versions 1.1N, 1.0N, 1.0B, and so on.

If you turn the version into a number, the code can look for all versions that are earlier than version 1.1. Version 1.1 of Netscape is the first version number that handles tables.

Examine line 7:

$version_number = substr($version[0], 0, 3);
  • The substr function here takes the first three characters from the $version variable. It starts at the 0 character and goes to the third character.
  • The substr command in Perl can be used to do much more complex things than this, but there just isn’t enough book here to go through the really complex functions in detail. In this case, I want to get the first three characters from my string, and this works just fine.

Now the CGI program can check for old Mozilla version numbers.

Line 8,

if ($version_number >= 1.1){

shows that any Mozilla version that is equal to or greater than 1.1 will pass this test. Notice that this is a numeric test against something removed from a string. That’s what makes Perl so popular. It does the right thing, even for me.

That completes step 1: finding out what type of browser is calling your Web page. Now all the code has to do is tell the browser which Web page you really want it to access.

This part is amazingly straightforward! Just print the Location response header with the URI of the correct Web page.

Lines 9-16 print the correct headers. Line 9,

print "Location: http://www.county-line-bbq/clbbq-plus.html.com\n\n";

redirects the client to the HTML+ enhanced page.

Line 12,

print "Location: http:// www.county-line-bbq/clbbq-minus.html.com\n\n ";

redirects the client to the HTML 1.0 page.

Before the response headers are sent to the browser, the server steps in and generates any additional required response headers.

The program told the server that it wanted the browser to go to a different location. The server parsed the response header’s output and added the required response headers for me. In particular, the first header of every response message must be a Status response header. In this case, that means a Status header giving the client a redirection response such as this:

HTTP/1.0 302 Redirection

Then the Location command is included in the response headers, and the client goes to the correct location.

Now your browser will retrieve the correct Web page for its capabilities. I will continue to refer to the HTTP headers throughout these tutorials. This is just one simple example of how you can use these headers to make your Web pages more effective for your clients. In Tutorial 7, where you put everything together, you will see HTTP headers as part of a complete online catalog application.