Internet

Beginning CGI Programming with Perl

Using the HTTP Headers

HTTP headers are the language your browser and server use to talk to each other. Think of each of the HTTP headers as a single message. In the client and server sense, first there are a bunch of questions (which are the request headers) and then the answers to those questions (which are the response headers).

To use the operator analogy again, think of the request headers-which come from the client-as you asking to speak to Mr. Thae. The response headers can be the operator, responding with “Mr. Thae is in Room 904, I’m connecting you now.” From there, if you have a good operator, the operator stays on the line and gives you the status of your connection request.

Status Codes in Response Headers

When the operator responded with “Mr. Thae is in Room 904,” the caller got a Status response header. The first HTTP response header sent in response to any HTTP request header is a status line. The status line is made up of status codes.

The status codes in the response header tell the client how well your request for a URI went. The status codes are discussed throughout these tutorials; they are included in Appendix C, “Status Codes and Reason Phrases.”

Here’s an overview of status codes so that you can recognize them throughout the remainder of the book:

  • Information status codes are for experimental purposes and only provide information. These status codes are in the 100s. If, instead of connecting you to Mr. Thae’s room, the operator had responded with “Mr. Thae is in Room 904, would you like me to connect you?” this would be considered an informational message.
  • Success status codes are in the 200s. Consider if the operator first had called Mr. Thae, confirming that he was in the room and willing to talk to you. A status code of 200 (OK) would correspond to the operator saying, “Mr. Thae is on the line now.”
  • Redirection status codes are in the 300s. The operator could have said “Mr. Thae is in a meeting in Room 908.” This corresponds to a status code of 302, which states that the URI temporarily moved.
  • Client error codes are in the 400s. They are the most useful and the most complex of the status codes. Client error codes can be used to demand payment before answering the phone. Maybe Mr. Thae operates a 900 number. If the operator responded with “Mr. Thae is not at this number,” this would correspond to a 400, Bad Request, status code.
  • Server error codes are in the 500s. If your operator had apoplexy because you wanted to talk to Mr. Thae and said, “Who do you think you are asking me to let you talk to-MR. Thae?!” This would correspond to a status code of 503, Service Unavailable.

In summary, 100s are informational, 200s indicate success, 300s are redirection codes, 400s are client error codes, and 500s are server error status codes. Refer to Appendix C for a complete definition of the status codes.

There are two basic types of headers: request and response headers. The client makes the request of the server, and the server builds the response headers. The most common request header is the Get method request header.

The Method Request Header

The client sends to the server several request headers defining for the server what the client wants, how the client can accept data, how to handle the incoming request, and any data that needs to be sent with the request.

The first request header for every client server communication is the method request header. This request header tells the server what other types of request headers to expect and how the server is expected to respond. Two types of method headers exist: The simple method request and the full method request.

The simple method request header is used only to support browsers that accept only HTTP/0.9 protocol. Because HTTP/0.9 is no longer the standard and the full method request header duplicates the definition of the simple method request header, an explanation of the simple method request header is not included here.

The simple method request header is made up of two parts separated by spaces: the request type, followed by the URI requested:

Request_Method URI \n

The most common request methods are Get, Post, and Head. The HTTP specification also allows for the Put, Delete, Link, and Unlink methods, along with an undefined extension method. Because you mainly will be dealing with the Get and Post methods, this Tutorial concentrates on those.

Each of the request headers identifies a URI to the server. The difference between Get and Post is the effect on how data is transferred. The Head request method affects how the requested URI is returned to the client.

The next section covers the full method request line. This is the request header that includes the type of access (Get, Post, Head, and so on) that the client is requesting. Of all the request headers, this is the one that really makes things work. This is the request header that tells the server which Web page you want returned to the browser. Without this header, no data can be transferred to the calling client.

The Full Method Request Header

The full method request header is the first request header sent with any client request. The full method request line is made up of three parts separated by spaces: the method type, the URI requested, and the HTTP version number.

Here’s the syntax of the full method request header illustrated logically and by a syntactically correct example:

Request_Method URI HTTP_Protocol_Version \nGET http://www.accn.com/index.html HTTP/1.0

Explanations for each part of the full method request header follow:

  • Request_Method can be any of the following method types: Get, Post, Head, Put, Delete, Link, or Unlink.
  • URI is the address of the file, program, or directory you are trying to access.
  • HTTP_Protocol_Version is the version number of the HTTP protocol that the client/browser can handle.

The Get HTTP Header

The Get method is the default method for following links and passing data on the Internet. After you click on a link, your browser sends a Get method request header. When you click the Submit button on a form, if the method is undefined in the Action field of the form, the Get method request header is used to call the CGI program that handles the form data. Tutorial 4 “Using Forms to Gather and Send Data,” covers forms and this method of sending data in detail.

When you click on a URI, it usually is of the form

http://www.somewhere.com/filename.html

A Get method request header is generated along with any other request header the browser might want to send. The URI is located and returned by the browser, unless an If-Modified-Since request header was sent along with the other request headers.

When the If-Modified-Since header is included in the request headers, the server checks the modification date of the requested URI and returns a new copy only if it has been modified after the date specified.

When you click on a URI and that URI is a request for another Web page, you send a Get method request header and lots of other headers to your server.

The Requested URI

The second field in the first line of the request header of the full method request header is the requested URI. The URI tells the server what file or service is requested.

Normally, the full method request header is for a file on the server. When this is the case, the absolute path of the file/URI is included in the method request header. An example Get method request header is GET / HTTP/1.0.

Tip

Notice that an HTML file is not identified for this Get method. The default home page or starting Web page is index.html. If you’re lazy like me and don’t want to type a Web page URI for the home page, make your home page index.html, and your Web server automatically goes to that page.

The format of the requested URI is the absolute pathname of the server root. This sentence has always confused me, so I’m going to explain it here so that I can always remember what an absolute pathname of the document root is. Take a look at a Get method request header of /~yawp/test/env.html/ as an example:

  • The absolute pathname is the directory and filename of the URI, beginning at the / directory. For this example, I show the absolute pathname to my personal directory ~yawp with a subdirectory of test and a filename of env.html.
  • This / directory is defined by your Server Administrator as the starting location for all Web pages or URIs on the server. This also is called the server root.
  • In my case, the Server Administrator has defined a public-web directory in every user’s home directory. So the actual path to the env.html file is yawp/public-web/test/env.html

On my commercial server, the server root looks like

www-practical-inet.com

but the real path is

/usr/local/business/http/practical-inet.com

The Proxy Get Method Request Header

If the target of the URI is a proxy server, it should send an absolute URI. An absolute URI includes the domain name and the full pathname to the requested URI. The domain name in this example is www.w3.org:

GET http://www.w3.org/hypertext/WWW/TheProject.html HTTP/1.0

The HTTP Version

The last field in the full method request header is HTTP version. Currently, the only valid values are HTTP/1.0, followed by a CRLF. If the request is for an HTTP/0.9 server, a simple method request header should be used. If you’re interested in keeping up with the latest HTTP protocol, you can find a hypertext version of the HTTP RFC at

http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html

Table 2.1 summarizes the request/response headers used by the server and client to communicate with each other. They are defined completely in the HTTP specification. I have included some of the more obscure ones. I will discuss several of the more common headers in more detail.

The most important thing to remember is that the request/response headers are the means by which your client and browser tell each other what is needed and what is available.

Table 2.1. HTTP request/response headers.

Request/Response Header Function
Accept Tells the server what type of data the browser can accept. Examples include text, audio, images, and so on.
Accept-Charset Tells the server what character sets the browser prefers. The default is US-ASCII.
Accept-Encoding Tells the server what type of data encoding the browser can accept. Examples are compress and gzip.
Accept-Language Tells the server what natural language the browser prefers. The default is English.
Allow Tells the browser what request methods are allowed by the server. Examples are Get, Head, and Post.
Authorization Used by the browser to authenticate itself with the server. It usually is sent in response to a 401 or 411 code.
Content Encoding Identifies the type of encoding used on the data transfer. An example is compressed.
Content Language Identifies the natural language of the data transferred.
ContentLength Identifies the size of the data transfer in decimal bytes.
Content Transfer Encoding Identifies the encoding of the message for Internet transfer. The default is binary.
Content-Type Identifies the type of data being transferred. An example is Content-Type: text/html \n.
Date Identifies the GMT date/time at which the data transfer was initiated.
Expires Identifies the date/time at which the data should be considered stale. This header often is used by caching clients.
Forwarded Used by proxy servers to indicate the intermediate steps between the browser and server.
From Contains the Internet e-mail address of the client. This header is no longer in common use.
If-Modified-Since Makes the request method a conditional request. A copy of the requested URI is returned only if it was modified after the time specified.
Last-Modified Identifies the date/time when the URI was last modified.
Link Describes a relationship between two URIs.
Location Defines the location of a URI. Typically, this header is used to redirect the client to a new URI.
MIME-Version Indicates what version of the MIME protocol was used to construct the transferred message.
Orig-URI Used by the client to specify to the server the original URI of the requested URI.
Pragma Specifies special directives that should be applied to all intermediaries along the request/response chain. This header usually provides directives to proxy servers or caching clients.
Public Lists the set of non-standard methods supported by the server.
Referer Identifies to the server the address (URI) of the link that was used to send the method request header to the server.
Retry-After Identifies to the client a length of time to wait before trying the requested URI again.
Server Identifies the server software used by the server.
Title Identifies the title of the URI.
URI-Header Specifies a uniform resource identifier.
User-Agent Identifies the type of browser making the request.
WWW-Authenticate Required when status response headers of Unauthorized (401) or Authorization refused (411) appear. This header is used to begin a challenge/response sequence with the client.

The Accept Request Header

After the initial method request header, one of the more common and useful request headers is the Accept request header. This header tells the server what type of response the client can handle.

The Accept request header has this format:

Accept: media-type; quality

Table 2.2 lists the basic media types, which are of MIME format. A complete list of MIME types is included in Appendix A, “MIME Types and File Extensions.”

Table 2.2. Basic media types.

MIME Type Definition
Application Tells the server what application to run based on the file extension.
Audio Specifies the type of audio that can be handled by the browser. Commonly includes basic, x-aiff, and x-wav.
Image Specifies the type of image that can be handled by the browser. Commonly includes gif and jpeg.
Text Specifies the type of text that can be handled by the browser. Commonly includes html, plain, rich text, and x-setext.
Video Specifies the type of video that can be handled by the browser. Commonly includes mpeg and quicktime.

Media Type

The first field of the Accept request header is the type of media that can be handled by this browser. That field is followed by a semicolon and then the quality factor. The quality factor is usually a request to not send 100 percent of the data associated with the URI. Adjusting the quality factor can speed up downloads; in most cases, the quality of the sound, image, or video is greater than the quality required for viewing or listening from your computer, as illustrated here:

Accept: audio/*; q=0.5

This means that I can accept any type of audio, and please degrade the audio data by 50 percent. Degrading the audio means less data transfer. You can use this to speed up audio transfers-for example, when you are receiving only voice and don’t care about full-quality sound.

The * in this example can be used on either side of the media-type designator. The default for the Accept media type is */*. Because the Accept header should be used only for restricting the types of media the client can receive, Accept */* is redundant, not required, and not recommended.

The common media types are text, image, and audio. Some of the text types are html, plain, x-dvi, and x-c. The standard text media types used on the Net are html and plain. For image, jpeg and gif are the two standards right now. Because of its smaller data size, jpeg is becoming the new preferred image format.

Quality

If you are not concerned about losing some detail, you can use the Quality field to speed up the downloading of files. The image format jpeg is an example in which a degradation in data, by removing detail, produces an image that is almost as good as the original and much smaller in data size. Because a large portion of the Net is connected by limited speed connections (modems and such), you should always consider data transfer when developing your Web page.

The default quality factor is 1, which translates to 100 percent. The format is q=factor. The factor can be any number from 1 to 0 and usually is expressed in tenths. An example is q=0.8.

The Get method request header and Accept request header are the most common request headers. Your browser may send more information to the server, but these two define to the server what the request is and the fundamentals of how to respond to your request.

The HTTP Response Header

After the server receives the request headers, it begins to generate the correct response. The server starts by looking up the URI in the Get method and then generates the response headers. The Get method request tells the server what URI is desired. The other request headers tell the server how to send the data back to the client. The Accept request header with its Quality field, for example, tells the server how much to degrade the returned data.

So, in short, the response headers are the server’s response to the client’s URI request. This is the operator’s chance to tell you to take a flying leap or to politely satisfy your every request.

In this case, assume that you have a polite operator and a valid request. In Tutorial 7 “Building an Online Catalog,” you will deal with some of the more persnickety operators-the kind who want to know your username, password, and other stuff like that.

After the server receives a request, it must choose a valid response. It starts with a response status line. This line gives the protocol version, followed by a status code. The format of a response status line follows:

PROTOCOL/Version_Number Status_Code Status_Description

The only valid protocol right now is HTTP, and version 1.0 is the standard at the moment. Notice how I add all those qualifiers; the Net moves so fast that fixed rules are sure to be overrun by some wild-and-crazy, new idea. Of course, that’s what makes the Net so neat.

Figure 2.2 shows the response headers generated when the server receives a Get method request header.

Figure 2.2 : The server response headers to a Get method request header.

Now take a moment to go through the response headers shown in Figure 2.2. These are the basic ones that will be returned from almost any request header.

The Status response line follows:

HTTP/1.0 200 OK

Nothing to write home about in this response header. Nice, simple, and straightforward. The HTTP version number is 1.0. The status is 200. The status description is OK. This means that your server found your requested URI and is going to return it to the browser.

The Date Response Header

The next line is the Date response header:

Date: Mon, 02 Oct 1995 11:11:32 GMT

This is the time at which the server generated the response to the request header. The date must be in Greenwich Mean Time (GMT). The date can be in one of three formats (see Table 2.3).

Table 2.3. Greenwich Mean Time (GMT) format.

Example Description
Wed, 06 Nov 1996 06:15:10 GMT Originally defined by RFC 822 and updated by RFC 1123, this is the preferred format Internet standard.
Wednesday, 06-Nov-96 06:15:10 GMT Defined by RFC 850 and made obsolete by RFC 1036, this format is in common use but is based on an obsolete format and lacks a four-digit year.
Wed Nov 6 06:15:10 1996 This is the ANSI standard date format represented in C’s asctime() function.

Only one Date response header is allowed per message, and because it is important for evaluating cached responses, the server always should include a Date response header. Cached responses are beyond the scope of these tutorials, but, in short, they can be part of a request/response chain used to speed up URI transfers.

The Server Response Header

The Server response header field contains information about the server software used to create the response:

Server: Apache/0.8.13

If you are having problems with your CGI working with a particular site, this can identify the type of server software with which your CGI is failing.

The Content-Type Response Header

The Content-Type header field tells your browser what type of media is appended after the last response header:

Content-type: text/html

Media types are defined in Appendix A, “MIME Types and File Extensions.”

The Content-Length Response Header

The Content-Length header field indicates the size of the appended media in decimal numbers in 8-bit format (referred to in the HTTP specification as octets):

Content-length: 1529

This header often is used by the server to determine the amount of data sent by the client when posting form data.

The Last-Modified Response Header

Because you are passing a file URI that is a text/html type, the Last-Modified field is the time the file was last modified. This field is used for caching information:

Last-Modified: Mon, 04 Sep 1995 17:42:40 GMT

If an If-Modified-Since request header was sent, it is used to determine whether the data should be transferred at all.

The Enclosed URI

The last line of the response headers is blank, and, after that, the requested URI is shipped to the client. This is the blank line in Figure 2.2 just before the opening <html> tag.

This is one of the most common reasons for response headers not working. Don’t make this CGI newbie mistake. All your HTTP response and request header chains must end with a blank line.

The last print statement of an HTTP header program you write should print a blank line:

print "Last-modified: $last_modified_variable\n\n";

Notice in this example that two newlines (\n) are printed. One always is required for every HTTP header, but the second newline indicates to the server or client the end of any incoming or outgoing HTTP headers. Everything after that first blank line is supposed to be in the format defined by the Content-Type header.

So now you know all about request and response headers. You know that the browser and the server use them to transfer data back and forth. So now that you know about request/response headers, what can you do with that knowledge?

Certainly there are all types of choices, but here is a real-world example that you just might have to deal with.

Changing the Returned Web Page Based on the User-Agent Header

One of the things I do to make a living is build Web pages. One of the most frustrating experiences I have is building a great-looking Web page that uses all the great features of HTML+ and then hearing from my customer that his Web page looks awful. What happened? Well, the most common problem is that my client does not have the latest and greatest Netscape version. The browser he is using just doesn’t deal with the latest HTML enhancements.

That’s the pits. My view of the page is great. He thinks it stinks. I’ll never convince him that what is out there looks good. And to him, it certainly doesn’t. Have you ever seen table data when your browser doesn’t support tables? UGLY!!

So what do I do about it? Well, I don’t experience that frustration anymore. I build two Web pages: one for browsers that handle the latest HTML enhancements and one for browsers that don’t.

This means more work for me but a more versatile page for my clients. It’s not too difficult a task to take advantage of the incoming request headers and then send back a Location response header that redirects the client to the correct page for his browser. Just to show what a difference this can make, the next two figures show an HTML+ page with table data. Figure 2.3 shows the data when it is understood by the browser. Figure 2.4 shows the same page when the browser doesn’t handle tables. Notice that the table data of County Line locations shown in Figure 2.3 is a jumbled list at the bottom of the Web page in Figure 2.4. And finally, Figure 2.5 shows that page rebuilt without tables.

Figure 2.3 : A working HTML + page for County Line Barbecue.

Figure 2.4 : A broken HTML + page for County Line Barbecue.

Figure 2.5 : An HTML 1.0 page for County Line Barbecue.

If you’re curious, you can see the difference between HTML+ tables and HTML 1.0 in Figures 2.3 and 2.5. Listing 2.1 is the HTML fragment for Figure 2.3. Listing 2.2 is the same data reformatted for HTML 1.0, as shown in Figure 2.5. My main complaint with list-data formatting is that I can’t get enough data on a computer screen. There is just too much wasted space in the HTML 1.0 version. There are other options, but none of them presents the data as neatly formatted as the HTML+ tables.
Listing 2.1. An HTML+ fragment using tables to present County Line locations.

01: <h1 > <a name="loc"> The County Line Locations </h1>02: <center>03: <table border=10 cellpadding=10 width=100%>04: <th align=center> New Mexico05: <th align=center>  Austin, Texas06: <th align=center>  Texas07: <th align=center> Louisiana08: <tr>09: <td align=left> <a href="New-Mexico-albq-e.html">  Albuquerque  East</a>10: <td align=left> <a href="Austin-hill.html"> On the Hill  </a>11: <td align=left> <a href="Texas-corpus.html"> Corpus Christie   </a>12: <td align=left> <a href="Louisiana-new-orleans.html"> New Orleans </a>13: <tr>14: <td align=left>   <a href="New-Mexico-albq-n.html">Albuquerque North </a>15: <td align=left> <a href=" Austin-lake.html "> On the Lake  </a>16: <td align=left>  <a href=" Texas-dallas.html "> Dallas </a>17: <td align=left> <a href="Louisiana-new-orleans-dtwn.html">  New Orleans  Dwtn </a>18: <tr>19: <td align=left>  <a href=" New-Mexico-sante-fe.html"> Santa Fe</a>20: <td align=left> <a href=" Austin-sixth.html "> On Sixth Street  </a>21: <td align=left>  <a href=" Texas-houston.html "> Houston</a>22: <td align=left> <a href="Louisiana-baton-rouge.html">Baton Rouge </a>23: <tr>24: </table>

Once you see how easy it is to direct the browser to the correct Web page, you’ll agree that this is a reasonable solution, even if it does require extra work. In addition, it isn’t too difficult to create a second Web page for the HTML 1.0 browsers. The HTML 1.0 fragment in Listing 2.2 shows the changes required to reformat the Web page to HTML 1.0 lists.


Listing 2.2. An HTML 1.0 fragment using lists to present County Line locations.

01: <h1 > <a name="loc"> The County Line Locations </h1>02: <h3> Austin, Texas </h3>03: <ul>04: <li><a href="Austin-hill.html"> On the Hill  </a>05: <li><a href=" Austin-lake.html "> On the Lake  </a>06: <li><a href=" Austin-sixth.html ">  On Sixth Street </a>07: </ul>08:09: <h3>Texas   </h3>10: <ul>11: <li><a href="Texas-corpus.html"> Corpus Christie   </a>12: <li><a href=" Texas-dallas.html "> Dallas  </a>13: <li><a href=" Texas-houston.html "> Houston  </a>14: </ul>15:16: <h3> New Mexico </h3>17: <ul>18: <li> <a href="New-Mexico-albq-e.html">Albuquerque East </a>19: <li> <a href=" New-Mexico-albq-n.html">Albuquerque North </a>20: <li> <a href=" New-Mexico-sante-fe.html">Sante Fe  </a>21: </ul>22:23: <h3> Louisiana  </h3>24: <ul>25: <li><a href="Louisiana-new-orleans.html"> New Orleans  </a>26: <li><a href="Louisiana-new-orleans-dtwn.html"> New Orleans  Dwtn</a>27: <li><a href="Louisiana-baton-rouge.html">Baton Rouge </a>28: </ul>