Internet

Beginning CGI Programming with Perl

The Directories on Your Server

The first thing you need to learn is how to get around on your server. If you have a personal account with an Internet service provider, your personal directory should be based on your username. In my case, I have a personal account with an Internet service provider and a business account from which I manage multiple business Web pages. Your personal account probably is similar to mine; I can build Web pages for Internet access under a specific directory called public-web. The name isn’t really important-just the concept of having a directory where specific operations are allowed.

Usually, you will find that your server is divided into two directory trees. A directory tree consists of a directory and the subdirectories below the main directory. Most UNIX Web servers separate their users from the system administrative files by creating separate directory trees called the server root and the document root.

The Server Root

The server root contains all the files for which the Webmaster or System Administrator is responsible. You probably will not be able to change these files, but there are several of them you will want to be aware of, because they provide valuable information about where your programs can run and what your CGI programs are allowed to do. Below the server root are two subdirectories that you should know about. Those directories, located on the ncSA server, usually are called the log directory and the conf directory. If you are not working on an ncSA server, you will find that the CERN and other servers have a similar directory structure with slightly different names.

The Log Directory

The log directory is where all the log files are kept. Within the log directory are your error log files. Error log files keep track of each command from your CGI, SSI commands, and HTML files that generates some type of error. When you are having problems getting something to work, the error log file is an excellent place to start your debugging. Usually, the file begins with err. On my server, the error log file is called error.log. Another log file you can make good use of is the access.log file. This file contains each file that was accessed by a user. This file often is used to derive access counts for your Web page. Building counters is discussed in Tutorial 10, “Keeping Track of Your Web Page Visitors.” Also in your log directory is a list of each of the different types of browsers accessing your Web site. On my server, this file is called the referer.log. You can use this information to direct a specific browser to Web pages written just for browsers that can or can’t handle special HTML extensions. Redirecting a browser based on the browser type is discussed in Tutorial 2. In addition to the log files are the configuration files below the conf directory.

The conf Directory

The conf directory contains, in addition to other files, the access.conf and srm.conf files. Understanding these files helps you understand the limitations (or lack of limitations) placed on your CGI programs. Both these files are covered in more detail in Tutorial 12, “Guarding Your Server Against Unwanted Guests.” This introduction is only intended to familiarize you with their purposes and general layouts.

The access.conf file is used to define per-directory access control for the entire document root. Any changes to this file require the server to be restarted in order for the changes to take effect. Each of the file’s command sets is contained within a

<DIRECTORY directory_path> ... </DIRECTORY>

command. Each

<DIRECTORY directory_path > ... </DIRECTORY>

command affects all the files and subdirectories for a single directory tree, defined by the directory_path. Remember that a directory tree is just a starting path to a directory and all the directories below that directory.

The srm.conf file controls the server after it has started up. Inside this file, you will find the path to the document root and an alias command telling the server where to hunt for CGI scripts. The srm.conf file is used to enable SSI commands and to tell the server about new file extensions that aren’t part of the basic MIME types. One file type that you should be particularly interested in is the x-parsed-html-type file type, which tells the server which files to look in for the SSI commands.

This brief introduction to your configuration files should just whet your appetite for the many things you can learn by understanding how your server configuration files work.

The Document Root

You normally will be working in a directory tree called the document root. The document root is the area where you put your HTML files for access by your Web clients. This probably will be some subdirectory of your user account. On my server, the document root for each user account is public-web. Users who want to create public Web pages must place those Web pages in the public-web subdirectory below their home directory. You can create as many subdirectories below the public-web directory as you want. Any subdirectory below the public-web directory is part of the document root tree.

How do you find out what the document root is? It is easy, even if you aren’t a privileged user. Just install the HTML Print Environment Variables program or the Mail Environment Variables program (described in Tutorial 6), and you will see right away what the document root directories are on your server. To find out what the server root is, you need to contact your Webmaster or System Administrator.

File Privileges, Permissions, and Protection

After you figure out where to put your HTML, SSI commands, and CGI files, the next thing you need to learn is how to enable them so that they can be used by the WWW server.

When you create a file, the file is given a default protection mask set up by one of your login files. This normally is done by a command called umask. Before you learn how to use the umask command, you should learn a bit about file-protection masks.

File protections also are referred to as file permissions. The file permissions tell the server who has access to your file and whether the file is a simple text file or an executable program. There are three main types of files: directories, text files, and executable files. Because you will be using Perl as your scripting language, your executable CGI programs will be both text and executable files. Directory files are special text files that are executable by the server. These files contain special directives to the server describing to the server where a group of files is located.

Each of these file types has three sets of permissions. The permissions are Read, Write, and Execute. The Read permission allows the file to be opened for reading, but it cannot be modified. The Write permission allows the file to be modified but not opened for reading. The Execute permission is used both to allow program execution and directory listings. If anyone (including you) is going to be able to get a listing or move to a directory, the Execute permission on the directory file must be set. The Execute permission also must be set for any program you want the server to run for you. Regardless of the file extension or the contents of a file, if the Execute permission is not set, the server will not try to run or execute the file when the file is called.

This is probably one of the most common reasons for CGI programs not working the first time. If you are using an interpretive language like Perl, you never run a compile and link command, so the system doesn’t automatically change the file permissions to Execute. If you write a perfectly good Perl program and then try to run it from the command line, you might get an error message like Permission denied. If you test out your CGI program from your Web browser, however, you are likely to get an error like the one shown in Figure 1.1-an Internet file error with a status code of 403. This error code seems kind of ominous the first time you see it, and it really doesn’t help you very much in figuring out what the problem is.

Figure 1.1 : The Forbidden error message.

Remember that there are three types of file permissions: Read, Write, and Execute. Each of these file permissions is applied at three separate access levels. These access levels define who can see your files based on their username and groupname.

When you create a file, it is created with your username and your groupname as the owner and groupname of the file. The file’s Read, Write, and Execute permissions are set for the owner, the group, and other (sometimes referred to as world). This is very important because your Web page is likely to be accessed by anybody in the world. Usually, your Web server runs as user Nobody. This means that when your CGI program is executed or your Web page is opened for reading a process with a groupname different than the groupname you belong to, someone else will be accessing your files. You must set your file-access permissions to allow your Web server access to your files. This usually means setting the Read and Execute privileges for the world or other group. Figure 1.2 shows a listing of the files in one of my business directories. You can see that most of the files have rw privileges for the owner and Read privileges only for everyone else. Notice that the owner is yawp (that’s my personal user name) and the group is bizaccnt. You can see that directories start with a d, as in the drwxr-xr-x permissions set. The d is set automatically when you use the mkdir command.

Figure 1.2 : A directory listing showing file permissions.

In order for your Web page to be opened by anyone on the Net, it must be readable by anyone in the world. In order for your CGI program to be run by anyone on the Net, it must be executable by your Internet server. Therefore, you must set the permissions so that the server can read or execute your files, which usually means making your CGI programs world executable. You set your file permissions by using a command called chmod (change file mode). The chmod command accepts two parameters. The first parameter is the permissions mask. The second parameter is the file for which you want to change permissions. Only the owner of a file can change the file’s permissions mask.

The permissions mask is a three-digit number; each digit of the number defines the permission for a different user of the file. The first digit defines the permissions for the owner. The second digit defines the permissions for the group. The third digit defines the permissions for everyone else-usually referred to as the world or other, as in other groups. Each digit works the same for each group of users: the owner, group, and world. What you set for one digit has no effect on the other two digits. Each digit is made up of the three Read, Write, and Execute permissions. The Read permission value is 4, the Write permission value is 2, and the Execute permission is 1. You add these three numbers together to get the permissions for a file. If you want a file to be only readable and not writable or executable, set its permission to 4. This works the same for Write and Execute. Executable only files have a permission of 1. If you want a file to have Read and Write permissions, add the Read and Write values together (4+2) and you get 6-the permissions setting for Read and Write. If you want the file to be Read, Write, and Execute, use the value 7, which is derived from adding the three permissions (4+2+1). Do this for each of the three permission groups and you get a valid chmod mask.

Suppose that you want your file to have Read, Write, and Execute permissions (4+2+1) for yourself; Read and Execute (4+1) for your group; and Execute only (1) for everyone else. You would set the file permissions to 751 by using this command:

chmod 751 (filename)

Table 1.1 shows several examples of setting file permissions.

Table 1.1. Sample file permissions and their meanings.

Command Meaning
chmod 777 filename The file is available for Read, Write, and Execute for the owner, group, and world.
Chmod 755 filename The file is available for Read, Write, and Execute for the owner; and Read and Execute only for the group and world.
Chmod 644 filename The file is available for Read and Write for the owner, and Read only for the group and world.
Chmod 666 filename The file is available for Read and Write for the owner, group, and world. I wonder if the 666 number is just a coincidence. Anybody can create havoc with your files with this wide-open permissions mask.
Tip

If you want the world to be able to use files in a directory, but only if they know exactly what files they want, you can set the directory permission to Execute only. This means that intruders cannot do wild-card directory listings to see what type of files you have in a directory. But if someone knows what type of file he wants, he still can access that file by requesting it with a fully qualified name (no wildcards allowed).

When you started this section, you were introduced to a command called umask, which sets the default file-creation permissions. You can have your umask set the default permission for your files by adding the umask command to your .login file. The umask command works inversely to the chmod command. The permissions mask it uses actually subtracts that permission when the file is created. Thus, umask stands for unmask. The default umask is 0, which means that all your files are created so that the owner, group, and world can read and write to your files, and all your directories can be read from and written to. A very common umask is 022. This umask removes the Write privilege for group and other users from all the files you create. Every file can be read and all directories are executable by anyone. Only you can change the contents of files or write new files to your directories, however.