Contents:
Typical Server Behavior
Configuring the Server
The Web server is the software responsible for accepting browser requests, retrieving the specified file (or executing the specified CGI script), and returning its contents (or the script's results). Most Web servers on the Internet today run on UNIX machines, although the percentage of servers on other platforms (such as Windows 95, Windows NT, and the Macintosh) is steadily increasing.
Web servers are often called httpd, using a UNIX convention in which daemons are named with the name of the service followed by the letter "d". (A UNIX daemon is a process that sits idle waiting for other programs to make requests.)
On UNIX, there are four major flavors of Web server:
We cover all four of these servers, as well as:
The UNIX servers are configured using configuration files containing directives that control basic settings such as server name and directory paths to file access and authorization. The Netscape family of servers is configured via HTML forms but maintains its data in configuration files. The WebSite server does not use configuration files but is instead configured via a series of dialog boxes and wizards.
Web servers first retrieve the request using Berkeley sockets, a mechanism for communicating over a network. The Web server listens for requests on a particular port on the server machine, generally port 80. By default, Web browsers use port 80 for their requests.
Once the server receives the request, it locates the document being requested. It looks for the file under the document root directory. For example, if the document root is /usr/local/httpd/htdocs, and the client requests the document /staff/matthew.html, then the server retrieves /usr/local/httpd/htdocs/staff/ matthew.html.
If the URL doesn't specify a file but just a directory, the server returns the directory index file, generally called index.html or welcome.html.
The server sends the contents of the file back to the client, along with some HTTP response headers (see Chapter 19, HTTP Headers). Among the data in the response headers is the media type (also known as a content type or MIME type), i.e., the format that the file is in. The way it determines the format depends on the server, but usually it comes from the suffix of the document--e.g., .html is taken to be an HTML document, .pdf is assumed to be an Adobe Acrobat document, etc. See Chapter 20, Media Types and Subtypes, for a listing of common media types and the associated suffixes, if any.