Programming Perl, Second Edition

Previous Chapter 6
Social Engineering
Next
 

6.2 Cooperating with Other Processes

Processes have almost as many ways of communicating as people do. But the difficulties of interprocess communication (IPC) should not be underestimated. It doesn't do you any good to listen for verbal cues when your friend is using only body language. Likewise, two processes can communicate only when they agree on the method of communication, and on the conventions built on top of that method. These layered conventions often gain the weight of "methodhood" themselves, so you'll sometimes hear people talking about stacks of communication methods. We can't hope to cover all the methods used in the world today, but we'll discuss some of the methods most commonly used in Perl.

The IPC facilities of Perl range from the very simple to the very complex. Which facility you want to use depends on the complexity of the information to be communicated. The simplest kind of information is, in a sense, no information at all, but just the awareness that a particular event has happened at a particular point in time. In Perl, these events are communicated via a signal mechanism modeled on the UNIX signal system.

At the other extreme, the socket facilities of Perl allow you to communicate with any other process on the Internet using any mutually supported protocol you like. Naturally, this freedom comes at a price: you have to go through a number of steps to set up the connections and make sure you're talking the same language as the process on the other end, which may in turn require you to adhere to any number of other strange customs, depending on the cultural conventions at work. To be protocoligorically correct, you might even be required to speak a language like HTML, or Java, or Perl. Horrors.

Sandwiched in between are some facilities intended primarily for communicating between processes on the same machine. These include pipes, FIFOs, and the various System V IPC calls.

Signals

Perl uses a simple signal handling model: the %SIG hash contains references (either symbolic or hard) to user-defined signal handlers. When an event transpires, the handler corresponding to that event is called with one argument containing the name of the signal that triggered it. In order to send a signal to another process, you use the kill function. If that process has installed a signal handler, it can execute code when you send the signal, but there's no way to get a return value (other than knowing that the signal was successfully sent).

We've classified this facility as a form of IPC, but in fact, signals can come from various sources, not just other processes. A signal might come from another process, or from your own process, or it might be generated when the user at the keyboard types a particular sequence like CTRL-C or CTRL-Z, or it might be manufactured by the kernel when special events transpire, such as when a child process is exiting, or when your process is running out of stack space, or hitting a file size limit.[3] But your own process can't easily distinguish among these cases. A signal is like a package that arrives mysteriously on your doorstep with no return address. You'd best open it carefully.

[3] Nevertheless, these are all examples of cooperation with something, even if it's not another process. Certainly, you tend to get more accomplished when you cooperate with your operating system.

For example, to unpack an interrupt signal, set up a handler like this:

sub catch_zap {
    my $signame = shift;
    $shucks++;
    die "Somebody sent me a SIG$signame!";
} 
$SIG{INT} = 'catch_zap';  # could fail outside of package main
$SIG{INT} = \&catch_zap;  # best strategy

Notice how all we do in the signal handler is set a global variable and then raise an exception with die. We try to avoid anything more complicated than that, because on most systems the C library is not re-entrant. Signals are delivered asynchronously, so calling any print functions (or even anything that needs to malloc (3) more memory) could in theory trigger a memory fault and subsequent core dump if you were already in a related C library routine when the signal was delivered. (Even the die routine is a bit unsafe unless the process is executing within an eval, which suppresses the I/O from die, which keeps it from calling the C library. Probably.)

The operating system thinks of signals as numbers rather than names. To find the names of the signals, you can use the kill -l command on your system (if you're running UNIX). Or you can retrieve them from Perl's Config module; the following snippet sets up two arrays: a @signame array indexed by number to get the signal name, and a %signo hash indexed by name to get the signal number:

use Config;
defined $Config{sig_name} or die "No sigs?";
$i = 0;     # Config prepends fake 0 signal called "ZERO".
foreach $name (split(' ', $Config{sig_name})) {
    $signo{$name} = $i;
    $signame[$i] = $name;
    $i++;
}

So to check whether signal 17 and SIGALRM are the same, you could do this:

print "signal #17 = $signame[17]\n";
if ($signo{ALRM}) { 
    print "SIGALRM is $signo{ALRM}\n";
}

You may also choose to assign either of the strings 'IGNORE' or 'DEFAULT' as the handler, in which case Perl will try to discard the signal or do the default thing. Some signals can be neither trapped nor ignored, such as the KILL and STOP signals. You can temporarily ignore other signals by using a local signal handler assignment, which goes out of effect once your block is exited. (Remember, though, that local values are inherited by functions called from within that block.)

sub precious {
    local $SIG{INT} = 'IGNORE';
    &more_functions;
} 
sub more_functions {
    # interrupts still ignored, for now...
}

Sending a signal to a negative process ID means that you send the signal to the entire UNIX process-group. This code sends a hang-up signal to all processes in the current process group except for the current process itself:

{
    local $SIG{HUP} = 'IGNORE';
    kill HUP => -$$;   # snazzy form of: kill('HUP', -$$)
}

Another interesting signal to send is signal number 0. This doesn't actually affect the other process, but instead checks whether it's alive or has changed its UID. That is, it checks whether it's legal to send a signal, without actually sending one.

unless (kill 0 => $kid_pid) {
    warn "something wicked happened to $kid_pid";
}

Another cute trick is to employ anonymous functions for simple signal handlers:

$SIG{INT} = sub { die "\nOutta here!\n" };

Because it's a subroutine without a name, this approach can be problematic for complicated handlers that need to reinstall themselves. That's because Perl's signal mechanism was historically based on the signal (3) function from the C library. On some systems, this function was broken; that is, it behaved in the unreliable System V way rather than the reliable BSD (and POSIX) fashion. This meant that you had to reinstall the signal handler each time it got called.[4] You also had to manually restart interrupted system calls. Careful programmers tend to write self-referential handlers that reinstall themselves:

[4] If you were lucky. The old signal behavior had a race condition whereby you couldn't guarantee that you could reset your handler in time before the next signal came in, which is why it was changed.

sub REAPER { 
    $waitedpid = wait;
    $SIG{CHLD} = \&REAPER;  # loathe sysV
}
$SIG{CHLD} = \&REAPER;
# now do something that forks...

or, somewhat more elaborately:[5]

[5] Although it seems unlikely that you would have POSIX WNOHANG waitpid behavior while lacking proper POSIX signals.

use POSIX "sys_wait_h";
sub REAPER { 
    $SIG{CHLD} = \&REAPER;  # loathe sysV, dream of real POSIX
    my $child;
    while ($child = waitpid(-1, WNOHANG)) {
        $Kid_Status{$child} = $?;
    } 
}
$SIG{CHLD} = \&REAPER;
# do something that forks...

And if you're writing code to behave the same way everywhere, even on rather old systems, it all gets more complex yet. Loops with blocking system calls (like <FILE> or accept) need additional logic to handle system calls that return failure for silly reasons, such as when your SIGCHLD handler triggers and you reap a moribund child process.

Fortunately, you shouldn't have to do that much any more. That's because whenever possible, Perl now uses the reliable sigaction (2) function from POSIX. If you know you're running on a system that supports sigaction (2), you won't have to reinstall your handlers, and a lot of other things will work out better, too. For example, "slow" system calls (ones that can block, like read, <STDIN>, wait, and accept) will restart automatically now if they get interrupted by a signal. This is generally construed to be a feature.

You check whether you have the more rigorous POSIX-style signal behavior by accessing the Config module, described in Chapter 7, The Standard Perl Library.

use Config;
print "Hurray!\n" if $Config{d_sigaction};

This will tell you whether you have reliable system calls that don't need to be reinstalled, but it won't tell you whether they're restartable. Perl doesn't provide that information in its Config module, but you could check out your system's C signal.h include file directly:

egrep 'S[AV]_(RESTART|INTERRUPT)' /usr/include/*/signal.h

On some older SysV systems, a simple but nonportable hack for avoiding zombies was to set $SIG{CHLD} to 'IGNORE'. This approach does not work on systems with sigaction (2). Instead, the best way to avoid zombies on POSIX systems is to use the REAPER() function above.

You can also use signals to impose time limits on long-running operations. If you're on a UNIX system (or any other system that supports the ALRM signal), you can ask the kernel to send your process an ALRM at some point in the future:

eval { 
    local $SIG{ALRM} = sub { die "alarm clock restart" };
    alarm 10;       # schedule alarm in 10 seconds 
    flock(FH, 2);   # a "write" lock that may block
    alarm 0;        # cancel the alarm
};
if ($@ and $@ !~ /alarm clock restart/) { die }

eval and die provide a convenient mechanism for aborting the flock if it hangs.

For more complex signal handling, see the POSIX module in Chapter 7, The Standard Perl Library. This module provides an object-oriented approach to signals that gives you complete access to low-level system behavior.

Pipes

A pipe is a unidirectional I/O channel that can transfer a stream of bytes from one process to another. They come in both named and nameless varieties. You may be more familiar with nameless pipes, so we'll talk about those first.

Anonymous pipes

Perl's open function opens a pipe instead of a file when you append or prepend a pipe symbol to the second argument to open. This turns the rest of the argument into a command, which will be interpreted as a process (or set of processes) to pipe a stream of data either into or out of. Here's how to start up a child process that you intend to write to:

open SPOOLER, "| cat -v | lpr -h 2>/dev/null"
                or die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER or die "bad spool: $! $?";

This example is actually starting up two processes, the first of which (running cat) we print to directly. The second process (running lpr) then receives the output of the first process. In shell programming this is often called a pipeline. A pipeline can have as many processes in a row as you like.

And here's how to start up a child process that you intend to read from:

open STATUS, "netstat -an 2>&1 |"
                or die "can't fork: $!";
while (<STATUS>) {
    next if /^(tcp|udp)/;
    print;
} 
close STATUS or die "bad netstat: $! $?";

You can open a pipeline for input just as you can for output, but we don't show it in this example.

You might have noticed that you can use backticks to accomplish the same effect as opening a pipe for reading:

print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`;
die "bad netstat" if $?;

While this is true, it's often more efficient to process the file one line or record at a time, because then Perl doesn't have to read the whole thing into memory at once. It also gives you finer control of the whole operation, letting you kill off the child process early if you like.

Be careful to check the return values of both open and close. (If you're writing to a pipe, you should also be prepared to handle the PIPE signal, which is sent to you if the process on the other end dies before you're done sending to it.) The reason you need to check both the open and the close has to do with an idiosyncrasy of UNIX in how piped commands are started up. When you do the open, your process forks a child process that is in charge of executing the command you gave it. The fork (2) system call, if successful, returns immediately within the parent process, and the parent script leaves the open function successfully, even though the child process may not have even run yet. By the time the child process actually tries to run the command, it's already a separately scheduled process. So if it fails to execute the command, it has no easy way to communicate the fact back to the open statement, which may have already exited successfully in the parent. The way the disaster is finally communicated back to the parent is the same way that any other disaster in the child process is communicated back: namely, the exit status of the child process is harvested by the parent process when it eventually does a wait (2) system call. But this happens in the close function, not the open function. And that's why you have to check the return value of your close function. Whew.

Talking to yourself

Another approach to IPC is to make your program talk to itself, in a manner of speaking. Actually, your process talks to a forked copy of itself. It works much like the piped open we talked about in the last section, except that the child process continues executing your script instead of trying to execute some other command.

To represent this to the open function, you use a pseudo-command consisting of a minus. So the second argument to open looks like either "-|" or "|-", depending on whether you want to pipe from yourself or to yourself. The open function returns the child's process ID in the parent process, but 0 in the child process. Another asymmetry is that the filehandle is used only in the parent process. The child's end of the pipe is hooked to either STDIN or STDOUT as appropriate. That is, if you open a pipe to minus, you can write to the filehandle you opened and your kid will find it in his STDIN. If you open a pipe from minus, you can read from the filehandle you opened whatever your kid writes to her STDOUT.

This is useful for safely opening a file when running under an assumed UID or GID, for example:

use English;
my $sleep_count = 0;
do { 
    $pid = open(KID_TO_WRITE, "|-");
    unless (defined $pid) {
        warn "cannot fork: $!";
        die "bailing out" if $sleep_count++ > 6;
        sleep 10;
    } 
} until defined $pid;
if ($pid) {  # parent
    print KID_TO_WRITE @some_data;
    close(KID_TO_WRITE) or warn "kid exited $?";
}
else {       # child
    ($EUID, $EGID) = ($UID, $GID); # suid progs only
    open (FILE, "> /safe/file") 
                    or die "can't open /safe/file: $!";
    while (<STDIN>) {
        print FILE; # child's STDIN is parent's KID
    } 
    exit;  # don't forget this
}

Another common use for this construct is to bypass the shell when you want to open a pipe from a command. You might want to do this for security reasons, because you don't want the shell interpreting any metacharacters in the filenames you're trying to pass to the command. We give an example of this later in the chapter--see "Cleaning Up Your Path".

Note that these operations are full UNIX forks, which means they may not be correctly implemented on alien systems. Additionally, these are not true multi-threading. If you'd like to learn more about threading, see CPAN.

Bidirectional communication

While pipes work reasonably well for unidirectional communication, what about bidirectional communication? The obvious thing you'd like to do doesn't actually work:

open(PROG_FOR_READING_AND_WRITING, "| some program |")  # WRONG!

and if you forget to use the -w switch, then you'll miss out entirely on the diagnostic message:

Can't do bidirectional pipe at myprog line 3.

The open function won't allow this because it's rather error prone unless you know what you're doing, and can easily result in deadlock, which we'll explain later. But if you really want to do it, you can use the standard IPC::Open2 library module to attach two pipes to a subprocess's STDIN and STDOUT. There's also an IPC::Open3 module for tridirectional I/O (allowing you to catch your child's STDERR), but this requires an awkward select loop and doesn't allow you to use normal Perl input operations.

If you look at the source, you'll see that Open2 uses low-level primitives like pipe and exec to create all the connections. While it might have been slightly more efficient to use socketpair, it would have been even less portable. As it is, the Open2 and Open3 modules are unlikely to work anywhere except on a UNIX system, or some other system purporting to be POSIX compliant.

Here's an example using IPC::Open2::open2():

use FileHandle;
use IPC::Open2;
$pid = open2( \*Reader, \*Writer, "cat -u -n" );
Writer->autoflush();     # This is default, actually.
print Writer "stuff\n";
$got = <Reader>;

The problem with this in general is that UNIX buffering is really going to ruin your day. Even though your Writer filehandle is autoflushed, and the process on the other end will get your data in a timely manner, you can't usually do anything to force it to actually give it back to you in a similarly quick fashion. In this particular case we can, since (on some systems) the cat program has a -u option to make it do unbuffered output. But very few UNIX commands are designed to operate well over pipes, so this seldom works unless you yourself wrote the program on the other end of the double-ended pipe.

A partial solution to this is to use the Comm.pl library (not a standard module--see CPAN). It uses pseudo-ttys to make your program behave more reasonably, at least on those machines that force standard output to do line-buffering:

require 'Comm.pl';
$ph = open_proc('cat -n');
for (1..10) {
    print $ph "a line\n";
    print "got back ", scalar <$ph>;
}

This way you don't have to have control over the source code of the program you're using.

Named pipes

A named pipe (often called a FIFO) is an old UNIX mechanism for setting up pipes between unrelated processes. The names in question exist in the filesystem, which is just a funny way to say that you can put a special file in the filesystem that has another process behind it instead of a disk.

To create a named pipe, use the UNIX command mknod (1) or, on some systems, mkfifo (1). These commands may not be in your normal execution path.

# system() return value is backwards, so "and" not "or"
#
$ENV{PATH} .= ":/etc:/usr/etc";
if  (      system('mknod',  $path, 'p') 
       and system('mkfifo', $path) )
{
    die "mk{nod,fifo} $path failed";
}

A FIFO is convenient when you want to connect a process to an unrelated one. When you open a FIFO, the program will block until there's something on the other end.

For example, let's say you'd like to have your .signature file be a named pipe that has a Perl program on the other end. Now every time any program (like a mailer, newsreader, finger program, and so on) tries to read from that file, the reading program will block and your program will supply the new signature. We'll use the pipe-checking file test, -p, to find out whether anyone (or anything) has accidentally removed our FIFO.

chdir; # go home
$FIFO = '.signature';
$ENV{PATH} .= ":/etc:/usr/games";

while (1) {
    unless (-p $FIFO) {
        unlink $FIFO;
        system('mknod', $FIFO, 'p') 
            && die "can't mknod $FIFO: $!";
    }

    # next line blocks until there's a reader
    open (FIFO, "> $FIFO") or die "can't write $FIFO: $!";
    print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
    close FIFO;
    sleep 1;    # to avoid dup sigs
}

If that last comment seems opaque to you, consider how often the fortune program changes its current fortune.

Note that a FIFO in an NFS partition won't transfer data across your network.

System V IPC

Although System V IPC is pretty ancient, it still has some valid uses. But you can't use System V shared memory (or the more modern mmap (2) system call, for that matter) to share a variable among several processes. That's because Perl would reallocate your string when you weren't wanting it to. Instead, Perl uses a read/write notion.

Here's a small example showing shared memory usage:

$IPC_PRIVATE = 0;
$IPC_RMID = 0;
$size = 2000;
$key = shmget($IPC_PRIVATE, $size , 0777 );
die unless defined $key;

$message = "Message #1";
shmwrite($key, $message, 0, 60 ) or die "shmwrite: $!";
shmread($key,$buff,0,60) or die "shmread: $!";

print $buff,"\n";

print "deleting $key\n";
shmctl($key ,$IPC_RMID, 0) or die "shmctl: $!";

Here's an example of a semaphore:

$IPC_KEY = 1234;
$IPC_RMID = 0;
$IPC_CREATE = 0001000;
$key = semget($IPC_KEY, $nsems, 0666 | $IPC_CREATE );
die if !defined($key);
print "$key\n";

Put this code in a separate file so that more than one process can require and run it. Call the file take:

# create a semaphore

$IPC_KEY = 1234;
$key = semget($IPC_KEY, 0, 0 );
die if !defined($key);

$semnum = 0;
$semflag = 0;

# 'take' semaphore
# wait for semaphore to be zero
$semop = 0;
$opstring1 = pack("sss", $semnum, $semop, $semflag);

# Increment the semaphore count
$semop = 1;
$opstring2 = pack("sss", $semnum, $semop,  $semflag);
$opstring = $opstring1 . $opstring2;

semop($key,$opstring) or die "semop: $!";

Put this code in a separate file to be run in more that one process. Call this file give:

# 'give' the semaphore
# run this in the original process and you will see
# that the second process continues

$IPC_KEY = 1234;
$key = semget($IPC_KEY, 0, 0);
die if !defined($key);

$semnum = 0;
$semflag = 0;

# Decrement the semaphore count
$semop = -1;
$opstring = pack("sss", $semnum, $semop, $semflag);

semop($key,$opstring) or die "semop: $!";

The code above is rather low-level and clunky. A better approach would be to use the IPC::SysV module in CPAN.

Sockets

While sockets were invented under UNIX, nowadays you can find them on many other operating systems (though sometimes as an unbundled product). If you don't have sockets on your machine, you're going to have difficulty cooperating with processes on the Internet. With sockets, you can do both virtual circuits (that is, TCP streams) and datagrams (that is, UDP packets). You may be able to do even more, depending on your system.

The Perl function calls for dealing with sockets have the same names as the corresponding system calls in C, but their arguments tend to differ for two reasons: first, Perl filehandles work differently from C file descriptors, and second, Perl already knows the length of its strings, so you don't need to pass that information. See Chapter 3, Functions for details on each call.

Most of these routines quietly but politely return the undefined value when they fail, instead of causing your program to die right then and there due to an uncaught exception. (Actually, some of the new Socket module conversion functions call croak() on bad arguments.) It is therefore essential that you check the return values of these functions. Always begin your socket programs this way for optimal success (and don't forget to add -T taint checking switch to the shebang line for servers):

#!/usr/bin/perl -w
require 5.002;
use strict;
use sigtrap;
use Socket;

All the socket routines create system-specific portability problems. As noted elsewhere, Perl is at the mercy of your C libraries for much of its system behavior. It's probably safest to assume broken System V semantics for signals and to stick with simple TCP and UDP socket operations; for example, don't try to pass open file descriptors over a local UDP datagram socket if you want your code to stand a chance of being portable. (Yes, you can really do that on some machines--see BSD in the Glossary.)

One of the major problems with ancient socket code in Perl was that it tended to use hard-coded values for some of the constants, which severely hurt portability. If you ever see code that does anything like explicitly setting $AF_INET = 2, you know you're in for big trouble. An immeasurably superior approach is to use the Socket module, which more reliably grants access to the various constants and functions you'll need.

Below we will present several sample clients and servers without a great deal of explanation, since it would mostly duplicate the descriptions we've already provided in Chapter 3, Functions. Besides those descriptions, you should also check out CPAN. Section 5 of the CPAN modules file is devoted to "Networking, Device Control (modems), and Interprocess Communication", and refers you to numerous unbundled modules having to do with networking, Chat and Expect operations, CGI programming, DCE, FTP, IPC, NNTP, Proxy, Ptty, RPC, SNMP, SMTP, Telnet, Threads, and ToolTalk--just to name a few.

Internet TCP clients and servers

Use Internet-domain sockets when you want to do client-server communication between different machines.

Here's a sample TCP client using Internet-domain sockets:

#!/usr/bin/perl -w
require 5.002;
use strict;
use Socket;
my ($remote,$port, $iaddr, $paddr, $proto, $line);

$remote  = shift || 'localhost';
$port    = shift || 2345;  # random port
if ($port =~ /\D/) { $port = getservbyname($port, 'tcp') }
die "No port" unless $port;
$iaddr   = inet_aton($remote)              or die "no host: $remote";
$paddr   = sockaddr_in($port, $iaddr);

$proto   = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto) or die "socket: $!";
connect(SOCK, $paddr)                      or die "connect: $!";
while ($line = <SOCK>) {
    print $line;
}

close (SOCK)                               or die "close: $!";
exit;

And here's a corresponding server to go along with it. The client didn't need to bind an address, but the server does. However, we'll specify the address as INADDR_ANY so that the kernel can choose the appropriate interface on multi-homed hosts. If you want to sit on a particular interface (like the external side of a gateway or firewall machine), you should fill this in with your real address instead.

#!/usr/bin/perl -Tw
require 5.002;
use strict;
BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
use Socket;
use Carp;

sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }

my $port = shift || 2345;
my $proto = getprotobyname('tcp');
socket(Server, PF_INET, SOCK_STREAM, $proto) or die "socket: $!";
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
                                             or die "setsockopt: $!";
bind(Server, sockaddr_in($port, INADDR_ANY)) or die "bind: $!";
listen(Server,SOMAXCONN)                     or die "listen: $!";

logmsg "server started on port $port";

my $paddr;

$SIG{CHLD} = \&REAPER;

for ( ; $paddr = accept(Client,Server); close Client) {
    my($port,$iaddr) = sockaddr_in($paddr);
    my $name = gethostbyaddr($iaddr,AF_INET);

    logmsg "connection from $name [", 
            inet_ntoa($iaddr), "] at port $port";

    print Client "Hello there, $name, it's now ", 
                    scalar localtime, "\n";
}

And here's a multi-threaded version. It's multi-threaded in the sense that, like most typical servers, it spawns (forks) a slave server to handle the client request so that the master server can quickly go back to service the next client.

#!/usr/bin/perl -Tw
require 5.002;
use strict;
BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
use Socket;
use Carp;
use FileHandle;
sub spawn;  # forward declaration
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" } 
my $port = shift || 2345;
my $proto = getprotobyname('tcp');
socket(Server, PF_INET, SOCK_STREAM, $proto) or die "socket: $!";
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
                                             or die "setsockopt: $!";
bind(Server, sockaddr_in($port, INADDR_ANY)) or die "bind: $!";
listen(Server,SOMAXCONN)                     or die "listen: $!";
logmsg "server started on port $port";
my $waitedpid = 0;
my $paddr;

sub REAPER { 
    $waitedpid = wait;
    $SIG{CHLD} = \&REAPER;  # if you don't have sigaction(2)
    logmsg "reaped $waitedpid" . ($? ? " with exit $?" : "");
}
$SIG{CHLD} = \&REAPER;
for ( ; $paddr = accept(Client,Server); close Client) {
    my($port,$iaddr) = sockaddr_in($paddr);
    my $name = gethostbyaddr($iaddr,AF_INET);
    logmsg "connection from $name [", 
            inet_ntoa($iaddr), "] at port $port";
    spawn sub { 
        print "Hello there, $name, it's now ", scalar localtime, "\n";
        exec '/usr/games/fortune' 
            or confess "can't exec fortune: $!";
    };
} 
sub spawn {
    my $coderef = shift;
    unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') { 
        confess "usage: spawn CODEREF";
    }
    my $pid;
    if (!defined($pid = fork)) {
        logmsg "cannot fork: $!";
        return;
    } elsif ($pid) {
        logmsg "begat $pid";
        return; # i'm the parent
    }
    # else i'm the child -- go spawn
    open(STDIN,  "<&Client")    or die "can't dup client to stdin";
    open(STDOUT, ">&Client")    or die "can't dup client to stdout";
    STDOUT->autoflush();
    exit &$coderef();
}

As mentioned, this server takes the trouble to clone off a child version via fork for each incoming request. That way it can handle many requests at once, as long as you can create more processes. (You might want to limit this.) Even if you don't fork, the listen will allow up to SOMAXCONN (usually five or more) pending connections. Each connection uses up some resources, although not as much as a process. Forking servers also have to be particularly careful about cleaning up their dead children (called zombies in UNIX), because otherwise they'd quickly fill up your process table. The REAPER code above will take care of that for you.

If you're running on a system without restartable system calls (or if you want to be really careful in case you might someday run on such a system), you'll have to write a more elaborate for loop. That's because the act of collecting the zombie child process may cause the accept to fail and return the undefined value, making your loop fail prematurely. Here's a work-around:

for ( $waitedpid = 0; 
      ($paddr = accept(Client,Server)) || $waitedpid; 
      $waitedpid = 0, close Client) 
{
    next if $waitedpid and not $paddr;   # or check $! == EINTR
    # the rest is the same...

We suggest that you use the -T switch to enable taint checking (see "Cooperating with Strangers" and "Cooperating with Other Languages" later in this chapter) even if you aren't running setuid or setgid. This is always a good idea for servers and other programs (like CGI scripts) that run on behalf of someone else, because it lessens the chances that people from the outside will be able to compromise your system.

Let's look at another TCP client. This one connects to the TCP "time" service on a number of different machines and shows how far their clocks differ from the system on which the client is being run:

#!/usr/bin/perl -w
require 5.002;
use strict;
use Socket;

my $SECS_of_70_YEARS = 2208988800;
sub ctime { scalar localtime(shift) }

my $iaddr = gethostbyname('localhost'); 
my $proto = getprotobyname('tcp');   
my $port = getservbyname('time', 'tcp');  
my $paddr = sockaddr_in(0, $iaddr);
my($host);

$| = 1;
printf "%-24s %8s %s\n",  "localhost", 0, ctime(time());

foreach $host (@ARGV) {
    printf "%-24s ", $host;
    my $hisiaddr = inet_aton($host)     or die "unknown host";
    my $hispaddr = sockaddr_in($port, $hisiaddr);
    socket(SOCKET, PF_INET, SOCK_STREAM, $proto)
                                        or die "socket: $!";
    connect(SOCKET, $hispaddr)          or die "bind: $!";
    my $rtime = '    ';
    read(SOCKET, $rtime, 4);
    close(SOCKET);
    my $histime = unpack("N", $rtime) - $SECS_of_70_YEARS ;
    printf "%8d %s\n", $histime - time, ctime($histime);
}

UNIX-domain clients and servers

That's all fine for Internet-domain clients and servers, but what about local communications? While you can just pretend that your local machine is remote, sometimes you don't want to. UNIX-domain sockets are local to the current host, and are often used internally to implement pipes. They tend to be a little more efficient than Internet-domain sockets. Unlike Internet-domain sockets, UNIX domain sockets can show up in the file system with an ls (1) listing.

$ ls -l /dev/log
srw-rw-rw-  1 root            0 Oct 31 07:23 /dev/log

You can test for these with Perl's -S file test:

unless ( -S '/dev/log' ) {
    die "something's wicked with the print system";
}

Here's a sample UNIX-domain client:

#!/usr/bin/perl -w
require 5.002;
use Socket;
use strict;
my ($rendezvous, $line);
$rendezvous = shift || '/tmp/catsock';
socket(SOCK, PF_UNIX, SOCK_STREAM, 0)     or die "socket: $!";
connect(SOCK, sockaddr_un($rendezvous))   or die "connect: $!";
while ($line = <SOCK>) {
    print $line;
} 
exit;

And here's a corresponding server.

#!/usr/bin/perl -Tw
require 5.002;
use strict;
use Socket;
use Carp;
BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
my $NAME = '/tmp/catsock';
my $uaddr = sockaddr_un($NAME);
my $proto = getprotobyname('tcp');
socket(Server,PF_UNIX,SOCK_STREAM,0)      or die "socket: $!";
unlink($NAME);
bind  (Server, $uaddr)                    or die "bind: $!";
listen(Server,SOMAXCONN)                  or die "listen: $!";
logmsg "server started on $NAME";
$SIG{CHLD} = \&REAPER;
for ( ; $paddr = accept(Client,Server); close Client) {
    logmsg "connection on $NAME";
    spawn sub { 
        print "Hello there, it's now ", scalar localtime, "\n";
        exec '/usr/games/fortune';
        die "can't exec fortune: $!";
    };
}

As you see, it's remarkably similar to the Internet-domain TCP server, so much so, in fact, that we've omitted several duplicate functions--spawn(), logmsg(), ctime(), and REAPER()--which are exactly the same as in the other server.

So why would you ever want to use a UNIX domain socket instead of a FIFO? Because a FIFO doesn't give you sessions. You can't tell one process's data from another's. With socket programming, you get a separate session for each client--that's why accept takes two arguments.

For example, let's say that you have a long-running database server daemon that you want folks from the World Wide Web to be able to access, but only if they go through a CGI interface. You'd have a small, simple CGI program that does whatever checks and logging you feel like, and then acts as a UNIX-domain client and proxies the request to your private server.

UDP: message passing

Another kind of client-server setup is one that uses not connections, but messages, or datagrams. UDP communications involve much lower overhead but also provide less reliability, since there are no promises that messages will arrive at all, let alone in order and unmangled. Still, UDP offers some advantages over TCP, including being able to broadcast or multicast to a whole bunch of destination hosts at once (usually on your local subnet). If you find yourself overly concerned about reliability and start building checks into your message system, then you probably should just use TCP to start with.

Here's a UDP program similar to the sample Internet TCP client given above. However, instead of checking one host at a time, the UDP version will check many of them asynchronously by simulating a multicast and then using select to do a timed-out wait for I/O. To do something similar with TCP, you'd have to use a different socket handle for each host.

#!/usr/bin/perl -w
use strict;
require 5.002;
use Socket;
use Sys::Hostname;

my ( $count, $hisiaddr, $hispaddr, $histime, 
     $host, $iaddr, $paddr, $port, $proto, 
     $rin, $rout, $rtime, $SECS_of_70_YEARS);

$SECS_of_70_YEARS      = 2208988800;

$iaddr = gethostbyname(hostname());
$proto = getprotobyname('udp');
$port = getservbyname('time', 'udp');
$paddr = sockaddr_in(0, $iaddr); # 0 means let kernel pick

socket(SOCKET, PF_INET, SOCK_DGRAM, $proto)   or die "socket: $!";
bind(SOCKET, $paddr)                          or die "bind: $!";

$| = 1;
printf "%-12s %8s %s\n",  "localhost", 0, scalar localtime time;
$count = 0;
for $host (@ARGV) {
    $count++;
    $hisiaddr = inet_aton($host)              or die "unknown host";
    $hispaddr = sockaddr_in($port, $hisiaddr);
    defined(send(SOCKET, 0, 0, $hispaddr))    or die "send $host: $!";
}

$rin = "";
vec($rin, fileno(SOCKET), 1) = 1;

# timeout after 10.0 seconds
while ($count && select($rout = $rin, undef, undef, 10.0)) {
    $rtime = "";
    ($hispaddr = recv(SOCKET, $rtime, 4, 0))  or die "recv: $!";
    ($port, $hisiaddr) = sockaddr_in($hispaddr);
    $host = gethostbyaddr($hisiaddr, AF_INET);
    $histime = unpack("N", $rtime) - $SECS_of_70_YEARS ;
    printf "%-12s ", $host;
    printf "%8d %s\n", $histime - time, scalar localtime($histime);
    $count--;
}


Previous Home Next
Cooperating with Command Interpreters Book Index Cooperating with Strangers

HTML: The Definitive Guide CGI Programming JavaScript: The Definitive Guide Programming Perl WebMaster in a Nutshell