cURL
Haxx ad
libcurl

curl's project page on SourceForge.net

Sponsors:
Haxx

cURL > Mailing List > Monthly Index > Single Mail

curl-tracker mailing list Archives

[ curl-Bugs-1403932 ] Race in test suite kills many processes

From: SourceForge.net <noreply_at_sourceforge.net>
Date: Fri, 13 Jan 2006 02:37:42 -0800

Bugs item #1403932, was opened at 2006-01-12 14:45
Message generated for change (Comment added) made by adsbenham
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1403932&group_id=976

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: compile or build problem
Group: crash
Status: Open
Resolution: None
Priority: 6
Submitted By: Andrew Benham (adsbenham)
Assigned to: Daniel Stenberg (bagder)
Summary: Race in test suite kills many processes

Initial Comment:
There's a nasty race condition in runtests.pl
For some reason, our Solaris 10 x86 build machine gets
hit by the race every time: instead of the test HTTP
server process being killed the code actually kills
the test HTTP server, runtests.pl, and our build system.

In the source for curl 7.15.1, the startnew() function
in runtests.pl reads:

    ....
    my $count=5;
    while($count--) {
    if(-f $pidfile) {
            open(PID, "<$pidfile");
            $pid2 = 0 + <PID>;
            close(PID);
            if(kill(0, $pid2)) {
                # make sure this pid is alive, as
otherwise it is just likely
                # to be the _previous_ pidfile or similar!
                last;
            }
        }
        sleep(1);
    }

    return ($child, $pid2);

The intention being to return the PID of the server
just started.
Later the stopserver() function does a 'kill (9, pid)'
to stop the server.

However there's a race in startnew(). What if $pidfile
exists but hasn't been written to yet ? server/sws
writes the pidfile, with
     fopen()
     fprintf()
     fclose()
If the scheduler runs between the fopen() and the
fprintf(), $pidfile exists but is an empty file.
The startnew() code reads from the empty file, adds 0
to undef (giving 0), and returns this as the pid.

Later, in stopserver():

sub stopserver {
    my ($pid) = @_;

    if($pid <= 0) {
        return; # this is not a good pid
    }

    if($pid =~ / /) {
        # if it contains space, it might be more than
one pid
        my @pids = split(" ", $pid);
        for (@pids) {
            kill (9, $_); # die!
        }
    }

    my $res = kill (9, $pid); # die!

    if($verbose) {
        logmsg "RUN: Test server pid $pid signalled to
die\n";
    }
}

Now, if $pid is "1234 0" for example, we'll do a valid
kill(9, 1234), and then a'kill (9,0)' - which (on
Unix systems) sends SIGKILL to all processes in the
current process group! It doesn't kill the server, it
kills the runtests.pl process, whatever process called
that, whatever called that, etc ......

This is bad.

The fix is to startnew(), lines marked '*':

    ....
    my $count=5;
    while($count--) {
        if(-f $pidfile) {
            open(PID, "<$pidfile");
* $pid2 = <PID>;
            close(PID);
* if($pid2 && kill(0, $pid2)) {
                # make sure this pid is alive, as
otherwise it is just likely
                # to be the _previous_ pidfile or similar!
                last;
            }
        }
        sleep(1);
    }
    ....

So we only accept the contents of $pidfile if it is
true (i.e. not blank, and not zero). This gets around
the race, returns the true pid of the server process,
and we don't end up killing the process group.

----------------------------------------------------------------------

>Comment By: Andrew Benham (adsbenham)
Date: 2006-01-13 10:37

Message:
Logged In: YES
user_id=1029472

Well, my patch also retries the reading in case it found no
pid in the existing file.
"if ($pid2 && kill(0, $pid2))"
won't call the 'kill' if $pid2 is false (i.e. is undef, the
empty string, or zero) because of the short-circuit
operation of '&&'. So the code will drop out of the
"if ($pidfile)" block, and hit the sleep(1) before retrying
the "while ($count--)" block.

Your patch means there have to be two "sleep(1)" statements,
one in the "while ($count--)" block; and the other inside
the "if(-f $pidfile) {" block - which, in my opinion, is
slightly clumsy.

Granted your patch makes it easy to attach a comment, so I
will offer (the underscores added because something in the
HTTP POST is stripping leading white space):

if($pid2 && kill(0, $pid2)) {
____# If $pid2 is valid, then
____# make sure this pid is alive, as otherwise it is just
likely
____# to be the _previous_ pidfile or similar!
____last;
}

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-01-12 22:30

Message:
Logged In: YES
user_id=1110

Woa. Very nice catch. Thanks for the explanation and fix. I
could even think of a (in my opinion) slightly nicer patch
that retries the reading in case it found no pid in the
existing file. Don't you agree?

diff -u -r1.198 runtests.pl
--- runtests.pl 8 Dec 2005 11:29:47 -0000 1.198
+++ runtests.pl 12 Jan 2006 22:28:55 -0000
@@ -249,6 +249,14 @@
             open(PID, "<$pidfile");
             $pid2 = 0 + <PID>;
             close(PID);
+
+ # if the file was created but there's no pid in
there (yet) we
+ # loop and hope for it to be there when we try
again!
+ if(!$pid2) {
+ sleep(1);
+ next;
+ }
+
             if(kill(0, $pid2)) {
                 # make sure this pid is alive, as otherwise
it is just likely
                 # to be the _previous_ pidfile or similar!

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1403932&group_id=976
Received on 2006-01-13

These mail archives are generated by hypermail.

donate! Page updated November 12, 2010.
web site info

File upload with ASP.NET