2004-01-20 18:36:51

by Manfred Spraul

[permalink] [raw]
Subject: Re: Fw: Re: Busy-wait delay in qmail 1.03 after upgrading to Linux 2.6

Haakon Riiser <[email protected]> wrote:

>What Qmail did was basically to use a named pipe as a trigger,
>where one program select()s on the FIFO file descriptor, waiting
>for another program to write() the FIFO. Once select() returns,
>the listener close()s the FIFO (the data was not important,
>it was only used as a signal), does some work, then re-open()s
>the FIFO file, and ends up in the same select() waiting for the
>whole thing to happen again.
>
What drains the fifo?
As far as I can see the fifo is filled by the write syscalls, and
drained by chance if both the reader and the writer have closed their
handles.

> for (;;) {
> while ((fd = open("test.fifo", O_WRONLY | O_NONBLOCK)) < 0)
> ;
> gettimeofday(&tv1, NULL);
> if (write(fd, &fd, 1) == 1) {
>
xxx now a thread switch

> gettimeofday(&tv2, NULL);
> fprintf(stderr, "dt = %f ms\n",
> (tv2.tv_sec - tv1.tv_sec) * 1000.0 +
> (tv2.tv_usec - tv1.tv_usec) / 1000.0);
> }
> if (close(fd) < 0) {
> perror("close");
>
>
If a thread switch happens in the indicated line, then the reader will
loop, until it's timeslice expires - one full timeslice delay between
the two gettimeofday() calls.

Running the reader with nice -20 resulted in delays of 200-1000 ms for
each write call, nice 20 resulted in no slow calls. In both cases 100%
cpu load.

--
Manfred


2004-01-20 19:23:20

by Haakon Riiser

[permalink] [raw]
Subject: Re: Busy-wait delay in qmail 1.03 after upgrading to Linux 2.6

[Manfred Spraul]

> What drains the fifo?
> As far as I can see the fifo is filled by the write syscalls, and
> drained by chance if both the reader and the writer have closed their
> handles.

That's correct, and that was my intention since this is apparently
how it works in Qmail. Every time the listener's select() returns,
the FIFO is immediately close()d and the first thing the writer
does after writing it's single trigger byte is also to close() its
end of the FIFO.

>> for (;;) {
>> while ((fd = open("test.fifo", O_WRONLY | O_NONBLOCK)) < 0)
>> ;
>> gettimeofday(&tv1, NULL);
>> if (write(fd, &fd, 1) == 1) {
>>
> xxx now a thread switch
>
>> gettimeofday(&tv2, NULL);
>> fprintf(stderr, "dt = %f ms\n",
>> (tv2.tv_sec - tv1.tv_sec) * 1000.0 +
>> (tv2.tv_usec - tv1.tv_usec) / 1000.0);
>> }
>> if (close(fd) < 0) {
>> perror("close");
>>
>>
> If a thread switch happens in the indicated line, then the reader will
> loop, until it's timeslice expires - one full timeslice delay between
> the two gettimeofday() calls.

Exactly. But on 2.6, the delay between the two gettimeofday()
calls are sometimes up to 300 ms, which is 300 timeslices in
2.6, right? I have never observed more than _one_ timeslice
delay in 2.4.

> Running the reader with nice -20 resulted in delays of 200-1000 ms for
> each write call, nice 20 resulted in no slow calls. In both cases 100%
> cpu load.

But when the listener and the writer have the same nice value,
how is it possible to have a delay of 300 ms? Both the writer
and the listener are ready to run, so wouldn't a 300 ms delay
mean that the listener was given the CPU 300 times in a row?

--
Haakon

2004-01-20 19:46:13

by Mike Fedyk

[permalink] [raw]
Subject: Re: Busy-wait delay in qmail 1.03 after upgrading to Linux 2.6

On Tue, Jan 20, 2004 at 08:22:16PM +0100, Haakon Riiser wrote:
> [Manfred Spraul]
> > If a thread switch happens in the indicated line, then the reader will
> > loop, until it's timeslice expires - one full timeslice delay between
> > the two gettimeofday() calls.
>
> Exactly. But on 2.6, the delay between the two gettimeofday()
> calls are sometimes up to 300 ms, which is 300 timeslices in
> 2.6, right? I have never observed more than _one_ timeslice
> delay in 2.4.
>
> > Running the reader with nice -20 resulted in delays of 200-1000 ms for
> > each write call, nice 20 resulted in no slow calls. In both cases 100%
> > cpu load.
>
> But when the listener and the writer have the same nice value,
> how is it possible to have a delay of 300 ms? Both the writer
> and the listener are ready to run, so wouldn't a 300 ms delay
> mean that the listener was given the CPU 300 times in a row?

The scheduler can do this for you with its priority modification heuristics.

Try running a test with Nick's scheduler, and see how much your timings
change.

Also, there is a scheduling patch in -mm that's not in 2.6.1 that might
affect you also.