LinuxLists.cc - 2.6.23-rc6: hanging ext3 dbench tests

2007-09-11 12:42:55

Subject: 2.6.23-rc6: hanging ext3 dbench tests

I have a couple of failed test runs against 2.6.23-rc6 where the
job timed out while running dbench over ext3. Both on powerpc,
though both significantly different hardware setups. A failed
run like this implies that the machine was still responsive to
other processes but the dbench was making no progress. There is
no console diagnostics during the failure.

beavis was lost during a plain ext3 dbench run, having just
successfully run a complete ext2 run. elm3b19 was lost during an
ext3 "data=writeback" dbench run, having already completed an plain
ext2, and ext3 runs.

A quick poke at the dbench logs on the second machine shows this
for the working ext3 dbench run:

4 clients started
4 35288 814.49 MB/sec
0 62477 822.99 MB/sec
Throughput 822.954 MB/sec 4 procs

Whereas the hanging run shows the following continuing until the
machine is reset, which confirms that the machine as a whole was
still with us:

4 clients started
4 36479 824.92 MB/sec
1 46857 519.98 MB/sec
1 46857 346.65 MB/sec
1 46857 259.99 MB/sec
1 46857 207.99 MB/sec
1 46857 173.32 MB/sec
1 46857 148.56 MB/sec
1 46857 129.99 MB/sec
1 46857 115.55 MB/sec
1 46857 103.99 MB/sec
1 46857 94.54 MB/sec
1 46857 86.66 MB/sec
1 46857 80.00 MB/sec
[...]

The first machine is very similar:

4 clients started
4 18468 445.29 MB/sec
4 41945 469.36 MB/sec
1 46857 346.68 MB/sec
1 46857 260.00 MB/sec
1 46857 208.00 MB/sec
[...]

Not sure if there is any significance to the 46857. Though it feels
like we may be at the end of the run when it fails.

I will try and reproduce this on one of the machines and see if I
can get any further info.

-apw

2007-09-11 16:55:46

by Jan Kara

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

> I have a couple of failed test runs against 2.6.23-rc6 where the
> job timed out while running dbench over ext3. Both on powerpc,
> though both significantly different hardware setups. A failed
> run like this implies that the machine was still responsive to
> other processes but the dbench was making no progress. There is
> no console diagnostics during the failure.
>
> beavis was lost during a plain ext3 dbench run, having just
> successfully run a complete ext2 run. elm3b19 was lost during an
> ext3 "data=writeback" dbench run, having already completed an plain
> ext2, and ext3 runs.
OK, thanks for report.

> A quick poke at the dbench logs on the second machine shows this
> for the working ext3 dbench run:
>
> 4 clients started
> 4 35288 814.49 MB/sec
> 0 62477 822.99 MB/sec
> Throughput 822.954 MB/sec 4 procs
>
> Whereas the hanging run shows the following continuing until the
> machine is reset, which confirms that the machine as a whole was
> still with us:
>
> 4 clients started
> 4 36479 824.92 MB/sec
> 1 46857 519.98 MB/sec
> 1 46857 346.65 MB/sec
> 1 46857 259.99 MB/sec
> 1 46857 207.99 MB/sec
> 1 46857 173.32 MB/sec
> 1 46857 148.56 MB/sec
> 1 46857 129.99 MB/sec
> 1 46857 115.55 MB/sec
> 1 46857 103.99 MB/sec
> 1 46857 94.54 MB/sec
> 1 46857 86.66 MB/sec
> 1 46857 80.00 MB/sec
> [...]
So the process doing IO is hung. Could you dump stack of all the tasks
(Alt-Sysrq-t) and send the output? We've probably deadlocked
somewhere...

> The first machine is very similar:
>
> 4 clients started
> 4 18468 445.29 MB/sec
> 4 41945 469.36 MB/sec
> 1 46857 346.68 MB/sec
> 1 46857 260.00 MB/sec
> 1 46857 208.00 MB/sec
> [...]
>
> Not sure if there is any significance to the 46857. Though it feels
> like we may be at the end of the run when it fails.
>
> I will try and reproduce this on one of the machines and see if I
> can get any further info.

Honza

2007-09-11 16:59:53

by Linus Torvalds

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Tue, 11 Sep 2007, Andy Whitcroft wrote:
>
> I have a couple of failed test runs against 2.6.23-rc6 where the
> job timed out while running dbench over ext3. Both on powerpc,
> though both significantly different hardware setups. A failed
> run like this implies that the machine was still responsive to
> other processes but the dbench was making no progress. There is
> no console diagnostics during the failure.

Since the machine seems to be otherwise alive, can you do a sysrq-W (which
is most easily done by just doing a

echo w > /proc/sysrq-trigger

and you don't actually need any console access or anything like that).

That should give you all the blocked process traces, and if it's a
deadlock on some semaphore or other, it should all stand out quite nicely.

In fact, things like the above are probably worth scripting for any
automated testing - if you auto-fail after some time, please make the
failure case do that sysrq-W by default.

(The other sysrq things can be useful too - "T" shows the same as "W",
except for _all_ tasks, which is often so verbose that it hides the
problem, but is sometimes the right thing to do).

Linus

2007-09-11 17:31:33

by Andy Whitcroft

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

Annoyingly this seems to be intermittent, and I have not managed to get
a machine into this state again yet. Will keep trying.

-apw

2007-09-14 09:49:37

by Andy Whitcroft

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Tue, Sep 11, 2007 at 06:30:49PM +0100, Andy Whitcroft wrote:
> Annoyingly this seems to be intermittent, and I have not managed to get
> a machine into this state again yet. Will keep trying.

Ok, I have been completly unsuccessful in reproducing this. Dispite
having two distinct machines showing this behaviour. I have neither
been able to reproduce it with those machine on 2.6.23-rc6 nor has any
of the testing of any of the -git releases which follow thrown this
error. I have run about 10 repeats of the jobs which failed too and
none of those have thrown the same error.

It is pretty clear from the dbench output that the problem is/was real,
that its not some artifact of the test harness. I am a loss as to how to
get this to trigger again.

I guess I will keep monitoring the ongoing tests for new instances. I
will also look to getting the sysrq-* stuff triggered automatically on
job timeout as that seems like a sane plan in all cases.

Frustrated.

-apw

2007-09-19 18:16:20

by Andy Whitcroft

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Fri, Sep 14, 2007 at 10:49:05AM +0100, Andy Whitcroft wrote:
> On Tue, Sep 11, 2007 at 06:30:49PM +0100, Andy Whitcroft wrote:
> > Annoyingly this seems to be intermittent, and I have not managed to get
> > a machine into this state again yet. Will keep trying.
>
> Ok, I have been completly unsuccessful in reproducing this. Dispite
> having two distinct machines showing this behaviour. I have neither
> been able to reproduce it with those machine on 2.6.23-rc6 nor has any
> of the testing of any of the -git releases which follow thrown this
> error. I have run about 10 repeats of the jobs which failed too and
> none of those have thrown the same error.
>
> It is pretty clear from the dbench output that the problem is/was real,
> that its not some artifact of the test harness. I am a loss as to how to
> get this to trigger again.
>
> I guess I will keep monitoring the ongoing tests for new instances. I
> will also look to getting the sysrq-* stuff triggered automatically on
> job timeout as that seems like a sane plan in all cases.
>
> Frustrated.

I have since had a single occurance of a hang on 2.6.23-rc6-mm1. As the
base is different I cannot for sure say its the same problem. In this
new event we had a mkfs hung in a 'D' wait:

=======================
mkfs.ext2 D c10220f4 0 6233 6222
c344fc80 00000082 00000286 c10220f4 c344fc90 002ed099 c2963340 c2b9f640
c142bce0 c2b9f640 c344fc90 002ed099 c344fcfc c344fcc0 c1219563 c1109bf2
c344fcc4 c186e4d4 c186e4d4 002ed099 c1022612 c2b9f640 c186e000 c104000c
Call Trace:
[<c10220f4>] lock_timer_base+0x19/0x35
[<c1219563>] schedule_timeout+0x70/0x8d
[<c1109bf2>] prop_fraction_single+0x37/0x5d
[<c1022612>] process_timeout+0x0/0x5
[<c104000c>] task_dirty_limit+0x3a/0xb5
[<c12194da>] io_schedule_timeout+0x1e/0x28
[<c10454b4>] congestion_wait+0x62/0x7a
[<c102b021>] autoremove_wake_function+0x0/0x33
[<c10402af>] get_dirty_limits+0x16a/0x172
[<c102b021>] autoremove_wake_function+0x0/0x33
[<c104040b>] balance_dirty_pages+0x154/0x1be
[<c103bda3>] generic_perform_write+0x168/0x18a
[<c103be38>] generic_file_buffered_write+0x73/0x107
[<c103c346>] __generic_file_aio_write_nolock+0x47a/0x4a5
[<c11b0fef>] do_sock_write+0x92/0x99
[<c11b1048>] sock_aio_write+0x52/0x5e
[<c103c3b9>] generic_file_aio_write_nolock+0x48/0x9b
[<c105d2d6>] do_sync_write+0xbf/0xfc
[<c102b021>] autoremove_wake_function+0x0/0x33
[<c1010311>] do_page_fault+0x2cc/0x739
[<c105d3a0>] vfs_write+0x8d/0x108
[<c105d4c3>] sys_write+0x41/0x67
[<c100260a>] syscall_call+0x7/0xb
=======================

-apw

2007-09-24 17:59:50

by Badari Pulavarty

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

Hi Andy,

I managed to reproduce the dbench problem. (not sure if its the same
thing or not - but symptoms are same). My problem has nothing to do
with ext3. I can produce it on ext2, jfs also.

Whats happening on my machine is ..

dbench forks of 4 children and sends them a signal to start the work.
3 out of 4 children gets the signal and does the work. One of the child
never gets the signal so, it waits forever in pause(). So, parent waits
for a longtime to kill it.

BTW, I was trying to find out when this problem started showing up.
So far, I managed to track it to 2.6.23-rc4. (2.6.23-rc3 doesn't seem
to have this problem). I am going to do bi-sect and find out which
patch caused this.

I am using dbench-2.0 which consistently reproduces the problem on
my x86-64 box. Did you find anything new with your setup ?

Thanks,
Badari

2007-09-24 19:14:45

by Andrew Morton

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Mon, 24 Sep 2007 11:01:58 -0700
Badari Pulavarty <[email protected]> wrote:

> Hi Andy,
>
> I managed to reproduce the dbench problem. (not sure if its the same
> thing or not - but symptoms are same). My problem has nothing to do
> with ext3. I can produce it on ext2, jfs also.
>
> Whats happening on my machine is ..
>
> dbench forks of 4 children and sends them a signal to start the work.
> 3 out of 4 children gets the signal and does the work. One of the child
> never gets the signal so, it waits forever in pause(). So, parent waits
> for a longtime to kill it.
>
> BTW, I was trying to find out when this problem started showing up.
> So far, I managed to track it to 2.6.23-rc4. (2.6.23-rc3 doesn't seem
> to have this problem). I am going to do bi-sect and find out which
> patch caused this.
>
> I am using dbench-2.0 which consistently reproduces the problem on
> my x86-64 box. Did you find anything new with your setup ?
>

Thanks, Badari. That sounds like a bit of a showstopper for 2.6.23.
Michal, do you have this regression in the dirt file?

2007-09-24 20:06:12

by Linus Torvalds

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Mon, 24 Sep 2007, Badari Pulavarty wrote:
>
> Whats happening on my machine is ..
>
> dbench forks of 4 children and sends them a signal to start the work.
> 3 out of 4 children gets the signal and does the work. One of the child
> never gets the signal so, it waits forever in pause(). So, parent waits
> for a longtime to kill it.

Since this *seems* to have nothing to do with the filesystem, and since it
*seems* to have been introduced between -rc3 and -rc4, I did

gitk v2.6.23-rc3..v2.6.23-rc4 -- kernel/

to see what has changed. One of the commits was signal-related, and that
one doesn't look like it could possibly matter.

The rest were scheduler-related, which doesn't surprise me. In fact, even
before I looked, my reaction to your bug report was "That sounds like an
application race condition".

Applications shouldn't use "pause()" for waiting for a signal. It's a
fundamentally racy interface - the signal could have happened just
*before* calling pause. So it's almost always a bug to use pause(), and
any users should be fixed to use "sigsuspend()" instead, which can
atomically (and correctly) pause for a signal while the process has masked
it outside of the system call.

Now, I took a look at the dbench sources, and I have to say that the race
looks *very* unlikely (there's quite a small window in which it does

children[i].status = getpid();
** race window here **
pause();

and it would require *just* the right timing so that the parent doesn't
end up doing the "sleep(1)" (which would make the window even less likely
to be hit), but there does seem to be a race condition there. And it
*could* be that you just happen to hit it on your hw setup.

So before you do anything else, does this patch (TOTALLY UNTESTED! DONE
ENTIRELY LOOKING AT THE SOURCE! IT MAY RAPE ALL YOUR PETS, AND CALL YOU
BAD NAMES!) make any difference?

(patch against unmodified dbench-2.0)

Linus

---
diff --git a/dbench.c b/dbench.c
index ccf5624..4be5712 100644
--- a/dbench.c
+++ b/dbench.c
@@ -91,10 +91,15 @@ static double create_procs(int nprocs, void (*fn)(struct child_struct * ))

for (i=0;i<nprocs;i++) {
if (fork() == 0) {
+ sigset_t old, blocked;
+
+ sigemptyset(&blocked);
+ sigaddset(&blocked, SIGCONT);
+ sigprocmask(SIG_BLOCK, &blocked, &old);
setbuffer(stdout, NULL, 0);
nb_setup(&children[i]);
children[i].status = getpid();
- pause();
+ sigsuspend(&old);
fn(&children[i]);
_exit(0);
}

2007-09-24 21:18:23

by Badari Pulavarty

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Mon, 2007-09-24 at 13:04 -0700, Linus Torvalds wrote:
>
> On Mon, 24 Sep 2007, Badari Pulavarty wrote:
> >
> > Whats happening on my machine is ..
> >
> > dbench forks of 4 children and sends them a signal to start the work.
> > 3 out of 4 children gets the signal and does the work. One of the child
> > never gets the signal so, it waits forever in pause(). So, parent waits
> > for a longtime to kill it.
>
> Since this *seems* to have nothing to do with the filesystem, and since it
> *seems* to have been introduced between -rc3 and -rc4, I did
>
> gitk v2.6.23-rc3..v2.6.23-rc4 -- kernel/

I was wrong. I managed to reproduce on 2.6.23-rc3, but it took a long
time. But I never reproduced it on 2.6.22. Ran test for a day.

>
> to see what has changed. One of the commits was signal-related, and that
> one doesn't look like it could possibly matter.
>
> The rest were scheduler-related, which doesn't surprise me. In fact, even
> before I looked, my reaction to your bug report was "That sounds like an
> application race condition".
>
> Applications shouldn't use "pause()" for waiting for a signal. It's a
> fundamentally racy interface - the signal could have happened just
> *before* calling pause. So it's almost always a bug to use pause(), and
> any users should be fixed to use "sigsuspend()" instead, which can
> atomically (and correctly) pause for a signal while the process has masked
> it outside of the system call.
>
> Now, I took a look at the dbench sources, and I have to say that the race
> looks *very* unlikely (there's quite a small window in which it does
>
> children[i].status = getpid();
> ** race window here **
> pause();
>
> and it would require *just* the right timing so that the parent doesn't
> end up doing the "sleep(1)" (which would make the window even less likely
> to be hit), but there does seem to be a race condition there. And it
> *could* be that you just happen to hit it on your hw setup.
>
> So before you do anything else, does this patch (TOTALLY UNTESTED! DONE
> ENTIRELY LOOKING AT THE SOURCE! IT MAY RAPE ALL YOUR PETS, AND CALL YOU
> BAD NAMES!) make any difference?
>
> (patch against unmodified dbench-2.0)

I am testing the updated version of dbench now. Normally, it takes
30min-1hour to reproduce the problem (when I do infinite "dbench 4").
I will post the results soon.

Thanks,
Badari

2007-09-25 00:26:10

by Badari Pulavarty

[permalink] [raw]

Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

On Mon, 2007-09-24 at 13:04 -0700, Linus Torvalds wrote:
>
> On Mon, 24 Sep 2007, Badari Pulavarty wrote:
> >
> > Whats happening on my machine is ..
> >
> > dbench forks of 4 children and sends them a signal to start the work.
> > 3 out of 4 children gets the signal and does the work. One of the child
> > never gets the signal so, it waits forever in pause(). So, parent waits
> > for a longtime to kill it.
>
> Since this *seems* to have nothing to do with the filesystem, and since it
> *seems* to have been introduced between -rc3 and -rc4, I did
>
> gitk v2.6.23-rc3..v2.6.23-rc4 -- kernel/
>
> to see what has changed. One of the commits was signal-related, and that
> one doesn't look like it could possibly matter.
>
> The rest were scheduler-related, which doesn't surprise me. In fact, even
> before I looked, my reaction to your bug report was "That sounds like an
> application race condition".
>
> Applications shouldn't use "pause()" for waiting for a signal. It's a
> fundamentally racy interface - the signal could have happened just
> *before* calling pause. So it's almost always a bug to use pause(), and
> any users should be fixed to use "sigsuspend()" instead, which can
> atomically (and correctly) pause for a signal while the process has masked
> it outside of the system call.
>
> Now, I took a look at the dbench sources, and I have to say that the race
> looks *very* unlikely (there's quite a small window in which it does
>
> children[i].status = getpid();
> ** race window here **
> pause();
>
> and it would require *just* the right timing so that the parent doesn't
> end up doing the "sleep(1)" (which would make the window even less likely
> to be hit), but there does seem to be a race condition there. And it
> *could* be that you just happen to hit it on your hw setup.
>
> So before you do anything else, does this patch (TOTALLY UNTESTED! DONE
> ENTIRELY LOOKING AT THE SOURCE! IT MAY RAPE ALL YOUR PETS, AND CALL YOU
> BAD NAMES!) make any difference?
>
> (patch against unmodified dbench-2.0)
>
> Linus
>
> ---
> diff --git a/dbench.c b/dbench.c
> index ccf5624..4be5712 100644
> --- a/dbench.c
> +++ b/dbench.c
> @@ -91,10 +91,15 @@ static double create_procs(int nprocs, void (*fn)(struct child_struct * ))
>
> for (i=0;i<nprocs;i++) {
> if (fork() == 0) {
> + sigset_t old, blocked;
> +
> + sigemptyset(&blocked);
> + sigaddset(&blocked, SIGCONT);
> + sigprocmask(SIG_BLOCK, &blocked, &old);
> setbuffer(stdout, NULL, 0);
> nb_setup(&children[i]);
> children[i].status = getpid();
> - pause();
> + sigsuspend(&old);
> fn(&children[i]);
> _exit(0);
> }

With the modified dbench, I couldn't reproduce the problem so far.
I will let it run through the night (just to be sure).

For now, we can treat it as a tool/App issue :)

Thanks,
Badari