2007-09-13 14:48:41

by Frantisek Rysanek

[permalink] [raw]
Subject: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

Dear everyone,

apologies in advance for a silly question...

I'm using a homebrew stripped-down mini-distro based on Fedora 5,
with various newer kernels, on a live CD, to test hardware with.
The live CD is composed by means of scripted binary copy of the key
necessary components (libc, init, bash, /dev/, /etc/, you know the
rest...) - it's almost like rolling your own MS-DOS boot floppy.
A minimum system is about 4-10 MB, a neat firewall takes up
about 22 MB.

Recently I've stubled over what seems like a lasting bug
in the Linux kernel. Excuse me for that accusation, which is
admittedly based on rather vague data, dated versions
of the user-space software (libc, bash...), and a homebrew
hackey distro.

First impression:
looped execution of Bonnie++2 makes bash go berserk.
There are two possible flawed behaviors:

1) the bash process that's waiting for Bonnie++2 to return,
starts looping inside the last waitpid() call I believe,
eating 100% CPU.
At least that's what 'top' + 'strace -p <bash PID>' would
suggest. The top and strace have to be running beforehand,
as the same happens to the bash process on any other virtual
console, if you try to run any further command. (The further
command doesn't seem to get executed anymore.)

2) the bash processes don't start eating 100% CPU, but any further
command that you try to execute returns immediately with
a segfault.

I boot the CD with just bare bash on all 6 virtual consoles.
I mount a previously created EXT3 FS (several hundred GB
to over 1 TB) on a mountpoint, `cd` into the mountpoint
on one or two consoles, and run

while true; do bonnie++2 -u root -s 4096; done

Then I run 'iostat 2', 'top' and 'strace -p <bash PID>' on the
remaining consoles. I try running some other command now and then, to
make the paging and block IO subsystems load some more blocks from
the CD.

I believe the `top` output suggests that the Bonnie processes don't
eat all that much RAM, but the kernel-space buffering eats almost all
of it. Only about 50 Megs remain truly "free", most of the RAM gets
"cached". The system stabilizes at this balance, and a few minutes
later it hangs in the aforementioned way.

This happens without a swap. If I mkswap+swapon some free hard drive,
the symptoms seem somewhat more difficult to reproduce, but do occur
after a somewhat longer period of time.

The symptoms are fairly easily reproduced on 2.6.16.18 through
2.6.16.48, as well as 2.6.18.8. On 2.6.22.6 it seems to take a bit
more time to reproduce the problem.

I've reproduced the problem on three different dual Xeon
boxes, all of them SuperMicro of different sizes/generations,
all of them upgraded to the latest BIOS (now showing no more
IRQ routing mischiefs).
The hardware setups are along the lines of
- Intel 7501 chipset, dual Xeon Northwood, 1 GB RAM,
Adaptec 79xx HBA, external RAID (~80 MBps),
internal Adaptec 2120 RAID (~50 MBps)
- Intel 7520 chipset, dual Xeon Irwindale, 2 GB RAM,
several internal U320 SCSI drives via Adaptec 79xx HBA,
an external RAID (~80 MBps) via LSI 20320 HBA (Fusion MPT)
- Intel 7520 chipset, dual Xeon Nocona, 1 GB RAM,
internal LSI MegaRAID SATA150-6 with 6 disk drives.

I've never seen this before I started using bonnie++2 as a load
generator :-) Both my hardware systems and my Linux CD are otherwise
perfectly stable, under sequential IO, cpuburn, older versions of
Bonnie on Linux 2.4 / FreeBSD etc.
I know what it looks like when there's a hardware problem and I know
how to prove/deny a hardware problem by selective A/B-style hardware
replacements, I'm fairly good at shielding away hardware unstability.

Should I start from compiling a fresh libc + bash + whatever else?
Any ideas are welcome :-)

Frank Rysanek


2007-09-13 16:12:45

by Nick Piggin

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

On Friday 14 September 2007 00:46, Frantisek Rysanek wrote:
> Dear everyone,
>
> apologies in advance for a silly question...
>
> I'm using a homebrew stripped-down mini-distro based on Fedora 5,
> with various newer kernels, on a live CD, to test hardware with.
> The live CD is composed by means of scripted binary copy of the key
> necessary components (libc, init, bash, /dev/, /etc/, you know the
> rest...) - it's almost like rolling your own MS-DOS boot floppy.
> A minimum system is about 4-10 MB, a neat firewall takes up
> about 22 MB.
>
> Recently I've stubled over what seems like a lasting bug
> in the Linux kernel. Excuse me for that accusation, which is
> admittedly based on rather vague data, dated versions
> of the user-space software (libc, bash...), and a homebrew
> hackey distro.
>
> First impression:
> looped execution of Bonnie++2 makes bash go berserk.
> There are two possible flawed behaviors:
>
> 1) the bash process that's waiting for Bonnie++2 to return,
> starts looping inside the last waitpid() call I believe,
> eating 100% CPU.
> At least that's what 'top' + 'strace -p <bash PID>' would
> suggest. The top and strace have to be running beforehand,
> as the same happens to the bash process on any other virtual
> console, if you try to run any further command. (The further
> command doesn't seem to get executed anymore.)
>
> 2) the bash processes don't start eating 100% CPU, but any further
> command that you try to execute returns immediately with
> a segfault.
>
> I boot the CD with just bare bash on all 6 virtual consoles.
> I mount a previously created EXT3 FS (several hundred GB
> to over 1 TB) on a mountpoint, `cd` into the mountpoint
> on one or two consoles, and run
>
> while true; do bonnie++2 -u root -s 4096; done
>
> Then I run 'iostat 2', 'top' and 'strace -p <bash PID>' on the
> remaining consoles. I try running some other command now and then, to
> make the paging and block IO subsystems load some more blocks from
> the CD.
>
> I believe the `top` output suggests that the Bonnie processes don't
> eat all that much RAM, but the kernel-space buffering eats almost all
> of it. Only about 50 Megs remain truly "free", most of the RAM gets
> "cached". The system stabilizes at this balance, and a few minutes
> later it hangs in the aforementioned way.
>
> This happens without a swap. If I mkswap+swapon some free hard drive,
> the symptoms seem somewhat more difficult to reproduce, but do occur
> after a somewhat longer period of time.
>
> The symptoms are fairly easily reproduced on 2.6.16.18 through
> 2.6.16.48, as well as 2.6.18.8. On 2.6.22.6 it seems to take a bit
> more time to reproduce the problem.
>
> I've reproduced the problem on three different dual Xeon
> boxes, all of them SuperMicro of different sizes/generations,
> all of them upgraded to the latest BIOS (now showing no more
> IRQ routing mischiefs).
> The hardware setups are along the lines of
> - Intel 7501 chipset, dual Xeon Northwood, 1 GB RAM,
> Adaptec 79xx HBA, external RAID (~80 MBps),
> internal Adaptec 2120 RAID (~50 MBps)
> - Intel 7520 chipset, dual Xeon Irwindale, 2 GB RAM,
> several internal U320 SCSI drives via Adaptec 79xx HBA,
> an external RAID (~80 MBps) via LSI 20320 HBA (Fusion MPT)
> - Intel 7520 chipset, dual Xeon Nocona, 1 GB RAM,
> internal LSI MegaRAID SATA150-6 with 6 disk drives.
>
> I've never seen this before I started using bonnie++2 as a load
> generator :-) Both my hardware systems and my Linux CD are otherwise
> perfectly stable, under sequential IO, cpuburn, older versions of
> Bonnie on Linux 2.4 / FreeBSD etc.
> I know what it looks like when there's a hardware problem and I know
> how to prove/deny a hardware problem by selective A/B-style hardware
> replacements, I'm fairly good at shielding away hardware unstability.
>
> Should I start from compiling a fresh libc + bash + whatever else?
> Any ideas are welcome :-)

Can you see if it is looping in userspace or kernel? Can you kill -9
the process?

Are you able to test with the latest 2.6.23-rc kernel? If not (or if it
still has the same problem), then can you get the output of sysrq+T
and three sysrq+P calls, please? (this might help work out where in
kernel it is spinning).

2007-09-14 14:56:06

by Frantisek Rysanek

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

Dear Mr. Piggin,

thanks for your response in the first place :-)

On 13 Sep 2007 at 2:30, Nick Piggin wrote:
>
> Can you see if it is looping in userspace or kernel? Can you kill -9
> the process?
>
I can't run any command. Any command hangs or coredumps.

> Are you able to test with the latest 2.6.23-rc kernel? If not (or if it
> still has the same problem), then can you get the output of sysrq+T
> and three sysrq+P calls, please? (this might help work out where in
> kernel it is spinning).
>
I've compiled 2.6.23-rc6, enabled serial console and captured
the output of sysrq+P (on the affected virtual VGA console)
and sysrq+T.

http://www.fccps.cz/download/adv/frr/bonnie/2.6.23-rc6.txt

The interesting bit of information, related to the erratic "bash"
processes, is always a single line, such as:

bash R running 0 2358 1

I've also taken a photo of `top` running
on another virtual console. I can't get any data out of the
affected box, as I can't run any shell commands...

http://www.fccps.cz/download/adv/frr/bonnie/top.jpg

Note that there are rather few processes running in the user space.
Can't say if that makes any difference from a full-blown distro.

Maybe I could set up the bootable CD for download somewhere
(gzipped ISO of maybe 50 Megs).

In this scenario, Linux 2.6.16.18 once reported a soft lockup.
http://www.fccps.cz/download/adv/frr/bonnie/soft-lockup1.txt
Never again.

I also managed to catch the misbehavior in strace once, didn't
get a capture, but essentially it was stuck at a single open
syscall, I believe it was "waitpid(1, " . (Never managed that again,
always got segfaults instead of the loopy bash when trying to watch
bash by strace -p).

Exactly where does the context switch from user to kernel take place?
I know that I can call ioctl() from user space, and I can write
ioctl() handlers in kernel space as part of device drivers (the
handlers take place entirely in kernel space). The waitpid()
thing is a syscall, being entered only once from user space
- and the bash process seems to keep looping inside it.
Does the single "running" line in Alt+SysRq+T mean that the
process is looping in user space?
Take a look at the CPU consumption % numbers though...

Note that there's no OOM killer. (Seen that one before, under
different circumstances - when OCFS2 didn't like machines
with less than 1 GB RAM.)

My impression is that the erratic behavior could be a secondary
symptom of a kernel-space memory leak taking place somewhere else
than in the loopy code itself. Can't say if the leak takes place in
memory management or EXT3 for instance...

Frank Rysanek

2007-09-19 09:06:20

by Frantisek Rysanek

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

I've been trying in vain to post a response to this thread...
Content is text/plain, encoding is ISO 8859-1.
Yet it doesn't land in the mailing list.
I've included about three URL's to captured data.
Is that a problem? Any further hints on what to avoid,
and how else to pass big logfiles / binary screenshots?

Frank Rysanek

2007-09-19 15:38:32

by Randy Dunlap

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

On Wed, 19 Sep 2007 10:50:53 +0200 Frantisek Rysanek wrote:

> I've been trying in vain to post a response to this thread...
> Content is text/plain, encoding is ISO 8859-1.
> Yet it doesn't land in the mailing list.
> I've included about three URL's to captured data.
> Is that a problem? Any further hints on what to avoid,
> and how else to pass big logfiles / binary screenshots?


Check the taboo expressions here:
http://vger.kernel.org/majordomo-info.html#taboo

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-09-19 16:28:51

by Frantisek Rysanek

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

Dear Mr. Piggin,

thanks for your response in the first place :-)

On 13 Sep 2007 at 2:30, Nick Piggin wrote:
>
> Can you see if it is looping in userspace or kernel? Can you kill -9
> the process?
>
This is interesting. I can't run any classic system command. Any
command hangs or coredumps. Any command except kill :-) Perhaps
"kill" is an internal bash command, so that it needn't fork+exec
(clone) to execute?

Anyway if I kill -9 the loopy bash process, the loopy console
respawns, I get several segfaults from udevd and dircolors (called
from .bashrc), and the new bash process on that console is no longer
loopy. But I continue to get segfaults from any commands that I try
to run...

> Are you able to test with the latest 2.6.23-rc kernel? If not (or if it
> still has the same problem), then can you get the output of sysrq+T
> and three sysrq+P calls, please? (this might help work out where in
> kernel it is spinning).
>
I've compiled 2.6.23-rc6, enabled serial console and captured
the output of sysrq+P (on the affected virtual VGA console)
and sysrq+T.

http://www.fccps.cz/download/adv/frr/bonnie/2.6.23-rc6.txt

The interesting bit of information, related to the erratic "bash"
processes, is always a single line, such as:

bash R running 0 2358 1

I've also taken a photo of `top` running
on another virtual console. I can't get any data out of the
affected box, as I can't run any shell commands...

http://www.fccps.cz/download/adv/frr/bonnie/top.jpg

Note that there are rather few processes running in the user space.
Can't say if that makes any difference from a full-blown distro.

Maybe I could set up the bootable CD for download somewhere
(gzipped ISO of maybe 50 Megs).

In this scenario, Linux 2.6.16.18 once reported a soft lockup.
http://www.fccps.cz/download/adv/frr/bonnie/soft-lockup1.txt
Never again.

I also managed to catch the misbehavior in strace once, didn't
get a capture, but essentially it was stuck at a single open
syscall, I believe it was "waitpid(1, " . (Never managed that again,
always got segfaults instead of the loopy bash when trying to watch
bash by strace -p).

Exactly where does the context switch from user to kernel take place?
I know that I can call ioctl() from user space, and I can write
ioctl() handlers in kernel space as part of device drivers (the
handlers take place entirely in kernel space). The waitpid()
thing is a syscall, being entered only once from user space
- and the bash process seems to keep looping inside it.
Does the single "running" line in Alt+SysRq+T mean that the
process is looping in user space?
Take a look at the CPU consumption % numbers though...

Note that there's no OOM killer. (Seen that one before, under
different circumstances - when OCFS2 didn't like machines
with less than 1 GB RAM.)

My impression is that the erratic behavior could be a secondary
symptom of a kernel-space memory leak taking place somewhere else
than in the loopy code itself. Can't say if the leak takes place in
memory management or EXT3 for instance...

Or maybe my problem lives in pure user space after all?

Frank Rysanek

2007-09-19 17:35:51

by Frantisek Rysanek

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

On 19 Sep 2007 at 17:34, Randy Dunlap wrote:
> On Wed, 19 Sep 2007 10:50:53 +0200 Frantisek Rysanek wrote:
> > I've been trying in vain to post a response to this thread...
> Check the taboo expressions here:
> http://vger.kernel.org/majordomo-info.html#taboo
>
Thanks for your kind response :-)
I've checked the regexp's in the TABOO list visually and couldn't
find any obvious violation, certainly not on part of my URL's.
Maybe there's some other word in my text.

Is there an easy way for me to scan a sample message (as a local flat
file on my system) using that reference TABOO list, in an automated
way? Such as feed the TABOO list to a scripted instance of
sed/awk/perl? I'm only moderately familiar with perl-compatible
regexp syntax, some of the lines seem odd to me...
I'm reluctant to download Majordomo and look for some references in
its documentation / source code, to look for the actual place where
this TABOO list gets applied, to recover a stand-alone usage
example...

Any ideas are welcome :-)

Frank Rysanek

2007-09-19 22:54:01

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

On 09/14/2007 11:00 AM, Frantisek Rysanek wrote:
> Dear Mr. Piggin,
>
> thanks for your response in the first place :-)
>
> On 13 Sep 2007 at 2:30, Nick Piggin wrote:
>> Can you see if it is looping in userspace or kernel? Can you kill -9
>> the process?
>>
> I can't run any command. Any command hangs or coredumps.
>
>> Are you able to test with the latest 2.6.23-rc kernel? If not (or if it
>> still has the same problem), then can you get the output of sysrq+T
>> and three sysrq+P calls, please? (this might help work out where in
>> kernel it is spinning).
>>
> I've compiled 2.6.23-rc6, enabled serial console and captured
> the output of sysrq+P (on the affected virtual VGA console)
> and sysrq+T.
>
> http://www.fccps.cz/download/adv/frr/bonnie/2.6.23-rc6.txt
>
> The interesting bit of information, related to the erratic "bash"
> processes, is always a single line, such as:
>
> bash R running 0 2358 1
>
> I've also taken a photo of `top` running
> on another virtual console. I can't get any data out of the
> affected box, as I can't run any shell commands...
>
> http://www.fccps.cz/download/adv/frr/bonnie/top.jpg
>

Looks like process 2382 is stuck with the kernel trying to send
SIGBUS to that bash process, here:

int
force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
{
unsigned long int flags;
int ret, blocked, ignored;
struct k_sigaction *action;

====> spin_lock_irqsave(&t->sighand->siglock, flags);

2007-09-20 07:26:45

by Frantisek Rysanek

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

On 20 Sep 2007 at 0:53, Chuck Ebbert wrote:
> Looks like process 2382 is stuck with the kernel trying to send
> SIGBUS to that bash process, here:
>
Wow, thanks for that hint, certainly I wouldn't be able to trace the
problem as far as this :-)

Does this mean that perhaps bash or glibc (my outdated versions)
inappropriately mask SIGBUS, and it's therefore a bug in my user
space ?

Frank Rysanek

2008-03-06 20:54:20

by Frantisek Rysanek

[permalink] [raw]
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

On 13 Sep 2007 at 16:46, Frantisek Rysanek wrote:
[...cut...]
> 1) the bash process that's waiting for Bonnie++2 to return,
> starts looping inside the last waitpid() call I believe,
> eating 100% CPU.
[...cut...]

Apologies for following up on this thread after half a year... :-)

Things haven't changed in 2.6.24.2.
Bonnie++2 still causes the same sort of havoc.
I have a further bit or two to report:

I've read in some LKML postings that the interaction between block
device buffering, FS operations and memory management (allocation)
are rather "stochastic" in some areas. That's what turned my
attention to the tweakable parameters of the VM / memory management.

I've tried
echo "2" >/proc/sys/vm/overcommit_memory
echo "0" >/proc/sys/vm/overcommit_ratio
echo "200000" >/proc/sys/vm/min_free_kbytes
(on a machine with 1 GB of RAM and very few user-space processes
running).
This had little or no effect on the misbehavior invoked by Bonnie++2.

I've been running Bonnie against EXT3 for my disk stress testing
purposes.
If I switch to ReiserFS 3, the problem goes away.

I really got the idea to try Reiser 3 by reading the Bonnie++2 docs,
which say that the last tests in the batch, consisting e.g. of
creating and deleting a massive number of directories, are the sort
of a benchmark where ReiserFS shows its beauty.
I've also noticed that the misbehavior often occurs right after the
point where Bonnie++2 reports "start'em" (or what's the precise
wording is).

Anyway I'll post more if/when I discover something.
Thanks for your attention, and for tolerating my vague postings in
this list :-)

Frank Rysanek