2001-04-28 21:22:00

by Ion Badulescu

[permalink] [raw]
Subject: 2.2.19 locks up on SMP

Hi Alan,

Over the last week I've tried to upgrade a 4-CPU Xeon box to 2.2.19, but
the it keeps locking up whenever the disks are stresses a bit, e.g. when
updatedb is running. I get the following messages on the console:

wait_on_bh, CPU 1:
irq: 1 [1 0]
bh: 1 [1 0]
<[8010af71]>

over and over again, until somebody pushes the reset button. 8010af71 is
somewhere in the middle of synchronize_bh().

The hardware configuration is: 4 Xeon/500MHz, 1GB RAM, 3 SCSI disks
attached to a symbios controller, 2 eepro100 interfaces. The kernel is
compiled with support for SMP and 2GB of RAM (hence the kernel address
starting with 8 instead of c). It was compiled from a pristine source
tree, no patches were applied.

I had more problems with 2.2.19 and another SMP box, which was also
locking up under stress. I'm not sure if it had the same messages on the
console, since it's headless, but it was running the same 2.2.19 kernel as
the previous one and was locking up in a very similar fashion. The
hardware in that box is 2 P-III/750MHz, 512MB RAM, 1 IDE disk on a PIIX
controller, and an unused aic7xxx SCSI controller with no SCSI devices
attached to it.

Both boxes are rock-solid when running 2.2.18-SMP.

Any ideas? Has anybody else reported this with 2.2.19?

Thanks,
Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.


2001-04-28 22:26:14

by bert hubert

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Sat, Apr 28, 2001 at 02:21:29PM -0700, Ion Badulescu wrote:
> Hi Alan,
>
> Over the last week I've tried to upgrade a 4-CPU Xeon box to 2.2.19, but
> the it keeps locking up whenever the disks are stresses a bit, e.g. when
> updatedb is running. I get the following messages on the console:
>
> wait_on_bh, CPU 1:
> irq: 1 [1 0]
> bh: 1 [1 0]
> <[8010af71]>

Obvious question is, which compiler.

--
http://www.PowerDNS.com Versatile DNS Services
Trilab The Technology People
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

2001-04-28 23:10:20

by idalton

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Sun, Apr 29, 2001 at 01:16:04AM +0200, bert hubert wrote:
> On Sat, Apr 28, 2001 at 02:21:29PM -0700, Ion Badulescu wrote:
> > Hi Alan,
> >
> > Over the last week I've tried to upgrade a 4-CPU Xeon box to 2.2.19, but
> > the it keeps locking up whenever the disks are stresses a bit, e.g. when
> > updatedb is running. I get the following messages on the console:
> >
> > wait_on_bh, CPU 1:
> > irq: 1 [1 0]
> > bh: 1 [1 0]
> > <[8010af71]>
>
> Obvious question is, which compiler.

I hadn't seen any locks, but (on a dual Pmmx 200) it started crawling
right after the NIC module (tulip) was loaded. System load decided to
skyrocket.

Yadda... 2.2.19 with devfs patch.
bicycle:~# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.3/specs
gcc version 2.95.3 20010315 (Debian release)

Might be the same problem.

-- Ferret

2001-04-29 00:00:09

by Tim Moore

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

> > Obvious question is, which compiler.
>
> I hadn't seen any locks, but (on a dual Pmmx 200) it started crawling
> right after the NIC module (tulip) was loaded. System load decided to
> skyrocket.
>
> Yadda... 2.2.19 with devfs patch.
> bicycle:~# gcc -v
> Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.3/specs
> gcc version 2.95.3 20010315 (Debian release)
>
> Might be the same problem.

Twin Abit BP6's, 2.2.19 + 9-Apr ide patch, no problems.

egcs-2.91.66

tulip.c:v0.91g-ppc 7/16/99 [email protected]
eth0: Lite-On 82c168 PNIC rev 32 at 0xc800, 00:A0:CC:57:89:93, IRQ 16.

--
| 650.390.9613 | [email protected]

2001-04-30 09:22:21

by Ion Badulescu

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Sun, 29 Apr 2001 01:16:04 +0200, bert hubert <[email protected]> wrote:
> On Sat, Apr 28, 2001 at 02:21:29PM -0700, Ion Badulescu wrote:
>> Hi Alan,
>>
>> Over the last week I've tried to upgrade a 4-CPU Xeon box to 2.2.19, but
>> the it keeps locking up whenever the disks are stresses a bit, e.g. when
>> updatedb is running. I get the following messages on the console:
>>
>> wait_on_bh, CPU 1:
>> irq: 1 [1 0]
>> bh: 1 [1 0]
>> <[8010af71]>
>
> Obvious question is, which compiler.

These are rh62 systems, the compiler is egcs-1.1.2. So that's not it.

I'd be willing to do the binary search through the 2.2.19pre series,
but I'd rather avoid it if it's a known bug. It's pretty painful, both
for myself and for the real users of this box, to go through the pains
of 10-20 cycles of reboot-crash-fsck_3_large_disks...

Thanks,
Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-04-30 17:55:37

by Alan

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

> I had more problems with 2.2.19 and another SMP box, which was also
> locking up under stress. I'm not sure if it had the same messages on the
> console, since it's headless, but it was running the same 2.2.19 kernel as
> the previous one and was locking up in a very similar fashion. The
> hardware in that box is 2 P-III/750MHz, 512MB RAM, 1 IDE disk on a PIIX
> controller, and an unused aic7xxx SCSI controller with no SCSI devices
> attached to it.
>
> Both boxes are rock-solid when running 2.2.18-SMP.
>
> Any ideas? Has anybody else reported this with 2.2.19?

A couple. It looks lik the VM changes may have upset something (based on
reports saying it began at that point). Can you see if 2.2.19pre stuff is
stable ?

2001-04-30 18:18:38

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Mon, Apr 30, 2001 at 06:55:54PM +0100, Alan Cox wrote:
> A couple. It looks lik the VM changes may have upset something (based on
> reports saying it began at that point). Can you see if 2.2.19pre stuff is
> stable ?

I also have reports but related to the network driver updates. So I
suggest to try again with 2.2.19 but with the drivers/net/* of 2.2.18.

Andrea

2001-04-30 18:29:38

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Mon, Apr 30, 2001 at 08:15:47PM +0200, Andrea Arcangeli wrote:
> suggest to try again with 2.2.19 but with the drivers/net/* of 2.2.18.

even better try vanilla 2.2.19aa2 and if it crashes too, try 2.2.19aa2
plus the drivers/net/* of 2.2.18.

Andrea

2001-04-30 19:54:03

by Alan

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

> On Mon, Apr 30, 2001 at 06:55:54PM +0100, Alan Cox wrote:
> > A couple. It looks lik the VM changes may have upset something (based on
> > reports saying it began at that point). Can you see if 2.2.19pre stuff is
> > stable ?
>
> I also have reports but related to the network driver updates. So I
> suggest to try again with 2.2.19 but with the drivers/net/* of 2.2.18.

Thats probably a better starting point. Its easier to back out than the VM
changes and it would also explain the reports I saw.

2001-04-30 19:59:55

by Ion Badulescu

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Mon, 30 Apr 2001, Alan Cox wrote:

> > I also have reports but related to the network driver updates. So I
> > suggest to try again with 2.2.19 but with the drivers/net/* of 2.2.18.
>
> Thats probably a better starting point. Its easier to back out than the VM
> changes and it would also explain the reports I saw.

Except that the only driver I'm using is eepro100, and the only change to
that driver was the patch I submitted myself and which is also in 2.4.

Also, another data point: those two SMP boxes have been running 2.2.18 +
Andrea's VM-global patch since January, without a hitch.

Ok, so onto the binary search through the 2.2.19pre series...

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-04-30 20:29:22

by Mohammad A. Haque

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Mon, 30 Apr 2001, Ion Badulescu wrote:

> Except that the only driver I'm using is eepro100, and the only change to
> that driver was the patch I submitted myself and which is also in 2.4.
>
> Also, another data point: those two SMP boxes have been running 2.2.18 +
> Andrea's VM-global patch since January, without a hitch.
>
> Ok, so onto the binary search through the 2.2.19pre series...

Just to give another data point...

2.2.19 + LVM patches - dual P3 550
1 GB RAM
eepro100
ncr53c8xx scsi
mylex accelRAID 1100 RAID controller

We've transferred around 1 GB of stuff over the network and about 200 GB
between two raids w/o problems in a little under 3 days.

We've only scratched into swap. Free show 128K being used.

--

=====================================================================
Mohammad A. Haque http://www.haque.net/
[email protected]

"Alcohol and calculus don't mix. Project Lead
Don't drink and derive." --Unknown http://wm.themes.org/
[email protected]
=====================================================================

2001-05-01 01:09:54

by Ion Badulescu

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

On Mon, 30 Apr 2001, Mohammad A. Haque wrote:

> Just to give another data point...
>
> 2.2.19 + LVM patches - dual P3 550
> 1 GB RAM
> eepro100
> ncr53c8xx scsi
> mylex accelRAID 1100 RAID controller
>
> We've transferred around 1 GB of stuff over the network and about 200 GB
> between two raids w/o problems in a little under 3 days.
>
> We've only scratched into swap. Free show 128K being used.

Ok. Have you tried running a large bonnie (1GB) while at the same time
pummeling the network? That's how I trigger it, quite reliably.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.


2001-07-03 10:10:33

by Scott Nursten

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

Hi there,

Was there ever any resolution to this thread? I'm running a bunch of Compaq DL-360's which seem to work fine on the 2.2.19pre series. As soon as I go to 2.2.19, networking doesn't work. Machines are spec'd as follows:

2 x P3-933
1.4GB RAM
Compaq RLO card
Compaq Smart2 Array Controller
2 x EtherExpress Pro onboard
2 x EtherExpress Pro PCI (the dual port server adapter from Intel)

Caveat: whenever I run `ifconfig device down` the machine locks up completely.

Willing to give any information necessary in exchange for working kernel :) Any takers? Tell me what you guys need.

Rgds,

--
Scott Nursten - Systems Administrator
Streets Online Ltd.

Direct: +44 (0) 1293 744 122
Business: +44 (0) 1293 402 040
Fax: +44 (0) 1293 402 050
Email: [email protected]

-----------------------------------------------------------------------
"Unix is user friendly. It's just selective when choosing friends."
-----------------------------------------------------------------------

2001-07-03 10:47:01

by Scott Nursten

[permalink] [raw]
Subject: Re: 2.2.19 locks up on SMP

Hey guys,

Just to confirm - I've compiled 2.2.19 w/out SMP and it works sweet.

Rgds,

Scott

Scott Nursten wrote:
>
> Hi there,
>
> Was there ever any resolution to this thread? I'm running a bunch of Compaq DL-360's which seem to work fine on the 2.2.19pre series. As soon as I go to 2.2.19, networking doesn't work. Machines are spec'd as follows:
>
> 2 x P3-933
> 1.4GB RAM
> Compaq RLO card
> Compaq Smart2 Array Controller
> 2 x EtherExpress Pro onboard
> 2 x EtherExpress Pro PCI (the dual port server adapter from Intel)
>
> Caveat: whenever I run `ifconfig device down` the machine locks up completely.
>
> Willing to give any information necessary in exchange for working kernel :) Any takers? Tell me what you guys need.
>
> Rgds,
>
> --
> Scott Nursten - Systems Administrator
> Streets Online Ltd.
>
> Direct: +44 (0) 1293 744 122
> Business: +44 (0) 1293 402 040
> Fax: +44 (0) 1293 402 050
> Email: [email protected]
>
> -----------------------------------------------------------------------
> "Unix is user friendly. It's just selective when choosing friends."
> -----------------------------------------------------------------------
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Scott Nursten - Systems Administrator
Streets Online Ltd.

Direct: +44 (0) 1293 744 122
Business: +44 (0) 1293 402 040
Fax: +44 (0) 1293 402 050
Email: [email protected]

-----------------------------------------------------------------------
"Unix is user friendly. It's just selective when choosing friends."
-----------------------------------------------------------------------