2004-04-17 08:53:06

by Marc Giger

[permalink] [raw]
Subject: Linux on UltraSparcII E450

Hi All,

Last week I had the honor to install Linux on a E450 with 2 cpu's. All
went fine at first. Long compiling sessions were no problem for the
machine. Later we installed 16 additional SCSI disks and we built
4 x Soft-RAID5 groups with 4 disks each.
After some time during the sync processes the machine stops responding.
Simply dead. The same thing happens after every boot when the sync
process is in action.

My question now is: Is it a hardware or a kernel problem? I now it isn't
a simple question with the given infos.
Is it possible that the 4 parallel sync processes are to much for the
SCSI (standard LSI) controllers?
I assume that the kernel RAID5 code is stable on sparc?!

Thank you

Regards

Marc


2004-04-17 10:07:19

by Willy Tarreau

[permalink] [raw]
Subject: Re: Linux on UltraSparcII E450

Hmmm, I believe you forgot to tell which kernel version you used, and how
you configured it :-)

Willy

On Sat, Apr 17, 2004 at 10:53:03AM +0200, Marc Giger wrote:
> Hi All,
>
> Last week I had the honor to install Linux on a E450 with 2 cpu's. All
> went fine at first. Long compiling sessions were no problem for the
> machine. Later we installed 16 additional SCSI disks and we built
> 4 x Soft-RAID5 groups with 4 disks each.
> After some time during the sync processes the machine stops responding.
> Simply dead. The same thing happens after every boot when the sync
> process is in action.
>
> My question now is: Is it a hardware or a kernel problem? I now it isn't
> a simple question with the given infos.
> Is it possible that the 4 parallel sync processes are to much for the
> SCSI (standard LSI) controllers?
> I assume that the kernel RAID5 code is stable on sparc?!
>
> Thank you
>
> Regards
>
> Marc
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2004-04-17 12:16:08

by Marc Giger

[permalink] [raw]
Subject: Re: Linux on UltraSparcII E450

On Sat, 17 Apr 2004 12:06:30 +0200
Willy Tarreau <[email protected]> wrote:

> Hmmm, I believe you forgot to tell which kernel version you used, and
> how you configured it :-)
>
> Willy

Oh f**k:-) Sorry for that.

It is 2.4.26.

Sorry, I can't attach the .config because I'm not near the machine...

RAID1 + RAID5 code in kernel.
No preempt but SMP.
ext3 fs on all disks.
Most other code as modules configured.

Hopefully nothing forgotten this time.

Thank you!

Regards

Marc

>
> On Sat, Apr 17, 2004 at 10:53:03AM +0200, Marc Giger wrote:
> > Hi All,
> >
> > Last week I had the honor to install Linux on a E450 with 2 cpu's.
> > All went fine at first. Long compiling sessions were no problem for
> > the machine. Later we installed 16 additional SCSI disks and we
> > built 4 x Soft-RAID5 groups with 4 disks each.
> > After some time during the sync processes the machine stops
> > responding. Simply dead. The same thing happens after every boot
> > when the sync process is in action.
> >
> > My question now is: Is it a hardware or a kernel problem? I now it
> > isn't a simple question with the given infos.
> > Is it possible that the 4 parallel sync processes are to much for
> > the SCSI (standard LSI) controllers?
> > I assume that the kernel RAID5 code is stable on sparc?!
> >
> > Thank you
> >
> > Regards
> >
> > Marc
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>

2004-04-17 13:46:43

by Ben Collins

[permalink] [raw]
Subject: Re: Linux on UltraSparcII E450

On Sat, Apr 17, 2004 at 10:53:03AM +0200, Marc Giger wrote:
> Hi All,
>
> Last week I had the honor to install Linux on a E450 with 2 cpu's. All
> went fine at first. Long compiling sessions were no problem for the
> machine. Later we installed 16 additional SCSI disks and we built
> 4 x Soft-RAID5 groups with 4 disks each.
> After some time during the sync processes the machine stops responding.
> Simply dead. The same thing happens after every boot when the sync
> process is in action.
>
> My question now is: Is it a hardware or a kernel problem? I now it isn't
> a simple question with the given infos.
> Is it possible that the 4 parallel sync processes are to much for the
> SCSI (standard LSI) controllers?
> I assume that the kernel RAID5 code is stable on sparc?!

Try enabling some debug, like spinlock debug and such. See if that spits
out anything interesting.

--
Debian - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/