2001-02-01 06:49:53

by Michal Jaegermann

[permalink] [raw]
Subject: 2.4.1 not fully sane on Alpha - file systems

I just tried to boot 2.4.1 kernel on Alpha UP1100. This machine
happens to have two SCSI disks on sym53c875 controller and two IDE
drives hooked to a builtin "Acer Laboratories Inc. [ALi] M5229 IDE".
It boots and in the first moment makes even a pretty good impression
of beeing healthy. But an attempt to compile something causes the
whole setup to start behaving weird, with a compiler obviously unable
to find both itself and the right sources, and the whole thing ends in
a silent lockup.

On the second boot I tried to copy kernel sources from a SCSI to an
IDE drive. This time I got something in my logs and the same stuff
was printed on my screen before everything lockded up really tight
again (no sysrq). Here it is:

kernel: attempt to access beyond end of device
kernel: 08:05: rw=0, want=198500353, limit=5779456
kernel: attempt to access beyond end of device
kernel: 08:05: rw=0, want=4294934529, limit=5779456
kernel: attempt to access beyond end of device
kernel: 08:05: rw=0, want=198500353, limit=5779456
kernel: EXT2-fs error (device sd(8,5)): ext2_readdir: bad entry in
directory #250255: directory entry across blocks - offset=0,
inode=198505472, rec_len=32768, name_len=255

(and the machine dies at this point).

There is nothing wrong with this device and a file system on it.
Copying the same way, or compiling the same sources, but when booted
with 2.2.18 does not present a whiff of trouble and e2fsck, luckily
enough, finds my file systems still in place. One should be grateful
for small favours.

Anybody have seen something similar?

Michal
[email protected]

p.s. I find a bit humorous the fact that the code required to
recognize that one has _some_ partition table (I happen to have two
kinds at the moment) is billed in a config file as ADVANCED.
It did the job anyway. :-)


2001-02-01 15:46:40

by John Jasen

[permalink] [raw]
Subject: Re: 2.4.1 not fully sane on Alpha - file systems

On Wed, 31 Jan 2001, Michal Jaegermann wrote:

> I just tried to boot 2.4.1 kernel on Alpha UP1100. This machine
> happens to have two SCSI disks on sym53c875 controller and two IDE
> drives hooked to a builtin "Acer Laboratories Inc. [ALi] M5229 IDE".

ALI M1535D pci-ide bridge, isn't it? That's what the specs on
API's webpage seem to indicate.

> It boots and in the first moment makes even a pretty good impression
> of beeing healthy. But an attempt to compile something causes the
> whole setup to start behaving weird, with a compiler obviously unable
> to find both itself and the right sources, and the whole thing ends in
> a silent lockup.

Try this for fun: dd if=/dev/hda of=/dev/null bs=4096, and see if it
cronks out.

In my case, any serious I/O on the IDE drives quickly results in pretty
technicolor on the VGA screen, and then a hard lockup.

Furthermore, after power-reset, 2.4.x, x=0 or 1, cannot successfully fsck
the drives. It hangs after about the 2nd-3rd partition, again in a hard
lockup.

I have to boot into 2.2.x to fsck the drives, make changes, and reboot to
hang the system.

My WAG is that there are problems in the ALI driver.

> On the second boot I tried to copy kernel sources from a SCSI to an
> IDE drive. This time I got something in my logs and the same stuff
> was printed on my screen before everything lockded up really tight
> again (no sysrq). Here it is:
>
> kernel: attempt to access beyond end of device
> kernel: 08:05: rw=0, want=198500353, limit=5779456
> kernel: attempt to access beyond end of device
> kernel: 08:05: rw=0, want=4294934529, limit=5779456
> kernel: attempt to access beyond end of device
> kernel: 08:05: rw=0, want=198500353, limit=5779456
> kernel: EXT2-fs error (device sd(8,5)): ext2_readdir: bad entry in
> directory #250255: directory entry across blocks - offset=0,
> inode=198505472, rec_len=32768, name_len=255
>
> (and the machine dies at this point).

AIC7xxx controller, just recently started spewing errors very similar to
this -- amongst a host of others, as I was trying to get the UP1100 to use
a generic IDE interface rather than the ALI 15x3.


2001-02-01 16:24:18

by Michal Jaegermann

[permalink] [raw]
Subject: Re: 2.4.1 not fully sane on Alpha - file systems

On Thu, Feb 01, 2001 at 10:46:12AM -0500, John Jasen wrote:
> On Wed, 31 Jan 2001, Michal Jaegermann wrote:
>
> > I just tried to boot 2.4.1 kernel on Alpha UP1100. This machine
> > happens to have two SCSI disks on sym53c875 controller and two IDE
> > drives hooked to a builtin "Acer Laboratories Inc. [ALi] M5229 IDE".
>
> ALI M1535D pci-ide bridge, isn't it? That's what the specs on
> API's webpage seem to indicate.

'lspci' claims that this is:

"07.0 Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV]"

>
> Try this for fun: dd if=/dev/hda of=/dev/null bs=4096, and see if it
> cronks out.

Probably.

> In my case, any serious I/O on the IDE drives quickly results in pretty
> technicolor on the VGA screen, and then a hard lockup.

No, no technicolor or other sounds effects. The whole thing just
locks up with a power switch as the only option.

> Furthermore, after power-reset, 2.4.x, x=0 or 1, cannot successfully fsck
> the drives. It hangs after about the 2nd-3rd partition, again in a hard
> lockup.

My box is much healtier than that. Regardless if I booted into a file
system on a SCSI drive or on an IDE drive (I happen to have those
options although I prefer IDE - I have there something which I can loose
without any real pain :-) I can still fsck drives healthy after the
crash but I did NOT risk fsck under 2.4.1. Things looks way too screwy
for this.

>
> My WAG is that there are problems in the ALI driver.

Possibly, but I crashed the whole thing without mounting anything from
IDE drives at all. There are still there but unused. I simply managed
to get something in logs for the case described. Note that errors
I quoted are from a device 08:05, i.e. SCSI driver (/dev/sda5 to be
more precise). When my compiler went bonkers and started to read
clearly some random stuff instead of sources then the whole action was
happening on a SCSI drive.

Michal

2001-02-01 20:39:09

by John Jasen

[permalink] [raw]
Subject: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems


The system in question is an API UP1100 based system, running 4 Maxtor
40gb IDE drives off the ALI M15x3 chipset.

This applies to kernel 2.4.0 and 2.4.1.

The drives are identified as follows from hdparm:

Model=Maxtor 54098H8, FwRev=DAC10SC0, SerialNo=K80F1ZFC

Is also has an Adaptec 29160 SCSI card, running a solid state disk and an
AIT tape library.

Upon placing any heavy I/O load on any of the disks (dd if=/dev/*d*
of=/dev/null) the screen flashes a few times, and then the system locks
hard -- no sysrq, no control-alt-del, no pings, no nothing.

It will also hang and lock hard on fscking corrupted filesystems under
2.4.0 and 2.4.1.

Interestingly enough, I tried 'dd if=/dev/zero of=/tmp/dd.img bs=4096
count=10000' and it also locked hard, after printing messages to the
effect of:

EXT2-fs error: (device info) allocating block in system zone -- block
(block numbers).

stock RH 2.2.16-3 works peachy.

I've tried various options with compiling in and out the ALI chipset, PCI
DMA, drive DMA, and IRQ sharing, but without all four of those enabled,
the system freezes at identifying the IDE device partitions, like so:

hda: lost interrupt
lost interrupt
lost interrupt

I've heard one other report of similar problems on the linux-kernel
mailing list, and at least one other on the axp-list.

On Thu, 1 Feb 2001, Michal Jaegermann wrote:

> Date: Thu, 1 Feb 2001 09:23:42 -0700
> From: Michal Jaegermann <[email protected]>
> To: John Jasen <[email protected]>
> Cc: [email protected]
> Subject: Re: 2.4.1 not fully sane on Alpha - file systems
>
> On Thu, Feb 01, 2001 at 10:46:12AM -0500, John Jasen wrote:
> > On Wed, 31 Jan 2001, Michal Jaegermann wrote:
> >
> > > I just tried to boot 2.4.1 kernel on Alpha UP1100. This machine
> > > happens to have two SCSI disks on sym53c875 controller and two IDE
> > > drives hooked to a builtin "Acer Laboratories Inc. [ALi] M5229 IDE".
> >
> > ALI M1535D pci-ide bridge, isn't it? That's what the specs on
> > API's webpage seem to indicate.
>
> 'lspci' claims that this is:
>
> "07.0 Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV]"
>
> >
> > Try this for fun: dd if=/dev/hda of=/dev/null bs=4096, and see if it
> > cronks out.
>
> Probably.
>
> > In my case, any serious I/O on the IDE drives quickly results in pretty
> > technicolor on the VGA screen, and then a hard lockup.
>
> No, no technicolor or other sounds effects. The whole thing just
> locks up with a power switch as the only option.
>
> > Furthermore, after power-reset, 2.4.x, x=0 or 1, cannot successfully fsck
> > the drives. It hangs after about the 2nd-3rd partition, again in a hard
> > lockup.
>
> My box is much healtier than that. Regardless if I booted into a file
> system on a SCSI drive or on an IDE drive (I happen to have those
> options although I prefer IDE - I have there something which I can loose
> without any real pain :-) I can still fsck drives healthy after the
> crash but I did NOT risk fsck under 2.4.1. Things looks way too screwy
> for this.
>
> >
> > My WAG is that there are problems in the ALI driver.
>
> Possibly, but I crashed the whole thing without mounting anything from
> IDE drives at all. There are still there but unused. I simply managed
> to get something in logs for the case described. Note that errors
> I quoted are from a device 08:05, i.e. SCSI driver (/dev/sda5 to be
> more precise). When my compiler went bonkers and started to read
> clearly some random stuff instead of sources then the whole action was
> happening on a SCSI drive.
>
> Michal
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

--
--
-- John E. Jasen ([email protected])
-- In theory, theory and practise are the same. In practise, they aren't.


2001-02-01 22:19:23

by Andre Hedrick

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems


Sorry, but the ALI code was written based upon ix86 :-(
Where were you guys during 2.3.X development?

On Thu, 1 Feb 2001, John Jasen wrote:

>
> The system in question is an API UP1100 based system, running 4 Maxtor
> 40gb IDE drives off the ALI M15x3 chipset.
>
> This applies to kernel 2.4.0 and 2.4.1.
>
> The drives are identified as follows from hdparm:
>
> Model=Maxtor 54098H8, FwRev=DAC10SC0, SerialNo=K80F1ZFC
>
> Is also has an Adaptec 29160 SCSI card, running a solid state disk and an
> AIT tape library.
>
> Upon placing any heavy I/O load on any of the disks (dd if=/dev/*d*
> of=/dev/null) the screen flashes a few times, and then the system locks
> hard -- no sysrq, no control-alt-del, no pings, no nothing.
>
> It will also hang and lock hard on fscking corrupted filesystems under
> 2.4.0 and 2.4.1.
>
> Interestingly enough, I tried 'dd if=/dev/zero of=/tmp/dd.img bs=4096
> count=10000' and it also locked hard, after printing messages to the
> effect of:
>
> EXT2-fs error: (device info) allocating block in system zone -- block
> (block numbers).
>
> stock RH 2.2.16-3 works peachy.
>
> I've tried various options with compiling in and out the ALI chipset, PCI
> DMA, drive DMA, and IRQ sharing, but without all four of those enabled,
> the system freezes at identifying the IDE device partitions, like so:
>
> hda: lost interrupt
> lost interrupt
> lost interrupt
>
> I've heard one other report of similar problems on the linux-kernel
> mailing list, and at least one other on the axp-list.
>
> On Thu, 1 Feb 2001, Michal Jaegermann wrote:
>
> > Date: Thu, 1 Feb 2001 09:23:42 -0700
> > From: Michal Jaegermann <[email protected]>
> > To: John Jasen <[email protected]>
> > Cc: [email protected]
> > Subject: Re: 2.4.1 not fully sane on Alpha - file systems
> >
> > On Thu, Feb 01, 2001 at 10:46:12AM -0500, John Jasen wrote:
> > > On Wed, 31 Jan 2001, Michal Jaegermann wrote:
> > >
> > > > I just tried to boot 2.4.1 kernel on Alpha UP1100. This machine
> > > > happens to have two SCSI disks on sym53c875 controller and two IDE
> > > > drives hooked to a builtin "Acer Laboratories Inc. [ALi] M5229 IDE".
> > >
> > > ALI M1535D pci-ide bridge, isn't it? That's what the specs on
> > > API's webpage seem to indicate.
> >
> > 'lspci' claims that this is:
> >
> > "07.0 Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV]"
> >
> > >
> > > Try this for fun: dd if=/dev/hda of=/dev/null bs=4096, and see if it
> > > cronks out.
> >
> > Probably.
> >
> > > In my case, any serious I/O on the IDE drives quickly results in pretty
> > > technicolor on the VGA screen, and then a hard lockup.
> >
> > No, no technicolor or other sounds effects. The whole thing just
> > locks up with a power switch as the only option.
> >
> > > Furthermore, after power-reset, 2.4.x, x=0 or 1, cannot successfully fsck
> > > the drives. It hangs after about the 2nd-3rd partition, again in a hard
> > > lockup.
> >
> > My box is much healtier than that. Regardless if I booted into a file
> > system on a SCSI drive or on an IDE drive (I happen to have those
> > options although I prefer IDE - I have there something which I can loose
> > without any real pain :-) I can still fsck drives healthy after the
> > crash but I did NOT risk fsck under 2.4.1. Things looks way too screwy
> > for this.
> >
> > >
> > > My WAG is that there are problems in the ALI driver.
> >
> > Possibly, but I crashed the whole thing without mounting anything from
> > IDE drives at all. There are still there but unused. I simply managed
> > to get something in logs for the case described. Note that errors
> > I quoted are from a device 08:05, i.e. SCSI driver (/dev/sda5 to be
> > more precise). When my compiler went bonkers and started to read
> > clearly some random stuff instead of sources then the whole action was
> > happening on a SCSI drive.
> >
> > Michal
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> --
> --
> -- John E. Jasen ([email protected])
> -- In theory, theory and practise are the same. In practise, they aren't.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
Linux ATA Development

2001-02-02 07:40:03

by Michal Jaegermann

[permalink] [raw]
Subject: Re: 2.4.1 not fully sane on Alpha - file systems

To follow my own message about lockups on UP1100. This time I tried
to boot 2.4.1-ac1. Results are really the same but this time
an attempt to copy kernel source from a partition on a SCSI drive
to another one on an IDE drive brought different message. I include
it below.

When trying to immediatly reboot with this kernel a machine locks
up in the middle of fsck. Luckily 2.2.18 does not have problems with
that or other disk operations for that matter.

Here is what I collected in logs this time before a machine went "poof".


kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753664
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753664
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753665, count = 1
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753683
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753686, count = 5
kernel: EXT2-fs error (device ide0(3,1)): ext2_add_entry: bad entry in directory #379108: inode out of bounds - offset=0, inode=4049125, rec_len=12, name_len=1
last message repeated 10 times
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753686
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753687
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753688, count = 7
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753688
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753689, count = 7
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753700
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753702, count = 6
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753702
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753705, count = 5
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753734
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753739, count = 3
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753739
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753746, count = 1
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753746
kernel: EXT2-fs error (device ide0(3,1)): ext2_free_blocks: Freeing blocks in system zones - Block = 753747, count = 7
kernel: EXT2-fs error (device ide0(3,1)): ext2_new_block: Allocating block in system zone - block = 753747

BTW - on a target disk there are no traces that somebody attempted to
copy something.

Michal

2001-02-02 16:30:55

by John Jasen

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems

On Thu, 1 Feb 2001, Andre Hedrick wrote:

> Sorry, but the ALI code was written based upon ix86 :-(
> Where were you guys during 2.3.X development?

We had lots of problems with the few 2.3.x kernels we downloaded; and R&D
effort was needed elsewhere.

Would it help if a UP1100 was somehow made available for
testing/development?

--
-- John E. Jasen ([email protected])
-- In theory, theory and practise are the same. In practise, they aren't.

2001-02-15 17:50:47

by John Jasen

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems


Well, the situation is improving, I suppose ...

Under kernel 2.4.0 and 2.4.1, a dd of about 10000 4k blocks would cause
the system to go technicolor and lock up.

Now, under 2.4.1-ac13, at about 11000 blocks, it goes technicolor, but
doesn't lock up until somewhere between 13000 and 20000.

*wry shrug*



2001-02-15 19:49:32

by Michal Jaegermann

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems

On Thu, Feb 15, 2001 at 12:49:29PM -0500, John Jasen wrote:
>
> Well, the situation is improving, I suppose ...
>
> Under kernel 2.4.0 and 2.4.1, a dd of about 10000 4k blocks would cause
> the system to go technicolor and lock up.

On UP1100 which I have here somehow this looks a bit different _after_
I put on it the latest SRM and used this "magic incantation" from
Hyung Min SEO ('d -l 801fe0000ac d' at SRM prompt to modify firmware).
I copied from disk to disk directory tries with some 150 MB of data
in these and no ill effects.

OTOH things are still wobbly. This shows up in this that trying to
run e2fsck on a dirty file system while booting one 2.4.1 is likely
to come up with all kind of errors in a file sytstem requiring manual
interactions. If one breaks this process and repeats an exercise
on the same file system, but booting this time 2.2.18, then things
check out without any incidents. Once clean file systems can be
used with 2.4.1 again and no problems are reported.

I really do not see any kernel problems with 2.2 series kernels and IDE
patches.

> Now, under 2.4.1-ac13, at about 11000 blocks, it goes technicolor, but
> doesn't lock up until somewhere between 13000 and 20000.

I got various lockups but no "technicolor" on any occasion. Recently
I even got a picture with X and G450 Matrox card although one can
be very careful not to look at it a wrong angle or a power button will
be the only way out.

Michal

2001-02-15 20:15:39

by John Jasen

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems

On Thu, 15 Feb 2001, Michal Jaegermann wrote:

> > Well, the situation is improving, I suppose ...
> >
> > Under kernel 2.4.0 and 2.4.1, a dd of about 10000 4k blocks would cause
> > the system to go technicolor and lock up.
>
> On UP1100 which I have here somehow this looks a bit different _after_
> I put on it the latest SRM and used this "magic incantation" from
> Hyung Min SEO ('d -l 801fe0000ac d' at SRM prompt to modify firmware).
> I copied from disk to disk directory tries with some 150 MB of data
> in these and no ill effects.

I retried the mysticism and incantations (d -l 801fe0000ac d) at the srm
prompt, and the machine locked on fsck, under kernel 2.4.1-ac13.

I don't care about X on this system, all that much, honestly.

-- John

2001-02-15 20:48:32

by Michal Jaegermann

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems

On Thu, Feb 15, 2001 at 03:15:01PM -0500, John Jasen wrote:
>
> I retried the mysticism and incantations (d -l 801fe0000ac d) at the srm
> prompt, and the machine locked on fsck, under kernel 2.4.1-ac13.

Like I wrote - I did not get to locks on fsck but then stuff was weird
and if I would press sufficiently long maybe I would. I still had some
use for my file systems so I did not try hard enough. Maybe we need
black hens on the top of the magic quoted above?

> I don't care about X on this system, all that much, honestly.

"Technicolor" thingy seems to be side effect of your particular
graphics card, no?

Michal

2001-02-15 20:59:53

by John Jasen

[permalink] [raw]
Subject: Re: 2.4.x/alpha/ALI chipset/IDE problems summary Re: 2.4.1 not fully sane on Alpha - file systems

On Thu, 15 Feb 2001, Michal Jaegermann wrote:

> Like I wrote - I did not get to locks on fsck but then stuff was weird
> and if I would press sufficiently long maybe I would. I still had some
> use for my file systems so I did not try hard enough. Maybe we need
> black hens on the top of the magic quoted above?

You bring the black hens, I've got the goats, red silk ribbon, and candles
...

> > I don't care about X on this system, all that much, honestly.
>
> "Technicolor" thingy seems to be side effect of your particular
> graphics card, no?

I gotta think that something Very Bad (tm) is happening at kernel level,
like something getting overrun in the IDE subsystem, and overwriting into
other areas of memory.

*shrug*

-- John