2000-12-28 01:35:38

by Mike A. Harris

[permalink] [raw]
Subject: 2.2.18 dies on my 486..

I just upgraded my 486 firewall's kernel to pure 2.2.18 from
2.2.17, with no other changes, and now it dies with all sorts
of hard disk failures.

I get:

hdb: lost interrupt

And stuff about DRQ lost...

Totally frozen box after that.



----------------------------------------------------------------------
Mike A. Harris - Linux advocate - Open source advocate
This message is copyright 2000, all rights reserved.
Views expressed are my own, not necessarily shared by my employer.
----------------------------------------------------------------------


If you're interested in computer security, and want to stay on top of the
latest security exploits, and other information, visit:

http://www.securityfocus.com


2000-12-28 01:44:29

by Andreas Dilger

[permalink] [raw]
Subject: Re: 2.2.18 dies on my 486..

Mike Harris writes:
> I just upgraded my 486 firewall's kernel to pure 2.2.18 from
> 2.2.17, with no other changes, and now it dies with all sorts
> of hard disk failures.
>
> I get:
>
> hdb: lost interrupt
>
> And stuff about DRQ lost...

Is it possible you compiled the kernel with gcc 2.95.2? I've been having
a similar problem, but I'm having trouble tracking it down. Because I
normally use a very heavily modified 2.2.18 kernel, I'm trying to isolate
just where the problem is - I have no problems with a stock 2.2.18 kernel.
If I compile with gcc 2.7.2.3 it works fine.

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2000-12-28 02:45:44

by Mike A. Harris

[permalink] [raw]
Subject: Re: 2.2.18 dies on my 486..

On Wed, 27 Dec 2000, Andreas Dilger wrote:

>Mike Harris writes:
>> I just upgraded my 486 firewall's kernel to pure 2.2.18 from
>> 2.2.17, with no other changes, and now it dies with all sorts
>> of hard disk failures.
>>
>> I get:
>>
>> hdb: lost interrupt
>>
>> And stuff about DRQ lost...
>
>Is it possible you compiled the kernel with gcc 2.95.2? I've been having
>a similar problem, but I'm having trouble tracking it down.

Absolutely not possible. ;o) Compiled with kgcc on Red Hat 7
(egcs 2.91.66). I've been building kernels with egcs since Red
Hat 5.0 was released, no problems.

I've never used gcc 2.95.x at all, so I can't comment on it at
all..

It seems my hard disk may be failing...


>Because I normally use a very heavily modified 2.2.18 kernel,
>I'm trying to isolate just where the problem is - I have no
>problems with a stock 2.2.18 kernel. If I compile with gcc
>2.7.2.3 it works fine.

Hmm.. must be a different problem than I'm having. I've tracked
my problem down to disk accesses to hdb. hda/hdc work fine, as
does the machine sitting idling doing its job. If I do a copy
from hdb to hdc it explodes. Very odd.. ;o(


----------------------------------------------------------------------
Mike A. Harris - Linux advocate - Free Software advocate
This message is copyright 2000, all rights reserved.
Views expressed are my own, not necessarily shared by my employer.
----------------------------------------------------------------------


[Quote: Linus Torvalds - Aug 27, 2000 - linux-kernel mailing list]
"And I'm right. I'm always right, but in this case I'm just a bit more
right than I usually am." -- Linus Torvalds

2000-12-28 03:04:49

by Alan

[permalink] [raw]
Subject: Re: 2.2.18 dies on my 486..

> I just upgraded my 486 firewall's kernel to pure 2.2.18 from
> 2.2.17, with no other changes, and now it dies with all sorts
> of hard disk failures.
>
> I get:
>
> hdb: lost interrupt
> And stuff about DRQ lost...

What hardware config, what hdparm tuning options ?

2000-12-29 00:39:52

by Mike A. Harris

[permalink] [raw]
Subject: Re: 2.2.18 dies on my 486..

On Thu, 28 Dec 2000, Alan Cox wrote:

>> I just upgraded my 486 firewall's kernel to pure 2.2.18 from
>> 2.2.17, with no other changes, and now it dies with all sorts
>> of hard disk failures.
>>
>> I get:
>>
>> hdb: lost interrupt
>> And stuff about DRQ lost...
>
>What hardware config, what hdparm tuning options ?

AMD 486-DX2/66 12Mb RAM, ALi 14xx chipset. Using 2.2.18 stock
and also 2.2.18+IDE.

hdparm settings:

pts/3 root@gw:~# hdparm -iv /dev/hd[abc]

/dev/hda:
multcount = 16 (on)
I/O support = 1 (32-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 929/16/48, sectors = 713472, start = 0

Model=DSAA-3360, FwRev=25505120, SerialNo=PABP2020102
Config={ SoftSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=929/16/48, TrkSize=59400, SectSize=550, ECCbytes=16
BuffType=3(DualPortCache), BuffSize=96kB, MaxMultSect=16, MultSect=16
DblWordIO=no, OldPIO=2, DMA=yes, OldDMA=2
CurCHS=929/16/48, CurSects=-486539254, LBA=yes, LBAsects=713472
tDMA={min:240,rec:240}, DMA modes: sword0 sword1 sword2 mword0 mword1
IORDY=yes, tPIO={min:240,w/IORDY:240}, PIO modes:


/dev/hdb:
multcount = 8 (on)
I/O support = 1 (32-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 827/32/63, sectors = 1667232, start = 0

Model=Maxtor 7850 AR, FwRev=UA7X6059, SerialNo=P60133LS
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>5Mbs FmtGapReq }
RawCHS=1654/16/63, TrkSize=0, SectSize=0, ECCbytes=11
BuffType=3(DualPortCache), BuffSize=64kB, MaxMultSect=8, MultSect=8
DblWordIO=yes, OldPIO=2, DMA=yes, OldDMA=1
CurCHS=1654/16/63, CurSects=1889533977, LBA=yes, LBAsects=1667232
tDMA={min:150,rec:150}, DMA modes: sword0 sword1 *sword2 *mword0
IORDY=on/off, tPIO={min:240,w/IORDY:180}, PIO modes: mode3

/dev/hdc:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 524/255/63, sectors = 8418816, start = 0

Model=QUANTUM FIREBALL SE4.3A, FwRev=API.0A00, SerialNo=334734916263
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=14848/9/63, TrkSize=32256, SectSize=512, ECCbytes=4
BuffType=3(DualPortCache), BuffSize=80kB, MaxMultSect=16, MultSect=off
DblWordIO=no, OldPIO=2, DMA=yes, OldDMA=2
CurCHS=14848/9/63, CurSects=1979711616, LBA=yes, LBAsects=8418816
tDMA={min:120,rec:120}, DMA modes: sword0 sword1 sword2 mword0 mword1 *mword2
IORDY=on/off, tPIO={min:120,w/IORDY:120}, PIO modes: mode3 mode4
UDMA modes: mode0 mode1 mode2


No messages in syslog, but it died numerous times with "hdb
interrupt lost" and DRQ failed or something like that. It seems
to work fine if I access any one drive, but if I copy from hdb ->
hdc the machine dies within seconds.

.config attached

I am thinking possible hardware failure, but I havent spent time
yet trying to narrow it down.

No special lilo options or any tweaking going on on this machine
other than hdparm..



----------------------------------------------------------------------
Mike A. Harris - Linux advocate - Free Software advocate
This message is copyright 2000, all rights reserved.
Views expressed are my own, not necessarily shared by my employer.
----------------------------------------------------------------------

Are you an open source developer? Need web space? Your own project mailing
lists? Bug tracking software? CVS Repository? Build environments?
Head over to http://sourceforge.net for all of that, and more, for free!


Attachments:
486-2.2.18-1gw (9.57 kB)

2000-12-29 09:15:42

by Vid Strpic

[permalink] [raw]
Subject: Re: 2.2.18 dies on my 486..

On Thu, Dec 28, 2000 at 07:09:34PM -0500, Mike A. Harris wrote:
> On Thu, 28 Dec 2000, Alan Cox wrote:
> >What hardware config, what hdparm tuning options ?
> AMD 486-DX2/66 12Mb RAM, ALi 14xx chipset. Using 2.2.18 stock
> and also 2.2.18+IDE.
>
> hdparm settings:
>
> /dev/hdb:
> multcount = 8 (on)
> I/O support = 1 (32-bit)
> unmaskirq = 0 (off)
> using_dma = 0 (off)
> keepsettings = 0 (off)
> nowerr = 0 (off)
> readonly = 0 (off)
> readahead = 8 (on)
> geometry = 827/32/63, sectors = 1667232, start = 0
>
> Model=Maxtor 7850 AR, FwRev=UA7X6059, SerialNo=P60133LS
> Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>5Mbs FmtGapReq }
> RawCHS=1654/16/63, TrkSize=0, SectSize=0, ECCbytes=11
> BuffType=3(DualPortCache), BuffSize=64kB, MaxMultSect=8, MultSect=8
> DblWordIO=yes, OldPIO=2, DMA=yes, OldDMA=1
> CurCHS=1654/16/63, CurSects=1889533977, LBA=yes, LBAsects=1667232
> tDMA={min:150,rec:150}, DMA modes: sword0 sword1 *sword2 *mword0
> IORDY=on/off, tPIO={min:240,w/IORDY:180}, PIO modes: mode3
>
> I am thinking possible hardware failure, but I havent spent time
> yet trying to narrow it down.

I think it's probably a hardware issue, yes. I've seen several Maxtors
doing just this kind of stuff before ... I have one @ork which gives
this kind of errors if I put the box sideways - if it stays upright, no
problems :)

> No special lilo options or any tweaking going on on this machine
> other than hdparm..

Well, have you tried setting 32-bit support to '0'? Just for hdb,
first, if that doesn't help, for hda and hdc also.

It happened to me on one machine (486 also, just one Quantum) but
inexplainably sometimes. On other occasions all is well.

--
)) Vid Strpic, IRC:Martin, [email protected], /bin/zsh.
(( (I don't speak for my employer, just for myself.)
C|~~| UNIX fundamentalist - and an average chauvinistic male.
`--'
C>N>K Never anger a dragon, for you are crunchy and good with ketchup.