2005-02-04 09:05:11

by Jerome Lacoste

[permalink] [raw]
Subject: Huge unreliability - does Linux have something to do with it?

[Sorry for the sensational title]

I have had this laptop for three years. It ran Linux (Debian unstable)
from the start and its hardware has been very unreliable: I changed
hard disks twice and the motherboard thrice. My DVD drive started
failing some days ago (this one is 'original', 3 years old). But I
don't mind as I am not under warranty anymore... This morning the
machine booted with fsck errors on my hard disk. I am not sure if I
did the right thing, but I said clear the inodes, and I ended up
loosing some programs(*) (du, dircolors, etc..). The day starts well
isn't it? Sounds like I will have to switch disks again...

I halted the machine correctly yesterday night. I never dropped the
box in 3 years. Am I just being unlucky? Or could the fact that I am
using Linux on the box affect the reliability in some ways on that
particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
computers and never had single problems with them.

How can the file system (ext3) be messed up the way it was this
morning after I stopped the machine correctly yesterday?
Could a hardware failure look like bad sectors to fsck?

Attached the output of smartctl -a /dev/hda, whatever that helps.

Jerome

(*) I accept tips on discovering and maybe recovering which files have
been taken out of my system...


Attachments:
(No filename) (1.28 kB)
smartctl.log (10.75 kB)
Download all attachments

2005-02-04 09:24:39

by Julien Banchet

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

Le vendredi 04 f?rier 2005 à 10:03 +0100, jerome lacoste a écrit :
> [Sorry for the sensational title]
>
> I have had this laptop for three years. It ran Linux (Debian unstable)
> from the start and its hardware has been very unreliable: I changed
> hard disks twice and the motherboard thrice. My DVD drive started
> failing some days ago (this one is 'original', 3 years old). But I
> don't mind as I am not under warranty anymore... This morning the
> machine booted with fsck errors on my hard disk. I am not sure if I
> did the right thing, but I said clear the inodes, and I ended up
> loosing some programs(*) (du, dircolors, etc..). The day starts well
> isn't it? Sounds like I will have to switch disks again...
>
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.
>
> How can the file system (ext3) be messed up the way it was this
> morning after I stopped the machine correctly yesterday?
> Could a hardware failure look like bad sectors to fsck?
>
> Attached the output of smartctl -a /dev/hda, whatever that helps.
>
> Jerome
>
> (*) I accept tips on discovering and maybe recovering which files have
> been taken out of my system...

I honestly beleive that your simply out of luck, not that 3 years is
alot for a laptop, but simply the "shit happens" thing.

Even though the Distro you run is tagged "Unstable" I'd rather run a
battery of stress tools on your computer it before posting here, 'cus
it's maybe a bit beyond the scope of lkml (I bet you tried more than one
versions of the kernel in 3 years, problems never remain too long, I
also hope you tried fresh installs too).

I don't think that en Inspiron 8100 carries anything exotic, so ...
well... Go for a memtest86 then try disk stress tools (my memory wen't
blank right now, ask google ;-) )


JB,
PS: Heu Jérome.... BTS Info Indus à l'Isle sur la Sorgue ?

--
Julien Banchet <[email protected]>

2005-02-04 10:32:07

by James Nelson

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

jerome lacoste wrote:
> [Sorry for the sensational title]
>
> I have had this laptop for three years. It ran Linux (Debian unstable)
> from the start and its hardware has been very unreliable: I changed
> hard disks twice and the motherboard thrice. My DVD drive started
> failing some days ago (this one is 'original', 3 years old). But I
> don't mind as I am not under warranty anymore... This morning the
> machine booted with fsck errors on my hard disk. I am not sure if I
> did the right thing, but I said clear the inodes, and I ended up
> loosing some programs(*) (du, dircolors, etc..). The day starts well
> isn't it? Sounds like I will have to switch disks again...
>
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.
>
> How can the file system (ext3) be messed up the way it was this
> morning after I stopped the machine correctly yesterday?
> Could a hardware failure look like bad sectors to fsck?
>

It can. I had a drive crash on my server a couple of months ago, and I had ext3
errors show up before the syslog filled up with the ide errors. The hard disk was
only 1 1/2 years old.

If the bad sectors happen where directory inodes are written, your directory
structure will be turned into swiss cheese. That will *definitely* cause ext3
errors, and dump you (in Red Hat systems, at least) to a shell on reboot.

> Attached the output of smartctl -a /dev/hda, whatever that helps.
>
> Jerome
>
> (*) I accept tips on discovering and maybe recovering which files have
> been taken out of my system...
>

You might not have any luck. After fsck -f, I thought I had saved the drive,
copied everything that was left onto another machine, and found that most of the
larger files had holes in them - mp3's had skips, jpegs were completely corrupted,
etc.

That's what made me get a backup FireWire drive... :)

2005-02-04 10:45:35

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

In article <[email protected]> you wrote:
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.

There are a lot of possible problems with your actual hardware. Like
Interrupt handling, power control, dma, ... Those are seldom but possible.
Notebooks tend to require some special handling.

> Could a hardware failure look like bad sectors to fsck?

A failure of the bus or a former sporadic error can cause defective fs, but
normally you have a read error in fsck no structure error.

Are you using hdparm? is the system perhaps overheating or overclocked?

Greetings
Bernd

2005-02-04 11:27:24

by Andre Tomt

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

jerome lacoste wrote:
> Attached the output of smartctl -a /dev/hda, whatever that helps.

Judging from the SMART output, this drive seems hosed. All firmware
controlled extended off-line self-tests have failed on LBA 92491576, and
it has a worrying amount of re-allocated sectors.

New laptop harddrives shouldn't be too hard to get hold of.

2005-02-04 11:28:17

by Jerome Lacoste

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

>> Could a hardware failure look like bad sectors to fsck?
>
> A failure of the bus or a former sporadic error can cause defective fs, but
> normally you have a read error in fsck no structure error.
>
> Are you using hdparm? is the system perhaps overheating or overclocked?

no overclock
hdparm is used but I cannot tell you exactly what the config is (now
machine has been running memtest for 1.5 hour). I don't think I use
special option: probably the defaults in my config file (mult_sect 16,
dma on, write_cache off).

overheating: perhaps. The machine is hot and running many hours per
day (usually 12-16). It s running the fans very often, but it's always
been like that. I've tried to control the fan, but then the
temperature goes high very quickly. So I let the fans run.

2005-02-04 11:55:40

by DervishD

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

Hi Jerome :)

* jerome lacoste <[email protected]> dixit:
> [Sorry for the sensational title]

It catched my attention ;)))

> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.

Well, Linux may stress the hardware more than other operating
systems because it tries to optimize usage and performance. But in
this particular case I will think you are very unlucky O:) I've seen
that before, unfortunately.

> Could a hardware failure look like bad sectors to fsck?

Yes, depending on the hardware failure.

> (*) I accept tips on discovering and maybe recovering which files have
> been taken out of my system...

You should use 'integrit' (http://integrit.sourceforge.net). I
use it to know whether a file whose contents shouldn't change has
changed, but it has more usages. And use memtest86 (there are two
versions out there) to check your RAM, just in case. Bad RAM can
cause 'apparent' hardware failures. A bad RAM chip can cause disk
errors (if you write to disk from *bad* RAM, you'll write *bad* data)
and other failures. Use 'integrit', read the documentation for
details.

Good luck, you'll need it with that laptop :(

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736
http://www.dervishd.net & http://www.pleyades.net/
It's my PC and I'll cry if I want to...

2005-02-04 12:09:29

by Wakko Warner

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

Please keep me CCd

jerome lacoste wrote:
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other

I have this exact same laptop. It works perfectly for me with linux.
Originally started with a 2.4 kernel and recently went to 2.6.10. The modem
works well, the video card works well even with 3D accel. I replaced the
original 30gb hdd with a 40gb (for space reasons). The only complaint about
this thing I have is the fact they used an nvidia video chip. I have seen
more than 4 months uptime on it (I used to use it as a desktop)

I did have a hardware mouse problem, but replacing the touchpad/palm rest
fixed that. I'd give it a 4 star (out of five, mainly because of the video
chipset and the keyboard layout)

--
Lab tests show that use of micro$oft causes cancer in lab animals

2005-02-04 14:45:05

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

On Fri, 4 Feb 2005 07:18:17 -0500, Wakko Warner <[email protected]> wrote:
> Please keep me CCd
>
> jerome lacoste wrote:
> > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
>
> I have this exact same laptop. It works perfectly for me with linux.
> Originally started with a 2.4 kernel and recently went to 2.6.10. The modem
> works well, the video card works well even with 3D accel. I replaced the
> original 30gb hdd with a 40gb (for space reasons). The only complaint about
> this thing I have is the fact they used an nvidia video chip. I have seen
> more than 4 months uptime on it (I used to use it as a desktop)
>
> I did have a hardware mouse problem, but replacing the touchpad/palm rest
> fixed that. I'd give it a 4 star (out of five, mainly because of the video
> chipset and the keyboard layout)
>

Hmm, I guess it's a hit and run. I had replaced:

1. Fan assembly (was making grinding sounds after 1.5 years)
2. DVD-CDRW combo (Samsung SN-308B could not really read pretty much
anything but burns pretty well)
3. LCD (backlight burned out and I managed to tear connectors on the
panel trying to see if I can replace the light and what part shoudl I
order). Well, that helped to persuade my better half that I really
need 1600x1200 ;)
4. Original Hitachi hard driver died horrible death - I returned home
and heard it making grinding sounds and hitting heads against
something.
5. Replacement IBM drive has developed a few bad sectors, need to
arrange replacement.

But I guess all of it has something to do with being on 24/7. I am not
complaining, I like the box, especially the touchpad ;)

--
Dmitry

2005-02-04 18:33:57

by Horst H. von Brand

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

jerome lacoste <[email protected]> said:
> Bernd Eckenfels <[email protected]> said:
> >> Could a hardware failure look like bad sectors to fsck?

> > A failure of the bus or a former sporadic error can cause defective fs, but
> > normally you have a read error in fsck no structure error.
> >
> > Are you using hdparm? is the system perhaps overheating or overclocked?

> no overclock
> hdparm is used but I cannot tell you exactly what the config is (now
> machine has been running memtest for 1.5 hour). I don't think I use
> special option: probably the defaults in my config file (mult_sect 16,
> dma on, write_cache off).

There are combinations of IDE + disk that slowly corrupt filesystems with
DMA on, if the default setting is DMA off _don't touch it_. Not all bad
combinations are catched by the code in the kernel (intel + some Western
Digital disk is what drove me up the wall until I disabled DMA).

What machine is this, what disk?

> overheating: perhaps. The machine is hot and running many hours per
> day (usually 12-16). It s running the fans very often, but it's always
> been like that. I've tried to control the fan, but then the
> temperature goes high very quickly. So I let the fans run.

Wise decision.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2005-02-05 10:44:36

by Jerome Lacoste

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

Took

On Fri, 4 Feb 2005 07:18:17 -0500, Wakko Warner <[email protected]> wrote:
> Please keep me CCd
>
> jerome lacoste wrote:
> > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
>
> I have this exact same laptop. It works perfectly for me with linux.
> Originally started with a 2.4 kernel and recently went to 2.6.10. The modem
> works well, the video card works well even with 3D accel. I replaced the
> original 30gb hdd with a 40gb (for space reasons). The only complaint about
> this thing I have is the fact they used an nvidia video chip. I have seen
> more than 4 months uptime on it (I used to use it as a desktop)

I sometimes use it as a desktop. Thing is as I never took the time to
try to make work sw suspend, I'd rather have it running all the time
than to restart it every now and then.

While looking for a replacement disk, I've seen that some new disks
were "designed for continuous, 24/7 operation".

E.g. http://www6.tomshardware.com/storage/20030813/mini-harddisks-01.html

Not sure how good that is, but I will sure look into it...

Thanks for all who answered. If you want to further the talk, it maybe
better to take this off lkml now.

J

2005-02-05 12:28:43

by Willy Tarreau

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

On Fri, Feb 04, 2005 at 07:18:17AM -0500, Wakko Warner wrote:
> Please keep me CCd
>
> jerome lacoste wrote:
> > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
>
> I have this exact same laptop. It works perfectly for me with linux.
> Originally started with a 2.4 kernel and recently went to 2.6.10. The modem
> works well, the video card works well even with 3D accel. I replaced the
> original 30gb hdd with a 40gb (for space reasons). The only complaint about
> this thing I have is the fact they used an nvidia video chip. I have seen
> more than 4 months uptime on it (I used to use it as a desktop)

I think it does not like being moved. A friend of mine had his one repaired
several times because of either hard disk failures, backlight failure and
the machine refusing to boot at all. I've never seen such unreliable hardware!

Willy

2005-02-05 15:02:37

by Wakko Warner

[permalink] [raw]
Subject: Re: Huge unreliability - does Linux have something to do with it?

Please keep me CCd

Willy Tarreau wrote:
> On Fri, Feb 04, 2005 at 07:18:17AM -0500, Wakko Warner wrote:
> > I have this exact same laptop. It works perfectly for me with linux.
> > Originally started with a 2.4 kernel and recently went to 2.6.10. The modem
> > works well, the video card works well even with 3D accel. I replaced the
> > original 30gb hdd with a 40gb (for space reasons). The only complaint about
> > this thing I have is the fact they used an nvidia video chip. I have seen
> > more than 4 months uptime on it (I used to use it as a desktop)
>
> I think it does not like being moved. A friend of mine had his one repaired
> several times because of either hard disk failures, backlight failure and
> the machine refusing to boot at all. I've never seen such unreliable hardware!

Mine didn't have that problem. At the time it was the fastest machine I
had. I got away from it though with my nice xeon box =)

I have never heard of a machine that if you move it it'd quit working.
That's bad. I have heard of a machine quit working because someone looked
at it the wrong way.

--
Lab tests show that use of micro$oft causes cancer in lab animals