Hello kernel people,
Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
tried to delete it, this started showing up in my log files:
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
sector=1803472
end_request: I/O error, dev 03:06 (hda), sector 1803472
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data
of [612671 612672 0x0 SD]
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
sector=1803472
end_request: I/O error, dev 03:06 (hda), sector 1803472
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data
of [612671 612677 0x0 SD]
and rm just reported me 'Permission denied'.
I've looked up these errors on the net, and as far as i can tell it means that
the drive has some bad sectors at the given addresses and that it will
probably die on me sooner or later.
Can someone either confirm this to me or tell me what to do to fix it?
The drive involved is an IBM-DTLA-307060, which has served me without problems
now for about 2 years.
Thanks!
DK
--
If all the Chinese simultaneously jumped into the Pacific off a 10 foot
platform erected 10 feet off their coast, it would cause a tidal wave
that would destroy everything in this country west of Nebraska.
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
Have a look at:
http://csl.cse.ucsc.edu/smart.shtml
there you will find software for interrogating and monitoring the S.M.A.R.T. data available from your drive. It's a little late to start monitoring it, if the drive is already dying, but if, for example, it shows a lot of re-allocated sectors, or spin retries, you'll know something is wrong.
John.
> The drive involved is an IBM-DTLA-307060, which has served me without problems
> now for about 2 years.
IBM DeathStar 75gxp.
One of the worst hard drives ever made. It's quite likely it's failed,
and in fact, two years is pretty impressive out of one of these.
Make backups immediately. Run ibm's DFT tool, get the code to RMA this
thing back to IBM. Sell the replacement they send you to a sucker on
eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
around 80 bucks nowadays. I used to recommend Maxtors, until they said
they're cutting their warranty to one year from three. I don't know what
to use anymore.
Mike
same problem I was having with 2.4.20-pre4-ac2-preempt.
alan didn't want to hear it from me due to the -preempt
my system was e7500 chipset, dual xeon, WD 40g drive, ext2 or ext3.
from this we can glean: preempt not a factor, HD manufacturer not a factor,
FS not a factor. don't know what chipset you are using.
I was allso geting badCRC errors.
On Friday 06 September 2002 11:13, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
>
> Thanks!
>
> DK
--
/**************************************************
** Mark Salisbury || [email protected] **
** If you would like to sponsor me for the **
** Mass Getaway, a 150 mile bicycle ride to for **
** MS, contact me to donate by cash or check or **
** click the link below to donate by credit card **
**************************************************/
https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736
On Fri, 2002-09-06 at 16:13, DevilKin wrote:
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
That certainly looks like a drive error.
> The drive involved is an IBM-DTLA-307060, which has served me without problems
> now for about 2 years.
Get the IBM disk tools, upgrade the firmware and see what the ibm tools
have to say. IBM drives have had some problems with spontaneous bad
blocks appearing that go away with new firmware and a run of the disk
tools. More importantly if thats the problem with the firmware update
they dont come back until the drive really dies.
On Fri, 2002-09-06 at 16:37, mbs wrote:
> same problem I was having with 2.4.20-pre4-ac2-preempt.
I beg to differ. He has a dying disk, you have some weird crc and other
goings on
On 6 Sep 2002, Alan Cox wrote:
> On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> > eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
> > around 80 bucks nowadays. I used to recommend Maxtors, until they said
> > they're cutting their warranty to one year from three. I don't know what
> > to use anymore.
>
> At current drive density and reliabilities - raid. Software raid setups
> are so cheap there is little point not running RAID on IDE nowdays
>
Well, I was looking more on the side of the Windows PC's here at the
office, it's a bit expensive to start running raid on those.
Mike
forgot to say: my drive worked fine with 2.4.19-pre3-ac5-preempt before the
move to the -20 kernel.
also worked fine after a fdisk/reinstall and continued to work fine till the
first time I booted on a (freshly built) -20-ac version.
I thought it was the drive so I replaced it with a brand new drive, and had
_EXACTLY_ the same failure pattern.
------
same problem I was having with 2.4.20-pre4-ac2-preempt.
alan didn't want to hear it from me due to the -preempt
my system was e7500 chipset, dual xeon, WD 40g drive, ext2 or ext3.
from this we can glean: preempt not a factor, HD manufacturer not a factor,
FS not a factor. don't know what chipset you are using.
I was allso geting badCRC errors.
On Friday 06 September 2002 11:13, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
>
> Thanks!
>
> DK
--
/**************************************************
** Mark Salisbury || [email protected] **
** If you would like to sponsor me for the **
** Mass Getaway, a 150 mile bicycle ride to for **
** MS, contact me to donate by cash or check or **
** click the link below to donate by credit card **
**************************************************/
https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736
On Fri, 6 Sep 2002, Mike Dresser wrote:
> > The drive involved is an IBM-DTLA-307060, which has served me without problems
> > now for about 2 years.
>
> IBM DeathStar 75gxp.
>
> One of the worst hard drives ever made. It's quite likely it's failed,
> and in fact, two years is pretty impressive out of one of these.
>
> Make backups immediately. Run ibm's DFT tool, get the code to RMA this
> thing back to IBM. Sell the replacement they send you to a sucker on
> eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
> around 80 bucks nowadays. I used to recommend Maxtors, until they said
> they're cutting their warranty to one year from three. I don't know what
> to use anymore.
>
> Mike
>
IBM DeathStar 75gxp.
Well put. Also, don't turn off this drive --ever. If possible, back-up
to something on a network, not to anything on the IDE bus. If you don't
have anything available, borrow something from work and make a temporary
LAN. With bad sectors and a relocation list already full, this drive
will seize the IDE bus and never let go once you trip it into failure.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.
On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
> around 80 bucks nowadays. I used to recommend Maxtors, until they said
> they're cutting their warranty to one year from three. I don't know what
> to use anymore.
At current drive density and reliabilities - raid. Software raid setups
are so cheap there is little point not running RAID on IDE nowdays
On Fri, Sep 06, 2002 at 11:44:52AM -0400, Richard B. Johnson wrote:
> IBM DeathStar 75gxp.
>
> Well put. Also, don't turn off this drive --ever. If possible, back-up
> to something on a network, not to anything on the IDE bus.
I had one of these drives fail recently with the dread "clicking of death"
sounds (while it was retrying reads). What I discovered, while backing up
the disk, is that continuing sequential reads past the bad sectors without
and intervening operation would eventually cause the drive to get into a
messed up state where it erroneously reported the following good sectors
as bad.
My strategy to recover the good data was to read sequentially until I
got an error, then explicitly seek to the next good sector and continue
from there. This enabled me to copy the good data.
On Friday 06 September 2002 17:36, [email protected] wrote:
> > I've looked up these errors on the net, and as far as i can tell it means
> > that the drive has some bad sectors at the given addresses and that it
> > will probably die on me sooner or later.
> >
> > Can someone either confirm this to me or tell me what to do to fix it?
> >
> > The drive involved is an IBM-DTLA-307060, which has served me without
> > problems now for about 2 years.
>
> Have a look at:
>
> http://csl.cse.ucsc.edu/smart.shtml
>
> there you will find software for interrogating and monitoring the
> S.M.A.R.T. data available from your drive. It's a little late to start
> monitoring it, if the drive is already dying, but if, for example, it shows
> a lot of re-allocated sectors, or spin retries, you'll know something is
> wrong.
>
OK, I downloaded that and installed it, but well, frankly, it shows me very
little useful stuff.
Or i'm just not good at interpreting this.
DK
--
"I gained nothing at all from Supreme Enlightenment, and for that very
reason it is called Supreme Enlightenment."
-- Gotama Buddha
On Fri, 2002-09-06 at 11:42, Mike Dresser wrote:
> On 6 Sep 2002, Alan Cox wrote:
>
> > On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> > > eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
> > > around 80 bucks nowadays. I used to recommend Maxtors, until they said
> > > they're cutting their warranty to one year from three. I don't know what
> > > to use anymore.
> >
> > At current drive density and reliabilities - raid. Software raid setups
> > are so cheap there is little point not running RAID on IDE nowdays
> >
> Well, I was looking more on the side of the Windows PC's here at the
> office, it's a bit expensive to start running raid on those.
>
> Mike
Well, I haven't examined this empirically, but as the quantity of disk
drives in an organization continues increasing, so does the probability
of disk failure, any one of which can mean lost time/money, etc. Drive
reliability is likely not increasing at the same rate that density is,
so the likelihood of lost data is probably increasing. Since LAN speeds
continue to increase, it might start making sense now in clusters of
more than a few machines to make each machine less reliant on its own
disk storage (to the point of not at all other than big swap space) and
use the LAN more. On the LAN put the money into a quality shared
resource - a heavy duty UPS'd, etc. RAID system. Especially if a RAID
system is as easy to build/maintain/use as Alan alludes to (don't know -
never built one).
Billy
On 6 Sep 2002, Billy Harvey wrote:
> use the LAN more. On the LAN put the money into a quality shared
> resource - a heavy duty UPS'd, etc. RAID system. Especially if a RAID
> system is as easy to build/maintain/use as Alan alludes to (don't know -
> never built one).
>
> Billy
And don't forget the cost of cluebats to beat the users over the head
with. I've been trying for 3 years to get people to save their documents
to the H: drive. Still find stuff stored wherever they feel like storing
it.
So each facility has a backup server that nightly grabs their entire
drive, gzip's it, and then dumps it to a DDS-4 tape. Also keeps X days of
daily full backups, and X weeks as well.
Aside from Windows filesharing being so slow(1500kps via smbtar is average
here), it works quite nicely. Even with a P4/2.53, I still can't get
more than the 1500kps that a p133 is capable of. All the p4 gives me, is
the ability to gzip -9 or even bzip2 the files, instead of the gzip -1
that the p133 is capable of in real time.
Mike
> OK, I downloaded that and installed it, but well, frankly, it shows me very
> little useful stuff.
>
> Or i'm just not good at interpreting this.
Post the output of smartctl -a /dev/hda? to me, and I'll tell you what I can, but it's best to monitor the stats from when the drive is new, (I.E. every drive you buy from now on :-) ).
John.
Am Fre, 2002-09-06 um 17.26 schrieb Mike Dresser:
> Make backups immediately. Run ibm's DFT tool, get the code to RMA this
> thing back to IBM. Sell the replacement they send you to a sucker on
> eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
> around 80 bucks nowadays. I used to recommend Maxtors, until they said
> they're cutting their warranty to one year from three. I don't know what
> to use anymore.
I did exactly this and bought a 80gig Maxtor for EUR 100 (don't know why
it would be so much cheaper at your place, but anyway). Unfortunately
the drive was broken right away, let's see how long the replacement
drive keeps running...
Seems like every major brand is just producing crap nowadays....
--
Servus,
Daniel
Am Fre, 2002-09-06 um 17.38 schrieb Alan Cox:
> Get the IBM disk tools, upgrade the firmware and see what the ibm tools
> have to say. IBM drives have had some problems with spontaneous bad
> blocks appearing that go away with new firmware and a run of the disk
> tools.
The "run of the disk tools" that does away with the badblocks is a
lowlevel format; a tedious way to spent ones' time on a harddrive
that will die anyway soon.
> More importantly if thats the problem with the firmware update
> they dont come back until the drive really dies.
Right, which is probably shortly after. Especially on a two years
old drive I wouldn't go through all the troubles to backup 60GB
data, lowlevel format the drive, restore the data and hope the
problems are gone; instead I'd rather get a new drive within the
warranty and cross fingers.
BTW: I did the backup way exactly once and the drive got back to me
with new errors two weeks after.
--
Servus,
Daniel
fdisk/format and reinstall but stick with a 2.4.19 or 2.4.19-ac kernel.
I would bet money that the problem is purely a .20-preX-acX thing.
run it a while on 2.4.19 to verify that life is good. then build a new
2.4.20-pre1-ac3 and boot it. I bet that within minutes of normal use, you
will have a problem.
(I have done this loop 3 times.)
On Friday 06 September 2002 11:13, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
>
> Thanks!
>
> DK
--
/**************************************************
** Mark Salisbury || [email protected] **
** If you would like to sponsor me for the **
** Mass Getaway, a 150 mile bicycle ride to for **
** MS, contact me to donate by cash or check or **
** click the link below to donate by credit card **
**************************************************/
https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736
> On Fri, 2002-09-06 at 11:42, Mike Dresser wrote:
> > On 6 Sep 2002, Alan Cox wrote:
> >
> > > On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> > > > eBAY, and buy yourself a new drive. You can pickup 80 gig drives for
> > > > around 80 bucks nowadays. I used to recommend Maxtors, until they said
> > > > they're cutting their warranty to one year from three. I don't know what
> > > > to use anymore.
> > >
> > > At current drive density and reliabilities - raid. Software raid setups
> > > are so cheap there is little point not running RAID on IDE nowdays
> > >
> > Well, I was looking more on the side of the Windows PC's here at the
> > office, it's a bit expensive to start running raid on those.
> >
> > Mike
>
> Well, I haven't examined this empirically, but as the quantity of disk
> drives in an organization continues increasing, so does the probability
> of disk failure, any one of which can mean lost time/money, etc. Drive
> reliability is likely not increasing at the same rate that density is,
> so the likelihood of lost data is probably increasing. Since LAN speeds
> continue to increase, it might start making sense now in clusters of
> more than a few machines to make each machine less reliant on its own
> disk storage (to the point of not at all other than big swap space) and
> use the LAN more. On the LAN put the money into a quality shared
> resource - a heavy duty UPS'd, etc. RAID system. Especially if a RAID
> system is as easy to build/maintain/use as Alan alludes to (don't know -
> never built one).
A RAID array isn't a universal solution to all disk related problems, though, is it? I mean, we were talking about buggy firmware earlier on in this thread - if a drive which is part of an array returns corrupted data, without acknowledging it, then you'll read corrupted data from the RAID array. Also, an array of unreliable drives doesn't make a reliable array.
Now that the Smart Suite S.M.A.R.T. applications are unmaintained, would there be any chance of implementing S.M.A.R.T. in to the kernel IDE code? I know the IDE code is already a nightmare, but it would be a nice feature. S.M.A.R.T. is terribly under used at the moment - most people don't even know what it is. Infact, I could be wrong, but isn't a subset of S.M.A.R.T. implemented on modern SCSI disks, too?
Monitoring of any kind is always a nice feature to have...
John.
On Fri, 6 Sep 2002 [email protected] wrote:
> Infact, I could be wrong, but isn't a subset of S.M.A.R.T. implemented
on modern SCSI disks, too?
Yes.
Mike
On Friday 06 September 2002 19:22, [email protected] wrote:
> > OK, I downloaded that and installed it, but well, frankly, it shows me
> > very little useful stuff.
> >
> > Or i'm just not good at interpreting this.
>
> Post the output of smartctl -a /dev/hda? to me, and I'll tell you what I
> can, but it's best to monitor the stats from when the drive is new, (I.E.
> every drive you buy from now on :-) ).
>
Well, there were 21 ATA errors, and it showed 5 error blocks, with disk 'live'
times of 629 hours.
Luckely I've been able to backup everything from the disk, and I'm running the
DFT now. The tests showed bad sectors, i'm currently running a disk erase.
DK
--
"What's that thing?"
"Well, it's a highly technical, sensitive instrument we use in
computer repair. Being a layman, you probably can't grasp exactly what
it does. We call it a two-by-four."
-- Jeff MacNelley, "Shoe"
On Fri, 2002-09-06 at 18:46, mbs wrote:
> fdisk/format and reinstall but stick with a 2.4.19 or 2.4.19-ac kernel.
>
> I would bet money that the problem is purely a .20-preX-acX thing.
Its a status entry direct from the drive. The drive says "uncorrectable
error" which means there is a media problem. Its nothing to do with
Linux
On Fri, 2002-09-06 at 18:33, Daniel Egger wrote:
> Am Fre, 2002-09-06 um 17.38 schrieb Alan Cox:
>
> > Get the IBM disk tools, upgrade the firmware and see what the ibm tools
> > have to say. IBM drives have had some problems with spontaneous bad
> > blocks appearing that go away with new firmware and a run of the disk
> > tools.
>
> The "run of the disk tools" that does away with the badblocks is a
> lowlevel format; a tedious way to spent ones' time on a harddrive
> that will die anyway soon.
For the IBM's it depends what the problem is. Spontaneous bad blocks
appearing during power off appears to be fixed by the firmware update
Is a drive you cant rely on worth having?
On 06 Sep 2002 21:31:25 +0100 Alan Cox <[email protected]> wrote:
On Fri, 2002-09-06 at 21:40, [email protected] wrote:
> Is a drive you cant rely on worth having?
Thats up to the owner. There are lots of uses for such drives - /tmp,
swap, in a raid array, etc
Mind you I collect drives that have nice properties like "hangs the
entire scsi bus when inserted into an SCA connector" for testing with
On Friday 06 September 2002 22:40, [email protected] wrote:
> Is a drive you cant rely on worth having?
Very good question...
the DFT has finished it's work, and tells me no more bad sectors are
present... for how long?
To the swap guru's: what does linux do if it attempts to write to swap, and
gets an error code returned from the ide layer?
DK
--
The streets are safe in Philadelphia, it's only the people who make
them unsafe.
-- Mayor Frank Rizzo
> > Is a drive you cant rely on worth having?
>
> Thats up to the owner. There are lots of uses for such drives - /tmp,
> swap, in a raid array, etc
..primary Windows partition :-)
well 9x is unreliable anyway..
On Fri, 6 Sep 2002 22:45:55 +0100 (BST) [email protected] wrote:
>>>>> "AC" == Alan Cox <[email protected]> writes:
AC> Thats up to the owner. There are lots of uses for such drives -
AC> /tmp, swap, in a raid array, etc
Be careful of these even in a RAID array; they will go bad silently.
I had one array (software RAID5, 8 75GXP drives on a 3w6800 in JBOD
mode, one hot spare) that was going fine until one drive died hard,
wouldn't spin up, etc. I replaced it, but during the RAID resync
three other drives were found to have errors. The array was trash,
but luckily all drives were dead just at the tail end, so I could copy
the data out during the RAID resync. Some of the failed drives had
the updated firmware.
3ware has background integrity scans now; I don't know if software
RAID has any equivalent besides an occasional 'dd', but even that's a
good idea.
- J<
> > Now that the Smart Suite S.M.A.R.T. applications are unmaintained, would
>
> what happened?
I'm not sure, but the last update to the S.M.A.R.T. Suite website, on 3 July this year, says that the page and the applications are no longer maintained.
Seems the Beta of version 2.0 never got finished either :-(.
> > there be any chance of implementing S.M.A.R.T. in to the kernel IDE code?
>
> what would be the benefit? as I understand it, smart is really
> a means of reporting long-term disk status, which is optimally done
> by user-space. even something exotic like failing over to a spare disk
> would clearly be best done in user-space.
You are right, the idea is to monitor the smart info, ideally from when the drive is new, but at least over a period of time, so that a change in it's behavior shows up.
> > I know the IDE code is already a nightmare, but it would be a nice feature.
>
> what did you have in mind?
Well, nothing very exotic, just some sanity checks on the SMART data when the IDE and SCSI interfaces are probed for devices. Something like:
* Device supports/does not support following SMART features:
* General attributes
* Vendor attributes
* Error log
* Selftest log
* Drive info
* SMART is currently enabled/disabled
* Total power-on time is currently foo hours
* Warning if any of the following is excessive:
* Last spin up time
* Calibration retry count
* UDMA CRC Error count
> > S.M.A.R.T. is terribly under used at the moment - most people don't even
> > know what it is. Infact, I could be wrong, but isn't a subset of
> > S.M.A.R.T. implemented on modern SCSI disks, too?
>
> I know that most people don't run it, but other than that, how is it
> underused?
Well, I can't see any reason for *not* using it where available - who wouldn't appreciate a warning on boot up, 'oh, by the way, /dev/hda is about to die in a couple of days :-)'
> > Monitoring of any kind is always a nice feature to have...
>
> certainly, though that doesn't mean it should move from userspace to
> kernel...
Agreed, there isn't any point in doing monitoring in kernelspace, but capabilities reporting, and sanity checks on boot might be useful.
John.
First BACK up what is left.
Next dig out smartsuite from http://www.linux-ide.org/smart.html
Run it in full capture mode, please use another disk to run root, or the
system will tank.
Read and save smart logs.
cat /dev/zero > /dev/hd{IBM-DTLA-307060}x
Rerun Smart in full capture mode.
Reread smart logs and compare.
cat /dev/urandom > /dev/hd{IBM-DTLA-307060}x
If you get no errors you can reuse the drive, for how long? Maybe 6 months
to a year.
Now, I can not tell you what, why, how things are going on.
Sheesh, I expect to be in a deep six for this series of events already.
Sorry, I can not say anymore.
If you do not like the above, you need to run out and buy another drive
fast.
Cheers,
On Fri, 6 Sep 2002, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data
> of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data
> of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means that
> the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without problems
> now for about 2 years.
>
> Thanks!
>
> DK
> --
> If all the Chinese simultaneously jumped into the Pacific off a 10 foot
> platform erected 10 feet off their coast, it would cause a tidal wave
> that would destroy everything in this country west of Nebraska.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
Send me the results offline
On Fri, 6 Sep 2002, DevilKin wrote:
> On Friday 06 September 2002 17:36, [email protected] wrote:
> > > I've looked up these errors on the net, and as far as i can tell it means
> > > that the drive has some bad sectors at the given addresses and that it
> > > will probably die on me sooner or later.
> > >
> > > Can someone either confirm this to me or tell me what to do to fix it?
> > >
> > > The drive involved is an IBM-DTLA-307060, which has served me without
> > > problems now for about 2 years.
> >
> > Have a look at:
> >
> > http://csl.cse.ucsc.edu/smart.shtml
> >
> > there you will find software for interrogating and monitoring the
> > S.M.A.R.T. data available from your drive. It's a little late to start
> > monitoring it, if the drive is already dying, but if, for example, it shows
> > a lot of re-allocated sectors, or spin retries, you'll know something is
> > wrong.
> >
>
> OK, I downloaded that and installed it, but well, frankly, it shows me very
> little useful stuff.
>
> Or i'm just not good at interpreting this.
>
> DK
>
> --
> "I gained nothing at all from Supreme Enlightenment, and for that very
> reason it is called Supreme Enlightenment."
> -- Gotama Buddha
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
> Next dig out smartsuite from http://www.linux-ide.org/smart.html
I thought that smartsuite was now unmaintained, and posted a comment to that effect earlier in this thread - sorry for the mis-information.
John.
Technically it is, I am working to transfer the copyright/license to LAD.
Then I can update it and transform it to the preferred kernel API that is
not enabled by default. I expect it will require an sub-set of the
taskfile_ioctl calls to restrict various IO calls.
Cheers,
On Sat, 7 Sep 2002 [email protected] wrote:
> > Next dig out smartsuite from http://www.linux-ide.org/smart.html
>
> I thought that smartsuite was now unmaintained, and posted a comment to that effect earlier in this thread - sorry for the mis-information.
>
> John.
>
Andre Hedrick
LAD Storage Consulting Group
DevilKin wrote:
> Luckely I've been able to backup everything from the disk, and I'm running the
> DFT now. The tests showed bad sectors, i'm currently running a disk erase.
I have had sucess in firmware-upgrading these drives, after which all
problems were gone forever.
You can download the firmware programs from
http://anders.fugmann.dhs.org/ibm. There are both upgrade for 75GXP and
60GXP, or you could contact IBM for the firmware upgrade - They are not
available on the ibm site. The programs are Windows thingies, which
creates a floppy to be booted.
Regards
Anders Fugmann
On Sat, 07 Sep 2002 11:30:01 +0200 Anders Fugmann (AF) wrote:
AF> You can download the firmware programs from
AF> http://anders.fugmann.dhs.org/ibm. There are both upgrade for 75GXP and
AF> 60GXP, or you could contact IBM for the firmware upgrade - They are not
AF> available on the ibm site. The programs are Windows thingies, which
AF> creates a floppy to be booted.
They are on the IBM site, but a bit hard to find:
http://www-1.ibm.com/support/docview.wss?rs=0&uid=psg1MIGR-39082
-Udo.
Am Fre, 2002-09-06 um 22.32 schrieb Alan Cox:
> Its a status entry direct from the drive. The drive says "uncorrectable
> error" which means there is a media problem. Its nothing to do with
> Linux
According to IBM tech staff it is an OS problem because the data
transfer to the drive got corrupted somehow and thus the drive forgot
about the sectors.
I was just laughing my ass off when I heard this, especially after
the 4th drive failing within a short period of time with the same
guy calling me on my cell phone and telling me the same shite over
and over again...
--
Servus,
Daniel
Am Fre, 2002-09-06 um 23.01 schrieb Alan Cox:
> Thats up to the owner. There are lots of uses for such drives - /tmp,
> swap, in a raid array, etc
Having two of such notorious broken drives in a RAID array is also
not an option in many cases. Mirroring is meant to increase data
security in case a drive fails spontaneously; using particularly bad
drives for that purpose is a way to work against the reason.
> Mind you I collect drives that have nice properties like "hangs the
> entire scsi bus when inserted into an SCA connector" for testing with
You probably should keep a DeathStar as the worst drive ever made.
Heck, if my latest replacement drive from IBM ("serviceable used part")
starts failing again I might as well ship it to you instead of IBM.
--
Servus,
Daniel
Am Fre, 2002-09-06 um 21.22 schrieb DevilKin:
> Well, there were 21 ATA errors, and it showed 5 error blocks, with disk 'live'
> times of 629 hours.
No wonder it ran for 2 years. Are you using this machine frequently at
all? :)
> The tests showed bad sectors, i'm currently running a disk erase.
This is exactly the mistake I've been meaning to warn you of.
The disk will corrupt sooner or later again and you'll have to go
through all the torture (possible backup/restore, missing data) again
and if you're unlucky (which is quite possible with your frequency of
use) the warranty is void until the problems appear the next time.
--
Servus,
Daniel
> > The tests showed bad sectors, i'm currently running a disk erase.
>
> This is exactly the mistake I've been meaning to warn you of.
> The disk will corrupt sooner or later again and you'll have to go
> through all the torture (possible backup/restore, missing data) again
> and if you're unlucky (which is quite possible with your frequency of
> use) the warranty is void until the problems appear the next time.
There are two separate issues here, though:
* Buggy firmware
* Unreliable media
We have confirmed, (I believe), that the drive did have the buggy firmware. We do not know yet whether the media is defective or not, but we do know that the drives are not the best in the world.
Alan also confirmed that the errors were direct from the device, and so it is not a kernel bug.
However, I raise the question of whether the new kernel version caused different access patterns to the device, and showed up the firmware bug that was there all the time. Or maybe the compilation of the new kernel thrashed the disk and showed up the firmware bug. If the machine has been on for some time, (months), doing not very much, maybe a lot of disk data was cached in RAM, and the kernel compile caused it to be re-read from disk, showing up media defects.
I was hoping that he would actually post the output of:
smartctl -a /dev/hda?
because that tells you all sorts of things, like, for example, reallocated sector count, and calibration retry count.
Obviously, it is not a good idea to use the drive for anything important until it has been tested in a non-critical application first.
Besides, you *do* backup, don't you? (Or do what Linus suggested a while ago, and upload your stuff to an ftp site that is mirrored worldwide.)
I don't see the point of returning a disk that turns out not to be faulty after the firmware upgrade, for replacement under the warranty, even if it qualifies for a warranty replacement, (which it shouldn't do), because you might be exchanging a good disk for a bad disk.
John.
Am Sam, 2002-09-07 um 15.08 schrieb [email protected]:
> Besides, you *do* backup, don't you?
I do but besides that there is still data loss involved and my time is
expensive and limited, so I'd rather go for a hasslefree solution than
to poke around in mud with a stick in the hope it might clear up.
> (Or do what Linus suggested a while ago, and upload your stuff to an
> ftp site that is mirrored worldwide.)
Very practicable advise.
> I don't see the point of returning a disk that turns out not to be
> faulty after the firmware upgrade,
The point is that until you know whether it really was the firmware,
you've spend so much time that it is much easier to return the drive.
> even if it qualifies for a warranty replacement, (which it shouldn't do)
A faulty drive is a faulty drive and thus qualifies for a
free replacement (at least in Germany). Nobody here can force
you to try several costly things which might solve the problem;
it is rather the manufacturers duty to fix it on their cost.
> because you might be exchanging a good disk for a bad disk.
Very doubtful considering past experience. Also it's not very
probable (though it has happened) to receive a disk which is
more broken than broken.
--
Servus,
Daniel
> > Besides, you *do* backup, don't you?
>
> I do but besides that there is still data loss involved and my time is
> expensive and limited, so I'd rather go for a hasslefree solution than
> to poke around in mud with a stick in the hope it might clear up.
Fair enough, if you don't have the time to devote to it, it's best to replace the drive.
I assumed from the size of this thread, which has nothing to do with the kernel anymore, that we were trying to find out what was to blame.
If this is going to become a flamewar, please remove the cc: to the kernel list, as I doubt that it interests them.
> > (Or do what Linus suggested a while ago, and upload your stuff to an
> > ftp site that is mirrored worldwide.)
>
> Very practicable advise.
Whatever - it was a joke.
The reason I brought up backups, was because even if you have a RAID array, of high quality drives, with non-sequential serial numbers, on hot-pluggable interfaces, with known good firmware, you can still get silent data corruption.
Fact - *NO* SLED, or RAID array, can ever be guaranteed never to silently flip a bit.
> > I don't see the point of returning a disk that turns out not to be
> > faulty after the firmware upgrade,
>
> The point is that until you know whether it really was the firmware,
> you've spend so much time that it is much easier to return the drive.
And the chances are you will get another drive of the same model, back from IBM. How does that help?
I already pointed out that there are two known issues here with these drive - firmware bugs, and media defects.
So far, all we can say is that the firmware problem is now fixed. On a replacement drive, you can't even say that.
The 'media errors' could have been caused entirely by the buggy firmware.
> > even if it qualifies for a warranty replacement, (which it shouldn't do)
>
> A faulty drive is a faulty drive and thus qualifies for a
> free replacement (at least in Germany). Nobody here can force
> you to try several costly things which might solve the problem;
> it is rather the manufacturers duty to fix it on their cost.
No, but you've upgraded the firmware, right? If that has fixed the problem, then it is not a faulty drive. If it is not a faulty drive, then what is the point in sending it back? If it is not a faulty drive, IBM would be justified in sending it right back to you at your expense. Oh, and it might get damaged in transit.
> > because you might be exchanging a good disk for a bad disk.
>
> Very doubtful considering past experience. Also it's not very
> probable (though it has happened) to receive a disk which is
> more broken than broken.=20
No, I would say it is very possible that you could receive a disk with the old firmware on it. So, you'll just plug in your 'new' disk, and in a few months, bad sectors will start appearing.
John.
Anders Fugmann wrote:
> I have had sucess in firmware-upgrading these drives, after which all
> problems were gone forever.
Which firmware version do your drives show? I ran the firmware upgrade
on my two DTLA half a year ago, and ended up with this:
Model=IBM-DTLA-307045, FwRev=TX6OA59A
Model=IBM-DTLA-305040, FwRev=TW4OA69A
(from hdparm -i output - the former 0A changed to 9A after the upgrade,
rest stayed the same)
Both work fine (they never failed me before the upgrade either).
However, at least the second drive still clicks often enough for me to
notice. I am still worried, though smartsuite says I'm fine - if I read
the output correctly.
It seems to click only when doing lots of write requests for extended
periods of time (like unbatching and spooling several megabytes of news
- one or two usually don't trigger it, larger batches do).
I wonder if it would be possible for the driver to monitor SMART and
lighten the load on the drive when things don't seem normal.
What is normal, anyway? For example, my Seagate Barracuda IV shows
continually increasing raw values for "Raw Read Error Rate", "Seek Error
Rate" and "Hardware ECC Recovered". It works fine, though. The older U5
I still have running has a high but pretty constant raw value for the
first, a slower rate of increase for the second and doesn't show the
third.
I don't really believe the 310617 power on hours my Maxtor (the old 60
gig with 4 platters) claims, either.
Holger
> I wonder if it would be possible for the driver to monitor SMART and
> lighten the load on the drive when things don't seem normal.
I think it would be fun to have SMART monitoring in the driver, but I'm not sure it's worth the bloat. It *can* be done in userspace, afterall.
> What is normal, anyway?
Not sure what 'normal' is, but the manufacturer defines thresholds, which are to be interpreted as 'drive is failing' if they are exceeded.
> I don't really believe the 310617 power on hours my Maxtor (the old 60
> gig with 4 platters) claims, either.
That's because it's reporting power on time in minutes :-)
John.
Am Sam, 2002-09-07 um 17.02 schrieb [email protected]:
> No, but you've upgraded the firmware, right?
Not exactly. According to IBM technical support there is no such thing
as a new firmware. The drives are alright, the OS is broken.
> If that has fixed the problem, then it is not a faulty drive.
Right, and how would you notice without sacrifying more data?
> So, you'll just plug in your 'new' disk, and in a few months,
> bad sectors will start appearing.
Not if you sold it at Ebay, which is what I did with all *new*
drives I received from IBM. I just kept the "serviceable used part"
one in case I need to install Windows to upgrade the firmware of
some drive or anything else in range.
--
Servus,
Daniel
This discussion is becoming stupid, but here we go:
> > No, but you've upgraded the firmware, right?
>
> Not exactly.
??? Either you did or didn't.
> According to IBM technical support there is no such thing
> as a new firmware. The drives are alright, the OS is broken.
Right, so you're calling Alan Cox a liar, then? I know who I believe.
> > If that has fixed the problem, then it is not a faulty drive.
> Right, and how would you notice without sacrifying more data?
smartctl -X /dev/hda?
'Execute Extended Self Test' might be a good start
or you could just copy data to/from it, generally hammer it and spin it up, down, and sideways, generally try to make it go wrong, and if your data is intact, then I would trust it more than a disk that arrived in a jiffy bag, with an assurance that 'this one works'.
> > So, you'll just plug in your 'new' disk, and in a few months,
> > bad sectors will start appearing.
>
> Not if you sold it at Ebay,
The bad sectors are just as likely to appear, but somebody else's data will be lost. Very nice gesture, not to mention that you probably violate the Ebay T&C by selling a product that you suspect is faulty.
> which is what I did with all *new* drives I received from IBM.
Well, I won't buy a second hand drive from you then :-).
> I just kept the "serviceable used part" one in case I need to install
> Windows to upgrade the firmware of some drive or anything else in range.
Fine, if that's what floats your boat.
Infact, I was completely wrong, OK? You were right all along, so there is no need to continue this pointless thread.
John.
On Sat, 2002-09-07 at 21:19, Daniel Egger wrote:
> Am Sam, 2002-09-07 um 17.02 schrieb [email protected]:
>
> > No, but you've upgraded the firmware, right?
>
> Not exactly. According to IBM technical support there is no such thing
> as a new firmware. The drives are alright, the OS is broken.
The IBM technical support I dealt with not only confirmed there was new
firmware, the tools updated it and said they had 8)
On Sat, 2002-09-07 at 21:41, [email protected] wrote:
> > According to IBM technical support there is no such thing
> > as a new firmware. The drives are alright, the OS is broken.
>
> Right, so you're calling Alan Cox a liar, then? I know who I believe.
Hardly. He said IBM tech support told him one thing, and they told me
another. Give it a rest
> On Sat, 2002-09-07 at 21:19, Daniel Egger wrote:
> > Am Sam, 2002-09-07 um 17.02 schrieb [email protected]:
> >
> > > No, but you've upgraded the firmware, right?
> >
> > Not exactly. According to IBM technical support there is no such thing
> > as a new firmware. The drives are alright, the OS is broken.
>
> The IBM technical support I dealt with not only confirmed there was new
> firmware, the tools updated it and said they had 8)
Here is the URL:
http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-39082
it expressly states that the firmware is intended for the DTLA-307060.
The page mentions that is it enhances stability and SMART data collection.
John.
On 7 Sep 2002, Daniel Egger wrote:
> Am Sam, 2002-09-07 um 17.02 schrieb [email protected]:
>
> > No, but you've upgraded the firmware, right?
>
> Not exactly. According to IBM technical support there is no such thing
> as a new firmware. The drives are alright, the OS is broken.
They are full of CRAP!
IBM ran TASKFILE IO throught there bus analyzers and it came up clean.
IBM also introduced FLAGGED versions of the diagnostic TASKFILE transport
for eventual use of their DFT (Drive Fitness Test).
You tell the service tech he is smoking crack.
The kernel passed with flying colors in their disk labs. If you read
in ide-taskfile.c version 0.33 and above, you will see they did some work
on the driver and verified issues.
Now earlier I published a method of how to stablize the drive once you
back up all the data you can off of it. Since I do not yet have a source
verison of DFT-Linux, or binary yet, I can not offer much more native.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Sat, 7 Sep 2002 [email protected] wrote:
...
> Here is the URL:
>
> http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-39082
>
> it expressly states that the firmware is intended for the DTLA-307060.
The firmware update is for many more drives than that, My own
Model=IBM-DTLA-305040, FwRev=TW4OA60A
is also recommended, as well as many with a FwRev=xxxOyzzz with zzz<66A.
Now i have to find a windows machine to try it out on...
Dave,
--
Dave Forrest [email protected]
(804)642-0662h (434)924-3954w http://mug.sys.virginia.edu/~drf5n/
Why don't IBM do a Linux version? Don't they have a linux firmware utility? I thought IBM had a campaign to support linux...
Thanks. Regards, Dean McEwan. OpenModemTalk creator.
On Sat, 7 Sep 2002 19:19:21 -0400 (EDT) David Forrest <[email protected]> wrote:
Hell.Surfers,
Because when I was booted out of 2.5, effectively, the API for DFT was
deleted. So you take away the means to make it work, and they stop moving
that direction. Now that -AC series has the API, and 2.5 will see a full
return of it, I now have to restart the process to motivate them again.
However since Hitachi may end up buying out IBM's disk manufacturing
business and their product lines, I now have to go an try an court (old
dating term for the post 70's crowd) Hitachi.
Now, what do you want?
On Sun, 8 Sep 2002 [email protected] wrote:
> Why don't IBM do a Linux version? Don't they have a linux firmware utility? I thought IBM had a campaign to support linux...
>
> Thanks. Regards, Dean McEwan. OpenModemTalk creator.
>
> On Sat, 7 Sep 2002 19:19:21 -0400 (EDT) David Forrest <[email protected]> wrote:
>
Andre Hedrick
LAD Storage Consulting Group
I simply wondered why ibm moaned about supporting linux loads, yet don't have a decent linux flash update for their drives. If their stuff is written in C# why don't they port it to Ximian Mono?
Thanks. Regards, Dean McEwan. OpenModemTalk creator.
On Sat, 7 Sep 2002 17:17:25 -0700 (PDT) Andre Hedrick <[email protected]> wrote:
I am sorry, what part did you not understand?
IBM == Big Corporation of decoupled divisions equal to "Two Zaks" by Dr. Seuss
The communication is going the same direction.
Cheers, Have a Great Day, Bye.
Andre Hedrick
LAD Storage Consulting Group
On Sun, 8 Sep 2002 [email protected] wrote:
> I simply wondered why ibm moaned about supporting linux loads, yet don't have a decent linux flash update for their drives. If their stuff is written in C# why don't they port it to Ximian Mono?
>
> Thanks. Regards, Dean McEwan. OpenModemTalk creator.
>
> On Sat, 7 Sep 2002 17:17:25 -0700 (PDT) Andre Hedrick <[email protected]> wrote:
>
David Forrest <[email protected]> writes:
>On Sat, 7 Sep 2002 [email protected] wrote:
>...
>> Here is the URL:
>>
>> http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-39082
>>
>> it expressly states that the firmware is intended for the DTLA-307060.
>The firmware update is for many more drives than that, My own
> Model=IBM-DTLA-305040, FwRev=TW4OA60A
>is also recommended, as well as many with a FwRev=xxxOyzzz with zzz<66A.
>Now i have to find a windows machine to try it out on...
You don't need to. All you need is someone run this tool and send you
the image it creates. I put mine as boot.img on a CD so I can upgrade
all the disks I have in boxes without floppy disk drives. It's a self
booting DOS disk.
Regards
Henning
--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [email protected]
Am Schwabachgrund 22 Fon.: 09131 / 50654-0 [email protected]
D-91054 Buckenhof Fax.: 09131 / 50654-20
> >The firmware update is for many more drives than that, My own
>
> > Model=IBM-DTLA-305040, FwRev=TW4OA60A
>
> >is also recommended, as well as many with a FwRev=xxxOyzzz with zzz<66A.
> >Now i have to find a windows machine to try it out on...
>
> You don't need to. All you need is someone run this tool and send you
> the image it creates. I put mine as boot.img on a CD so I can upgrade
> all the disks I have in boxes without floppy disk drives. It's a self
> booting DOS disk.
As the old firmware is known to be buggy, and those bugs are relevant when using Linux, and updated firmware is available, is it worth checking for the known buggy firmware version in the ide driver?
I realise that we cannot check every drive in the world for compatibility, but if this is a known issue...
John.
On 7 Sep 2002, Andre Hedrick wrote:
> On 7 Sep 2002, Daniel Egger wrote:
>
> > Am Sam, 2002-09-07 um 17.02 schrieb [email protected]:
> >
> > > No, but you've upgraded the firmware, right?
> >
> > Not exactly. According to IBM technical support there is no such thing
> > as a new firmware. The drives are alright, the OS is broken.
>
> They are full of CRAP!
>
> IBM ran TASKFILE IO throught there bus analyzers and it came up clean.
> IBM also introduced FLAGGED versions of the diagnostic TASKFILE transport
> for eventual use of their DFT (Drive Fitness Test).
>
> You tell the service tech he is smoking crack.
> The kernel passed with flying colors in their disk labs. If you read
> in ide-taskfile.c version 0.33 and above, you will see they did some work
> on the driver and verified issues.
Sorry, that I step in but you said that you are working on smartsuite (2.1+),
again?
Andre, can you fix start/stop counts, please?
unWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sda
Device: IBM DDYS-T18350N Version: S96H
Device supports S.M.A.R.T. and is Enabled
Temperature Warning Disabled or Not Supported
S.M.A.R.T. Sense: Okay!
Current Drive Temperature: 31 C
Drive Trip Temperature: 85 C
Current start stop count: 131072 times
Recommended start stop count: 2555920 times
SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdb
Device: IBM DDRS-34560D Version: DC1B
Device supports S.M.A.R.T. and is Enabled
Temperature Warning Disabled or Not Supported
S.M.A.R.T. Sense: Okay!
SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdc
Device: IBM DDRS-34560W Version: S71D
Device supports S.M.A.R.T. and is Enabled
Temperature Warning Disabled or Not Supported
S.M.A.R.T. Sense: Okay!
Smartsuite-2.1 (at least) missing some feather for SCSI.
Regards,
Dieter
BTW
I had a double disk crash (same symptoms as in this thread) in a school's
RAID5 with four Fujitsu MPG3204AT-EF (the ones with gel-lager, silent and
reliable we hoped) last week...
The shop for which I work from time to time got 71 disks of this type back
(sold over the last 1.5 years). We switched to them after the "IBM" disaster.
Maybe a "misdecision" ;-)
What shall we sell safely, now...?
MAXTOR?
--
Dieter N?tzel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel at hamburg.de (replace at with @)
> Andre, can you fix start/stop counts, please?
>
> unWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sda
> Device: IBM DDYS-T18350N Version: S96H
> Device supports S.M.A.R.T. and is Enabled
> Temperature Warning Disabled or Not Supported
> S.M.A.R.T. Sense: Okay!
> Current Drive Temperature: 31 C
> Drive Trip Temperature: 85 C
> Current start stop count: 131072 times
> Recommended start stop count: 2555920 times
>
> SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdb
> Device: IBM DDRS-34560D Version: DC1B
> Device supports S.M.A.R.T. and is Enabled
> Temperature Warning Disabled or Not Supported
> S.M.A.R.T. Sense: Okay!
>
> SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdc
> Device: IBM DDRS-34560W Version: S71D
> Device supports S.M.A.R.T. and is Enabled
> Temperature Warning Disabled or Not Supported
> S.M.A.R.T. Sense: Okay!
>
> Smartsuite-2.1 (at least) missing some feather for SCSI.
Are you sure that it is not just the drive mis-reporting the start/stop counts? S.M.A.R.T. implementions are often flakey.
> BTW
> I had a double disk crash (same symptoms as in this thread) in a school's
> RAID5 with four Fujitsu MPG3204AT-EF (the ones with gel-lager, silent and
> reliable we hoped) last week...
> The shop for which I work from time to time got 71 disks of this type back
> (sold over the last 1.5 years). We switched to them after the "IBM" disaster.
> Maybe a "misdecision" ;-)
> What shall we sell safely, now...?
> MAXTOR?
I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu, Western Digital, and DEC drives all fail on me before.
It's dissapointing that Maxtor are reducing their warranty from 3 years to 1 year, but on the other hand, I've never needed it at all.
John.
On Sun, 8 Sep 2002 [email protected] wrote:
> I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu, Western Digital, and DEC drives all fail on me before.
>
> It's dissapointing that Maxtor are reducing their warranty from 3 years to 1 year, but on the other hand, I've never needed it at all.
The problem is that you will eventually lose data. No matter what the
brand is. Some disks tend to work better for longer time. Sometimes you
are just out of luck. With some brands, luck seems to be running out
(Quantum). Other brands may work better, but they will eventually fail.
I've had failed Seagate, Maxtor, IBM, Fujitsu, Western Digital, Quantum,
Conner.
The only brand which never failed that I use is Samsung (probably due to
the fact that I only have 2 of them compared to the other brands).
I do expect them to fail and I have backups of the most important stuff I
need.
The best I found for reliability (except for backups) is haveing a
software raid 5 on many disks of about same capacity (but different
brand/model).
Nuitari wrote:
> On Sun, 8 Sep 2002 [email protected] wrote:
>
>>I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu, Western Digital, and DEC drives all fail on me before.
>>
>>It's dissapointing that Maxtor are reducing their warranty from 3 years to 1 year, but on the other hand, I've never needed it at all.
>
>
> The problem is that you will eventually lose data. No matter what the
> brand is. Some disks tend to work better for longer time. Sometimes you
> are just out of luck. With some brands, luck seems to be running out
> (Quantum). Other brands may work better, but they will eventually fail.
>
> I've had failed Seagate, Maxtor, IBM, Fujitsu, Western Digital, Quantum,
> Conner.
>
> The only brand which never failed that I use is Samsung (probably due to
> the fact that I only have 2 of them compared to the other brands).
>
> I do expect them to fail and I have backups of the most important stuff I
> need.
>
> The best I found for reliability (except for backups) is haveing a
> software raid 5 on many disks of about same capacity (but different
> brand/model).
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
well since most hdd's have a 5-10 year lifespan when operating in <30C
you can see why many people's hdds are dying or having errors much
sooner. People are operating their hdd's in the mid 30's or higher due
to low air circulation or just generally high ambient temperatures in
the case. This cuts run to error time significantly. There was a
chart on some hdd website that showed how long you can expect to have
your data safe for different ranges of temperatures and it's something
like 30-33 = 3 year ...~35C is getting close to 1 year and it just goes
like that. So yea, it's not surprising at all that reputations that
used to work dont anymore as heat effects all and heat has become the
dominating problem in today's hdds rather than simple quality of parts.
Keep your drive cool and you can expect to keep it around for a very
long time.
> Keep your drive cool and you can expect to keep it around for a very
> long time.
In a large tower case, it is worth while to leave a drive bay free between disks, instead of using them sequentially, like this:
------ ------
|****| |****|
------ ------
| | |****|
------ ------
|****| |****|
------ instead of ------
| | | |
------ ------
|****| | |
------ ------
| | | |
------ ------
Also, if you get a disk that suddenly doesn't spin up, don't assume that the motor has died - you can sometimes bring them back to life by connecting power to them, and giving them a very sharp angular jolt in the plane of the platters - the effect is called static friction, (A.K.A. stiction)
John.
On Sun, 8 Sep 2002 [email protected] wrote:
> > Keep your drive cool and you can expect to keep it around for a very
> > long time.
>
> In a large tower case, it is worth while to leave a drive bay free between disks, instead of using them sequentially, like this:
>
> ------ ------
> |****| |****|
> ------ ------
> | | |****|
> ------ ------
> |****| |****|
> ------ instead of ------
> | | | |
> ------ ------
> |****| | |
> ------ ------
> | | | |
> ------ ------
>
> Also, if you get a disk that suddenly doesn't spin up, don't assume that
> the motor has died - you can sometimes bring them back to life by
> connecting power to them, and giving them a very sharp angular jolt in
> the plane of the platters - the effect is called static friction,
> (A.K.A. stiction)
Depends on the ventilation you can put.
In this computer I have all bays (5 1/4 and 3 1/2) full and the
temperature doesn't get over 30
There is a total of 4 HDs and 3CDs drives.
I think the spacing depends on the heat output of the drives.
I use my 2 Samsungs (they are always cold) between the Maxtor (makes a
lot of heat) and Fujitsu.
Of course the best is to strap 6" fans in front of them (which I did to
cool a stack of 6 full height disks that I put in an old full tower
cases).
Just for the sake of the argument...
[email protected] writes:
>> BTW
>> I had a double disk crash (same symptoms as in this thread) in a
>> school's RAID5 with four Fujitsu MPG3204AT-EF (the ones with
>> gel-lager, silent and reliable we hoped) last week... The shop for
>> which I work from time to time got 71 disks of this type back (sold
>> over the last 1.5 years). We switched to them after the "IBM"
>> disaster. Maybe a "misdecision" ;-) What shall we sell safely,
>> now...? MAXTOR?
>
> I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> Western Digital, and DEC drives all fail on me before.
>
> It's dissapointing that Maxtor are reducing their warranty from 3
> years to 1 year, but on the other hand, I've never needed it at all.
And with good reason, it seems (the warranty reduction) - my Maxtor
6L060J3 (or whatever, the 7200rpm 60G ATA-100) died after approx. 8
weeks (bad sectors); warranty replacement; replacement dies after
approx. 16 weeks (bad sectors); I'm now on the 2nd replacement. Oh
joy.
I have to say that I have a few more (~ 5) Maxtor drives running which
didn't cause any trouble... so far.
Yes, I did switch to Maxtor because of excessive outages of IBM drives
(DeathStar), why do you ask?
So long,
Joe
--
"I use emacs, which might be thought of as a thermonuclear
word processor."
-- Neal Stephenson, "In the beginning... was the command line"
Hi,
On Sun, 8 Sep 2002 [email protected] wrote:
> I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> Western Digital, and DEC drives all fail on me before.
I can't confirm that. Yes, IBM failed, Fujitsu is often IBM, DEC isn't any
better either. But Western... I'm still having some quite old Western
drives, aged several years, a lot more than they guaranteed. They still
run our old database, and are used in some workstations. Some of them
touched ground more than once, and are still running like the cursed. No
need for an end. Then there are these ST-157A, stable as rocks, still
running here. You can even crash them on up to 60G's, if not more!!! That
is, they can stand falling down very well.
In the while, there were two to three broken Maxtor disks. Their spindles
broke after two years, so the data was physically moved upwards. We've
returned them and got another disk in return, no problem.
Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-
On Mon, 9 Sep 2002, Thunder from the hill wrote:
> Hi,
>
> On Sun, 8 Sep 2002 [email protected] wrote:
> > I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> > Western Digital, and DEC drives all fail on me before.
>
> I can't confirm that. Yes, IBM failed, Fujitsu is often IBM, DEC isn't any
> better either. But Western... I'm still having some quite old Western
> drives, aged several years, a lot more than they guaranteed. They still
WDC AC21600H.
Best damn drive ever made by any company.
I've got maybe 40 of these left in the systems here. They're coming up on
7-8 years old.
Sure, they're dog slow. Sure, they're pretty small(1.6 gig)
But they're rock stable and solid. I use them for boot drives for old
servers, and for the old Windows PC's
Mike
If I knew how to check that I would. We wanted to do the same for the
ancient WD but there was no way to tell
[email protected] wrote (ao):
> I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> Western Digital, and DEC drives all fail on me before.
>
> It's dissapointing that Maxtor are reducing their warranty from 3
> years to 1 year, but on the other hand, I've never needed it at all.
FWIW:
On http://www.maxtor.com/products/enterprise_apps/default.htm
they say 3 years limited warranty.
On Tue, 10 Sep 2002, Ookhoi wrote:
> [email protected] wrote (ao):
> > I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> > Western Digital, and DEC drives all fail on me before.
> >
> > It's dissapointing that Maxtor are reducing their warranty from 3
> > years to 1 year, but on the other hand, I've never needed it at all.
>
> FWIW:
> On http://www.maxtor.com/products/enterprise_apps/default.htm
> they say 3 years limited warranty.
That's only their MaxLine II drives. Their regular DiamondMax and all
that, are still one year starting in October. At least their SCSI drives
haven't been killed off yet.
OffTopic:
I'm wonder just who in upper Management at Maxtor decided to help the
company commit suicide.
I've already ordered a few Seagate drives to test out here at our
offices, to replace my previous choice of Maxtor D740X's. I'll still be
looking at the MaxLine II's for backup servers because of the 3 year
warranty, but for desktops, I can't risk our data to drives that even the
manufacturer doesn't trust. The performance drop becomes secondary at
that point.
I know of a few local shops that will no longer carry Maxtor drives
because the warranty costs would kill their profit margin. They cannot
offer a 3 year warranty on the computer when the drive is only covered
for a year.
Mike Dresser,
Systems Administrator
Windsor Machine & Stamping
>
> On Tue, 10 Sep 2002, Ookhoi wrote:
>
> > [email protected] wrote (ao):
> > > I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> > > Western Digital, and DEC drives all fail on me before.
> > >
> > > It's dissapointing that Maxtor are reducing their warranty from 3
> > > years to 1 year, but on the other hand, I've never needed it at all.
> >
> > FWIW:
> > On http://www.maxtor.com/products/enterprise_apps/default.htm
> > they say 3 years limited warranty.
>
> That's only their MaxLine II drives. Their regular DiamondMax and all
> that, are still one year starting in October. At least their SCSI drives
> haven't been killed off yet.
>
> OffTopic:
>
> I'm wonder just who in upper Management at Maxtor decided to help the
> company commit suicide.
>
> I've already ordered a few Seagate drives to test out here at our
> offices, to replace my previous choice of Maxtor D740X's. I'll still be
> looking at the MaxLine II's for backup servers because of the 3 year
> warranty, but for desktops, I can't risk our data to drives that even the
> manufacturer doesn't trust. The performance drop becomes secondary at
> that point.
>
> I know of a few local shops that will no longer carry Maxtor drives
> because the warranty costs would kill their profit margin. They cannot
> offer a 3 year warranty on the computer when the drive is only covered
> for a year.
>
> Mike Dresser,
>
> Systems Administrator
> Windsor Machine & Stamping
>
According this this announcement:
http://www.shareholder.com/maxtor/news/20020909-89588.cfm
some of their new ATA drives will carry a three-year warranty.
John.
On Tue, 10 Sep 2002 [email protected] wrote:
> According this this announcement:
>
> http://www.shareholder.com/maxtor/news/20020909-89588.cfm
>
> some of their new ATA drives will carry a three-year warranty.
>
> John.
Right. Only the MaxLine II. The rest, not including SCSI, are 1 year.
It looks to me that Maxtor is exiting the consumer market. They sell
their crippled DiamondMax 9/16's, and the existing product lines, to the
OEM's who don't care about warranty as much.
They continue to sell to NAS/SAN manufacturers(even though they got out of
the business themselves), and to people building large servers, with their
MaxLine II.
Mike
On Tue, Sep 10, 2002 at 11:21:24AM -0400, Mike Dresser wrote:
> On Tue, 10 Sep 2002 [email protected] wrote:
>
> > According this this announcement:
> >
> > http://www.shareholder.com/maxtor/news/20020909-89588.cfm
> >
> > some of their new ATA drives will carry a three-year warranty.
> >
> > John.
>
> Right. Only the MaxLine II. The rest, not including SCSI, are 1 year.
>
> It looks to me that Maxtor is exiting the consumer market. They sell
> their crippled DiamondMax 9/16's, and the existing product lines, to the
> OEM's who don't care about warranty as much.
Well can you blame them? Drive prices are coming down faster than processor
prices and it costs a lot more to produce a drive than a processor (production
costs, not development costs). Drives have parts. The head assembly isn't
free. It's unbelieveable that we can get drives for $1/GB, at least it is
to me. And if any of us think we're getting reliable drives at this price,
a visit from the tooth fairy can't be far behind.
What we do here is mark the date we put a drive into production on the drive
then cycle the drive out of production use in 24 months. We have lots of
build machines so the "old" drives go into those. We also put in 4 drives
for any data we care about (on a 3ware escalade in JBOD mod) and then
mirror the data nightly to /nightly, /weekly, or /monthly. If I'm really
being paranoid, I mix manufacters and release dates in the set of 4 drives
so I drop the likelihood of them all failing at once.
Don't get me wrong, there is no love lost between BitMover and Maxtor,
they aren't a customer and we've had our own problems dealing with them
in the past. However, it seems unfair to get too unhappy with a product
that works as well as it does for the price that you pay. I'd hate to
be in the drive business, it looks like a losing proposition to me.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
MaxLine II is Serial ATA coming in at 250GB per disk.
On Tue, 10 Sep 2002, Ookhoi wrote:
> [email protected] wrote (ao):
> > I have *never* lost data to a Maxtor disk. I have had IBM, Fujitsu,
> > Western Digital, and DEC drives all fail on me before.
> >
> > It's dissapointing that Maxtor are reducing their warranty from 3
> > years to 1 year, but on the other hand, I've never needed it at all.
>
> FWIW:
> On http://www.maxtor.com/products/enterprise_apps/default.htm
> they say 3 years limited warranty.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group