OK, I have heard that other people have been having this problem for a while
now, but I havent been able to find much about what causes it. I have a
Western Digital hard drive in my computer (60GB, 5400 RPM) I can use it just
fine with no DMA, but it runs much faster with DMA. However, when I use DMA,
all my data is slowly corrupted, and I begin having to re-install packages
all the time. After about a month, my system deteriorates to the point where
I have to reinstall slackware. I have no idea why this is happening, but I
know some people who have had the same experience under Linux with Western
Digital hard drives, but not with other brands. I am assuming this is a
problem with Western Digital's implimentation of DMA, but shouldnt it do
something to prevent errors?
Thanks.
--
Adam Jaskiewicz
[email protected]
http://middlearth.d2g.com:31415
talk: [email protected]
--
Never tell a lie unless it is absolutely convenient.
> OK, I have heard that other people have been having this problem for a while
> now, but I havent been able to find much about what causes it. I have a
> Western Digital hard drive in my computer (60GB, 5400 RPM) I can use it just
> fine with no DMA, but it runs much faster with DMA. However, when I use DMA,
> all my data is slowly corrupted, and I begin having to re-install packages
> all the time. After about a month, my system deteriorates to the point where
> I have to reinstall slackware. I have no idea why this is happening, but I
> know some people who have had the same experience under Linux with Western
> Digital hard drives, but not with other brands. I am assuming this is a
> problem with Western Digital's implimentation of DMA, but shouldnt it do
> something to prevent errors?
What is the chipset of the interface it's on?
John.
> first, what controller is it plugged into, and which kernel are you
> running, and what are the ide-related boot messages?
Well, ATM its 2.4.17, but ive had the problem all through since 2.4.5, which
was the first kernel installed on this machine. The chipset is Intel 440BX.
These are the IDE boot messages:
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x1420-0x1427, BIOS settings: hda:pio, hdb:DMA
ide1: BM-DMA at 0x1428-0x142f, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD600AB-00BVA0, ATA DISK drive
hdb: WDC AC313600D, ATA DISK drive
hdc: TOSHIBA DVD-ROM SD-M1212, ATAPI CD/DVD-ROM drive
hdd: PCRW804, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 117231408 sectors (60022 MB) w/2048KiB Cache, CHS=7297/255/63, UDMA(33)
hdb: 26712000 sectors (13677 MB) w/1966KiB Cache, CHS=1662/255/63, UDMA(33)
hdc: ATAPI 32X DVD-ROM drive, 256kB Cache, UDMA(33)
hdd is running though ide-scsi, as it is a cd-rw. hda and hdb both have dma
turned off later in the boot process by hdparm. Could it be that I wasnt
using those 80 conductor cables, and was getting crosstalk? I guess i could
buy some to test that theory out...
--
Adam Jaskiewicz
[email protected]
http://middlearth.d2g.com:31415
talk: [email protected]
--
... But we've only fondled the surface of that subject.
-- Virginia Masters
> turned off later in the boot process by hdparm. Could it be that I wasnt
> using those 80 conductor cables, and was getting crosstalk? I guess i could
> buy some to test that theory out...
Have a look at the number of UDMA CRC errors reported by
smartctl -a /dev/hda?
John
> hdd is running though ide-scsi, as it is a cd-rw. hda and hdb both have dma
> turned off later in the boot process by hdparm. Could it be that I wasnt
> using those 80 conductor cables, and was getting crosstalk? I guess i could
> buy some to test that theory out...
if you have noisy cables and someone turns off udma,
yes, you could certainly see corruption. if you can
possibly ever use udma, it's a very good idea to do so;
only with it are transfers checksummed. 80-conductor
cables are always advantageous as well, though only
required over udma33. (remember that valid IDE cables
are always <= 18" long, with no stubs...)
On Sun, Sep 08, 2002 at 04:46:37PM -0400, Adam Jaskiewicz wrote:
> OK, I have heard that other people have been having this problem for a while
> now, but I havent been able to find much about what causes it. I have a
> Western Digital hard drive in my computer (60GB, 5400 RPM) I can use it just
[snip]
What brand of IDE controller does your computer have? WD drives often
don't get along with VIA IDE controllers. (I think very recent WD drives
might have fixed this, but I'm not sure.)
Also, I'm pretty sure that a 60GB WD drive is too new to be affected by
the DMA problems that their older drives had. I'd look at other factors
like the cables (they are 80-conductor, and 18" or shorter, right?), the
IDE controller (bad controllers can do this), or the power supply (that
was the cause for the case of data corruption that I most recently
investigated).
-Barry K. Nathan <[email protected]>
> if you have noisy cables and someone turns off udma,
> yes, you could certainly see corruption. if you can
> possibly ever use udma, it's a very good idea to do so;
How do I enable UDMA as opposed to just DMA? I was having trouble with DMA,
but no trouble (other than VERY slow access) without DMA. I have an 80
conductor cable SOMEWHERE (probably in the bottom of a box in the basement
lol) but im almost certain the cables in there now arent more than 18 inches,
as its a fairly stock Dell system, and has the original ribbon cables.
--
Adam Jaskiewicz
[email protected]
http://middlearth.d2g.com:31415
talk: [email protected]
--
"I'd love to go out with you, but there are important world issues that
need worrying about."
[email protected] said:
> > OK, I have heard that other people have been having this problem for a while
> > now, but I havent been able to find much about what causes it. I have a
> > Western Digital hard drive in my computer (60GB, 5400 RPM) I can use it just
> > fine with no DMA, but it runs much faster with DMA. However, when I use DMA,
> > all my data is slowly corrupted, and I begin having to re-install packages
> > all the time. After about a month, my system deteriorates to the point where
> > I have to reinstall slackware. I have no idea why this is happening, but I
> > know some people who have had the same experience under Linux with Western
> > Digital hard drives, but not with other brands. I am assuming this is a
> > problem with Western Digital's implimentation of DMA, but shouldnt it do
> > something to prevent errors?
>
> What is the chipset of the interface it's on?
Use DMA for a week or so, and / is mangled beyond recognition (seems to
happen with read-only access too). Chipset is intel (sr440bx board, PIIX4E
IDE). Have heard of similar problems with DMA on WD drives, but got no
details.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Sun, 2002-09-08 at 23:28, Adam Jaskiewicz wrote:
> > if you have noisy cables and someone turns off udma,
> > yes, you could certainly see corruption. if you can
> > possibly ever use udma, it's a very good idea to do so;
>
> How do I enable UDMA as opposed to just DMA? I was having trouble with DMA,
> but no trouble (other than VERY slow access) without DMA. I have an 80
> conductor cable SOMEWHERE (probably in the bottom of a box in the basement
> lol) but im almost certain the cables in there now arent more than 18 inches,
> as its a fairly stock Dell system, and has the original ribbon cables.
UDMA is on so that doesn't explain what is happening at all
On Mon, 2002-09-09 at 03:19, Horst von Brand wrote:
> Use DMA for a week or so, and / is mangled beyond recognition (seems to
> happen with read-only access too). Chipset is intel (sr440bx board, PIIX4E
> IDE). Have heard of similar problems with DMA on WD drives, but got no
> details.
Old old (we are talking 340MB era here) WD had some DMA problems in a
few cases. We know about it and blacklist such drives. I'm aware of a
few "UDMA doesnt work" type incompatibilities with WD drives but not
with PIIX and always UDMA crc errors
On Mon, Sep 09, 2002 at 10:08:14PM +0100, Alan Cox wrote:
> Old old (we are talking 340MB era here) WD had some DMA problems in a
> few cases. We know about it and blacklist such drives. I'm aware of a
shouldn't these drives be _way_ past their mtbf by now and into the "wow
its still working" stage. not that its an argument against
blacklisting... (far from it!)
j.
--
toyota power: http://indigoid.net/
On Sun, 8 Sep 2002, Adam Jaskiewicz wrote:
> OK, I have heard that other people have been having this problem for a while
> now, but I havent been able to find much about what causes it. I have a
> Western Digital hard drive in my computer (60GB, 5400 RPM) I can use it just
> fine with no DMA, but it runs much faster with DMA. However, when I use DMA,
> all my data is slowly corrupted, and I begin having to re-install packages
> all the time. After about a month, my system deteriorates to the point where
> I have to reinstall slackware. I have no idea why this is happening, but I
> know some people who have had the same experience under Linux with Western
> Digital hard drives, but not with other brands. I am assuming this is a
> problem with Western Digital's implimentation of DMA, but shouldnt it do
> something to prevent errors?
This _could_ be your power supply.
I had problems with two IBM 22GXP drives attached to a Tyan dual slot1 board
based on the BX chipset. The system seemed fine other than drives would spin
down and spin up occasionally and the /var/log/messages output would indicate
DMA "drive not ready" errors.
I can't remember why I finally did it, but after replacing the power supply,
the system operated fine. The original was 3 years old, a generic to begin
with. (I've become a big fan of Antec power supplies and cases now that I've
upgraded to Athlon class CPUs.)
Andre's drivers (which your output shows you are using on 2.4.17) have been
very good on this type of equipment, IMHO. But software can't overcome
failing or poor quality hardware and the age of the BX chipset suggests this
is probably and older computer as well.
This is just a guess, but something you can try and investigate on your own
right now..
-- ----------------------------------------------------------------------------
Maxwell Spangler
Program Writer
Greenbelt, Maryland, U.S.A.
Washington D.C. Metropolitan Area
Maxwell Spangler <[email protected]> said:
[...]
> I can't remember why I finally did it, but after replacing the power
> supply, the system operated fine. The original was 3 years old, a
> generic to begin with. (I've become a big fan of Antec power supplies
> and cases now that I've upgraded to Athlon class CPUs.)
This is the second power supply here (original died on me), corruption with
the old and the new one. I very much doubt both are broken the same
way. Problems started with 2.4 or thereabouts.
WDC WD135AA, 15GiB; PIIX4E (intel sr440bx mobo, updated bios)
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513