Please forgive me if I've sent this to the wrong address, it is the one
listed in the 2.4.21 sources in the MAINTAINERS file as the IDE
maintainer.
Okay, I've spent a LONG time (>48 hrs) trying to fix this problem on my own, and I
have no idea what is causing this totally weird behavior.
I just recently replaced a broken system with a new gigabyte GA-7VAXP
motherboard and an athlon XP 2600 cpu. The Gigabyte motherboard has the
Via VT82C586/B/686A/B (according to lspci -v) chipset on ide0, which is
being used for the WDC WD102AA hard disk (according to
/proc/ide/ide0/hda/model)
I have disabled everything I possibly could in the bios without making
it impossible to boot the system. I have tried using the original cables
from the old machine, as well as the new cables that came with the
motherboard. I have tried the hard disk in both master and cable select modes.
I have both enabled and disabled ACPI, to see if that would make it
work. I have tried moving the hard disk to a different ide channel. I
have removed all other hard disks from the system. (All are experiencing
the problems, not just this one) I have asked everyone I know that knows
anything about computers what could be wrong - most of their replies were
variants of the above.
The problem I'm seeing is, even with literally every single setting
disabled in hdparm, the system is VERY VERY SLOW, and I'm often seeing
'hda: lost interrupt' in console when I try to read/write a large amount
of data.
It's so bad I'm actually having to compile my kernels on a separate
pentium 1 133 because it's compiling them *faster* than my computer can.
I am currently using the 2.4.21 kernel, although I started trying this
on the 2.4.20 kernel. Both exhibit the same problem unfortunately.
I am familiar with patching kernels, and am able to fix cosmetic to
minor problems in source, so sending me a patch and saying 'try this'
isn't a problem.
I *am* willing to experiment and try using 2.5.whatever but *only* if
the ide maintainer or someone familiar with the ide subsystem tells me
that it's safe to use in a certain configuration. I don't want to lose
the data on my hard disk, it doesn't have a backup. (long story short, I
was about to do a backup on the machine when the motherboard blewup.
Seriously!)
If someone gives me a patch which makes the machine stable and able to
work even if it's *slow* I'll be happy. I don't want the thing to lose
data, and the message the kernel is giving me could be really really bad
IIRC if it's trying to write when it loses the interrupt. :(
I currently am limited to using a keyboard only, and I'm stuck in
console as I am unable to use X windows due to problems I was attempting
to fix before the original system blewup. So... it's hard for me to copy
and paste anything - I have to type it in manually. I would fix X, but
considering doing 'make dep' on a 2.4.21 kernel currently takes longer
on my XP 2600 system than it does to compile the *entire kernel* on a P1
133 (I'm stone cold serious.) ... ... ... No. Until I can get this fixed, I can't fix X.
I have attached output from lspci -v, /proc/interrupts, my kernel
.config and /proc/ioports in the hopes it is useful to you.
(You'll likely notice I've thrown the kitchen sink at it. *shrugs*)
If you can think of *anything* I can send you that might clear up this
problem, ask for it.
Special note: Please send all followups to this address, as I do not
have a subscription to the linux kernel mailing list. (Although I don't
mind followups that wind up on the list too - I just won't see them :)
Timothy C. McGrath
> Please forgive me if I've sent this to the wrong address, it is the one
> listed in the 2.4.21 sources in the MAINTAINERS file as the IDE
> maintainer.
> Okay, I've spent a LONG time (>48 hrs) trying to fix this problem on my own, and I
> have no idea what is causing this totally weird behavior.
> I just recently replaced a broken system with a new gigabyte GA-7VAXP
> motherboard and an athlon XP 2600 cpu. The Gigabyte motherboard has the
> Via VT82C586/B/686A/B (according to lspci -v) chipset on ide0, which is
> being used for the WDC WD102AA hard disk (according to
> /proc/ide/ide0/hda/model)
I have two Gigabyte GA-7VA based machines here with Athlon XP 2200+
cpus. This board also uses the VT82C586B chipset for IDE.
They works fine here, except for mis-detection of a 40-way cable as
80-way, (and the devices on the 40-way cable only support a maximum of
33.3MB/s anyway, so it's not a big problem at the moment).
> I have disabled everything I possibly could in the bios without making
> it impossible to boot the system. I have tried using the original cables
> from the old machine, as well as the new cables that came with the
> motherboard. I have tried the hard disk in both master and cable select modes.
> I have both enabled and disabled ACPI, to see if that would make it
> work. I have tried moving the hard disk to a different ide channel. I
> have removed all other hard disks from the system. (All are experiencing
> the problems, not just this one) I have asked everyone I know that knows
> anything about computers what could be wrong - most of their replies were
> variants of the above.
The default BIOS settings worked fine for me. I notice you've got the
IO-APIC enabled - I've left it disabled, basically because I don't
need the functionality yet, and it avoids any bugs in the kernel
IO-APIC code.
> The problem I'm seeing is, even with literally every single setting
> disabled in hdparm, the system is VERY VERY SLOW, and I'm often seeing
> 'hda: lost interrupt' in console when I try to read/write a large amount
> of data.
Hmmm, try disabling IO-APIC.
> It's so bad I'm actually having to compile my kernels on a separate
> pentium 1 133 because it's compiling them *faster* than my computer can.
Heh, must take about an hour :-). My rarely-used MMX-200 compiles
recent 2.4 trees in about 30-40 minutes.
> I am currently using the 2.4.21 kernel, although I started trying this
> on the 2.4.20 kernel. Both exhibit the same problem unfortunately.
I've used 2.4.21-RC1 and 2.4.21-RC2 on these boxes without problems,
but I've been too busy to try more recent trees. I did boot a recent
2.5 tree on one of them, and it booted successfully, but I didn't do
much testing, (due to lack of time). I installed KDE, and noticed
that it was much slower to start under 2.5, but I've not investigated
that.
> I am familiar with patching kernels, and am able to fix cosmetic to
> minor problems in source, so sending me a patch and saying 'try this'
> isn't a problem.
> I *am* willing to experiment and try using 2.5.whatever but *only* if
> the ide maintainer or someone familiar with the ide subsystem tells me
> that it's safe to use in a certain configuration. I don't want to lose
> the data on my hard disk, it doesn't have a backup. (long story short, I
> was about to do a backup on the machine when the motherboard blewup.
> Seriously!)
If you don't enable exotic options like IO-APIC and ACPI, I seriously
doubt you'll get massive file corruption. The only reasons I'm not
using 2.5 as a deault on most of my production boxes, are time and
security fixes not having gone in yet.
> If someone gives me a patch which makes the machine stable and able to
> work even if it's *slow* I'll be happy. I don't want the thing to lose
> data, and the message the kernel is giving me could be really really bad
> IIRC if it's trying to write when it loses the interrupt. :(
Try booting a 2.5 kernel, and mount the root filesystem read-only if
you're really worried about corruption.
> I currently am limited to using a keyboard only, and I'm stuck in
> console as I am unable to use X windows due to problems I was attempting
> to fix before the original system blewup. So... it's hard for me to copy
> and paste anything - I have to type it in manually. I would fix X, but
> considering doing 'make dep' on a 2.4.21 kernel currently takes longer
> on my XP 2600 system than it does to compile the *entire kernel* on a P1
> 133 (I'm stone cold serious.) ... ... ... No. Until I can get this fixed, I can't fix X.
> I have attached output from lspci -v, /proc/interrupts, my kernel
> .config and /proc/ioports in the hopes it is useful to you.
> (You'll likely notice I've thrown the kitchen sink at it. *shrugs*)
> If you can think of *anything* I can send you that might clear up this
> problem, ask for it.
Compile a minimal kernel without things like IO-APIC and ACPI
enabled.
Oh, I don't use modules by the way, so any issues with things being
compiled as modules won't be apparent to me :-). So, you might want
to try compiling everything in.
John.
Hi,
have you already looked at the irq's from the lspci output? Especially
the
usage from 16 to 19, for graphics-, sound- and network-card looks a bit
wrong.
>From your kernel-configuration I can see that you have acpi enabled, the
first I would do is booting with acpi=off as kernel option. If you
really need acpi later on, you should try with pci=noacpi or pci=biosirq. Also
consider to use a more recent acpi from acpi.sf.net.
If this doesn't help, I would suggest that you also send some lines from
dmesg-output, so that the real experts can see whats going on with your
irq-routing.
Best regards,
Bernd
On Thu, Jun 19, 2003 at 02:29:40PM +0200, Bernd Schubert wrote:
> have you already looked at the irq's from the lspci output? Especially the
> usage from 16 to 19, for graphics-, sound- and network-card looks a bit
> wrong.
No, I hadn't noticed that. hmm. However I have both tried with and
without ACPI and APCI or whatever it is (One does stuff like APM and the
other one does the irq routing among other things IIRC.) People have
told me to both turn it on and off. Anyway, later on today I'm going to
try using a minimal kernel built with no options I absolutely don't need
to have to boot this thing. Might work.
dmseg attached.
Timothy C. McGrath
I believe I have nailed the problem to the wall. Your talk about the
bios misdetecting the cable got me to thinking - I hadn't actually been
able to see what the bios said it was configuring the disks attached to
since lilo's menu came up microseconds later.
I still haven't bothered checking, however I believe the bios is on a
very unhealthy volume of crack. :)
using hdparm -i /dev/hda shows the disk wasn't configured to do any pio
mode or udma/dma mode at boot time. Strange, right?
Stranger when you do hdparm -I on the disk again and it shows the disk
is set to use udma4 - and the disk only understands up to udma2! - now
add in the fact I currently have a 40 wire cable connected to the disk
and my brain starts frying :)
At a suggestion of a friend, I set the disk to use mdma2 - via the line:
hdparm -Xmdma2 -d1 /dev/hda
It worked, for all of two seconds. Remember, this is a WD drive. WD
drives, or at least mine, like to screw up in pretty amazing ways when
you turn dma on initially. Mine throws a screenful of CRC errors,
causing the kernel to reset the ide channel. Oddly, I noticed that dma
was still on despite the fact the channel had been reset - so I checked
with -I again, only to find out now the disk was told to use udma*3*! -
this wasn't getting me anywhere. >D Anyway, the simple fix was to force
it to keep settings across a reset by doing:
hdparm -Xmdma2 -k1 -d1 /dev/hda
- I am no longer getting any hda: lost interrupt messages, nor am I
getting any errors at all about the disk losing data or getting
confused. It's running slower than I'm used to, as I used to run it in
ata66 mode, but MUCH faster than it was a day ago. :) All I need to do now is migrate the information from this disk to one of my maxtors and I'm all set. Finally, I can start setting this machine up. Note, I could get this disk to use ata66 again if I switched cables to the 80 wire variant - but I plan on replacing this disk asap anyway.
So, to summarize: The BIOS in the Gigabyte GA-7VAXP motherboard (and
likely all variants using the same bios) is getting confused and
misdetecting both the cable's abilities and the hard disks abilities,
causing linux to have a very nasty fit when you try using it without
manually changing the settings using hdparm.
I have not tried, nor will I likely try, setting the PIO modes up with
this motherboard as I don't need to. However, it is very likely that the
same problem occurs with dma disabled as with dma enabled - you need to
manually reconfigure the hard disk and disk controller using hdparm to
the correct values, or it just basically gets all confused and whines.
Also note, I tested this setup after configuring with hdparm in three
ways: First, I did a test using hdparm -t -T /dev/hda - Passed. A little
slow, but understandable considering. Second, I did a simple test doing
find / - this almost always caused the thing to throw a hda: lost
interrupt before at some point or another. Passed. Finally, I'm
currently doing a kernel compile. As I said, a P1 133 was outpacing this
machine before. This is a AMD XP 2600+ - it's absolutely ludicrous for a
P1 to outpace this thing, unless some unsane overclocker ... no, I don't
want to encourage anyone. :P Anyway, even with the slow settings, the
kernel compile is going quite nicely, and is going much faster than the
P1 could ever hope to do.
Note that the bios in this motherboard does not support turning OFF dma
support - the only options are 'auto' 'ata33' and 'ata66/100/133' - all
of which don't appear to actually work. For instance, I have the bios
set to ata33 right now as I write this, and despite this, it was still
trying to set the disk up to use udma4!
A buggy bios a happy linux user does not make. :)
Thank you for all your help, time and effort. It was greatly
appreciated.
Tim McGrath
On Fri, Jun 20, 2003 at 03:52:51AM -0400, [email protected] wrote:
> I believe I have nailed the problem to the wall. Your talk about the
> bios misdetecting the cable got me to thinking - I hadn't actually been
> able to see what the bios said it was configuring the disks attached to
> since lilo's menu came up microseconds later.
>
> I still haven't bothered checking, however I believe the bios is on a
> very unhealthy volume of crack. :)
>
> using hdparm -i /dev/hda shows the disk wasn't configured to do any pio
> mode or udma/dma mode at boot time. Strange, right?
Not very strange. Some disks even cannot report it. And, any disk if set
to PIO only may not report any mode as active, since that is only
applicable to DMA modes.
> Stranger when you do hdparm -I on the disk again and it shows the disk
> is set to use udma4 - and the disk only understands up to udma2! - now
If you use hdparm -I and the drive reports udma4, it can understand
udma4, since hdparm -I is straight from the drive's mouth.
> add in the fact I currently have a 40 wire cable connected to the disk
> and my brain starts frying :)
That's a problem I assume. If the drive can do udma3 and higher, the
chipset can do udma3 and higher, and you have a 40-wire cable and some
bad luck, the BIOS or the driver may misdetect the cable and try to
operate the drive at, most likely, udma4. This won't work, of course.
> At a suggestion of a friend, I set the disk to use mdma2 - via the line:
>
> hdparm -Xmdma2 -d1 /dev/hda
Don't do this. If your drive supports udma, then there is no reason to
use mwdma. Ever. mwdma is not crc-protected and that can lead to drive
data corruption, namely in the case where you seem to have cabling
problems.
If you have a 40-wire cable, udma2 will work just fine on it. If you
want a slower speed, use udma1 or udma0, which is the same speed as
mwdma2, but is CRC protected and thanks to the udma signalling also more
robust.
> It worked, for all of two seconds. Remember, this is a WD drive. WD
> drives, or at least mine, like to screw up in pretty amazing ways when
> you turn dma on initially. Mine throws a screenful of CRC errors,
> causing the kernel to reset the ide channel.
CRC errors in mwdma mode? Weird. Those CRC errors must've come from the
drive itself - them not being transfer CRC errors but surface CRC
errors. That's mean the drive is dying and you should be getting them in
PIO mode as well
> Oddly, I noticed that dma
> was still on despite the fact the channel had been reset - so I checked
> with -I again, only to find out now the disk was told to use udma*3*! -
> this wasn't getting me anywhere. >D
Note that the asterisk stays even after the drive is used back in PIO
mode. The way to check if DMA is being used is hdparm /dev/hd*
> Anyway, the simple fix was to force
> it to keep settings across a reset by doing:
>
> hdparm -Xmdma2 -k1 -d1 /dev/hda
>
> - I am no longer getting any hda: lost interrupt messages, nor am I
> getting any errors at all about the disk losing data or getting
> confused. It's running slower than I'm used to, as I used to run it in
> ata66 mode, but MUCH faster than it was a day ago. :) All I need to do
> now is migrate the information from this disk to one of my maxtors and
> I'm all set. Finally, I can start setting this machine up. Note, I
> could get this disk to use ata66 again if I switched cables to the 80
> wire variant - but I plan on replacing this disk asap anyway.
>
> So, to summarize: The BIOS in the Gigabyte GA-7VAXP motherboard (and
> likely all variants using the same bios) is getting confused and
> misdetecting both the cable's abilities and the hard disks abilities,
> causing linux to have a very nasty fit when you try using it without
> manually changing the settings using hdparm.
>
> I have not tried, nor will I likely try, setting the PIO modes up with
> this motherboard as I don't need to. However, it is very likely that
> the same problem occurs with dma disabled as with dma enabled - you
> need to manually reconfigure the hard disk and disk controller using
> hdparm to the correct values, or it just basically gets all confused
> and whines.
>
> Also note, I tested this setup after configuring with hdparm in three
> ways: First, I did a test using hdparm -t -T /dev/hda - Passed. A
> little slow, but understandable considering. Second, I did a simple
> test doing find / - this almost always caused the thing to throw a
> hda: lost interrupt before at some point or another. Passed. Finally,
> I'm currently doing a kernel compile. As I said, a P1 133 was
> outpacing this machine before. This is a AMD XP 2600+ - it's
> absolutely ludicrous for a P1 to outpace this thing, unless some
> unsane overclocker ... no, I don't want to encourage anyone. :P
> Anyway, even with the slow settings, the kernel compile is going quite
> nicely, and is going much faster than the P1 could ever hope to do.
>
> Note that the bios in this motherboard does not support turning OFF
> dma support - the only options are 'auto' 'ata33' and 'ata66/100/133'
> - all of which don't appear to actually work. For instance, I have the
> bios set to ata33 right now as I write this, and despite this, it was
> still trying to set the disk up to use udma4!
>
> A buggy bios a happy linux user does not make. :)
>
> Thank you for all your help, time and effort. It was greatly
> appreciated.
>
> Tim McGrath - To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in the body of a message to
> [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html Please read the FAQ at
> http://www.tux.org/lkml/
--
Vojtech Pavlik
SuSE Labs, SuSE CR
On Fri, Jun 20, 2003 at 10:58:53AM +0200, Vojtech Pavlik wrote:
> On Fri, Jun 20, 2003 at 03:52:51AM -0400, [email protected] wrote:
>
> > using hdparm -i /dev/hda shows the disk wasn't configured to do any pio
> > mode or udma/dma mode at boot time. Strange, right?
>
> Not very strange. Some disks even cannot report it. And, any disk if set
> to PIO only may not report any mode as active, since that is only
> applicable to DMA modes.
Hmm. Interesting, as the information shown with -i matches what the disk
supports, and -I matches what the controller supports. maybe a bug in
hdparm, or perhaps I was using it incorrectly. I'll investigate.
> If you use hdparm -I and the drive reports udma4, it can understand
> udma4, since hdparm -I is straight from the drive's mouth.
again that's interesting as -i shows the max the disk can do is udma2 -
and in fact, that *is* the max.
> > add in the fact I currently have a 40 wire cable connected to the disk
> > and my brain starts frying :)
>
> That's a problem I assume. If the drive can do udma3 and higher, the
> chipset can do udma3 and higher, and you have a 40-wire cable and some
> bad luck, the BIOS or the driver may misdetect the cable and try to
> operate the drive at, most likely, udma4. This won't work, of course.
hrm. The BIOS detects I have a 40 wire cable, however tells the chipset
that I have an 80 wire cable. (Confirmed by checking /proc/ide/via) So, things are pretty f'd up in the BIOS, and I can't figure out why it's doing this. Not that it really matters now that I can get it to work *anyway* - but.
> > At a suggestion of a friend, I set the disk to use mdma2 - via the line:
> >
> > hdparm -Xmdma2 -d1 /dev/hda
>
> Don't do this. If your drive supports udma, then there is no reason to
> use mwdma. Ever. mwdma is not crc-protected and that can lead to drive
> data corruption, namely in the case where you seem to have cabling
> problems.
I was totally unaware of this. Thanks for the information!
>
> If you have a 40-wire cable, udma2 will work just fine on it. If you
> want a slower speed, use udma1 or udma0, which is the same speed as
> mwdma2, but is CRC protected and thanks to the udma signalling also more
> robust.
OK, I wasn't aware udma2 worked on this type of cable, and was planning
on switching back to the other 80 wire cable before I continued. Thanks.
>
> > It worked, for all of two seconds. Remember, this is a WD drive. WD
> > drives, or at least mine, like to screw up in pretty amazing ways when
> > you turn dma on initially. Mine throws a screenful of CRC errors,
> > causing the kernel to reset the ide channel.
>
> CRC errors in mwdma mode? Weird. Those CRC errors must've come from the
> drive itself - them not being transfer CRC errors but surface CRC
> errors. That's mean the drive is dying and you should be getting them in
> PIO mode as well
I know a little more about this than you having dealt with this buggy
drive for more than two years. Basically wd drives don't deal well with
DMA as their hardware doesn't follow the spec - specifically, something
in the crc handling is f'd up, so occasionally the drive will report a
crc error that doesn't actually exist. And it will continue reporting
crc errors infinitely until the ide channel and all devices on it are
reset - which can be maddening at times when I'm playing quake,
something tries to load a file and for two seconds my computer is
frozen. Luckily due to the kernel's error handling, it can deal with
this, and life goes on. I've never lost any data due to this hardware
bug, it's just annoying. If you're in the market for a hard disk, I'd
avoid Western Digital hard disks for this very reason - they're cheap
for a reason. :)
The whole crc errors filling the screen thing - that's actually NORMAL
for this disk when you initially turn dma on. It used to happen on the
old system, so I was actually glad to see it happening again, because it
meant things were working. The error messages weren't showing up in
udma4 mode, and in fact it was acting as if dma was totally turned
off... I guess because the disk doesn't understand udma4 at all.
> > Oddly, I noticed that dma
> > was still on despite the fact the channel had been reset - so I checked
> > with -I again, only to find out now the disk was told to use udma*3*! -
> > this wasn't getting me anywhere. >D
>
> Note that the asterisk stays even after the drive is used back in PIO
> mode. The way to check if DMA is being used is hdparm /dev/hd*
Yes, I did that, which is why I noticed dma was still enabled to my
astonishment. Rather than get my hopes up, I checked what the drive was
set to with -i, and nothing showed up - so I checked with -I and my
eyebrow started twitching because it had neither stayed at the setting
I'd set nor gone back to the setting it was at before.
I've never set up a hard disk with PIO modes before, so I'd like to hear
any advice you have on that - currently whatever setup the hard disk is
trying to use with dma off is malfunctioning in a grand manner, and I'd
like to have a failsafe in case for whatever reason the hard disk throws
the gauntlet down and refuses to do dma in the future for whatever
reason.
Mostly, I want to know if what is listed by a -i/-I is assumed 'safe' to
use. Certaintly I'm expecting you to say 'no, backup your data first
before experimenting' so, you need not surprise me. However, any other
advice you have would certaintly be appreciated. Also, I have a 486 with
dma capable hard disks that has a non dma capable disk controller, so if
I can get PIO modes working on it's hard disks it might speed things up
slightly, who knows.
Thanks for your comments,
Timothy C. McGrath
On Fri, Jun 20, 2003 at 01:57:41PM +0200, Vojtech Pavlik wrote:
> Well, send me the output of your dmesg (only the IDE parts),
> /proc/ide/via, hdparm /dev/hd*, hdparm -i /dev/hd* and I'll write a
> commentary on what means what and what's the maximum speed the thing is
> expected to operate at.
Will do this tomorrow, need to sleep now. Thanks for the help :)
Timothy C. McGrath
On Fri, Jun 20, 2003 at 07:40:30AM -0400, [email protected] wrote:
> Mostly, I want to know if what is listed by a -i/-I is assumed 'safe' to
> use. Certaintly I'm expecting you to say 'no, backup your data first
> before experimenting' so, you need not surprise me. However, any other
> advice you have would certaintly be appreciated. Also, I have a 486 with
> dma capable hard disks that has a non dma capable disk controller, so if
> I can get PIO modes working on it's hard disks it might speed things up
> slightly, who knows.
Well, send me the output of your dmesg (only the IDE parts),
/proc/ide/via, hdparm /dev/hd*, hdparm -i /dev/hd* and I'll write a
commentary on what means what and what's the maximum speed the thing is
expected to operate at.
--
Vojtech Pavlik
SuSE Labs, SuSE CR
[email protected] writes:
>I believe I have nailed the problem to the wall. Your talk about the
>bios misdetecting the cable got me to thinking - I hadn't actually been
>able to see what the bios said it was configuring the disks attached to
>since lilo's menu came up microseconds later.
Put your Boot Sequence on "Floppy --> Anything else" and format a floppy
with just plain DOS. :-)
Works for me every time.
Regards
Henning
--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen INTERMETA GmbH
[email protected] +49 9131 50 654 0 http://www.intermeta.de/
Java, perl, Solaris, Linux, xSP Consulting, Web Services
freelance consultant -- Jakarta Turbine Development -- hero for hire
Alright, minor update. I was messing around tonight not really expecting
to get anywhere, and to my horror, saw a 'hda: lost interrupt' message.
Again. So. Off I go, trying to figure out what is going on and what do I
find out? You sir were totally right. even with the -k1 setting, the wd
drive changes it's settings back to whatever it wants when it throws the
crc errors. Which incidentally appears to be settings it shouldn't be
capable of supporting, but... whatever. Anyway. A friend of mine (who
I'm very grateful for putting up with me) helped me screw around with
PIO modes, and we managed to get the disk working with dma off and PIO
mode 4 enabled.
What was likely fooling me into thinking the drive was working properly
is the enormous amount of ram my computer has now - I tend to forget
that with almost 512MB free of ram, my disk cache can be absolutely
enormous. Which of course means I can easily get fooled into thinking a
disk operation is working perfectly fine when in fact the disk isn't
even being touched at all.
it looks like I was very lucky with my original motherboard and that the
wd drive was able to communicate at it's stock settings without having
any special setup - otherwise, this entire assumption would never have
happened - the disk worked perfectly fine with dma on my previous
motherboard, which is why I was so surprised things broke so damn fast
now.
So, the Gigabyte motherboard I'm using is still missetting the values
for the hard disks - but on the other hand, my hard disk was also
playing foul games.
I tested right after doing a reboot with the PIO4 settings, and it
appears to be working just fine. My test consisted of a hdparm -t -T
/dev/hda and also a tar c / > /dev/null for completeness. No problems.
I'll have more detailed information on my setup for you to look at
fairly soon.
After I get the data moved off of it, I plan on sticking this WD drive
into my 486, where it will happily work without any dma support at all.
And it can stay there, for all I care. :)
Timothy C. McGrath
Well, I just rebooted and checked - I can, in fact, set the CHS
settings. However, not to the settings I need. ... Which explains nicely
why my bios can't figure out what to do with my disk. Ah well, I tried.
Anyone with suggestions on how to get DOS booting and happy would be
appreciated.
Tim McGrath
To my astonishment, everything is working now that I've had time to sort
out all the problems.
What I had to do: First off, the WD drive was causing a lot of strange
problems all by itself. Removing it from my computer made everything
work a lot better - both maxtors have their DMA modes correctly detected
and turned on at boot time, although I still manually prefer to turn
them on and do small amounts of tuning myself using hdparm once the
kernel starts.
Weird problems I've noticed include that the kernel and motherboard
disagree with the CHS attributes of my larger disk, a 120GB maxtor 7200
rpm drive. I'm pretty sure the kernel is right, mostly because using the
CHS settings the motherboard insists on cause braindamage in DOS,
including but not limited to the partition table being misdetected, the
files on the disk I copied there from linux to dos going missing, 6mb of
'bad sectors' when I try formatting the partition (Which goes away when
I enter linux.) and when linux fscks the disk, it can find DOS's files
and believes the disk has enormous filesystem corruption. Oddly enough,
the 40GB maxtor hard disk I have (That incidentally originally came with
the ps2 linux kit) is correctly detected.
I haven't yet tried fiddling with the motherboard's settings for the
first hard disk - everything is still on auto, but I'm pretty sure there
is nothing I can do to manually set the CHS, which will be a problem if
I want to boot dos correctly. :( I've checked with maxtor's site, but
their maxblast program only works in windows according to it, which is a
pity since I don't have that installed here. I'll likely be forced to
use dosemu, although right now I can neither get sound nor VGA/SVGA to
work in it. *shrugs* I'm sure given enough time I can nail this too,
although I'd love suggestions. ... Hmm, just had a thought. Worst case,
I can make the 40GB disk have a dos partition and tell lilo to swap the
disk appearance for dos when I want to run dos. ... I might go with
that, actually.
Things that I've tried and haven't worked, include manually setting the
CHS in lilo's append command - obviously this only works for the linux
kernel. I've tried using grub as well, and it fixes the braindamage by
itself - problem is according to what little I've read of grub's
documentation there is no way to boot dos using it that I can see. If
I'm wrong, please inform me where the documentation for doing it is
located?
So. Problems with the Gigabyte GA-7VAXP motherboard I've found with
linux and it's F11 variant of the bios include:
Western Digital drives go absofuckinglutely insane in linux, which
actually isn't abnormal from what I've read - seems to be an issue more
with the disk being insane in the first place, and telling the BIOS it
can do things it not only can't do, but won't do, nor will the disk
complain when you try to do them. If you have one of these disks, the
only thing I can really say to you is to force dma OFF and experiment
with PIO mode. Mode 4 worked on my hard disk, but yours might be
different from mine, keep this in mind. Also note PIO mode is *slow* -
unbelievably slow compared to DMA. If you are stuck with one of these
disks, if you can return it and get your money back - do it. Otherwise,
you might consider getting a different manufacturers hard disk - from
what I've read using google.com about western digital disks and linux,
they just plain don't work with dma in linux, and the company seems to
have no plans on fixing this problem.
Large hard disks have their CHS values incorrectly detected.
Not including this, I've noted that the onboard fan that came with the
board which cools the agp chipset (at least I think it does - has a 'AGP
8x' sticker on it) is starting to wear out. I'm sure I can find a
replacement for it at radio shack, but I think it's a little silly for
it to be wearing out already.
I can't really recommend this board to other people, especially as I
haven't even tried enabling the other hard disk controller (which uses a
promise bios and can do ATA/RAID) and the problems I've had with this
motherboard were really absurd at some points. Also one of the things
you WON'T see on websites trying to sell you this board is that the bios
doesn't let you change important things - like memory timings, wether or
not dma should be totally turned OFF to hard disks, among other things.
I've not investigated yet, but I have a feeling I'm not going to be able
to manually set the CHS either, which would be a pity. None of these
things really makes a difference in linux, but on the other hand it
makes it a lot easier to track down problems if you can force the bios
to do the hard work for you when you're trying to figure out what's
wrong with the machine. Trying to fix problems on this motherboard makes
me feel half the time like I've got a hand tied behind my back. Another
thing is that the manual tends to be in engrish, so it's a little hard
to read at times. I won't be buying a gigabyte motherboard next time -
they're nice and slick, have pretty pictures, have a easy to use bios,
so long as you don't do things it doesn't understand, but although I
like having an 'autodetect' mode, I also like being able to manually set
the bios to do things. I had and still have this same type of problem
with my IBM PS/1 2155-78C(SL-B) which was built almost ten years ago -
it's silly to see people still building motherboards that don't have the
ability for the user to set settings manually.
Timothy C. McGrath
On 26 Jun 2003, Tim McGrath wrote:
> Well, I just rebooted and checked - I can, in fact, set the CHS
> settings. However, not to the settings I need. ... Which explains nicely
> why my bios can't figure out what to do with my disk. Ah well, I tried.
>
> Anyone with suggestions on how to get DOS booting and happy would be
> appreciated.
Maybe you can use dosemu under Linux instead?
http://www.dosemu.org
There is also GPLed DOS replacement, maybe it doesn't use CHS info,
if it does you have source code available ;-).
http://www.freedos.org
--
Bartlomiej
> Tim McGrath
On Thu, 2003-06-26 at 06:50, Bartlomiej Zolnierkiewicz wrote:
> Maybe you can use dosemu under Linux instead?
> http://www.dosemu.org
Well, yes, it works - problem is I can't figure out how to get sound nor
VGA/SVGA working in it, and I need both to play the games I use DOS for.
> There is also GPLed DOS replacement, maybe it doesn't use CHS info,
> if it does you have source code available ;-).
> http://www.freedos.org
Nope, not even going to try that one - I've used it before and the last
time I tried it, it had the same problems dos did. Not that it's a bad
program or anything.
Thanks for trying to help. If you know how to get dosemu working with
vga/svga and sound in linux console/x let me know.
Another option I'm going to try is booting off the 40GB disk I have
instead of the 100GB disk - should work as the CHS info given by the
bios agrees with the kernel.
We'll see.
Timothy C. McGrath