2002-08-18 06:40:46

by Ed Sweetman

[permalink] [raw]
Subject: cerberus errors on 2.4.19 (ide dma related)

(overview written in hindsight of writing email)
I ran all these tests on ide/host2/bus0/target0/lun0/part1
when dma was enabled or disabled is was done to both drives at the same
time.
I do not know if cerberus cares where it is run or not to do it's tests,
but the program was on the drive it was tested on when run and
throughout the email i assume it only runs it's drive tests primarily on
the partition you've run it on. I see now that this is probably wrong
and instead of changing where i run the test i should alternate which
drive gets dma enabled and disabled and process of elimination will show
just the kind of dma bug i'm seeing
(/overview)


I've been trying to track down why i seem to get disk corruption on my
harddrives after some good amount of usage all the time. It's been
happening for a long time across a number of different kernel versions.
I believe this is because i stick to the same board manufacturer, Abit
and use via chipsets.

I ran cerberus with dma enabled at UDMA4 and UDMA2, at udma4 cerberus
reports MEMORY errors and BBidehost2bus0target0lun0discN1 errors, but
mostly MEMORY errors before the kernel panics after a minute or two. At
udma2 the cerberus reports no errors but panics after a minute or two.
I ran cerberus a couple times on each, with UDMA4 it began to error
about 30 seconds into the test with MEMORY errors.

I thought, well this could be ram errors, so i ran memtest for a couple
hours. Nothing reported as being bad. I then thought, my hardware
could be the problem, so I ran e2fsck -c on the partition I was running
cerberus on with dma disabled via hdparm -d0 and it completed with no
errors found. I then rebooted, enabled udma2 and the kernel panic'd
with the same test after a few minutes.

The rest of this email is just information regarding the setup


First off the way my fs's are setup are as follows:

swap + files are now all on my primary master ide drive on the
motherboard ide controller. Swap on my primary master promise controller
seemed too problematic because of corruption, but i'm not sure if the
corruption i've seen is related only to the promise controller or if
it's not controller specific. I'll have to run the test without swap on
the promise drive and then run the test on my primary motherboard hdd
and again without swap.

cerberus version : 1.3.0pre4
dmesg info : http://signal-lost.homeip.net/lkml/dmesg
hdparm info : http://signal-lost.homeip.net/lkml/hdparm
pci info : http://signal-lost.homeip.net/lkml/lspci

tests completed before escaping in pio mode:
http://signal-lost.homeip.net/lkml/tests_passed

Errors during last test that caused kernel panic (udma2)
http://signal-lost.homeip.net/lkml/memory

Errors during test of udma4 (first test)
http://signal-lost.homeip.net/lkml/memory2
http://signal-lost.homeip.net/lkml/dmesg2
various segfaults of badblocks of BBidehost tests.


I ran memtest for an extensive amount of time after the first test
reported memory errors and go absolutely no errors (wasn't using dma
mode at the time either). And since these errors aren't produced when
not using DMA on my drives I find it very unlikely that it's "System
Ram" as the cause of them. I'm going to rerun the test on my
motherboard primary drive after posting this in case something happens
and i hose everything.





2002-08-18 07:11:18

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, 2002-08-18 at 02:44, Ed Sweetman wrote:
> (overview written in hindsight of writing email)
> I ran all these tests on ide/host2/bus0/target0/lun0/part1
> when dma was enabled or disabled is was done to both drives at the same
> time.
> I do not know if cerberus cares where it is run or not to do it's tests,
> but the program was on the drive it was tested on when run and
> throughout the email i assume it only runs it's drive tests primarily on
> the partition you've run it on. I see now that this is probably wrong
> and instead of changing where i run the test i should alternate which
> drive gets dma enabled and disabled and process of elimination will show
> just the kind of dma bug i'm seeing
> (/overview)
>
>
> I've been trying to track down why i seem to get disk corruption on my
> harddrives after some good amount of usage all the time. It's been
> happening for a long time across a number of different kernel versions.
> I believe this is because i stick to the same board manufacturer, Abit
> and use via chipsets.
>
> I ran cerberus with dma enabled at UDMA4 and UDMA2, at udma4 cerberus
> reports MEMORY errors and BBidehost2bus0target0lun0discN1 errors, but
> mostly MEMORY errors before the kernel panics after a minute or two. At
> udma2 the cerberus reports no errors but panics after a minute or two.
> I ran cerberus a couple times on each, with UDMA4 it began to error
> about 30 seconds into the test with MEMORY errors.
>
> I thought, well this could be ram errors, so i ran memtest for a couple
> hours. Nothing reported as being bad. I then thought, my hardware
> could be the problem, so I ran e2fsck -c on the partition I was running
> cerberus on with dma disabled via hdparm -d0 and it completed with no
> errors found. I then rebooted, enabled udma2 and the kernel panic'd
> with the same test after a few minutes.
>
> The rest of this email is just information regarding the setup
>
>
> First off the way my fs's are setup are as follows:
>
> swap + files are now all on my primary master ide drive on the
> motherboard ide controller. Swap on my primary master promise controller
> seemed too problematic because of corruption, but i'm not sure if the
> corruption i've seen is related only to the promise controller or if
> it's not controller specific. I'll have to run the test without swap on
> the promise drive and then run the test on my primary motherboard hdd
> and again without swap.
>
> cerberus version : 1.3.0pre4
> dmesg info : http://signal-lost.homeip.net/lkml/dmesg
> hdparm info : http://signal-lost.homeip.net/lkml/hdparm
> pci info : http://signal-lost.homeip.net/lkml/lspci
>
> tests completed before escaping in pio mode:
> http://signal-lost.homeip.net/lkml/tests_passed
>
> Errors during last test that caused kernel panic (udma2)
> http://signal-lost.homeip.net/lkml/memory
>
> Errors during test of udma4 (first test)
> http://signal-lost.homeip.net/lkml/memory2
> http://signal-lost.homeip.net/lkml/dmesg2
> various segfaults of badblocks of BBidehost tests.
>
>
> I ran memtest for an extensive amount of time after the first test
> reported memory errors and go absolutely no errors (wasn't using dma
> mode at the time either). And since these errors aren't produced when
> not using DMA on my drives I find it very unlikely that it's "System
> Ram" as the cause of them. I'm going to rerun the test on my
> motherboard primary drive after posting this in case something happens
> and i hose everything.

Forgot to add my kernel config.
http://signal-lost.homeip.net/lkml/config


2002-08-18 07:22:44

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

Ok, i reran the test with a little process of elimination.
The problem occurs only when dma is enabled on the promise controller's harddrive.

The cerberus test ran for 15 minutes when dma was disabled on the promise controller but both disabled and enabled on the via controller's card without any errors. When dma was enabled on the promise controller's, both with dma enabled on via and disabled cerberus reported MEMORY errors within 30 seconds both times.

It appears then that there are some DMA issues with the promise controller i have with the driver. My swap used to be on the drive on the promise controller before which would explain fs corruption on both drives (swap cached and such).

If whoever develops this driver wants some more bug testing or specific information I can give it. I'd like to help get the problem solved.




On Sun, 2002-08-18 at 02:44, Ed Sweetman wrote:
> (overview written in hindsight of writing email)
> I ran all these tests on ide/host2/bus0/target0/lun0/part1
> when dma was enabled or disabled is was done to both drives at the same
> time.
> I do not know if cerberus cares where it is run or not to do it's tests,
> but the program was on the drive it was tested on when run and
> throughout the email i assume it only runs it's drive tests primarily on
> the partition you've run it on. I see now that this is probably wrong
> and instead of changing where i run the test i should alternate which
> drive gets dma enabled and disabled and process of elimination will show
> just the kind of dma bug i'm seeing
> (/overview)
>
>
> I've been trying to track down why i seem to get disk corruption on my
> harddrives after some good amount of usage all the time. It's been
> happening for a long time across a number of different kernel versions.
> I believe this is because i stick to the same board manufacturer, Abit
> and use via chipsets.
>
> I ran cerberus with dma enabled at UDMA4 and UDMA2, at udma4 cerberus
> reports MEMORY errors and BBidehost2bus0target0lun0discN1 errors, but
> mostly MEMORY errors before the kernel panics after a minute or two. At
> udma2 the cerberus reports no errors but panics after a minute or two.
> I ran cerberus a couple times on each, with UDMA4 it began to error
> about 30 seconds into the test with MEMORY errors.
>
> I thought, well this could be ram errors, so i ran memtest for a couple
> hours. Nothing reported as being bad. I then thought, my hardware
> could be the problem, so I ran e2fsck -c on the partition I was running
> cerberus on with dma disabled via hdparm -d0 and it completed with no
> errors found. I then rebooted, enabled udma2 and the kernel panic'd
> with the same test after a few minutes.
>
> The rest of this email is just information regarding the setup
>
>
> First off the way my fs's are setup are as follows:
>
> swap + files are now all on my primary master ide drive on the
> motherboard ide controller. Swap on my primary master promise controller
> seemed too problematic because of corruption, but i'm not sure if the
> corruption i've seen is related only to the promise controller or if
> it's not controller specific. I'll have to run the test without swap on
> the promise drive and then run the test on my primary motherboard hdd
> and again without swap.
>
> cerberus version : 1.3.0pre4
> dmesg info : http://signal-lost.homeip.net/lkml/dmesg
> hdparm info : http://signal-lost.homeip.net/lkml/hdparm
> pci info : http://signal-lost.homeip.net/lkml/lspci
>
> tests completed before escaping in pio mode:
> http://signal-lost.homeip.net/lkml/tests_passed
>
> Errors during last test that caused kernel panic (udma2)
> http://signal-lost.homeip.net/lkml/memory
>
> Errors during test of udma4 (first test)
> http://signal-lost.homeip.net/lkml/memory2
> http://signal-lost.homeip.net/lkml/dmesg2
> various segfaults of badblocks of BBidehost tests.
>
>
> I ran memtest for an extensive amount of time after the first test
> reported memory errors and go absolutely no errors (wasn't using dma
> mode at the time either). And since these errors aren't produced when
> not using DMA on my drives I find it very unlikely that it's "System
> Ram" as the cause of them. I'm going to rerun the test on my
> motherboard primary drive after posting this in case something happens
> and i hose everything.

> Forgot to add my kernel config.
> http://signal-lost.homeip.net/lkml/config



2002-08-18 08:56:00

by Barry K. Nathan

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, Aug 18, 2002 at 03:26:42AM -0400, Ed Sweetman wrote:
> Ok, i reran the test with a little process of elimination.
> The problem occurs only when dma is enabled on the promise controller's
> harddrive.
[snip]

Looking at your dmesg, it seems you're using a Promise controller on a
VIA chipset. AFAIK this is a known problem and the only known solution
is to avoid the VIA/Promise combo.

-Barry K. Nathan <[email protected]>

2002-08-18 09:09:35

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, 2002-08-18 at 05:00, Barry K. Nathan wrote:
> On Sun, Aug 18, 2002 at 03:26:42AM -0400, Ed Sweetman wrote:
> > Ok, i reran the test with a little process of elimination.
> > The problem occurs only when dma is enabled on the promise controller's
> > harddrive.
> [snip]
>
> Looking at your dmesg, it seems you're using a Promise controller on a
> VIA chipset. AFAIK this is a known problem and the only known solution
> is to avoid the VIA/Promise combo.

there are a lot less bug reports than would be seen if it was strictly a
hardware problem. Promise controllers are by far the most popular and
readily available addon controllers and the amount of people using via
chipsets is significant for such a hardware problem to make a lot more
than a handful of people post problems directly related to the fact that
they use promise controllers on a via chipset. You'd get a lot of people
screaming and yelling. It could just as easily be the promise driver
used with the via chipset. The only people who know are those that work
with the ide drivers and hardware. Are we dealing with a fundemental
flaw with via and promise combos or some driver bug ?

as for the solution, it's not a solution unless it's strictly a hardware
conflict which i doubt. That is, unless you know a place that trades
ide controllers and will take my promise controller and give me one that
works at equivalant speeds that the promise would have.


assuming there aren't any such places. I'm looking for real pointers as
to what's going on and how to fix it, not ways to avoid problems. I
have ways to avoid the problem, they're not solutions to the problem
though. Thanks anyways.

2002-08-18 09:06:54

by Alexander Viro

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)



On 18 Aug 2002, Ed Sweetman wrote:

> (overview written in hindsight of writing email)
> I ran all these tests on ide/host2/bus0/target0/lun0/part1

Don't be silly - if you want to test anything, devfs is the last thing
you want on the system.

2002-08-18 09:12:22

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, 2002-08-18 at 05:10, Alexander Viro wrote:
>
>
> On 18 Aug 2002, Ed Sweetman wrote:
>
> > (overview written in hindsight of writing email)
> > I ran all these tests on ide/host2/bus0/target0/lun0/part1
>
> Don't be silly - if you want to test anything, devfs is the last thing
> you want on the system.
>
>


OK, i can remove devfs, but I dont really see how that would make dma
transfers (memory) become corrupted and pio mode transfers (memory) to
not.

I'm going to remove it, but i dont see how it's going to affect what's
going on.

2002-08-18 09:25:24

by Andre Hedrick

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)


2.4.19-preempt

All bets are off because who knows what preempt is doing to the state
machines and in PIO you are dead.

You can not delay transaction of data between interrupts w/o having the
transport help out. But the preempt don't get it.

If you push your request size down to 8k or to a page you preempt problems
will go away, only because granularity of requests. And the price is
performance goes in the tank. But this is preempt so who cares.

Cheers,


Andre Hedrick
LAD Storage Consulting Group

2002-08-18 17:46:48

by Jonathan Lundell

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

At 3:26 AM -0400 8/18/02, Ed Sweetman wrote:
>It appears then that there are some DMA issues with the promise
>controller i have with the driver. My swap used to be on the drive
>on the promise controller before which would explain fs corruption
>on both drives (swap cached and such).

FWIW, this is a semi-well-known phenomenon with the IDE controller in
the ServerWorks OSB4 south bridge. As I recall from our testing, a
word appears to be dropped in the DMA transfer to the disk. We found
that both PIO and multi-word DMA worked OK.

What's your chipset?
--
/Jonathan Lundell.

2002-08-18 18:00:43

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)




the dmesg i included shows my chipsets
VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1
PDC20262: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.





On Sun, 2002-08-18 at 13:50, Jonathan Lundell wrote:
> At 3:26 AM -0400 8/18/02, Ed Sweetman wrote:
> >It appears then that there are some DMA issues with the promise
> >controller i have with the driver. My swap used to be on the drive
> >on the promise controller before which would explain fs corruption
> >on both drives (swap cached and such).
>
> FWIW, this is a semi-well-known phenomenon with the IDE controller in
> the ServerWorks OSB4 south bridge. As I recall from our testing, a
> word appears to be dropped in the DMA transfer to the disk. We found
> that both PIO and multi-word DMA worked OK.
>
> What's your chipset?
> --
> /Jonathan Lundell.
>


2002-08-18 18:06:35

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

It appears i'm completely unable to not use devfs. Attempting to run
the kernel without mounting devfs results in it still being mounted or
if not compiled in, locks up during boot. Attempts to run the kernel
and mv /dev does not work, umounting /dev does not work and rm'ing /dev
does not work. I cant create the non-devfs nodes while devfs is
mounted and i cant boot the kernel without devfs. It seems that no
uninstall procedure has been made and i've read the documentation that
comes with the kernel about devfs and it says nothing about how to move
back to the old device nodes from devfs.

anyone have any suggestions?




On Sun, 2002-08-18 at 05:16, Ed Sweetman wrote:
> On Sun, 2002-08-18 at 05:10, Alexander Viro wrote:
> >
> >
> > On 18 Aug 2002, Ed Sweetman wrote:
> >
> > > (overview written in hindsight of writing email)
> > > I ran all these tests on ide/host2/bus0/target0/lun0/part1
> >
> > Don't be silly - if you want to test anything, devfs is the last thing
> > you want on the system.
> >
> >
>
>
> OK, i can remove devfs, but I dont really see how that would make dma
> transfers (memory) become corrupted and pio mode transfers (memory) to
> not.
>
> I'm going to remove it, but i dont see how it's going to affect what's
> going on.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2002-08-18 18:07:49

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, 2002-08-18 at 05:19, Andre Hedrick wrote:
>
> 2.4.19-preempt
>
> All bets are off because who knows what preempt is doing to the state
> machines and in PIO you are dead.
>
> You can not delay transaction of data between interrupts w/o having the
> transport help out. But the preempt don't get it.
>
> If you push your request size down to 8k or to a page you preempt problems
> will go away, only because granularity of requests. And the price is
> performance goes in the tank. But this is preempt so who cares.
>

OK, that makes some sense, at least more than removing devfs. I'll not
compile in preempt and test to see if the problem still occurs. As
soon as i figure out how to not use devfs i'll try it without that too.



2002-08-18 18:16:55

by Sean Neakums

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

commence Ed Sweetman quotation:

> It appears i'm completely unable to not use devfs. Attempting to
> run the kernel without mounting devfs results in it still being
> mounted or if not compiled in, locks up during boot. Attempts to
> run the kernel and mv /dev does not work, umounting /dev does not
> work and rm'ing /dev does not work. I cant create the non-devfs
> nodes while devfs is mounted and i cant boot the kernel without
> devfs. It seems that no uninstall procedure has been made and i've
> read the documentation that comes with the kernel about devfs and it
> says nothing about how to move back to the old device nodes from
> devfs.
>
> anyone have any suggestions?

Where does the boot hang? If it comaplains about not being able to
open /dev/console or some other device node, it may be that your /dev
has no nodes in it. This happened to me when I eradicated devfs (I
got fed up of fighting with devfsd to get my permission changes to
stick, and had reshuffled FSes in the meantime) and so I booted from a
rescue disk, mounted my root FS and recreated the device nodes in
/mnt/dev.

--
/ |
[|] Sean Neakums | Questions are a burden to others;
[|] <[email protected]> | answers a prison for oneself.
\ |

2002-08-18 18:27:31

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

Preempt doesn't seem to be the culprit. Reran the test without using
preempt and I got MEMORY errors within 22 seconds at udma2. So preempt
is not the problem here.



On Sun, 2002-08-18 at 05:19, Andre Hedrick wrote:
>
> 2.4.19-preempt
>
> All bets are off because who knows what preempt is doing to the state
> machines and in PIO you are dead.
>
> You can not delay transaction of data between interrupts w/o having the
> transport help out. But the preempt don't get it.
>
> If you push your request size down to 8k or to a page you preempt problems
> will go away, only because granularity of requests. And the price is
> performance goes in the tank. But this is preempt so who cares.




2002-08-18 18:25:31

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

I know i have no device nodes. I removed them all before installing
devfs. the devfs documentation says it doesn't need to have devfs
mounted to work, but this doesn't seem to be true at all. Hence my
confusion. I know i can go download a bootable iso and get that burned
and working but I shouldn't have to do that.









On Sun, 2002-08-18 at 14:20, Sean Neakums wrote:
> commence Ed Sweetman quotation:
>
> > It appears i'm completely unable to not use devfs. Attempting to
> > run the kernel without mounting devfs results in it still being
> > mounted or if not compiled in, locks up during boot. Attempts to
> > run the kernel and mv /dev does not work, umounting /dev does not
> > work and rm'ing /dev does not work. I cant create the non-devfs
> > nodes while devfs is mounted and i cant boot the kernel without
> > devfs. It seems that no uninstall procedure has been made and i've
> > read the documentation that comes with the kernel about devfs and it
> > says nothing about how to move back to the old device nodes from
> > devfs.
> >
> > anyone have any suggestions?
>
> Where does the boot hang? If it comaplains about not being able to
> open /dev/console or some other device node, it may be that your /dev
> has no nodes in it. This happened to me when I eradicated devfs (I
> got fed up of fighting with devfsd to get my permission changes to
> stick, and had reshuffled FSes in the meantime) and so I booted from a
> rescue disk, mounted my root FS and recreated the device nodes in
> /mnt/dev.
>
> --
> / |
> [|] Sean Neakums | Questions are a burden to others;
> [|] <[email protected]> | answers a prison for oneself.
> \ |
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2002-08-18 18:32:47

by Sean Neakums

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

commence Ed Sweetman quotation:

> On Sun, 2002-08-18 at 14:20, Sean Neakums wrote:
>> commence Ed Sweetman quotation:
>>
>> > It appears i'm completely unable to not use devfs. Attempting to
>> > run the kernel without mounting devfs results in it still being
>> > mounted or if not compiled in, locks up during boot. Attempts to
>> > run the kernel and mv /dev does not work, umounting /dev does not
>> > work and rm'ing /dev does not work. I cant create the non-devfs
>> > nodes while devfs is mounted and i cant boot the kernel without
>> > devfs. It seems that no uninstall procedure has been made and i've
>> > read the documentation that comes with the kernel about devfs and it
>> > says nothing about how to move back to the old device nodes from
>> > devfs.
>> >
>> > anyone have any suggestions?
>>
>> Where does the boot hang? If it comaplains about not being able to
>> open /dev/console or some other device node, it may be that your /dev
>> has no nodes in it. This happened to me when I eradicated devfs (I
>> got fed up of fighting with devfsd to get my permission changes to
>> stick, and had reshuffled FSes in the meantime) and so I booted from a
>> rescue disk, mounted my root FS and recreated the device nodes in
>> /mnt/dev.
>>
>
> I know i have no device nodes. I removed them all before installing
> devfs.

You must have been really keen on devfs.

> the devfs documentation says it doesn't need to have devfs mounted
> to work, but this doesn't seem to be true at all.

If it does say exactly that, then it is outrageously wrong.

> Hence my confusion. I know i can go download a bootable iso and get
> that burned and working but I shouldn't have to do that.

Uh, you deleted your devices nodes, and now you want to boot the
system without devfs. You have to do precisely that, or something
equivalent.

--
/ |
[|] Sean Neakums | Questions are a burden to others;
[|] <[email protected]> | answers a prison for oneself.
\ |

2002-08-18 19:10:28

by Denis Vlasenko

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On 18 August 2002 16:10, Ed Sweetman wrote:
> It appears i'm completely unable to not use devfs. Attempting to run
> the kernel without mounting devfs results in it still being mounted or
> if not compiled in, locks up during boot. Attempts to run the kernel
> and mv /dev does not work, umounting /dev does not work and rm'ing /dev
> does not work. I cant create the non-devfs nodes while devfs is
> mounted and i cant boot the kernel without devfs. It seems that no
> uninstall procedure has been made and i've read the documentation that
> comes with the kernel about devfs and it says nothing about how to move
> back to the old device nodes from devfs.
>
> anyone have any suggestions?

Boot with devfs as usual.
Mount your root fs again, say

#mount /dev/hda2 /mnt/tmp

and you will see your /dev as it is on disk (i.e. without
devfs mounted over it) in /mnt/tmp/dev.
Now, mknod everything you need.

BTW, NFS mount over loopback (127.0.0.1) works too.
--
vda

2002-08-18 19:49:58

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

Ok, devfs was removed and I got the old way working again. cerberus
reports MEMORY errors when dma is enabled on the promise controller less
than 30 seconds after the test has begun. Just like every other time
i've had dma enabled on the promise controller.

So it's not preempt. It's not devfs. So now we have to face the fact
that it's either a hardware conflict that linux cannot handle or a
device driver bug.

Any other suggestions?

Now that i'm down to vanilla 2.4.19 perhaps it's time for some real
tests?


On Sun, 2002-08-18 at 05:16, Ed Sweetman wrote:
> On Sun, 2002-08-18 at 05:10, Alexander Viro wrote:
> >
> >
> > On 18 Aug 2002, Ed Sweetman wrote:
> >
> > > (overview written in hindsight of writing email)
> > > I ran all these tests on ide/host2/bus0/target0/lun0/part1
> >
> > Don't be silly - if you want to test anything, devfs is the last thing
> > you want on the system.
> >
> >
>
>
> OK, i can remove devfs, but I dont really see how that would make dma
> transfers (memory) become corrupted and pio mode transfers (memory) to
> not.
>
> I'm going to remove it, but i dont see how it's going to affect what's
> going on.



2002-08-18 20:12:59

by Andre Hedrick

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)


Ed,

MEMORY errors explian please.

If you mean data corruption please use those words, they are screaming red
flags for attention.

On 18 Aug 2002, Ed Sweetman wrote:

> Ok, devfs was removed and I got the old way working again. cerberus
> reports MEMORY errors when dma is enabled on the promise controller less
> than 30 seconds after the test has begun. Just like every other time
> i've had dma enabled on the promise controller.
>
> So it's not preempt. It's not devfs. So now we have to face the fact
> that it's either a hardware conflict that linux cannot handle or a
> device driver bug.
>
> Any other suggestions?
>
> Now that i'm down to vanilla 2.4.19 perhaps it's time for some real
> tests?
>
>
> On Sun, 2002-08-18 at 05:16, Ed Sweetman wrote:
> > On Sun, 2002-08-18 at 05:10, Alexander Viro wrote:
> > >
> > >
> > > On 18 Aug 2002, Ed Sweetman wrote:
> > >
> > > > (overview written in hindsight of writing email)
> > > > I ran all these tests on ide/host2/bus0/target0/lun0/part1
> > >
> > > Don't be silly - if you want to test anything, devfs is the last thing
> > > you want on the system.
> > >
> > >
> >
> >
> > OK, i can remove devfs, but I dont really see how that would make dma
> > transfers (memory) become corrupted and pio mode transfers (memory) to
> > not.
> >
> > I'm going to remove it, but i dont see how it's going to affect what's
> > going on.
>
>
>

Andre Hedrick
LAD Storage Consulting Group

2002-08-18 20:25:04

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

They're both. Cerberus reports MEMORY errors only when dma is enabled
for the promise card. doesn't matter for the via chipset. These MEMORY
errors just precursor data corruption on the disks. badblocks
segfaulted during tests on both drives when dma was enabled on the
promise controller. Before i got drive corruption on both drives but
that was when i had swap on the promise controller, since then I have
not experienced data corruption on the via drive. It's still uncertain
as to if the data corruption is something at the transfer level to the
promise controller or a more general ide dma memory corruption because
when dma is enabled on the promise controller and the cerberos test is
run, all I get is what i explained in my original post and then the
kernel always panic's after a number of errors (both badblocks test
errors and MEMORY errors).

again, none of these errors show up when dma is disabled on the promise
controller.

so by MEMORY error, i mean what cerberus reports as "MEMORY" errors.
cerberus doesn't seem to report hdd data corruption, rather for some
reason badblocks segfaults. If you have a data accuracy test you like
to run that i should try I'll do that. But the data corruption that
i've seen only occurs after a couple days of being up with dma enabled
on the promise card and I haven't had time to be up that long since
moving my swap from the promise controller.



On Sun, 2002-08-18 at 16:07, Andre Hedrick wrote:
>
> Ed,
>
> MEMORY errors explian please.
>
> If you mean data corruption please use those words, they are screaming red
> flags for attention.
>
> On 18 Aug 2002, Ed Sweetman wrote:
>
> > Ok, devfs was removed and I got the old way working again. cerberus
> > reports MEMORY errors when dma is enabled on the promise controller less
> > than 30 seconds after the test has begun. Just like every other time
> > i've had dma enabled on the promise controller.
> >
> > So it's not preempt. It's not devfs. So now we have to face the fact
> > that it's either a hardware conflict that linux cannot handle or a
> > device driver bug.
> >
> > Any other suggestions?
> >
> > Now that i'm down to vanilla 2.4.19 perhaps it's time for some real
> > tests?
> >
> >
> > On Sun, 2002-08-18 at 05:16, Ed Sweetman wrote:
> > > On Sun, 2002-08-18 at 05:10, Alexander Viro wrote:
> > > >
> > > >
> > > > On 18 Aug 2002, Ed Sweetman wrote:
> > > >
> > > > > (overview written in hindsight of writing email)
> > > > > I ran all these tests on ide/host2/bus0/target0/lun0/part1
> > > >
> > > > Don't be silly - if you want to test anything, devfs is the last thing
> > > > you want on the system.
> > > >
> > > >
> > >
> > >
> > > OK, i can remove devfs, but I dont really see how that would make dma
> > > transfers (memory) become corrupted and pio mode transfers (memory) to
> > > not.
> > >
> > > I'm going to remove it, but i dont see how it's going to affect what's
> > > going on.
> >
> >
> >
>
> Andre Hedrick
> LAD Storage Consulting Group
>
>


2002-08-18 21:49:55

by Barry K. Nathan

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, Aug 18, 2002 at 07:36:47PM +0100, Sean Neakums wrote:
> commence Ed Sweetman quotation:
[snip]
> > the devfs documentation says it doesn't need to have devfs mounted
> > to work, but this doesn't seem to be true at all.
>
> If it does say exactly that, then it is outrageously wrong.

Starting at line 722 of
linux-2.4.19/Documentation/filesystems/devfs/README:

> In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
> devfs onto /dev is completely safe, and requires no
> configuration changes.

I skimmed through the documentation and it appears to assume that you're
not deleting all the stuff in /dev before switching over to devfs.

> > Hence my confusion. I know i can go download a bootable iso and get
> > that burned and working but I shouldn't have to do that.
>
> Uh, you deleted your devices nodes, and now you want to boot the
> system without devfs. You have to do precisely that, or something
> equivalent.

Right, there's no way around that. If you deleted everything in /dev --
which you're not supposed to do -- then there's no way for anything to
find any devices if devfs isn't enabled. (And you should have a rescue
CD around anyway -- you never know when you might need it! BTW, what
distribution are you (Ed) using? Some distributions have special boot
options you can use when booting their install CDs to get into a rescue
mode.)

In any event, it might be a good idea to make the documentation a bit
more explicit about this, and I might send a patch to the mailing
list later today.

-Barry K. Nathan <[email protected]>

2002-08-18 22:22:35

by Ed Sweetman

[permalink] [raw]
Subject: devfs

On Sun, 2002-08-18 at 17:53, Barry K. Nathan wrote:
> On Sun, Aug 18, 2002 at 07:36:47PM +0100, Sean Neakums wrote:
> > commence Ed Sweetman quotation:
> [snip]
> > > the devfs documentation says it doesn't need to have devfs mounted
> > > to work, but this doesn't seem to be true at all.
> >
> > If it does say exactly that, then it is outrageously wrong.
>
> Starting at line 722 of
> linux-2.4.19/Documentation/filesystems/devfs/README:
>
> > In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
> > devfs onto /dev is completely safe, and requires no
> > configuration changes.
>
> I skimmed through the documentation and it appears to assume that you're
> not deleting all the stuff in /dev before switching over to devfs.

This has nothing to do with not mounting devfs and still using devfs to
work with devices. If devfs is not mounted but you're still using
devfs, you shouldn't need anything in /dev. The documentation says you
can use devfs without mounting and This is what i'm saying is
problematic and doesn't seem possible in normal usage. It's an
optional config so are we using devfs when we dont mount it or not?
and if not, then why make not mounting it an option ?

If it's using the old device files in /dev then how can it be using
devfs and how can accessing physical inodes on the disk be intentional
to devfs?


> Right, there's no way around that. If you deleted everything in /dev --
> which you're not supposed to do -- then there's no way for anything to
> find any devices if devfs isn't enabled. (And you should have a rescue
> CD around anyway -- you never know when you might need it! BTW, what
> distribution are you (Ed) using? Some distributions have special boot
> options you can use when booting their install CDs to get into a rescue
> mode.)
>
> In any event, it might be a good idea to make the documentation a bit
> more explicit about this, and I might send a patch to the mailing
> list later today.

I'm not talking about booting without devfs enabled being the problem, i
know booting without devfs enabled I'll have issues booting the system
without physical /dev entries, i was referring to having devfs enabled
and not mounting it. Which according to the documentation should be
perfectly functional and valid. This is not the case though. devfs
should not require the old /dev entries at all since it doesn't use them
so why would keeping them be required at all when using it (not counting
the "if i want to not use devfs" argument). This is what should be
cleared up in the documentation.

2002-08-18 22:39:03

by Andrew Rodland

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On 18 Aug 2002 14:29:23 -0400
Ed Sweetman <[email protected]> wrote:

> I know i have no device nodes. I removed them all before installing
> devfs.

Well then you have no device nodes without devfs. D'uh? :)

> the devfs documentation says it doesn't need to have devfs
> mounted to work, but this doesn't seem to be true at all.

No, the devfs documentation says that it is "safe" to have devfs
compiled in and not use it -- you will just use the standard /dev. It
does not imply in any way that you will be using devfs if you don't
mount it, it says that if you choose _not_ to use devfs, then it will be
able to fall cleanly back to standard /dev. In other words,
CONFIG_DEVFS_FS provides the _ability_ to use devfs, not a
_requirement_.

That's all it says.
To assume that it means anything else would be incredibly silly.


Attachments:
(No filename) (189.00 B)

2002-08-18 22:43:33

by Sean Neakums

[permalink] [raw]
Subject: Re: devfs

commence Ed Sweetman quotation:

> On Sun, 2002-08-18 at 17:53, Barry K. Nathan wrote:
>> On Sun, Aug 18, 2002 at 07:36:47PM +0100, Sean Neakums wrote:
>> > commence Ed Sweetman quotation:
>> [snip]
>> > > the devfs documentation says it doesn't need to have devfs mounted
>> > > to work, but this doesn't seem to be true at all.
>> >
>> > If it does say exactly that, then it is outrageously wrong.
>>
>> Starting at line 722 of
>> linux-2.4.19/Documentation/filesystems/devfs/README:
>>
>> > In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
>> > devfs onto /dev is completely safe, and requires no
>> > configuration changes.
>>
>> I skimmed through the documentation and it appears to assume that you're
>> not deleting all the stuff in /dev before switching over to devfs.
>
> This has nothing to do with not mounting devfs and still using devfs to
> work with devices.

If you don't mount devfs, you are not using it to "work with devices".
You use your existing device nodes, unless you deleted them, in which
case you are in trouble, as you discovered.

> If devfs is not mounted but you're still using devfs, you shouldn't
> need anything in /dev.

If devfs is not mounted, you are not "using it".

> The documentation says you can use devfs without mounting [...]

It does not. It says that if devfs is built as part of your kernel's
configuration and you do not mount it, everything works as before.
For everythign to work as before, your device nodes need to be
present.

> and This is what i'm saying is problematic and doesn't seem possible
> in normal usage. It's an optional config so are we using devfs when
> we dont mount it or not? and if not, then why make not mounting it
> an option ?

I can imagine it being useful for for vendors. Makes it easy to make
devfs usage a simple selection a install time without needsing to ship
two different kernels.

> If it's using the old device files in /dev then how can it be using
> devfs and how can accessing physical inodes on the disk be
> intentional to devfs?

Dunno what you mean here.

>> Right, there's no way around that. If you deleted everything in /dev --
>> which you're not supposed to do -- then there's no way for anything to
>> find any devices if devfs isn't enabled. (And you should have a rescue
>> CD around anyway -- you never know when you might need it! BTW, what
>> distribution are you (Ed) using? Some distributions have special boot
>> options you can use when booting their install CDs to get into a rescue
>> mode.)
>>
>> In any event, it might be a good idea to make the documentation a bit
>> more explicit about this, and I might send a patch to the mailing
>> list later today.
>
> I'm not talking about booting without devfs enabled being the problem, i
> know booting without devfs enabled I'll have issues booting the system
> without physical /dev entries, i was referring to having devfs enabled
> and not mounting it. Which according to the documentation should be
> perfectly functional and valid. This is not the case though.

BECAUSE YOU DELETED YOUR DEVICE NODES.

> devfs should not require the old /dev entries at all since it
> doesn't use them so why would keeping them be required at all when
> using it (not counting the "if i want to not use devfs" argument).
> This is what should be cleared up in the documentation.

devfs does not require your old devices nodes. Nowhere does the
documentation say that it does. HOWEVER, if you want to boot a
devfs-capable kernel and not mount devfs (or simply boot a kernel with
no devfs capability at all), you have to have SOMETHING in /dev,
i.e. a set of standard device nodes.

--
/ |
[|] Sean Neakums | Questions are a burden to others;
[|] <[email protected]> | answers a prison for oneself.
\ |

2002-08-18 22:51:01

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

On Sun, 2002-08-18 at 18:41, Andrew Rodland wrote:
> On 18 Aug 2002 14:29:23 -0400
> Ed Sweetman <[email protected]> wrote:
>
> > I know i have no device nodes. I removed them all before installing
> > devfs.
>
> Well then you have no device nodes without devfs. D'uh? :)
>
> > the devfs documentation says it doesn't need to have devfs
> > mounted to work, but this doesn't seem to be true at all.
>
> No, the devfs documentation says that it is "safe" to have devfs
> compiled in and not use it -- you will just use the standard /dev. It
> does not imply in any way that you will be using devfs if you don't
> mount it, it says that if you choose _not_ to use devfs, then it will be
> able to fall cleanly back to standard /dev. In other words,
> CONFIG_DEVFS_FS provides the _ability_ to use devfs, not a
> _requirement_.
>
> That's all it says.
> To assume that it means anything else would be incredibly silly.

ok, so that's over and done with. It's not the topic of the thread and
i've already fixed things a while ago.

2002-08-18 22:59:42

by Alexander Viro

[permalink] [raw]
Subject: Re: devfs



On 18 Aug 2002, Ed Sweetman wrote:

> This has nothing to do with not mounting devfs and still using devfs to
> work with devices. If devfs is not mounted but you're still using
> devfs, you shouldn't need anything in /dev. The documentation says you
> can use devfs without mounting and This is what i'm saying is
> problematic and doesn't seem possible in normal usage. It's an
> optional config so are we using devfs when we dont mount it or not?
> and if not, then why make not mounting it an option ?

What? If program calls open("/dev/zero",...) and there's no such file,
how the fuck would having devfs enabled help you?

Come on, use common sense - devfs provides a tree with some device nodes.
You can mount it wherever you want (or not mount it anywhere). Just as
with any other filesystem.

If you mount it on /dev - well, duh, you see that tree on /dev. If you
do not - you see whatever is in /dev on underlying fs.

If program wants to access a device, it opens that device. Just as any
other file. By name. There is nothing magical about names that begin
with /dev/ - it's just a conventional place for device nodes.

devfs "mount" option is an idiotic kludge that makes _kernel_ mount
it on /dev after the root fs had been mounted. Why it had been
introduced is a great mistery, since the normal way is to have a
corresponding line in /etc/fstab and have userland mount whatever
it needs.

Said option is, indeed, not required for anything - in a sense that
it does nothing that system wouldn't be perfectly capable of in regular
ways.

But you _do_ need stuff in /dev, no matter what filesystem it comes
from. Kernel doesn't need it, but userland programs expect to find
it there. If you had deleted device nodes from underlying /dev and
do not care to mount something on top of it - well, there won't be
anything in that directory.

2002-08-18 23:11:49

by Ed Sweetman

[permalink] [raw]
Subject: Re: devfs

On Sun, 2002-08-18 at 19:03, Alexander Viro wrote:
>
>
> On 18 Aug 2002, Ed Sweetman wrote:
>
> > This has nothing to do with not mounting devfs and still using devfs to
> > work with devices. If devfs is not mounted but you're still using
> > devfs, you shouldn't need anything in /dev. The documentation says you
> > can use devfs without mounting and This is what i'm saying is
> > problematic and doesn't seem possible in normal usage. It's an
> > optional config so are we using devfs when we dont mount it or not?
> > and if not, then why make not mounting it an option ?
>
> What? If program calls open("/dev/zero",...) and there's no such file,
> how the fuck would having devfs enabled help you?
>
> Come on, use common sense - devfs provides a tree with some device nodes.
> You can mount it wherever you want (or not mount it anywhere). Just as
> with any other filesystem.
>
> If you mount it on /dev - well, duh, you see that tree on /dev. If you
> do not - you see whatever is in /dev on underlying fs.
>
> If program wants to access a device, it opens that device. Just as any
> other file. By name. There is nothing magical about names that begin
> with /dev/ - it's just a conventional place for device nodes.
>
> devfs "mount" option is an idiotic kludge that makes _kernel_ mount
> it on /dev after the root fs had been mounted. Why it had been
> introduced is a great mistery, since the normal way is to have a
> corresponding line in /etc/fstab and have userland mount whatever
> it needs.
>
> Said option is, indeed, not required for anything - in a sense that
> it does nothing that system wouldn't be perfectly capable of in regular
> ways.
>
> But you _do_ need stuff in /dev, no matter what filesystem it comes
> from. Kernel doesn't need it, but userland programs expect to find
> it there. If you had deleted device nodes from underlying /dev and
> do not care to mount something on top of it - well, there won't be
> anything in that directory.
>

Ok, that all makes sense. I removed the dev entries because they
weren't needed by me anymore but I suppose it doesn't hurt anything to
just keep them there anyways so I've fixed all that. Either way,
removing devfs did nothing but apparently it was asked to be done to
allow better testing and/or debugging to be done. But i've yet to get
any reason why I removed devfs to investigate promise ide controller's
dma related memory failures. I've removed devfs and replaced the old
/dev entries, no problem. I'm not getting off topic about that. It's
all done so i'm waiting for the next step here.

2002-08-18 23:14:11

by Barry K. Nathan

[permalink] [raw]
Subject: Re: devfs

On Sun, Aug 18, 2002 at 06:26:35PM -0400, Ed Sweetman wrote:
> On Sun, 2002-08-18 at 17:53, Barry K. Nathan wrote:
[snip]
> > Starting at line 722 of
> > linux-2.4.19/Documentation/filesystems/devfs/README:
> >
> > > In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
> > > devfs onto /dev is completely safe, and requires no
> > > configuration changes.
> >
> > I skimmed through the documentation and it appears to assume that you're
> > not deleting all the stuff in /dev before switching over to devfs.
>
> This has nothing to do with not mounting devfs and still using devfs to
> work with devices. If devfs is not mounted but you're still using
> devfs, you shouldn't need anything in /dev.

IMO the combination of common sense and a little Linux/Unix knowledge
would suggest that you can't use a filesystem if it's not mounted.
(Also, see my next paragraph in this message.)

> The documentation says you
> can use devfs without mounting

No, that's not what it says. It says that you can run a kernel with
devfs enabled but not mounted. This does not imply that devfs is in use
and providing the device nodes, just that it is enabled and present in
the event that it should be mounted.

> and This is what i'm saying is
> problematic and doesn't seem possible in normal usage. It's an
> optional config so are we using devfs when we dont mount it or not?
> and if not, then why make not mounting it an option ?

Why make not mounting it an option? There's more than one reason. You
might want to wait until well into the boot process before mounting it
(I think this one's mentioned in the documentation but I'm not 100%
sure). You might also want to temporarily disable devfs mounting to
avoid a security hole in the event that one is found in devfs, until an
updated kernel is available (this actually happened earlier this year
with recent Mandrake Linux releases that use devfs by default).

> If it's using the old device files in /dev then how can it be using
> devfs and how can accessing physical inodes on the disk be intentional
> to devfs?

If it's using the old /dev nodes then it's not using devfs -- but you
can switch over later after or during boot.

> > In any event, it might be a good idea to make the documentation a bit
> > more explicit about this, and I might send a patch to the mailing
> > list later today.
>
> I'm not talking about booting without devfs enabled being the problem, i
> know booting without devfs enabled I'll have issues booting the system
> without physical /dev entries, i was referring to having devfs enabled
> and not mounting it. Which according to the documentation should be
> perfectly functional and valid. This is not the case though. devfs
> should not require the old /dev entries at all since it doesn't use them
> so why would keeping them be required at all when using it (not counting
> the "if i want to not use devfs" argument). This is what should be
> cleared up in the documentation.

I believe you misunderstood the existing documentation. Nonetheless it
probably should be clarified, and I'm about to send a patch to the
mailing list.

-Barry K. Nathan <[email protected]>

2002-08-19 01:02:16

by Olivier Galibert

[permalink] [raw]
Subject: Re: devfs

On Sun, Aug 18, 2002 at 07:03:42PM -0400, Alexander Viro wrote:
> devfs "mount" option is an idiotic kludge that makes _kernel_ mount
> it on /dev after the root fs had been mounted. Why it had been
> introduced is a great mistery, since the normal way is to have a
> corresponding line in /etc/fstab and have userland mount whatever
> it needs.

I've been wondering, imagine that in the future we have a working
dynamic device filesystem (be it devfs, driverfs, whatever) nice
enough that we don't want a disk-based /dev anymore. How are we
supposed to mount it so that the kernel's open("/dev/console")
succeeds?

OG.

2002-08-19 02:02:21

by Greg KH

[permalink] [raw]
Subject: Re: devfs

On Sun, Aug 18, 2002 at 09:06:18PM -0400, Olivier Galibert wrote:
>
> I've been wondering, imagine that in the future we have a working
> dynamic device filesystem (be it devfs, driverfs, whatever) nice
> enough that we don't want a disk-based /dev anymore. How are we
> supposed to mount it so that the kernel's open("/dev/console")
> succeeds?

initramfs might already contain a minimial /dev that has those kinds of
entries in it.

thanks,

greg k-h

2002-08-19 21:41:59

by Ed Sweetman

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

If the promise controller is unable to function correctly with the VIA
vt82c686b what pci ide controllers are suggested with linux? And
shouldn't the good-bad ide config disable dma on the promise controller
when used with this chipset or will hdparm in init scripts be required
and fixing the problem be done all in userspace. In which case maybe a
list of hardware conflicts should be included in the help for the
chipset to give some warning.

2002-08-21 04:45:07

by Richard Gooch

[permalink] [raw]
Subject: Re: devfs

Ed Sweetman writes:
> Ok, that all makes sense. I removed the dev entries because they
> weren't needed by me anymore but I suppose it doesn't hurt anything
> to just keep them there anyways so I've fixed all that. Either way,
> removing devfs did nothing but apparently it was asked to be done to
> allow better testing and/or debugging to be done. But i've yet to
> get any reason why I removed devfs to investigate promise ide
> controller's dma related memory failures. I've removed devfs and
> replaced the old /dev entries, no problem. I'm not getting off
> topic about that. It's all done so i'm waiting for the next step
> here.

It seems the reason you removed devfs is that you followed Al's bad
advice:
Alexander Viro writes:
> Don't be silly - if you want to test anything, devfs is the last
> thing you want on the system.

In fact, devfs works quite robustly for many people, and wasn't
involved in the IDE problems you were having. Al is an absolutist: if
it's not 100% provably correct, it falls into his other category,
"spawn of satan".

So next time someone claims devfs is causing you problems, treat it
with the skepticism it deserves.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-08-21 04:59:48

by Ed Sweetman

[permalink] [raw]
Subject: Re: devfs

On Wed, 2002-08-21 at 00:49, Richard Gooch wrote:
> It seems the reason you removed devfs is that you followed Al's bad
> advice:
> Alexander Viro writes:
> > Don't be silly - if you want to test anything, devfs is the last
> > thing you want on the system.
>
> In fact, devfs works quite robustly for many people, and wasn't
> involved in the IDE problems you were having. Al is an absolutist: if
> it's not 100% provably correct, it falls into his other category,
> "spawn of satan".
>
> So next time someone claims devfs is causing you problems, treat it
> with the skepticism it deserves.
>
> Regards,
>
> Richard....


Ok, well the more happy the people who have the ability to know what the
problem is the better so despite a simple usb issue, moving out of devfs
wasn't a hassle. I'm willing to move back to vanilla if it helps in
chasing a problem down. I'm just kind of disappointed that the
discussion of me playing around with devfs got more attention from the
ide guys than the actual problem since it seems to be strictly DMA
related and not a drive problem so if it's chipset conflicts it should
be documented either in the good-bad firmware so the kernel disables dma
on boot of this particular promise chipset when this particular via
chipset is present or in the HELP for the config option so at least
users are aware that drive corruption has been known to happen on the
linux promise driver when dma is enabled on some via chipsets. That is
unless it's a fixable problem. Which i rather hope it is. neither the
abit motherboard or promise controller card are uncommon. But since
nobody i know has via + promise setups I cant ask them to run the
cerberus test to see the same behavior to see the extent of the promise
+ via problems or if it's even got anything to do with via. (shrugs)

2002-08-23 00:00:47

by Samuel Flory

[permalink] [raw]
Subject: Re: cerberus errors on 2.4.19 (ide dma related)

BTW- If you run ctcs with the -p flag it will do rw data tests. Just
be sure to give it a number. (IE ./newburn -p 2)

On Sun, 2002-08-18 at 13:29, Ed Sweetman wrote:
> They're both. Cerberus reports MEMORY errors only when dma is enabled
> for the promise card. doesn't matter for the via chipset. These MEMORY
> errors just precursor data corruption on the disks. badblocks
> segfaulted during tests on both drives when dma was enabled on the
> promise controller. Before i got drive corruption on both drives but
> that was when i had swap on the promise controller, since then I have
> not experienced data corruption on the via drive. It's still uncertain
> as to if the data corruption is something at the transfer level to the
> promise controller or a more general ide dma memory corruption because
> when dma is enabled on the promise controller and the cerberos test is
> run, all I get is what i explained in my original post and then the
> kernel always panic's after a number of errors (both badblocks test
> errors and MEMORY errors).
>
> again, none of these errors show up when dma is disabled on the promise
> controller.
>
> so by MEMORY error, i mean what cerberus reports as "MEMORY" errors.
> cerberus doesn't seem to report hdd data corruption, rather for some
> reason badblocks segfaults. If you have a data accuracy test you like
> to run that i should try I'll do that. But the data corruption that
> i've seen only occurs after a couple days of being up with dma enabled
> on the promise card and I haven't had time to be up that long since
> moving my swap from the promise controller.
>
>
>
> On Sun, 2002-08-18 at 16:07, Andre Hedrick wrote:
> >
> > Ed,
> >
> > MEMORY errors explian please.
> >
> > If you mean data corruption please use those words, they are screaming red
> > flags for attention.
> >
> > On 18 Aug 2002, Ed Sweetman wrote:
> >
> > > Ok, devfs was removed and I got the old way working again. cerberus
> > > reports MEMORY errors when dma is enabled on the promise controller less
> > > than 30 seconds after the test has begun. Just like every other time
> > > i've had dma enabled on the promise controller.
> > >
> > > So it's not preempt. It's not devfs. So now we have to face the fact
> > > that it's either a hardware conflict that linux cannot handle or a
> > > device driver bug.
> > >
> > > Any other suggestions?
> > >
> > > Now that i'm down to vanilla 2.4.19 perhaps it's time for some real
> > > tests?
> > >
> > >
> > > On Sun, 2002-08-18 at 05:16, Ed Sweetman wrote:
> > > > On Sun, 2002-08-18 at 05:10, Alexander Viro wrote:
> > > > >
> > > > >
> > > > > On 18 Aug 2002, Ed Sweetman wrote:
> > > > >
> > > > > > (overview written in hindsight of writing email)
> > > > > > I ran all these tests on ide/host2/bus0/target0/lun0/part1
> > > > >
> > > > > Don't be silly - if you want to test anything, devfs is the last thing
> > > > > you want on the system.
> > > > >
> > > > >
> > > >
> > > >
> > > > OK, i can remove devfs, but I dont really see how that would make dma
> > > > transfers (memory) become corrupted and pio mode transfers (memory) to
> > > > not.
> > > >
> > > > I'm going to remove it, but i dont see how it's going to affect what's
> > > > going on.
> > >
> > >
> > >
> >
> > Andre Hedrick
> > LAD Storage Consulting Group
> >
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>