2006-11-07 15:21:44

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Hi.

I've got a very strange problem which I'm going to try to explain here
in detail.
As one could easily suppose that issue results from hardware problems
I'm going to explain it very very detailed because the facts let me
think that my hardware is ok (unfortunately I've got no other computer
to try to reproduce the problem).

I'm archiving my personal CD-DA collection and for that reason I had to
install Windows (yes I feel very ugly ;-) ) because I wanted to use EAC
(Exact Audo Copy) for that, due to its superior features.

So I've installed Windows XP (on NTFS) and created on additional FAT32
partition to store the extracted audio.
I did several badblock scans in addition to long SMART checks on the
whole drive, so the disc should be ok.

When extracting (under Windows): I extracted each CD twice,.. so
Windows/WAC should write exactly the same data for each CD twice to the
FAT32 partition.
This is important because I think, that if there would be errors in the
drive/FAT32 filesystem/RAM/CPU it would be likely that these files are
_not_ equal.

After ripping about 20 CDs I went back to linux and wanted to compare
the pairs of extracted data (originally I did that just to find any
errors in the ripping process).
Before doing anything I wrote sha512sums for all files.

At some point (I did that procedure for every CD) I've copied (cp -a)
the directory with all data for that CD to a temporary location on the
FAT32 partition.
Right after that, I've diffed the whole stuff (diff -q -r dirA dirB).
And there were differences in one file!!!
I copied again diffed again,.. and differences again (but in another file).

First of all I've thought that this would be an hardware issue. I
supposed the RAM could be damaged because diff would use probably the
cached data from the files I've had just copied.
So I did excessive memtests (memtest86+) for several hours/passes. But
no errors have been found in my 4GB ECC/Reg RAMs.
So I supposed it could be a CPU related problem (2x DualCore Opteron
275) and I've startet an mprime/gimps torture test on each core and let
it run for 16 hours with no errors at all.

Some days later I had the same or at least a very equal problem.
I copied,... diffed,.. but this time _no_ differences.
I restartet the system (thus the file cache was cleard)... diffed
again,.. and know differences!! This was also a reason to not believe
that my RAM is defect but the writing to the FAT32 disc.

Ok,... the original files written by Windows/EAC seem to be ok and never
changed or corrupted. Why? First of all, the sha512sums are still equal,
but one could say, that the data was already damaged when calculating
those hashes.
But EAC stores internally a hash (some CRCxx) which is (afaik)
calculated from data from the RAM. So if the RAMs are ok (and I suppose
that because of my memtests) the hashes should be correct, too.

I compared those EAC hashes with the original data and all data seem to
be correct.

So this is as far as I can say,.. only a Linux/FAT32 related problem, as
the data written by Windows seems to be correct.
And as I've said, I'm pretty sure my hardware is correct, too.

The strange thing is that one time the differences were found directly
after copying (thus one would thing RAM is damaged, because the data was
probalby (I cannot tell this for sure) taken from file cache).
and the other time after restarting with a certainly empty file cache.


Any ideas? I'm willing to help debugging and so on but I must admit that
I need someone to say me what to do :D

btw:
my system:
Debian sid (which should be unimportant)
kernel 2.6.18.2

For further data please ask :)


Thanks in advance,
Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-07 18:55:21

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer <[email protected]> writes:

> The strange thing is that one time the differences were found directly
> after copying (thus one would thing RAM is damaged, because the data was
> probalby (I cannot tell this for sure) taken from file cache).
> and the other time after restarting with a certainly empty file cache.
>
> Any ideas? I'm willing to help debugging and so on but I must admit that
> I need someone to say me what to do :D

bit interesting. Could you send the output of diff? I'd like to see
how it's breaking.
--
OGAWA Hirofumi <[email protected]>

2006-11-07 21:32:31

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

OGAWA Hirofumi wrote:
> Christoph Anton Mitterer <[email protected]> writes:
>
>
>> The strange thing is that one time the differences were found directly
>> after copying (thus one would thing RAM is damaged, because the data was
>> probalby (I cannot tell this for sure) taken from file cache).
>> and the other time after restarting with a certainly empty file cache.
>>
>> Any ideas? I'm willing to help debugging and so on but I must admit that
>> I need someone to say me what to do :D
>>
> bit interesting. Could you send the output of diff? I'd like to see
> how it's breaking.
>
Unfortunately I don't have currently any of the corrupted files (deleted
them,..) but as soon as I'll encounter the issue again I'll send you :)

But as far as I remember there was no pattern,.. on time a small part
was replaced by 0x0's and the other time by any bytes.

Chris.

2006-11-09 18:32:50

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

OGAWA Hirofumi wrote:
> Christoph Anton Mitterer <[email protected]> writes:
>
>
>> The strange thing is that one time the differences were found directly
>> after copying (thus one would thing RAM is damaged, because the data was
>> probalby (I cannot tell this for sure) taken from file cache).
>> and the other time after restarting with a certainly empty file cache.
>>
>> Any ideas? I'm willing to help debugging and so on but I must admit that
>> I need someone to say me what to do :D
>>
>
> bit interesting. Could you send the output of diff? I'd like to see
> how it's breaking.
>
Ok today I have perhaps some more information:
I've copied around 30 GBs from FAT32 to ext3.
I diffed everything,.. differences in one file. I recopied that one
file, rebooted, diffed again differences in another file:
euler:~# diff -q -r /mnt/tmp/CDDA_DATA_1 /mnt/CDDA/EAC_DATA_1 Files
/mnt/tmp/CDDA_DATA_1/LOTR 1/16.01.wav and /mnt/CDDA/EAC_DATA_1/LOTR
1/16.01.wav differ

Than after the complete diff was finished I diffed the single file again
euler:~# diff /mnt/tmp/CDDA_DATA_1/LOTR\ 1/16.01.wav
/mnt/CDDA/EAC_DATA_1/LOTR\ 1/16.01.wav
=> then,.. no differences?!
Am I crazy or what?

Is this know an memory problem? But if so why does memtest give me no
errors?

Regards,
Chris,

2006-11-09 20:45:31

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> Christoph,
>
> Install then edac_mc module, and make sure through the
> sysctl command that pci parity checking is enabled.
>
> I have seen pci parity errors produce this sort of results,
> ie make 100 identical 50MB files, and cksum them and one
> will be wrong, do it a again, and the "wrong" one is now
> right, but someone else is "wrong".
Ah thx,... is it in the vanilla kernel?
And do you know of any possible results that this issue has? When I just
read data (see my original stuff with fat32) is it possible that this
had been modified or damaged?
Or are the only consequences that diff errors occur?

And what is responsible for that parity errors? Is it possible that any
hardware is damaged?

Thanks,
Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 21:02:12

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer wrote:
> Roger Heflin wrote:
>> The failure can manifest itself in many ways, I have
>> only seen it as a read failure, but there should be no
>> reason why it cannot also show as a write failure.
>>
>> It should be in the later vanilla kernels, it won't
>> be in the earlier ones, I would do a
>> find /lib/modules -name "*edac*" -ls
>>
>> It is a hw issue, either something is running faster that
>> it should be (pci bus set to fast for the given hardware/config)
>> or something is broken.
> The strange thing is that it always occures on the copied data,.. not
> the original (which is on another disk). But wouldn those parity errors
> not occur in general?
> For example al my sha1sums -c sumfile checks are working corretly on the
> original disk :/

It depends on which PCI bus has the issue and which hardware
is using the bus with the issue.

There are several different buses in most machines, and they are
broken out different ways, and the error can only affect one
or 2 devices on a certain part of the bus.

Roger

2006-11-09 20:54:35

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)


Christoph Anton Mitterer wrote:
> Roger Heflin wrote:
>> Christoph,
>>
>> Install then edac_mc module, and make sure through the
>> sysctl command that pci parity checking is enabled.
>>
>> I have seen pci parity errors produce this sort of results,
>> ie make 100 identical 50MB files, and cksum them and one
>> will be wrong, do it a again, and the "wrong" one is now
>> right, but someone else is "wrong".
> Ah thx,... is it in the vanilla kernel?
> And do you know of any possible results that this issue has? When I just
> read data (see my original stuff with fat32) is it possible that this
> had been modified or damaged?
> Or are the only consequences that diff errors occur?
>
> And what is responsible for that parity errors? Is it possible that any
> hardware is damaged?

The failure can manifest itself in many ways, I have
only seen it as a read failure, but there should be no
reason why it cannot also show as a write failure.

It should be in the later vanilla kernels, it won't
be in the earlier ones, I would do a
find /lib/modules -name "*edac*" -ls

It is a hw issue, either something is running faster that
it should be (pci bus set to fast for the given hardware/config)
or something is broken.

Roger

2006-11-09 20:59:11

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> The failure can manifest itself in many ways, I have
> only seen it as a read failure, but there should be no
> reason why it cannot also show as a write failure.
>
> It should be in the later vanilla kernels, it won't
> be in the earlier ones, I would do a
> find /lib/modules -name "*edac*" -ls
>
> It is a hw issue, either something is running faster that
> it should be (pci bus set to fast for the given hardware/config)
> or something is broken.
The strange thing is that it always occures on the copied data,.. not
the original (which is on another disk). But wouldn those parity errors
not occur in general?
For example al my sha1sums -c sumfile checks are working corretly on the
original disk :/


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 21:03:45

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer wrote:
> Roger Heflin wrote:
>> The failure can manifest itself in many ways, I have
>> only seen it as a read failure, but there should be no
>> reason why it cannot also show as a write failure.
>>
>> It should be in the later vanilla kernels, it won't
>> be in the earlier ones, I would do a
>> find /lib/modules -name "*edac*" -ls
>>
>> It is a hw issue, either something is running faster that
>> it should be (pci bus set to fast for the given hardware/config)
>> or something is broken.
> The strange thing is that it always occures on the copied data,.. not
> the original (which is on another disk). But wouldn those parity errors
> not occur in general?
> For example al my sha1sums -c sumfile checks are working corretly on the
> original disk :/

Are both disks of the same type and connected to the same
hardware?

Or do they have different physical connections/drivers to the
machine?

Roger

2006-11-09 21:11:13

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> Are both disks of the same type and connected to the same
> hardware?
>
> Or do they have different physical connections/drivers to the
> machine?

The system has 2 DualCore Opterons 275, on a Tyan S2895 board...
The disk with the originak data is a PATA disk from IBM.
The disk where I've copied the stuff to... is a SATA.

I did several diffs the last hours between the two disks and experienced
what you've described, that sometimes no differences sometimes there are
differences (in different files).

But note that the same happened already on the SAME disk.
In the beginning I copied the data to another place on the same disk,
then diffed and there were the same problems.
So I still wonder why this never affects the original files. When I
check sha512sums there I never get an error.


Right now I compile a new kernel with that module... and pray to god
that this is not an hardware error :/


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 21:57:57

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

It seems that I don't get any data at all.
I only get the edac_mc module but none that seems to support my chipset
or so...
Any ideas?


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 22:02:37

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer wrote:
> It seems that I don't get any data at all.
> I only get the edac_mc module but none that seems to support my chipset
> or so...
> Any ideas?

The mc part does pci parity, it is separate from the
chipset driver, I have even used the _mc part on a
Itanium with no chipset driver at all and had it report
parity errors properly, so I expect just the mc driver
to work.

You would need the k8 module for the cpu, but that is
only if you want ECC checking also.

If you got the _mc loaded do a "sysctl -a | grep mc" and
see what things are set how, and reset if necessary
check_pci_parity to 1.

Roger

2006-11-09 22:08:13

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> The mc part does pci parity, it is separate from the
> chipset driver,
What? I thought the MC part does ECC and the pci part the parity stuff?

> I have even used the _mc part on a
> Itanium with no chipset driver at all and had it report
> parity errors properly, so I expect just the mc driver
> to work.
>
> You would need the k8 module for the cpu, but that is
> only if you want ECC checking also.
>
Where do I get this only when patching from CVS?

> If you got the _mc loaded do a "sysctl -a | grep mc" and
> see what things are set how, and reset if necessary
> check_pci_parity to 1.
Well ok,.. module is loaded now:
I've set check_pci_parity to 1 everything else is 0 in sysfs...


# sysctl -a | grep mc
error: "Operation not permitted" reading key "net.ipv6.route.flush"
net.ipv6.neigh.eth1.mcast_solicit = 3
net.ipv6.neigh.eth0.mcast_solicit = 3
net.ipv6.neigh.lo.mcast_solicit = 3
net.ipv6.neigh.default.mcast_solicit = 3
net.ipv4.conf.ppp0.mc_forwarding = 0
net.ipv4.conf.eth1.mc_forwarding = 0
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.neigh.ppp0.mcast_solicit = 3
net.ipv4.neigh.eth1.mcast_solicit = 3
net.ipv4.neigh.eth0.mcast_solicit = 3
net.ipv4.neigh.lo.mcast_solicit = 3
net.ipv4.neigh.default.mcast_solicit = 3
error: "Operation not permitted" reading key "net.ipv4.route.flush"
error: "Invalid argument" reading key "fs.binfmt_misc.register"


But this has nothing to do with edac, has it?

And I've already had diff errors again,..
so if there had been some parity issue it should have been logged, right?


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 22:14:36

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer wrote:
> Roger Heflin wrote:
>> The mc part does pci parity, it is separate from the
>> chipset driver,
> What? I thought the MC part does ECC and the pci part the parity stuff?
>

mc does pci parity all by itself, it is also the main module
holding the ecc stuff together, but you get no ecc without the
chipset/cpu specific module.

>> I have even used the _mc part on a
>> Itanium with no chipset driver at all and had it report
>> parity errors properly, so I expect just the mc driver
>> to work.
>>
>> You would need the k8 module for the cpu, but that is
>> only if you want ECC checking also.
>>
> Where do I get this only when patching from CVS?

I don't know the status is of the k8 modules, some
distro kernels include it, I don't know if vanilla has
it yet.

mcelog should also report ecc errors, but you would need
to be running the mcelog userspace program every so often
to realize that errors where happening.

>
>> If you got the _mc loaded do a "sysctl -a | grep mc" and
>> see what things are set how, and reset if necessary
>> check_pci_parity to 1.
> Well ok,.. module is loaded now:
> I've set check_pci_parity to 1 everything else is 0 in sysfs...
>
>
> # sysctl -a | grep mc
> error: "Operation not permitted" reading key "net.ipv6.route.flush"
> net.ipv6.neigh.eth1.mcast_solicit = 3
> net.ipv6.neigh.eth0.mcast_solicit = 3
> net.ipv6.neigh.lo.mcast_solicit = 3
> net.ipv6.neigh.default.mcast_solicit = 3
> net.ipv4.conf.ppp0.mc_forwarding = 0
> net.ipv4.conf.eth1.mc_forwarding = 0
> net.ipv4.conf.eth0.mc_forwarding = 0
> net.ipv4.conf.lo.mc_forwarding = 0
> net.ipv4.conf.default.mc_forwarding = 0
> net.ipv4.conf.all.mc_forwarding = 0
> net.ipv4.neigh.ppp0.mcast_solicit = 3
> net.ipv4.neigh.eth1.mcast_solicit = 3
> net.ipv4.neigh.eth0.mcast_solicit = 3
> net.ipv4.neigh.lo.mcast_solicit = 3
> net.ipv4.neigh.default.mcast_solicit = 3
> error: "Operation not permitted" reading key "net.ipv4.route.flush"
> error: "Invalid argument" reading key "fs.binfmt_misc.register"
>
>
> But this has nothing to do with edac, has it?
>
> And I've already had diff errors again,..
> so if there had been some parity issue it should have been logged, right?

The names and locations may have change, I am more
familiar with the older versions that had the sysctl stuff
in them, the new parts may not have the sysctl stuff,
but if you make the adjustment with the /sys filesystem,
that should work just fine.

Roger

2006-11-09 22:24:14

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> Christoph Anton Mitterer wrote:
>
>> Roger Heflin wrote:
>>
>>> The mc part does pci parity, it is separate from the
>>> chipset driver,
>>>
>> What? I thought the MC part does ECC and the pci part the parity stuff?
>>
>>
>
> mc does pci parity all by itself, it is also the main module
> holding the ecc stuff together, but you get no ecc without the
> chipset/cpu specific module.
>
>
>>> I have even used the _mc part on a
>>> Itanium with no chipset driver at all and had it report
>>> parity errors properly, so I expect just the mc driver
>>> to work.
>>>
>>> You would need the k8 module for the cpu, but that is
>>> only if you want ECC checking also.
>>>
>>>
>> Where do I get this only when patching from CVS?
>>
>
> I don't know the status is of the k8 modules, some
> distro kernels include it, I don't know if vanilla has
> it yet.
>
> mcelog should also report ecc errors, but you would need
> to be running the mcelog userspace program every so often
> to realize that errors where happening.
>
>
>>> If you got the _mc loaded do a "sysctl -a | grep mc" and
>>> see what things are set how, and reset if necessary
>>> check_pci_parity to 1.
>>>
>> Well ok,.. module is loaded now:
>> I've set check_pci_parity to 1 everything else is 0 in sysfs...
>>
>>
>> # sysctl -a | grep mc
>> error: "Operation not permitted" reading key "net.ipv6.route.flush"
>> net.ipv6.neigh.eth1.mcast_solicit = 3
>> net.ipv6.neigh.eth0.mcast_solicit = 3
>> net.ipv6.neigh.lo.mcast_solicit = 3
>> net.ipv6.neigh.default.mcast_solicit = 3
>> net.ipv4.conf.ppp0.mc_forwarding = 0
>> net.ipv4.conf.eth1.mc_forwarding = 0
>> net.ipv4.conf.eth0.mc_forwarding = 0
>> net.ipv4.conf.lo.mc_forwarding = 0
>> net.ipv4.conf.default.mc_forwarding = 0
>> net.ipv4.conf.all.mc_forwarding = 0
>> net.ipv4.neigh.ppp0.mcast_solicit = 3
>> net.ipv4.neigh.eth1.mcast_solicit = 3
>> net.ipv4.neigh.eth0.mcast_solicit = 3
>> net.ipv4.neigh.lo.mcast_solicit = 3
>> net.ipv4.neigh.default.mcast_solicit = 3
>> error: "Operation not permitted" reading key "net.ipv4.route.flush"
>> error: "Invalid argument" reading key "fs.binfmt_misc.register"
>>
>>
>> But this has nothing to do with edac, has it?
>>
>> And I've already had diff errors again,..
>> so if there had been some parity issue it should have been logged, right?
>>
>
> The names and locations may have change, I am more
> familiar with the older versions that had the sysctl stuff
> in them, the new parts may not have the sysctl stuff,
> but if you make the adjustment with the /sys filesystem,
> that should work just fine.
>
Ahh now I see:
Parity Count:

'pci_parity_count'

This attribute file will display the number of parity errors that
have been detected.


but this is zero ...
So would that mean that I don't have any parity errors?

btw: I'm still always getting diff errors at different files...

Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 22:23:30

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

OGAWA Hirofumi wrote:
> Christoph Anton Mitterer <[email protected]> writes:
>
>
>> The strange thing is that one time the differences were found directly
>> after copying (thus one would thing RAM is damaged, because the data was
>> probalby (I cannot tell this for sure) taken from file cache).
>> and the other time after restarting with a certainly empty file cache.
>>
>> Any ideas? I'm willing to help debugging and so on but I must admit that
>> I need someone to say me what to do :D
>>
>
> bit interesting. Could you send the output of diff? I'd like to see
> how it's breaking.
>
I have now such a diff,... but where should I send it,.. it's quite big
(21266 bytes)

Regards,
Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 22:35:13

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer wrote:
e.
>>
> Ahh now I see:
> Parity Count:
>
> 'pci_parity_count'
>
> This attribute file will display the number of parity errors that
> have been detected.
>
>
> but this is zero ...
> So would that mean that I don't have any parity errors?
>
> btw: I'm still always getting diff errors at different files...
>
> Chris.
>

That should mean that it is not a HW pci bus issue, though I
still have seen odd MB failures that cause corruption and don't
show anywhere (pci, ecc, mcelog), and only show up with cksums
on specific pieces of hw.

I don't have any good way of find those, we swapped one part
at a time until it went quit doing it.

Roger

2006-11-09 22:38:49

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> That should mean that it is not a HW pci bus issue, though I
> still have seen odd MB failures that cause corruption and don't
> show anywhere (pci, ecc, mcelog), and only show up with cksums
> on specific pieces of hw.
>
> I don't have any good way of find those, we swapped one part
> at a time until it went quit doing it.
Would those errors also occur when just calculating message digests
(sha1sum)? Because if so,.. I could exclude those types of errors for my
issue because as I've told,.. at least on the original files the sha
sums always are correct.

Regards,
Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-09 22:42:50

by Roger Heflin

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer wrote:
> Roger Heflin wrote:
>> That should mean that it is not a HW pci bus issue, though I
>> still have seen odd MB failures that cause corruption and don't
>> show anywhere (pci, ecc, mcelog), and only show up with cksums
>> on specific pieces of hw.
>>
>> I don't have any good way of find those, we swapped one part
>> at a time until it went quit doing it.
> Would those errors also occur when just calculating message digests
> (sha1sum)? Because if so,.. I could exclude those types of errors for my
> issue because as I've told,.. at least on the original files the sha
> sums always are correct.
>
> Regards,
> Chris.

Usually it seemed to be IO related, the sums just happened
to show it issue. It did not seem to be a cpu issue,
something unknown outside of the cpu seemed to cause it.

Roger

2006-11-10 00:45:21

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Roger Heflin wrote:
> Usually it seemed to be IO related, the sums just happened
> to show it issue. It did not seem to be a cpu issue,
> something unknown outside of the cpu seemed to cause it.
>
Ok,.. as this is obviously not FAT32 related (just tested the whole
stuff on ext3) I'll open a new thread to hopefully attract more people
for help :-)

btw: right now I'm going to try the whole thing with the edac_mc with
ECC for K8.
mcelog did not return anything at all (just silently quitted).

Thanks so far and regards,
Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-10 01:49:31

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Christoph Anton Mitterer <[email protected]> writes:

> OGAWA Hirofumi wrote:
>> Christoph Anton Mitterer <[email protected]> writes:
>>
>>> The strange thing is that one time the differences were found directly
>>> after copying (thus one would thing RAM is damaged, because the data was
>>> probalby (I cannot tell this for sure) taken from file cache).
>>> and the other time after restarting with a certainly empty file cache.
>>>
>>> Any ideas? I'm willing to help debugging and so on but I must admit that
>>> I need someone to say me what to do :D
>>>
>>
>> bit interesting. Could you send the output of diff? I'd like to see
>> how it's breaking.
>>
> I have now such a diff,... but where should I send it,.. it's quite big
> (21266 bytes)

I think it's not so big. If you care, please send it to me.
--
OGAWA Hirofumi <[email protected]>

2006-11-10 02:55:43

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

OGAWA Hirofumi wrote:
>> I have now such a diff,... but where should I send it,.. it's quite big
>> (21266 bytes)
>>
>
> I think it's not so big. If you care, please send it to me.
>
Sorry this must wait until monday,... I'm away over the weekend (but I'm
available via email) and the file I had got lost.... but I will "create"
a new one monday.
Regards,
Chris.


Attachments:
calestyo.vcf (156.00 B)

2006-11-10 10:28:12

by Alan

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Ar Iau, 2006-11-09 am 23:08 +0100, ysgrifennodd Christoph Anton
Mitterer:
> And I've already had diff errors again,..
> so if there had been some parity issue it should have been logged, right?

If it was a PCI side parity error yes. If you have dodgy memory then the
K8 will MCE and report that if the MCE code is loaded. If the memory is
non ECC or the CPU doesn't support ECC memory you'll get silent strange
behaviour, but a long run of memtest86 can usually find any main memory
problems.

Alan

2006-11-11 16:01:47

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!)

Alan Cox wrote:
> If it was a PCI side parity error yes. If you have dodgy memory then the
> K8 will MCE and report that if the MCE code is loaded. If the memory is
> non ECC or the CPU doesn't support ECC memory you'll get silent strange
> behaviour, but a long run of memtest86 can usually find any main memory
> problems.
>
> Alan
>
Dear Alan....
The memory has ECC (and neither EDAC_MC with K8 support, nor mcelog (I
even tried to compile in both the AMD and intel MCE support) nor memtest
does show me any errors.

Pleas have a look at my "new" post.... as this is definitely not FAT32
related,.. I posted the whole thing unter a new thread (that that would
be the correct way).
There you'll also find my latest results.

Thanks in advance for any further help :-)

Chris.