2004-01-14 16:40:56

by Samium Gromoff

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot


I know this sounds stupid, but anyway:

I have seen the very same symptome caused by RAM faults (too slow ram
for given clocks, to be exact).

Yes gzipping/gunzipping a gigabyte of /dev/random data didn`t show up a
crc error.

That was an i845 chipset, by the way...

regards, Samium Gromoff



2004-01-14 22:31:02

by Ville Herva

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

On Wed, Jan 14, 2004 at 07:39:37PM +0300, you [Samium Gromoff] wrote:
>
> I know this sounds stupid, but anyway:
>
> I have seen the very same symptome caused by RAM faults (too slow ram
> for given clocks, to be exact).

The very same? You mean if booted, wrote few kB's of data to disk, synced,
then pressed reset, the same three bytes were corrupted (set to zero) each
time after reboot?

I can buy the faulty ram explanation for many symptoms, but it somehow in
this case it seems very unlikely. The box can be doing its thing (backing up
>20 workstations onto 6 ide disks) for weeks without ever corrupting
anything, and the when I power it down and up (after manually raidstopping
and umounting), three bytes get corrupted. (Well, sometimes few bytes in
addition to the three, but usually just three.)

> Yes gzipping/gunzipping a gigabyte of /dev/random data didn`t show up a
> crc error.

The box does survive memtest, but you're right that doesn't prove anything.

> That was an i845 chipset, by the way...

This is i815.


-- v --

[email protected]

2004-01-15 12:44:14

by Samium Gromoff

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

At Thu, 15 Jan 2004 00:30:40 +0200,
Ville Herva wrote:
>
> On Wed, Jan 14, 2004 at 07:39:37PM +0300, you [Samium Gromoff] wrote:
> >
> > I know this sounds stupid, but anyway:
> >
> > I have seen the very same symptome caused by RAM faults (too slow ram
> > for given clocks, to be exact).
>
> The very same? You mean if booted, wrote few kB's of data to disk, synced,
> then pressed reset, the same three bytes were corrupted (set to zero) each
> time after reboot?

No, corruption after reboot and perfect work inbetween.

> [email protected]

regards, Samium Gromoff


2004-01-15 19:58:14

by Ville Herva

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

On Thu, Jan 15, 2004 at 03:42:41PM +0300, you [Samium Gromoff] wrote:
> At Thu, 15 Jan 2004 00:30:40 +0200,
> Ville Herva wrote:
> >
> > On Wed, Jan 14, 2004 at 07:39:37PM +0300, you [Samium Gromoff] wrote:
> > >
> > > I know this sounds stupid, but anyway:
> > >
> > > I have seen the very same symptome caused by RAM faults (too slow ram
> > > for given clocks, to be exact).
> >
> > The very same? You mean if booted, wrote few kB's of data to disk, synced,
> > then pressed reset, the same three bytes were corrupted (set to zero) each
> > time after reboot?
>
> No, corruption after reboot and perfect work inbetween.

Very strange. And you got rid of it by replacing the memory?

Any theories on how faulty memory could actually cause something like this?
A bad spot in memory on an area where the bios code is cached, and hence is
never used apart from running the bios startup (not even by memtest86)?


-- v --

[email protected]

2004-01-16 10:25:49

by Samium Gromoff

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

At Thu, 15 Jan 2004 21:57:58 +0200,
Ville Herva wrote:
>
> On Thu, Jan 15, 2004 at 03:42:41PM +0300, you [Samium Gromoff] wrote:
> > At Thu, 15 Jan 2004 00:30:40 +0200,
> > Ville Herva wrote:
> > >
> > > On Wed, Jan 14, 2004 at 07:39:37PM +0300, you [Samium Gromoff] wrote:
> > > >
> > > > I know this sounds stupid, but anyway:
> > > >
> > > > I have seen the very same symptome caused by RAM faults (too slow ram
> > > > for given clocks, to be exact).
> > >
> > > The very same? You mean if booted, wrote few kB's of data to disk, synced,
> > > then pressed reset, the same three bytes were corrupted (set to zero) each
> > > time after reboot?
> >
> > No, corruption after reboot and perfect work inbetween.
>
> Very strange. And you got rid of it by replacing the memory?

Yeah.

> Any theories on how faulty memory could actually cause something like this?
> A bad spot in memory on an area where the bios code is cached, and hence is
> never used apart from running the bios startup (not even by memtest86)?

No idea, really :-)

> -- v --
>
> [email protected]

regards, Samium Gromoff