2007-05-04 15:30:30

by Bernd Schubert

[permalink] [raw]
Subject: mkfs.ext2 triggerd RAM corruption

Hi,

I'm presently rather puzzled, if this is really a kernel bug, its a big bug.

Summary: The system ramdisk (initrd) gets corrupted while running mkfs.ext2 on
a local sata disk partition.

Reproduced on kernel versions: vanilla 2.6.16 - 2.6.20 (<2.6.16 doesn't run on
any of the systems I can do tests with).
Please note: I could reproduce this on serveral systems, all of them use ECC
memory and the memory of most of them the memory is monitored using EDAC.

Details:

1.) Our systems boot from an initrd, all system services are running from the
initrd/ramdisk.

2.) While setting up a lustre meta data storage server, lustre runs
mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/sda4
(Please note, I first observed this while using a lustre patched kernel, but I
could reproduce this with vanilla kernels).


While this mkfs.ext2 command was running, suddenly running commands such as
ps, top, ls, etc. resulted in segmentation faults.

To see whats going on, I copied the entire / (so the initrd) into a tmpfs
root, chrooted into it, also bind mounted the main / into this chroot and
compared several times /bin of chroot/bin and the bind-mounted /bin while the
mkfs.ext2 command was running.

beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/sleep and /oldroot/bin/sleep differ
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
Binary files /bin/cat and /oldroot/bin/cat differ
...

Also tested different schedulers, at least happens with deadline and
anticipatory.

The corruption does NOT happen on running the mkfs command on /dev/sda1, but
happens with sda2, sda3 and sda3. Also doesn't happen with extended
partitions of sda1.

Any idea whats going on?


Thanks,
Bernd


--
Bernd Schubert
Q-Leap Networks GmbH


2007-05-04 20:40:14

by Jan-Benedict Glaw

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

On Fri, 2007-05-04 16:59:51 +0200, Bernd Schubert <[email protected]> wrote:
> To see whats going on, I copied the entire / (so the initrd) into a tmpfs
> root, chrooted into it, also bind mounted the main / into this chroot and
> compared several times /bin of chroot/bin and the bind-mounted /bin while the
> mkfs.ext2 command was running.
>
> beo-05:/# diff -r /bin /oldroot/bin/
> beo-05:/# diff -r /bin /oldroot/bin/
> beo-05:/# diff -r /bin /oldroot/bin/
> Binary files /bin/sleep and /oldroot/bin/sleep differ
> beo-05:/# diff -r /bin /oldroot/bin/
> Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
> Binary files /bin/cat and /oldroot/bin/cat differ
> ...
>
> Also tested different schedulers, at least happens with deadline and
> anticipatory.
>
> The corruption does NOT happen on running the mkfs command on /dev/sda1, but
> happens with sda2, sda3 and sda3. Also doesn't happen with extended
> partitions of sda1.

Is sda2 the largest filesystem out of sda2, sda3 (and the logical
partitions within the extended sda1, if these get mkfs'ed, too)?

I'm not too sure that this is a kernel bug, but probably a bad RAM
chip. Did you run memtest86 for a while? ...and can you reproduce this
problem on different machines?

MfG, JBG

--
Jan-Benedict Glaw [email protected] +49-172-7608481
Signature of: Friends are relatives you make for yourself.
the second :


Attachments:
(No filename) (1.39 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-05-05 00:42:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

On Fri, May 04, 2007 at 04:59:51PM +0200, Bernd Schubert wrote:
>
> I'm presently rather puzzled, if this is really a kernel bug, its a big bug.
>
> Summary: The system ramdisk (initrd) gets corrupted while running
> mkfs.ext2 on a local sata disk partition.

What distribution are you using? What's the hardware configuration,
including amount of memory? What is the partition table look
like for /dev/sda? What filesystems are mounted? If you have any
soft RAID partitions, are any of them using part of /dev/sda? What
swap partitions are you using? And do any of the swap partitions
overlap with /dev/sda? :-)

- Ted

2007-05-05 01:36:42

by Bernd Schubert

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

Theodore Tso wrote:

> On Fri, May 04, 2007 at 04:59:51PM +0200, Bernd Schubert wrote:
>>
>> I'm presently rather puzzled, if this is really a kernel bug, its a
>> big
>> bug.
>>
>> Summary: The system ramdisk (initrd) gets corrupted while running
>> mkfs.ext2 on a local sata disk partition.
>
> What distribution are you using? What's the hardware configuration,

distribution: modified debian sarge, in which aspect is the distribution
important for this problem? mkfs2.ext2 is supposed to write to /dev/sdaX
and not /dev/rd/0. Stracing it and grepping for open calls shows that
only /dev/sdaX is opened in read-write mode.

hardware:
beo-05 and beo-06: cpu: xeon, acpi shows S3000PTH board, memory 2GB
(board too new for EDAC), piix sata controller

beo-106: Dual Core AMD Opteron, no idea what kind of board, 4GB memory
(k8_edac monitored), nforce sata controller

beo-01: Presently can't connect to it, afaik another intel system

(all system are running in x86_64 mode)


> including amount of memory? What is the partition table look
> like for /dev/sda? What filesystems are mounted? If you have any

I already tested several partition types, e.g. something like this for a
test on sda3

beo-05:~# sfdisk -d /dev/sda
# partition table of /dev/sda
unit: sectors

/dev/sda1 : start= 63, size= 4208967, Id=83
/dev/sda2 : start= 4209030, size= 4209030, Id=83
/dev/sda3 : start= 8418060, size=313251435, Id=83
/dev/sda4 : start= 0, size= 0, Id= 0


For the tests nothing was mounted.


> soft RAID partitions, are any of them using part of /dev/sda? What

No raid during the tests on sda, of course.
When sdaX was part of a raid testing the raid device, the corruption did
NOT happen.

> swap partitions are you using? And do any of the swap partitions

Swap already entirely disabled.

> overlap with /dev/sda? :-)

Suspected this first too, but the tested partition was never used as
swap partition (first always tested on sda4 and sda2 was used for swap),
later I entirely disabled the swap.

Thanks,
Bernd


PS: I took me about 10 hours of testing, before I wrote the first mail.
Took me that time to believe that its really a kernel bug.

2007-05-05 01:38:23

by Bernd Schubert

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

Jan-Benedict Glaw wrote:

> On Fri, 2007-05-04 16:59:51 +0200, Bernd Schubert <[email protected]>
> wrote:
>> To see whats going on, I copied the entire / (so the initrd) into a
>> tmpfs
>> root, chrooted into it, also bind mounted the main / into this chroot
>> and
>> compared several times /bin of chroot/bin and the bind-mounted /bin
>> while
>> the mkfs.ext2 command was running.
>>
>> beo-05:/# diff -r /bin /oldroot/bin/
>> beo-05:/# diff -r /bin /oldroot/bin/
>> beo-05:/# diff -r /bin /oldroot/bin/
>> Binary files /bin/sleep and /oldroot/bin/sleep differ
>> beo-05:/# diff -r /bin /oldroot/bin/
>> Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
>> Binary files /bin/cat and /oldroot/bin/cat differ
>> ...
>>
>> Also tested different schedulers, at least happens with deadline and
>> anticipatory.
>>
>> The corruption does NOT happen on running the mkfs command on
>> /dev/sda1,
>> but happens with sda2, sda3 and sda3. Also doesn't happen with
>> extended
>> partitions of sda1.
>
> Is sda2 the largest filesystem out of sda2, sda3 (and the logical
> partitions within the extended sda1, if these get mkfs'ed, too)?

I tested it that way:

- test on sda1, no further partitions
- test on sda2, sda1: ~2MB, everything else for sda2
- test on sda3, sda1: ~2MB, sda2: ~2MB, everything else for sda3
...
test on sda5: sda1: partition that has the extended partition,
everything in
sda5

>
> I'm not too sure that this is a kernel bug, but probably a bad RAM
> chip. Did you run memtest86 for a while? ...and can you reproduce this
> problem on different machines?

Reproducible on 4 test-systems (2 with identical hardware, but then the
2 + 1 + 1 with entirely different hardware combinations) with ECC memory,
which is monitored by EDAC. Memory, CPU, etc. are already real life stress
tested with several applications, e.g. linpack.
Though I don't entirely agree, my colleagues in this group are always
telling me, that their real life stress test shows more memory
corruptions than memtest. As soon as I have physical access again, I can also
do a memtest86 run (would like to do it over the weekend, but don't know how
to convince stupid rembo how to boot memtest).
Anyway, a memory corruption is more than unlikely on these systems for
several reasons.


Thanks,
Bernd

2007-05-05 18:57:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

On Sat, May 05, 2007 at 03:36:37AM +0200, Bernd Schubert wrote:
> distribution: modified debian sarge, in which aspect is the distribution
> important for this problem? mkfs2.ext2 is supposed to write to /dev/sdaX
> and not /dev/rd/0. Stracing it and grepping for open calls shows that
> only /dev/sdaX is opened in read-write mode.

/dev/rd/0? What's this? Is this the partition where your root
partition is found? What is it? Is it a ramdisk? Or is it some kind
of persistent storage device?

If it is a persistant storage device, do the corrupted files stay
corrupted when you reboot? (If it's a ramdisk which you load, then
obviously it's getting reloaded on reboot.) You didn't give enough
information to be sure exactly what's going on.

The next thing to ask is how the files are corrupted. Can you see
save a copy of the corrupted files to stable storage, so you can see
*how* they were corrupted. Were large swaths of zeros getting written
into it?

Next question; if you don't use these mke2fs parameters, can you
reproduce the corruption?

mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/sda4

What if you change the it to:

mkfs.ext2 -j -b 4096 /dev/sda4

Do you still see corruption problems?

> I already tested several partition types, e.g. something like this for a
> test on sda3
>
> beo-05:~# sfdisk -d /dev/sda
> # partition table of /dev/sda
> unit: sectors
>
> /dev/sda1 : start= 63, size= 4208967, Id=83
> /dev/sda2 : start= 4209030, size= 4209030, Id=83
> /dev/sda3 : start= 8418060, size=313251435, Id=83
> /dev/sda4 : start= 0, size= 0, Id= 0

What if the partition size is smaller; does that make the problem go
away? If so, can you do a binary search on the partition size where
the problem appears?

And what can you say about the SATA driver you were using; were all of
the machines that you tested this on using the same SATA controller
and same driver?

Obviously if this were a generic kernel problem, we'd been hearing
about this from a lot more people. So there has to be something
unique to your setup, and we need to figure out what that might happen
to be.

- Ted

2007-05-05 19:12:51

by Jan Engelhardt

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption


On May 5 2007 14:57, Theodore Tso wrote:
>On Sat, May 05, 2007 at 03:36:37AM +0200, Bernd Schubert wrote:
>> distribution: modified debian sarge, in which aspect is the distribution
>> important for this problem? mkfs2.ext2 is supposed to write to /dev/sdaX
>> and not /dev/rd/0. Stracing it and grepping for open calls shows that
>> only /dev/sdaX is opened in read-write mode.
>
>/dev/rd/0? What's this?

devfs (hint hint) naming for /dev/ram0.


Jan
--

2007-05-05 22:06:58

by Bernd Schubert

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

On Sat, May 05, 2007 at 09:12:02PM +0200, Jan Engelhardt wrote:
>
> On May 5 2007 14:57, Theodore Tso wrote:
> >On Sat, May 05, 2007 at 03:36:37AM +0200, Bernd Schubert wrote:
> >> distribution: modified debian sarge, in which aspect is the distribution
> >> important for this problem? mkfs2.ext2 is supposed to write to /dev/sdaX
> >> and not /dev/rd/0. Stracing it and grepping for open calls shows that
> >> only /dev/sdaX is opened in read-write mode.
> >
> >/dev/rd/0? What's this?
>
> devfs (hint hint) naming for /dev/ram0.

Yep, but udev knows devfs style ... (I already told I tested vanilla
kernels, so no patches).

Bernd

2007-05-05 23:09:11

by Bernd Schubert

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

On Sat, May 05, 2007 at 02:57:35PM -0400, Theodore Tso wrote:
> On Sat, May 05, 2007 at 03:36:37AM +0200, Bernd Schubert wrote:
> > distribution: modified debian sarge, in which aspect is the distribution
> > important for this problem? mkfs2.ext2 is supposed to write to /dev/sdaX
> > and not /dev/rd/0. Stracing it and grepping for open calls shows that
> > only /dev/sdaX is opened in read-write mode.
>
> /dev/rd/0? What's this? Is this the partition where your root
> partition is found? What is it? Is it a ramdisk? Or is it some kind
> of persistent storage device?
>
> If it is a persistant storage device, do the corrupted files stay
> corrupted when you reboot? (If it's a ramdisk which you load, then
> obviously it's getting reloaded on reboot.) You didn't give enough
> information to be sure exactly what's going on.

Sorry, should have expressed myself more clearly, /dev/rd/0 is the
devfs-style name of the first ram disk device (don't like those devfs
names myself, but since I'm rather new in this group I couldn't convice
my boss to switch to short names yet ;) ). However, its only the
devfs-style of udev and not devfs itself.

>
> The next thing to ask is how the files are corrupted. Can you see
> save a copy of the corrupted files to stable storage, so you can see
> *how* they were corrupted. Were large swaths of zeros getting written
> into it?

Yes, many zeros. Binary files, hexdump and diff are here:
http://www.q-leap.com/~bschubert/data-corruption

>
> Next question; if you don't use these mke2fs parameters, can you
> reproduce the corruption?
>
> mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/sda4
>
> What if you change the it to:
>
> mkfs.ext2 -j -b 4096 /dev/sda4
>
> Do you still see corruption problems?

No, no observable corruption.

>
> > I already tested several partition types, e.g. something like this for a
> > test on sda3
> >
> > beo-05:~# sfdisk -d /dev/sda
> > # partition table of /dev/sda
> > unit: sectors
> >
> > /dev/sda1 : start= 63, size= 4208967, Id=83
> > /dev/sda2 : start= 4209030, size= 4209030, Id=83
> > /dev/sda3 : start= 8418060, size=313251435, Id=83
> > /dev/sda4 : start= 0, size= 0, Id= 0
>
> What if the partition size is smaller; does that make the problem go
> away? If so, can you do a binary search on the partition size where
> the problem appears?

Need to test this thouroughly, but will do it tomorrow, its too late
here for this kind of tests.

>
> And what can you say about the SATA driver you were using; were all of
> the machines that you tested this on using the same SATA controller
> and same driver?

As you can see from my previous reply ;) tested with at least two
different controllers - intel and nvidia (will reboot on the 4th system on Monday to
figure out its hardware, once the corruption happened, the system tend to
stop working).

>
> Obviously if this were a generic kernel problem, we'd been hearing
> about this from a lot more people. So there has to be something
> unique to your setup, and we need to figure out what that might happen
> to be.

I also still have problems to believe its a generic problem...


Thanks for your help,
Bernd

2007-05-07 18:42:33

by Bill Davidsen

[permalink] [raw]
Subject: Re: mkfs.ext2 triggerd RAM corruption

Jan-Benedict Glaw wrote:
> On Fri, 2007-05-04 16:59:51 +0200, Bernd Schubert <[email protected]> wrote:
>> To see whats going on, I copied the entire / (so the initrd) into a tmpfs
>> root, chrooted into it, also bind mounted the main / into this chroot and
>> compared several times /bin of chroot/bin and the bind-mounted /bin while the
>> mkfs.ext2 command was running.
>>
>> beo-05:/# diff -r /bin /oldroot/bin/
>> beo-05:/# diff -r /bin /oldroot/bin/
>> beo-05:/# diff -r /bin /oldroot/bin/
>> Binary files /bin/sleep and /oldroot/bin/sleep differ
>> beo-05:/# diff -r /bin /oldroot/bin/
>> Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
>> Binary files /bin/cat and /oldroot/bin/cat differ
>> ...
>>
>> Also tested different schedulers, at least happens with deadline and
>> anticipatory.
>>
>> The corruption does NOT happen on running the mkfs command on /dev/sda1, but
>> happens with sda2, sda3 and sda3. Also doesn't happen with extended
>> partitions of sda1.
>
> Is sda2 the largest filesystem out of sda2, sda3 (and the logical
> partitions within the extended sda1, if these get mkfs'ed, too)?
>
> I'm not too sure that this is a kernel bug, but probably a bad RAM
> chip. Did you run memtest86 for a while? ...and can you reproduce this
> problem on different machines?
>
> MfG, JBG
>
Was this missing from your copy of the original post, or did you delete
it without reading? Note last sentence...

> Summary: The system ramdisk (initrd) gets corrupted while running
> mkfs.ext2 on a local sata disk partition.
>
> Reproduced on kernel versions: vanilla 2.6.16 - 2.6.20 (<2.6.16
> doesn't run on any of the systems I can do tests with). Please note:
> I could reproduce this on serveral systems, all of them use ECC
> memory and the memory of most of them the memory is monitored using
> EDAC.


--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot