2002-07-15 16:43:20

by Maurice Volaski

[permalink] [raw]
Subject: Mount corrupts an ext2 filesystem on a RAM disk

I just wanted to you to note this is an old issue, which is still
waiting to be resolved. (I tried contacting Andrew Morton recently,
but haven't heard back from him.)

Ideally, this issue should be addressed before 2.4.19 goes official.
(I have not tested recent releases of this 2.4.19, so I can't say
whether it was fixed incidentally.)

Also, later on I learned that one must "cd" into the mounted ramdisk
to cause the corruption.

(All this was done on a RedHat 7.1 system with kernel 2.4.18 and
mount-2.11n-7. The problem does not happen on a RedHat 7.1 system
with kernel 2.4.17. )

The following was discovered attempting to use mkcdrec to make a backup.

I do the following to setup a ram disk on /dev/ram0...

dd if=/dev/zero of=/dev/ram0 bs=1k count=4096
mkfs.ext2 /dev/ram0 -m 0 -N 4096

This ram disk checks OK with fsck -f.

I mount it and already the lost+found directory is not there.

If unmount and force fsck, I get...

fsck 1.25 (20-Sep-2001)
e2fsck 1.25 (20-Sep-2001)
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 2: 52
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 1 inodes containing duplicate/bad blocks.)

File / (inode #2, mod time Fri Mar 1 21:03:59 2002)
has 1 duplicate block(s), shared with 1 file(s):
<filesystem metadata>
Clone duplicate/bad blocks<y>? yes

Pass 2: Checking directory structure
Directory inode 2, block 0, offset 0: directory corrupted
Salvage<y>? yes

Missing '.' in directory inode 2.
Fix<y>? yes

Setting filetype for entry '.' in ??? (2) to 2.
Missing '..' in directory inode 2.
Fix<y>? yes

Setting filetype for entry '..' in ??? (2) to 2.
Pass 3: Checking directory connectivity
'..' in / (2) is <The NULL inode> (0), should be / (2).
Fix<y>? yes

Unconnected directory inode 11 (/???)
Connect to /lost+found<y>? yes

/lost+found not found. Create<y>? yes

Pass 4: Checking reference counts
Inode 2 ref count is 10, should be 3. Fix<y>? yes

Inode 11 ref count is 3, should be 2. Fix<y>? yes

Pass 5: Checking group summary information
Free blocks count wrong for group #0 (3566, counted=3565).
Fix<y>? yes

Free blocks count wrong (3566, counted=3565).
Fix<y>? yes

Free inodes count wrong for group #0 (4085, counted=4084).
Fix<y>? yes

Directories count wrong for group #0 (2, counted=3).
Fix<y>? yes

Free inodes count wrong (4085, counted=4084).
Fix<y>? yes

If I mount the disk, lost+found is still missing. If I unmount and
force fsck, I get the same result above.
--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


2002-07-15 18:29:20

by DervishD

[permalink] [raw]
Subject: Re: Mount corrupts an ext2 filesystem on a RAM disk

Hi Maurice :)

>Also, later on I learned that one must "cd" into the mounted ramdisk
>to cause the corruption.

I've reproduced all your steps and my ramdisk didn't get
corrupted. See below.

>I do the following to setup a ram disk on /dev/ram0...
>dd if=/dev/zero of=/dev/ram0 bs=1k count=4096
>mkfs.ext2 /dev/ram0 -m 0 -N 4096

Identical commands issued...

>I mount it and already the lost+found directory is not there.

Mine is OK. The lost+found is there. I don't suffer any of the
other problems you tell, neither.

Maybe you have bad ram chips, or a damaged mke2fs (unlikely), but
the kernel seems to work OK. I've tested with 2.4.18 and 2.4.17.

Ra?l

2002-07-15 18:37:37

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Mount corrupts an ext2 filesystem on a RAM disk

On Mon, 15 Jul 2002, DervishD wrote:

> Hi Maurice :)
>
> >Also, later on I learned that one must "cd" into the mounted ramdisk
> >to cause the corruption.
>
> I've reproduced all your steps and my ramdisk didn't get
> corrupted. See below.
>
> >I do the following to setup a ram disk on /dev/ram0...
> >dd if=/dev/zero of=/dev/ram0 bs=1k count=4096
> >mkfs.ext2 /dev/ram0 -m 0 -N 4096
>
> Identical commands issued...
>
> >I mount it and already the lost+found directory is not there.
>
> Mine is OK. The lost+found is there. I don't suffer any of the
> other problems you tell, neither.
>
> Maybe you have bad ram chips, or a damaged mke2fs (unlikely), but
> the kernel seems to work OK. I've tested with 2.4.18 and 2.4.17.
>
> Ra?l
> -

It also works okay here. Maybe, just maybe, you booted with initrd,
but did't unmount it before you started mucking with it `umount /initrd`
in the script below.



Script started on Mon Jul 15 13:08:16 2002
# cat xxx.xxx
umount /initrd 2>/dev/null
umount /mnt 2>/dev/null
dd if=/dev/zero of=/dev/ram0 bs=1k count=4096
mkfs.ext2 /dev/ram0 -m 0 -N 4096
fsck -f /dev/ram0
mount /dev/ram0 /mnt
ls /mnt
cat </dev/zero >/mnt/foo
ls -la /mnt
umount /mnt
fsck -f /dev/ram0

# sh xxx.xxx
4096+0 records in
4096+0 records out
mke2fs 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
4096 inodes, 4096 blocks
0 blocks (0.00%) reserved for the super user
First data block=1
1 block group
8192 blocks per group, 8192 fragments per group
4096 inodes per group

Writing inode tables: 0/1done
Writing superblocks and filesystem accounting information: done
Parallelizing fsck version 1.19 (13-Jul-2000)
e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/ram0: 11/4096 files (0.0% non-contiguous), 530/4096 blocks
lost+found
cat: write error: No space left on device
total 3580
drwxr-xr-x 3 root root 1024 Jul 15 13:08 .
drwxr-xr-x 24 root root 4096 Jul 15 04:09 ..
-rw-r--r-- 1 root root 3633152 Jul 15 13:08 foo
drwxr-xr-x 2 root root 12288 Jul 15 13:08 lost+found
Parallelizing fsck version 1.19 (13-Jul-2000)
e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/ram0: 12/4096 files (0.0% non-contiguous), 4093/4096 blocks
# exit
exit

Script done on Mon Jul 15 13:08:28 2002


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-07-16 16:08:07

by Maurice Volaski

[permalink] [raw]
Subject: Re: Mount corrupts an ext2 filesystem on a RAM disk

>It also works okay here. Maybe, just maybe, you booted with initrd,
>but did't unmount it before you started mucking with it `umount /initrd`
>in the script below.
>

I retested the ramdisk without executing the initial umount
instructions you proposed and saw the problem as I expected, but when
I first ran your umount instructions and then tested again, the fsck
problem didn't show up!

I tested this on all three of my boxes and they all behave the same.

So I guess this means there is a problem somewhere else, but I don't know what:

1) initrd really had still been mounted though df does not detect it.

2) there is no ghost initrd mounted, but merely executing bogus
umount instructions in some unknown way prevents the problem.

If it is the first one, there are two questions: why is initrd not
getting unmounted and why doesn't it show up in df?
If it is the second one, then what could be going on?
--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University

2002-07-16 16:38:11

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Mount corrupts an ext2 filesystem on a RAM disk

On Tue, 16 Jul 2002, Maurice Volaski wrote:

> >It also works okay here. Maybe, just maybe, you booted with initrd,
> >but did't unmount it before you started mucking with it `umount /initrd`
> >in the script below.
> >
>
> I retested the ramdisk without executing the initial umount
> instructions you proposed and saw the problem as I expected, but when
> I first ran your umount instructions and then tested again, the fsck
> problem didn't show up!
>
> I tested this on all three of my boxes and they all behave the same.
>
> So I guess this means there is a problem somewhere else, but I don't
> know what:
>
> 1) initrd really had still been mounted though df does not detect it.
>
> 2) there is no ghost initrd mounted, but merely executing bogus
> umount instructions in some unknown way prevents the problem.
>
> If it is the first one, there are two questions: why is initrd not
> getting unmounted and why doesn't it show up in df?
> If it is the second one, then what could be going on?
> --

Depending upon the distribution, initrd may not be unmounted.
It would normally be 'transferred' to /initrd on the new root-
filesystem when that is mounted, and just left alone. It's not
going to show up in 'df' because it got mounted before /etc/mtab
existed and `df` (most versions I've seen) reads /etc/mtab. To
fix that problem, as root do:

rm /etc/mtab
ln -s /proc/mounts /etc/mtab

That will substitute /proc/mounts, which has the real information
about what actually got mounted.


Script started on Tue Jul 16 12:38:07 2002
# cat /etc/mtab
/dev/sdb1 / ext2 rw,noatime 0 0
/dev/sdc1 /alt ext2 rw,noatime 0 0
/dev/sdc3 /home/users ext2 rw,noatime 0 0
none /proc proc rw 0 0
/dev/sda1 /dos/drive_C msdos rw 0 0
/dev/sda5 /dos/drive_D msdos rw 0 0
# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sdb1 16603376 5906288 9853680 37% /
/dev/sdc1 6356624 1198700 4835020 20% /alt
/dev/sdc3 2253284 1768524 370300 83% /home/users
/dev/sda1 1048272 281840 766432 27% /dos/drive_C
/dev/sda5 1046224 181280 864944 17% /dos/drive_D
# rm /etc/mtab
# ln -s /proc/mounts /etc/mtab
# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/root.old 1766 1723 0 100% /initrd
/dev/root 16603376 5906284 9853684 37% /
/dev/sdc1 6356624 1198700 4835020 20% /alt
/dev/sdc3 2253284 1768524 370300 83% /home/users
/dev/sda1 1048272 281840 766432 27% /dos/drive_C
/dev/sda5 1046224 181280 864944 17% /dos/drive_D
# ls /initrd
bin dev etc lib linuxrc sbin
# umount /initrd
# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/root 16603376 5906284 9853684 37% /
/dev/sdc1 6356624 1198700 4835020 20% /alt
/dev/sdc3 2253284 1768524 370300 83% /home/users
/dev/sda1 1048272 281840 766432 27% /dos/drive_C
/dev/sda5 1046224 181280 864944 17% /dos/drive_D
# exit
exit
Script done on Tue Jul 16 12:39:28 2002

I don't normally set the link I demonstrated because the actual
drive name of my root file-system gets lost, it becomes /dev/root.
If I run lilo under this configuration, with the parameters
"root=current", I get something strange, failing to find the
root file-system upon boot.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.