2001-11-23 23:55:56

by Petr Vandrovec

[permalink] [raw]
Subject: 2.5.0 breakage even with fix?

Hi Al,
I'm now running 2.5.0 with fix you posted - and now during dselect
run I received:

Unpacking replacement manpages ...
EXT2-fs error (device ide0(3,3)): ext2_check_page: bad entry in directory
#3801539: unaligned directory entry - offset=0, inode=1801675088,
rec_len=26465, name_len=101
Remounting filesystem read-only
rm: cannot remove directory `/var/lib/dpkg/tmp.ci': Read-only file system
...

and system is obviously unusable. I'll probably reboot and run fsck again.
If someone can show me how I can dump contents of some inode by number
(and not by name) in debugfs, I can look into inode itself... I found
only 'ncheck', to convert number to name, and this is running and running...

System was running 2.5.0 without patch for some time, but I followed
your guidelines for rebooting:

fuser -k /
sync
mount -o remount,ro /
sync
reboot

After reboot fsck was NOT run, so it is possible that there
might be some corruption - but I ran fsck on my non-root partition
after boot, and it did not show any problems.
Thanks,
Petr Vandrovec
[email protected]



2001-11-24 00:06:27

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.5.0 breakage even with fix?

On 24 Nov 01 at 0:54, [email protected] wrote:
> Hi Al,
> I'm now running 2.5.0 with fix you posted - and now during dselect
> run I received:
>
> Unpacking replacement manpages ...
> EXT2-fs error (device ide0(3,3)): ext2_check_page: bad entry in directory
> #3801539: unaligned directory entry - offset=0, inode=1801675088,
> rec_len=26465, name_len=101
> Remounting filesystem read-only
> rm: cannot remove directory `/var/lib/dpkg/tmp.ci': Read-only file system

Well, ncheck finished.

debugfs: stat /var/lib/dpkg/tmp.ci
Inode: 3801539 Type: directory Mode: 0755 Flags: 0x0 Generation: 537829
User: 0 Group: 0 Size: 4096
File ACL: 0 Directory ACL: 0
Links: 2 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x3bfede00 -- Sat Nov 24 00:38:40 2001
atime: dtto
mtime: dtto
BLOCKS:
(0):7603845
TOTAL: 1

debugfs: cat /var/lib/dpkg/tmp.ci
Package: diff
Version: 2.7-28
Section: base
Priority: required
Architecture: i386
...

It does not look like a directory to me. Unfortunately, as we
do not have coherent /dev/hda3 cache, I have no idea how to read
real contents of /var/lib/dpkg/tmp.ci, but
ls -l /var/lib/dpkg/tmp.ci/ reemited error message about
ext2-fs error, so I think that it is real problem, and my tmp.ci directory
contains some file contents instead. And I'm 100% sure that
/var/lib/dpkg/tmp.ci was created with patched kernel :-(
Petr Vandrovec
[email protected]

2001-11-24 00:14:18

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.5.0 breakage even with fix?



On Sat, 24 Nov 2001, Petr Vandrovec wrote:

> After reboot fsck was NOT run, so it is possible that there
> might be some corruption - but I ran fsck on my non-root partition
> after boot, and it did not show any problems.

fsck -f

Filesystem _is_ marked clean, so unless you do forced fsck no checks
are done.

Moreover, attempt to work with corrupted fs can break in very interesting
ways, so unless you do fsck -f even correct kernel (be it patched 2.4.15
or something earlier than 2.4.15-pre9) will not help.

2001-11-24 00:17:08

by Jeff Merkey

[permalink] [raw]
Subject: Re: Re: 2.5.0 breakage even with fix?


I am seeing file system corruption in NWFS in 2.5.0 with the patch. It's
not a severe as
ext2 directly was, and is simply creating mirror mismatches between the FAT
and DIR
tables, and is easily recovered, but it is annoying. I am also getting
"resource busy" during reboot when I try to reboot with a mounted NWFS
volume.

Jeff

----- Original Message -----
From: "Petr Vandrovec" <[email protected]>
To: <[email protected]>
Cc: <[email protected]>
Sent: Friday, November 23, 2001 6:05 PM
Subject: Re: 2.5.0 breakage even with fix?


> On 24 Nov 01 at 0:54, [email protected] wrote:
> > Hi Al,
> > I'm now running 2.5.0 with fix you posted - and now during dselect
> > run I received:
> >
> > Unpacking replacement manpages ...
> > EXT2-fs error (device ide0(3,3)): ext2_check_page: bad entry in
directory
> > #3801539: unaligned directory entry - offset=0, inode=1801675088,
> > rec_len=26465, name_len=101
> > Remounting filesystem read-only
> > rm: cannot remove directory `/var/lib/dpkg/tmp.ci': Read-only file
system
>
> Well, ncheck finished.
>
> debugfs: stat /var/lib/dpkg/tmp.ci
> Inode: 3801539 Type: directory Mode: 0755 Flags: 0x0 Generation:
537829
> User: 0 Group: 0 Size: 4096
> File ACL: 0 Directory ACL: 0
> Links: 2 Blockcount: 8
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x3bfede00 -- Sat Nov 24 00:38:40 2001
> atime: dtto
> mtime: dtto
> BLOCKS:
> (0):7603845
> TOTAL: 1
>
> debugfs: cat /var/lib/dpkg/tmp.ci
> Package: diff
> Version: 2.7-28
> Section: base
> Priority: required
> Architecture: i386
> ...
>
> It does not look like a directory to me. Unfortunately, as we
> do not have coherent /dev/hda3 cache, I have no idea how to read
> real contents of /var/lib/dpkg/tmp.ci, but
> ls -l /var/lib/dpkg/tmp.ci/ reemited error message about
> ext2-fs error, so I think that it is real problem, and my tmp.ci directory
> contains some file contents instead. And I'm 100% sure that
> /var/lib/dpkg/tmp.ci was created with patched kernel :-(
> Petr Vandrovec
> [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-11-24 00:20:48

by Jeff Merkey

[permalink] [raw]
Subject: Re: Re: 2.5.0 breakage even with fix?


I fscked and vrepaired the NWFS volumes, and it seems to be OK now. Might
have been
left over from the previous errors. I will let you know if I see other
problems.

:-)

Jeff

----- Original Message -----
From: "Jeff Merkey" <[email protected]>
To: "Petr Vandrovec" <[email protected]>; <[email protected]>
Cc: <[email protected]>
Sent: Friday, November 23, 2001 5:14 PM
Subject: Re: Re: 2.5.0 breakage even with fix?


>
> I am seeing file system corruption in NWFS in 2.5.0 with the patch. It's
> not a severe as
> ext2 directly was, and is simply creating mirror mismatches between the
FAT
> and DIR
> tables, and is easily recovered, but it is annoying. I am also getting
> "resource busy" during reboot when I try to reboot with a mounted NWFS
> volume.
>
> Jeff
>
> ----- Original Message -----
> From: "Petr Vandrovec" <[email protected]>
> To: <[email protected]>
> Cc: <[email protected]>
> Sent: Friday, November 23, 2001 6:05 PM
> Subject: Re: 2.5.0 breakage even with fix?
>
>
> > On 24 Nov 01 at 0:54, [email protected] wrote:
> > > Hi Al,
> > > I'm now running 2.5.0 with fix you posted - and now during dselect
> > > run I received:
> > >
> > > Unpacking replacement manpages ...
> > > EXT2-fs error (device ide0(3,3)): ext2_check_page: bad entry in
> directory
> > > #3801539: unaligned directory entry - offset=0, inode=1801675088,
> > > rec_len=26465, name_len=101
> > > Remounting filesystem read-only
> > > rm: cannot remove directory `/var/lib/dpkg/tmp.ci': Read-only file
> system
> >
> > Well, ncheck finished.
> >
> > debugfs: stat /var/lib/dpkg/tmp.ci
> > Inode: 3801539 Type: directory Mode: 0755 Flags: 0x0 Generation:
> 537829
> > User: 0 Group: 0 Size: 4096
> > File ACL: 0 Directory ACL: 0
> > Links: 2 Blockcount: 8
> > Fragment: Address: 0 Number: 0 Size: 0
> > ctime: 0x3bfede00 -- Sat Nov 24 00:38:40 2001
> > atime: dtto
> > mtime: dtto
> > BLOCKS:
> > (0):7603845
> > TOTAL: 1
> >
> > debugfs: cat /var/lib/dpkg/tmp.ci
> > Package: diff
> > Version: 2.7-28
> > Section: base
> > Priority: required
> > Architecture: i386
> > ...
> >
> > It does not look like a directory to me. Unfortunately, as we
> > do not have coherent /dev/hda3 cache, I have no idea how to read
> > real contents of /var/lib/dpkg/tmp.ci, but
> > ls -l /var/lib/dpkg/tmp.ci/ reemited error message about
> > ext2-fs error, so I think that it is real problem, and my tmp.ci
directory
> > contains some file contents instead. And I'm 100% sure that
> > /var/lib/dpkg/tmp.ci was created with patched kernel :-(
> > Petr Vandrovec
> > [email protected]
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>

2001-11-24 00:25:58

by Andreas Dilger

[permalink] [raw]
Subject: Re: 2.5.0 breakage even with fix?

On Nov 24, 2001 00:54 +0000, Petr Vandrovec wrote:
> I'm now running 2.5.0 with fix you posted - and now during dselect
> run I received:
>
> Unpacking replacement manpages ...
> EXT2-fs error (device ide0(3,3)): ext2_check_page: bad entry in directory
> #3801539: unaligned directory entry - offset=0, inode=1801675088,
> rec_len=26465, name_len=101
> Remounting filesystem read-only
> rm: cannot remove directory `/var/lib/dpkg/tmp.ci': Read-only file system

Did you run e2fsck -f after running unpatched 2.4.15/2.5.0? This may be
left-over garbage from the other problem.

> and system is obviously unusable. I'll probably reboot and run fsck again.
> If someone can show me how I can dump contents of some inode by number
> (and not by name) in debugfs, I can look into inode itself... I found
> only 'ncheck', to convert number to name, and this is running and running...

debugfs> stat <inum>
debugfs> dump <inum> /tmp/file


Note that you need to include the <> around the inode number.

> System was running 2.5.0 without patch for some time, but I followed
> your guidelines for rebooting:
>
> fuser -k /
> sync
> mount -o remount,ro /
> sync
> reboot
>
> After reboot fsck was NOT run, so it is possible that there
> might be some corruption - but I ran fsck on my non-root partition
> after boot, and it did not show any problems.

Ah, yes. Definitely sounds like left over corruption.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-24 22:44:53

by Robert Boermans

[permalink] [raw]
Subject: Re: 2.5.0 breakage even with fix?

Alexander Viro wrote:

> fsck -f
>
> Filesystem _is_ marked clean, so unless you do forced fsck no checks
> are done.
>
> Moreover, attempt to work with corrupted fs can break in very interesting
> ways, so unless you do fsck -f even correct kernel (be it patched 2.4.15
> or something earlier than 2.4.15-pre9) will not help.

If the filesystem is marked clean, does that mean that people with
journalling file systems are fscked? (since there might be no journal entry
of what hasn't finished.)

just guessing, I don't know how these work, but if ext2 gets the 'clean' bit
set, i can imagine the journaling file systems refusing to check anything...

Robert.

2001-11-24 23:03:45

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.5.0 breakage even with fix?



On Sun, 25 Nov 2001, Robert Boermans wrote:

> If the filesystem is marked clean, does that mean that people with
> journalling file systems are fscked? (since there might be no journal entry
> of what hasn't finished.)

Well, if filesystem doesn't have a recovery tool that would allow forced
check mode - you _are_ screwed. As you will be again and again if you get
memory corruption/driver bugs/fs bugs/RAID bugs/physical disk problems/etc.

Again, if filesystem trusts clear bit to the extent that you have no way
to convince it that checks _are_ needed - it's unfit for any serious use.
I suspect that by now everybody had learnt that much - that used to be
a permanent source of problems with early journalling filesystems and AFAIK
all of them had been fixed since then.

2001-11-24 23:15:47

by Jeff Merkey

[permalink] [raw]
Subject: Re: 2.5.0 breakage even with fix?

Al,

I am not seeing any more breakage with this fix with NWFS.

Jeff

----- Original Message -----
From: "Alexander Viro" <[email protected]>
To: "Robert Boermans" <[email protected]>
Cc: "Petr Vandrovec" <[email protected]>; <[email protected]>
Sent: Saturday, November 24, 2001 4:03 PM
Subject: Re: 2.5.0 breakage even with fix?


>
>
> On Sun, 25 Nov 2001, Robert Boermans wrote:
>
> > If the filesystem is marked clean, does that mean that people with
> > journalling file systems are fscked? (since there might be no journal
entry
> > of what hasn't finished.)
>
> Well, if filesystem doesn't have a recovery tool that would allow forced
> check mode - you _are_ screwed. As you will be again and again if you get
> memory corruption/driver bugs/fs bugs/RAID bugs/physical disk
problems/etc.
>
> Again, if filesystem trusts clear bit to the extent that you have no way
> to convince it that checks _are_ needed - it's unfit for any serious use.
> I suspect that by now everybody had learnt that much - that used to be
> a permanent source of problems with early journalling filesystems and
AFAIK
> all of them had been fixed since then.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/