2008-01-30 22:50:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

On Wed, 30 Jan 2008 14:29:27 -0800 (PST)
[email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9855
>
> Summary: ext3 ACL corruption
> Product: File System
> Version: 2.5
> KernelVersion: 2.6.23
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: ext3
> AssignedTo: [email protected]
> ReportedBy: [email protected]
>
>
> Latest working kernel version: Unknown
> Earliest failing kernel version: Definitely 2.6.23 and 2.6.23.8 but earlier is
> possible
> Distribution: Debian Etch
> Hardware Environment: Multiple x86 machines
>
> Software Environment:
> Filesystem is Ext3 on LVM on RAID-1 (on SATA).
> # e2fsck -V
> e2fsck 1.40-WIP (14-Nov-2006)
> Using EXT2FS Library version 1.40-WIP, 14-Nov-2006
>
> Problem Description:
> On several occasions now I have had e2fsck prune away ACLs on my file systems
> during a file system check after rebooting a number of (reasonably) long
> running Samba servers. This morning I decided to manually run fsck before
> rebooting one of these:
>
> # e2fsck -pfv /dev/mapper/vg_main-lv_samba
> (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> /dev/mapper/vg_main-lv_samba: Extended attribute in inode 163841 has a value
> offset (56) which is invalid
> CLEARED.
> (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> /dev/mapper/vg_main-lv_samba: Extended attribute in inode 262146 has a value
> offset (56) which is invalid
> CLEARED.
> [ snip lots of (near) identical errors]
>
> 8301 inodes used (0.08%)
> 1621 non-contiguous inodes (19.5%)
> # of inodes with ind/dind/tind blocks: 3837/24/0
> 1108478 blocks used (5.29%)
> 0 bad blocks
> 1 large file
>
> 7590 regular files
> 662 directories
> 0 character device files
> 0 block device files
> 0 fifos
> 0 links
> 40 symbolic links (38 fast symbolic links)
> 0 sockets
> --------
> 8292 files
>
> (Note: after remounting)
> # tune2fs -l /dev/mapper/vg_main-lv_samba
> tune2fs 1.40-WIP (14-Nov-2006)
> Filesystem volume name: <none>
> Last mounted on: <not available>
> Filesystem UUID: 88677414-c1f8-41ba-b737-d9f6170d771b
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype
> needs_recovery sparse_super large_file
> Filesystem flags: signed directory hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 10485760
> Block count: 20971520
> Reserved block count: 1048576
> Free blocks: 19863038
> Free inodes: 10477459
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 1019
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 16384
> Inode blocks per group: 1024
> Filesystem created: Wed Feb 21 21:38:33 2007
> Last mount time: Thu Jan 31 03:18:54 2008
> Last write time: Thu Jan 31 03:18:54 2008
> Mount count: 1
> Maximum mount count: 30
> Last checked: Thu Jan 31 03:16:51 2008
> Check interval: 15552000 (6 months)
> Next check after: Tue Jul 29 02:16:51 2008
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Journal inode: 8
> Default directory hash: tea
> Directory Hash Seed: be8c201b-3563-4fa5-a2a6-e2864e4b73e2
> Journal backup: inode blocks
>
>
> Steps to reproduce:
> Unfortunately, precise steps are not known. Restoring all the filesystem's ACLs
> from a recent dump made using "getfacl -RP" fixes the ACLs without causing the
> corruption to return.
>
> These are production Samba servers making fairly extensive use of file and
> directory ACLs. Thus far, I've only noticed the corruptions when it came time
> to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
> Note that I've never noticed any issues at runtime because of this - only when
> I later realised that ACLs had been removed from random files and/or
> directories.
>
> I think I will implement some scripts to unmount and run fsck nightly from
> cron, so I can at least detect the corruption a little earlier. If there is
> some more helpful debugging output I can provide, please let me know.
>


2008-01-31 07:49:19

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

On Jan 30, 2008 14:49 -0800, Andrew Morton wrote:
> > Problem Description:
> > On several occasions now I have had e2fsck prune away ACLs on my file systems
> > during a file system check after rebooting a number of (reasonably) long
> > running Samba servers. This morning I decided to manually run fsck before
> > rebooting one of these:
> >
> > # e2fsck -pfv /dev/mapper/vg_main-lv_samba
> > (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> > /dev/mapper/vg_main-lv_samba: Extended attribute in inode 163841 has a value
> > offset (56) which is invalid
> > CLEARED.
> > (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> > /dev/mapper/vg_main-lv_samba: Extended attribute in inode 262146 has a value
> > offset (56) which is invalid
> > CLEARED.

While these error messages still exist in e2fsck, this code appears to
have been changed somewhat because these same error messages no longer
get printed in e2fsprogs 1.40.5.

> > Inode size: 256

This is a bit interesting, since it isn't very common to use large inodes.
I suspect this relates to the problem.

> > These are production Samba servers making fairly extensive use of file and
> > directory ACLs. Thus far, I've only noticed the corruptions when it came time
> > to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
> > Note that I've never noticed any issues at runtime because of this - only when
> > I later realised that ACLs had been removed from random files and/or
> > directories.
> >
> > I think I will implement some scripts to unmount and run fsck nightly from
> > cron, so I can at least detect the corruption a little earlier. If there is
> > some more helpful debugging output I can provide, please let me know.

There is just such a script in the thread "forced fsck (again?)". Since you
are using LVs for the filesystem.

If you are able to reproduce this, could you please dump the inode and EA
block before fixing the problem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-01-31 14:58:29

by Eric Sandeen

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

Andreas Dilger wrote:
> On Jan 30, 2008 14:49 -0800, Andrew Morton wrote:
>>> Problem Description:

>>> Inode size: 256
>
> This is a bit interesting, since it isn't very common to use large inodes.
> I suspect this relates to the problem.

I think it is somewhat common on samba servers, though.

And it's the new default in the latest e2fsprogs... maybe something will
shake out in the F9 development cycle.

>>> These are production Samba servers making fairly extensive use of file and
>>> directory ACLs. Thus far, I've only noticed the corruptions when it came time
>>> to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
>>> Note that I've never noticed any issues at runtime because of this - only when
>>> I later realised that ACLs had been removed from random files and/or
>>> directories.
>>>
>>> I think I will implement some scripts to unmount and run fsck nightly from
>>> cron, so I can at least detect the corruption a little earlier. If there is
>>> some more helpful debugging output I can provide, please let me know.
>
> There is just such a script in the thread "forced fsck (again?)". Since you
> are using LVs for the filesystem.

Which is on the ext3-users list btw...

> If you are able to reproduce this, could you please dump the inode and EA
> block before fixing the problem.

Do you need instructions on doing that?

-Eric