Hi,
In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
getfattr can't search the extended attribute (EA) after remount.
Condition:
1. the inode size is over 128 byte
2. "lost+found" whose inode number is 11 was removed
3. the 11th inode is used for a file.
4. the EA locates in-inode
This happens because of following logic:
i_extra_isize is set to over 0 by ext3_new_inode() when we create
a file whose inode number is 11 after removing "lost+found".
Therefore setfattr creates the EA in-inode.
After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
when we lookup the file, so getfattr tries to search the EA out-inode.
However, the EA locates in-inode, so getfattr can't search the EA.
How to reproduce:
1. mkfs.ext3 -I 256 /dev/sdXX
2. mount -o acl,user_xattr /dev/sdXX /TEST
3. rm -rf /TEST/*
4. touch /TEST/file (whose inode number is 11)
5. cd /TEST; setfattr -n user.foo0 -v bar0 file
6. cd /TEST; getfattr -d file
-> can see foo0/bar0
7. umount /dev/sdXX
8. mount -o acl,user_xattr /dev/sdXX /TEST
9. cd /TEST; getfattr -d file
-> can't see foo0/bar0
Though the 11th inode is used for "lost+found" normally, the other
file can also use it. Therefore, i_extra_isize of 11th inode should be set
to the suitable value by ext3_iget().
Signed-off-by: Masayoshi MIZUMA <[email protected]>
---
fs/ext3/inode.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 735f019..85e8574 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2881,8 +2881,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
atomic_set(&ei->i_datasync_tid, tid);
}
- if (inode->i_ino >= EXT3_FIRST_INO(inode->i_sb) + 1 &&
- EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
+ if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
/*
* When mke2fs creates big inodes it does not zero out
* the unused bytes above EXT3_GOOD_OLD_INODE_SIZE,
--
1.6.2.5
Thanks,
Masayoshi
On 2010-08-19, at 20:20, Masayoshi MIZUMA wrote:
> In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
> getfattr can't search the extended attribute (EA) after remount.
>
> This happens because of following logic:
> i_extra_isize is set to over 0 by ext3_new_inode() when we create
> a file whose inode number is 11 after removing "lost+found".
> Therefore setfattr creates the EA in-inode.
> After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
> when we lookup the file, so getfattr tries to search the EA out-inode.
> However, the EA locates in-inode, so getfattr can't search the EA.
This was a workaround for a bug in mke2fs a couple of years ago, and is probably no longer needed.
> @@ -2881,8 +2881,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
> atomic_set(&ei->i_datasync_tid, tid);
> }
>
> - if (inode->i_ino >= EXT3_FIRST_INO(inode->i_sb) + 1 &&
> - EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
> + if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
> /*
> * When mke2fs creates big inodes it does not zero out
> * the unused bytes above EXT3_GOOD_OLD_INODE_SIZE,
This should also remove the above comment, which is no longer relevant.
Cheers, Andreas
On Fri, 20 Aug 2010 01:42:25 -0600
Andreas Dilger <[email protected]> wrote:
> On 2010-08-19, at 20:20, Masayoshi MIZUMA wrote:
> > In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
> > getfattr can't search the extended attribute (EA) after remount.
> >
> > This happens because of following logic:
> > i_extra_isize is set to over 0 by ext3_new_inode() when we create
> > a file whose inode number is 11 after removing "lost+found".
> > Therefore setfattr creates the EA in-inode.
> > After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
> > when we lookup the file, so getfattr tries to search the EA out-inode.
> > However, the EA locates in-inode, so getfattr can't search the EA.
>
> This was a workaround for a bug in mke2fs a couple of years ago, and is probably no longer needed.
I understand it. Thanks.
>
> > @@ -2881,8 +2881,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
> > atomic_set(&ei->i_datasync_tid, tid);
> > }
> >
> > - if (inode->i_ino >= EXT3_FIRST_INO(inode->i_sb) + 1 &&
> > - EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
> > + if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
> > /*
> > * When mke2fs creates big inodes it does not zero out
> > * the unused bytes above EXT3_GOOD_OLD_INODE_SIZE,
>
> This should also remove the above comment, which is no longer relevant.
OK. I will resend the patch which is removed the above comment.
Thanks,
Masayoshi
In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
getfattr can't search the extended attribute (EA) after remount.
Condition:
1. the inode size is over 128 byte
2. "lost+found" whose inode number is 11 was removed
3. the 11th inode is used for a file.
4. the EA locates in-inode
This happens because of following logic:
i_extra_isize is set to over 0 by ext3_new_inode() when we create
a file whose inode number is 11 after removing "lost+found".
Therefore setfattr creates the EA in-inode.
After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
when we lookup the file, so getfattr tries to search the EA out-inode.
However, the EA locates in-inode, so getfattr can't search the EA.
How to reproduce:
1. mkfs.ext3 -I 256 /dev/sdXX
2. mount -o acl,user_xattr /dev/sdXX /TEST
3. rm -rf /TEST/*
4. touch /TEST/file (whose inode number is 11)
5. cd /TEST; setfattr -n user.foo0 -v bar0 file
6. cd /TEST; getfattr -d file
-> can see foo0/bar0
7. umount /dev/sdXX
8. mount -o acl,user_xattr /dev/sdXX /TEST
9. cd /TEST; getfattr -d file
-> can't see foo0/bar0
Though the 11th inode is used for "lost+found" normally, the other
file can also use it. Therefore, i_extra_isize of 11th inode should be set
to the suitable value by ext3_iget().
CC: Andreas Dilger <[email protected]>
Signed-off-by: Masayoshi MIZUMA <[email protected]>
---
fs/ext3/inode.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 5e0faf4..69c3d47 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2914,13 +2914,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
atomic_set(&ei->i_datasync_tid, tid);
}
- if (inode->i_ino >= EXT3_FIRST_INO(inode->i_sb) + 1 &&
- EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
- /*
- * When mke2fs creates big inodes it does not zero out
- * the unused bytes above EXT3_GOOD_OLD_INODE_SIZE,
- * so ignore those first few inodes.
- */
+ if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
if (EXT3_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
EXT3_INODE_SIZE(inode->i_sb)) {
--
1.6.2.5
Thanks,
Masayoshi
On Mon 23-08-10 09:50:56, Masayoshi MIZUMA wrote:
> In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
> getfattr can't search the extended attribute (EA) after remount.
>
> Condition:
> 1. the inode size is over 128 byte
> 2. "lost+found" whose inode number is 11 was removed
> 3. the 11th inode is used for a file.
> 4. the EA locates in-inode
>
> This happens because of following logic:
> i_extra_isize is set to over 0 by ext3_new_inode() when we create
> a file whose inode number is 11 after removing "lost+found".
> Therefore setfattr creates the EA in-inode.
> After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
> when we lookup the file, so getfattr tries to search the EA out-inode.
> However, the EA locates in-inode, so getfattr can't search the EA.
>
> How to reproduce:
> 1. mkfs.ext3 -I 256 /dev/sdXX
> 2. mount -o acl,user_xattr /dev/sdXX /TEST
> 3. rm -rf /TEST/*
> 4. touch /TEST/file (whose inode number is 11)
> 5. cd /TEST; setfattr -n user.foo0 -v bar0 file
> 6. cd /TEST; getfattr -d file
> -> can see foo0/bar0
> 7. umount /dev/sdXX
> 8. mount -o acl,user_xattr /dev/sdXX /TEST
> 9. cd /TEST; getfattr -d file
> -> can't see foo0/bar0
>
> Though the 11th inode is used for "lost+found" normally, the other
> file can also use it. Therefore, i_extra_isize of 11th inode should be set
> to the suitable value by ext3_iget().
Hmm, with which kernel have you tested that? Because my 2.6.32 kernel
works just fine (and looking into the code, all should be handled well).
Look:
mount -o loop,user_xattr ~jack/fs-images/ext3-image /mnt/
quack:/crypted/home/jack # cd /mnt/
quack:/mnt # touch file
quack:/mnt # ls -i file
11 file
quack:/mnt # setfattr -n user.foo0 -v bar0 file
quack:/mnt # getfattr -d file
# file: file
user.foo0="bar0"
quack:/mnt # cd
quack:~ # umount /mnt
quack:~ # mount -o loop,user_xattr ~jack/fs-images/ext3-image /mnt/
quack:~ # getfattr -d /mnt/file
getfattr: Removing leading '/' from absolute path names
# file: mnt/file
user.foo0="bar0"
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue, 24 Aug 2010 15:17:36 +0200
Jan Kara <[email protected]> wrote:
> On Mon 23-08-10 09:50:56, Masayoshi MIZUMA wrote:
> > In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
> > getfattr can't search the extended attribute (EA) after remount.
> >
> > Condition:
> > 1. the inode size is over 128 byte
> > 2. "lost+found" whose inode number is 11 was removed
> > 3. the 11th inode is used for a file.
> > 4. the EA locates in-inode
> >
> > This happens because of following logic:
> > i_extra_isize is set to over 0 by ext3_new_inode() when we create
> > a file whose inode number is 11 after removing "lost+found".
> > Therefore setfattr creates the EA in-inode.
> > After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
> > when we lookup the file, so getfattr tries to search the EA out-inode.
> > However, the EA locates in-inode, so getfattr can't search the EA.
> >
> > How to reproduce:
> > 1. mkfs.ext3 -I 256 /dev/sdXX
> > 2. mount -o acl,user_xattr /dev/sdXX /TEST
> > 3. rm -rf /TEST/*
> > 4. touch /TEST/file (whose inode number is 11)
> > 5. cd /TEST; setfattr -n user.foo0 -v bar0 file
> > 6. cd /TEST; getfattr -d file
> > -> can see foo0/bar0
> > 7. umount /dev/sdXX
> > 8. mount -o acl,user_xattr /dev/sdXX /TEST
> > 9. cd /TEST; getfattr -d file
> > -> can't see foo0/bar0
> >
> > Though the 11th inode is used for "lost+found" normally, the other
> > file can also use it. Therefore, i_extra_isize of 11th inode should be set
> > to the suitable value by ext3_iget().
> Hmm, with which kernel have you tested that? Because my 2.6.32 kernel
> works just fine (and looking into the code, all should be handled well).
I tested at 2.6.35.
> Look:
> mount -o loop,user_xattr ~jack/fs-images/ext3-image /mnt/
> quack:/crypted/home/jack # cd /mnt/
> quack:/mnt # touch file
> quack:/mnt # ls -i file
> 11 file
> quack:/mnt # setfattr -n user.foo0 -v bar0 file
> quack:/mnt # getfattr -d file
> # file: file
> user.foo0="bar0"
>
> quack:/mnt # cd
> quack:~ # umount /mnt
> quack:~ # mount -o loop,user_xattr ~jack/fs-images/ext3-image /mnt/
> quack:~ # getfattr -d /mnt/file
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/file
> user.foo0="bar0"
What size is the inode ? This problem happens if the size is larger than 128 byte.
(I tested at 256 byte inode.)
Thanks,
Masayoshi
On Wed 25-08-10 09:05:52, Masayoshi MIZUMA wrote:
> On Tue, 24 Aug 2010 15:17:36 +0200
> Jan Kara <[email protected]> wrote:
> > On Mon 23-08-10 09:50:56, Masayoshi MIZUMA wrote:
> > > In ext3 filesystem, if following conditions 1., 2., 3. and 4. is satisfied,
> > > getfattr can't search the extended attribute (EA) after remount.
> > >
> > > Condition:
> > > 1. the inode size is over 128 byte
> > > 2. "lost+found" whose inode number is 11 was removed
> > > 3. the 11th inode is used for a file.
> > > 4. the EA locates in-inode
> > >
> > > This happens because of following logic:
> > > i_extra_isize is set to over 0 by ext3_new_inode() when we create
> > > a file whose inode number is 11 after removing "lost+found".
> > > Therefore setfattr creates the EA in-inode.
> > > After remount, i_extra_isize of 11th inode is set to 0 by ext3_iget()
> > > when we lookup the file, so getfattr tries to search the EA out-inode.
> > > However, the EA locates in-inode, so getfattr can't search the EA.
> > >
> > > How to reproduce:
> > > 1. mkfs.ext3 -I 256 /dev/sdXX
> > > 2. mount -o acl,user_xattr /dev/sdXX /TEST
> > > 3. rm -rf /TEST/*
> > > 4. touch /TEST/file (whose inode number is 11)
> > > 5. cd /TEST; setfattr -n user.foo0 -v bar0 file
> > > 6. cd /TEST; getfattr -d file
> > > -> can see foo0/bar0
> > > 7. umount /dev/sdXX
> > > 8. mount -o acl,user_xattr /dev/sdXX /TEST
> > > 9. cd /TEST; getfattr -d file
> > > -> can't see foo0/bar0
> > >
> > > Though the 11th inode is used for "lost+found" normally, the other
> > > file can also use it. Therefore, i_extra_isize of 11th inode should be set
> > > to the suitable value by ext3_iget().
> > Hmm, with which kernel have you tested that? Because my 2.6.32 kernel
> > works just fine (and looking into the code, all should be handled well).
> I tested at 2.6.35.
>
> > Look:
> > mount -o loop,user_xattr ~jack/fs-images/ext3-image /mnt/
> > quack:/crypted/home/jack # cd /mnt/
> > quack:/mnt # touch file
> > quack:/mnt # ls -i file
> > 11 file
> > quack:/mnt # setfattr -n user.foo0 -v bar0 file
> > quack:/mnt # getfattr -d file
> > # file: file
> > user.foo0="bar0"
> >
> > quack:/mnt # cd
> > quack:~ # umount /mnt
> > quack:~ # mount -o loop,user_xattr ~jack/fs-images/ext3-image /mnt/
> > quack:~ # getfattr -d /mnt/file
> > getfattr: Removing leading '/' from absolute path names
> > # file: mnt/file
> > user.foo0="bar0"
> What size is the inode ? This problem happens if the size is larger than 128 byte.
> (I tested at 256 byte inode.)
Ah, that was the reason. Thanks. But looking at the implications, I'm a bit
reluctant to do the change you propose. If someone has a filesystem created
by old mkfs, he could suddently see corrupted xattrs in his lost+found
directory with the new kernel. Not that there would be a big chance this
happens but people run various strange environments...
So I'd rather choose a safer approach for ext3 - see attached patch - it
fixes the problem for me. For ext4 your patch is definitely a way to go,
so please port it to ext4 and send it to Ted Tso. Thanks.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2010-08-25, at 17:04, Jan Kara wrote:
>> What size is the inode ? This problem happens if the size is larger than 128 byte. (I tested at 256 byte inode.)
>
> Ah, that was the reason. Thanks. But looking at the implications, I'm a bit
> reluctant to do the change you propose. If someone has a filesystem created
> by old mkfs, he could suddently see corrupted xattrs in his lost+found
> directory with the new kernel. Not that there would be a big chance this
> happens but people run various strange environments...
The fix to e2fsck for this issue has been around for a long time, AFAIK. It was only needed in the kernel while the broken mke2fs was in wide use, and before a fixed e2fsck was available.
Cheers, Andreas
On Thu, 26 Aug 2010 01:04:42 +0200
Jan Kara <[email protected]> wrote:
> So I'd rather choose a safer approach for ext3 - see attached patch - it
> fixes the problem for me. For ext4 your patch is definitely a way to go,
> so please port it to ext4 and send it to Ted Tso. Thanks.
This problem doesn't happen at ext4. Because, the i_extra_isize of 11th
inode is set to the suitable value by ext4_iget().
-----------------------------------------------------------
struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
{
(snip)
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
EXT4_INODE_SIZE(inode->i_sb)) {
ret = -EIO;
goto bad_inode;
}
if (ei->i_extra_isize == 0) {
/* The extra space is currently unused. Use it. */
ei->i_extra_isize = sizeof(struct ext4_inode) -
EXT4_GOOD_OLD_INODE_SIZE;
} else {
__le32 *magic = (void *)raw_inode +
EXT4_GOOD_OLD_INODE_SIZE +
ei->i_extra_isize;
if (*magic == cpu_to_le32(EXT4_XATTR_MAGIC))
ext4_set_inode_state(inode, EXT4_STATE_XATTR);
}
} else
ei->i_extra_isize = 0;
-----------------------------------------------------------
Thanks,
Masayoshi
On Wed, Aug 25, 2010 at 05:39:11PM -0600, Andreas Dilger wrote:
>
> The fix to e2fsck for this issue has been around for a long time,
> AFAIK. It was only needed in the kernel while the broken mke2fs was
> in wide use, and before a fixed e2fsck was available.
The mke2fs in question was fixed in e2fsprogs 1.37, over five years
ago. Mizuma-san, why are you using such an __ancient__ mke2fs? It
would seem that instead of fixing the kernel, the better thing to do
is to fix your version of e2fsprogs. There have been enough other bug
fixes to e2fsck that using such an ancient version of e2fsck, if you
are willing to update the kernel, seems to not make a whole lot of
sense.
I certainly don't see the point of fixing making any changes in ext4
to accomodate such a change, since e2fsprogs which is that old won't
support ext4 file system features anyway.
- Ted
On 2010-08-25, at 18:36, Ted Ts'o wrote:
> On Wed, Aug 25, 2010 at 05:39:11PM -0600, Andreas Dilger wrote:
>>
>> The fix to e2fsck for this issue has been around for a long time,
>> AFAIK. It was only needed in the kernel while the broken mke2fs was
>> in wide use, and before a fixed e2fsck was available.
>
> The mke2fs in question was fixed in e2fsprogs 1.37, over five years
> ago. Mizuma-san, why are you using such an __ancient__ mke2fs? It
> would seem that instead of fixing the kernel, the better thing to do
> is to fix your version of e2fsprogs.
I think you are missing the point. I don't think it matters which mke2fs is in use. The problem is that ext3_iget() has a workaround for this 5-year-old mke2fs bug that is actively causing a loss of xattr data if lost+found is reallocated.
> I certainly don't see the point of fixing making any changes in ext4
> to accomodate such a change, since e2fsprogs which is that old won't
> support ext4 file system features anyway.
There is no patch to ext4, which never had the workaround (this is a clear difference between ext3_iget() and ext4_iget()). That was because we knew ext4 would not be in use on these old filesystems and there was no point to include the workaround.
Cheers, Andreas
On Wed, Aug 25, 2010 at 06:55:49PM -0600, Andreas Dilger wrote:
>
> I think you are missing the point. I don't think it matters which
> mke2fs is in use. The problem is that ext3_iget() has a workaround
> for this 5-year-old mke2fs bug that is actively causing a loss of
> xattr data if lost+found is reallocated.
Ah, OK. But it's still the case that the right answer at this point
is to make ext3 code conform to ext4, and drop the workaround
altogether.
- Ted
On Wed, 25 Aug 2010 18:55:49 -0600
Andreas Dilger <[email protected]> wrote:
> On 2010-08-25, at 18:36, Ted Ts'o wrote:
> > On Wed, Aug 25, 2010 at 05:39:11PM -0600, Andreas Dilger wrote:
> >>
> >> The fix to e2fsck for this issue has been around for a long time,
> >> AFAIK. It was only needed in the kernel while the broken mke2fs was
> >> in wide use, and before a fixed e2fsck was available.
> >
> > The mke2fs in question was fixed in e2fsprogs 1.37, over five years
> > ago. Mizuma-san, why are you using such an __ancient__ mke2fs? It
> > would seem that instead of fixing the kernel, the better thing to do
> > is to fix your version of e2fsprogs.
>
> I think you are missing the point. I don't think it matters which mke2fs is in use. The problem is that ext3_iget() has a workaround for this 5-year-old mke2fs bug that is actively causing a loss of xattr data if lost+found is reallocated.
I used e2fsprogs 1.41. I think this problem isn't mke2fs's problem,
so I agree with Andreas.
Thanks,
Masayoshi
On 2010-08-25, at 19:21, Ted Ts'o wrote:
> On Wed, Aug 25, 2010 at 06:55:49PM -0600, Andreas Dilger wrote:
>>
>> I think you are missing the point. I don't think it matters which
>> mke2fs is in use. The problem is that ext3_iget() has a workaround
>> for this 5-year-old mke2fs bug that is actively causing a loss of
>> xattr data if lost+found is reallocated.
>
> Ah, OK. But it's still the case that the right answer at this point
> is to make ext3 code conform to ext4, and drop the workaround
> altogether.
That's exactly what the patch is doing. :-)
Cheers, Andreas
On Wed 25-08-10 17:39:11, Andreas Dilger wrote:
> On 2010-08-25, at 17:04, Jan Kara wrote:
> >> What size is the inode ? This problem happens if the size is larger than 128 byte. (I tested at 256 byte inode.)
> >
> > Ah, that was the reason. Thanks. But looking at the implications, I'm a bit
> > reluctant to do the change you propose. If someone has a filesystem created
> > by old mkfs, he could suddently see corrupted xattrs in his lost+found
> > directory with the new kernel. Not that there would be a big chance this
> > happens but people run various strange environments...
>
> The fix to e2fsck for this issue has been around for a long time, AFAIK.
> It was only needed in the kernel while the broken mke2fs was in wide use,
> and before a fixed e2fsck was available.
I agree but rather old e2fsprogs are still in use and if a filesystem
created by these e2fsprogs would be (possibly on a different machine)
accessed by the new kernel it would see corrupted xattrs. I've looked at our
supported products (the oldest is currently SLES9 SP3) and it has e2fsprogs
1.38. This should be new enough. But RHEL3 which is also still supported
for another three years has e2fsprogs 1.32 so these are buggy. So I'd
rather be on the safe side and fix the bug by consistently refusing to
store extented attributes in inode for inodes <= EXT3_FIRST_INO + 1 as I
don't think that really costs us much...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu 26-08-10 09:26:37, Masayoshi MIZUMA wrote:
> On Thu, 26 Aug 2010 01:04:42 +0200
> Jan Kara <[email protected]> wrote:
>
> > So I'd rather choose a safer approach for ext3 - see attached patch - it
> > fixes the problem for me. For ext4 your patch is definitely a way to go,
> > so please port it to ext4 and send it to Ted Tso. Thanks.
> This problem doesn't happen at ext4. Because, the i_extra_isize of 11th
> inode is set to the suitable value by ext4_iget().
Ah, you are right. Sorry for confusion.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2010-08-26, at 06:27, Jan Kara wrote:
> On Wed 25-08-10 17:39:11, Andreas Dilger wrote:
>> The fix to e2fsck for this issue has been around for a long time, AFAIK.
>> It was only needed in the kernel while the broken mke2fs was in wide use,
>> and before a fixed e2fsck was available.
>
> I agree but rather old e2fsprogs are still in use and if a filesystem
> created by these e2fsprogs would be (possibly on a different machine)
> accessed by the new kernel it would see corrupted xattrs.
The kernel should detect if there is the xattr magic before accessing this space. I think the only fallout of an uninitialized i_extra_isize is that it might waste some space in the inode, or more likely it will detect that i_extra_isize is invalid.
In that case, ext3 could be more friendly for (i_ino == EXT3_FIRST_INO(inode->i_sb)) it makes sense to just set i_extra_isize = 0 instead of returning -EIO and marking it a bad inode:
if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
if (EXT3_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
EXT3_INODE_SIZE(inode->i_sb)) {
/*
* Old mke2fs <= 1.37 didn't zero i_extra_isize for large
* reserved inodes. Instead of assuming corruption and
* returning an error, just reset i_extra_isize for them.
* Remove this in 2013 (RHEL3 EOL).
*/
if (inode->i_ino <= EXT3_FIRST_INO(inode->i_sb)) {
ei->i_extra_isize = 0;
} else {
brelse (bh);
ret = -EIO;
goto bad_inode;
}
}
> I've looked at our
> supported products (the oldest is currently SLES9 SP3) and it has e2fsprogs
> 1.38. This should be new enough. But RHEL3 which is also still supported
> for another three years has e2fsprogs 1.32 so these are buggy. So I'd
> rather be on the safe side and fix the bug by consistently refusing to
> store extented attributes in inode for inodes <= EXT3_FIRST_INO + 1 as I
> don't think that really costs us much...
The question is what problem are you trying to prevent? Do people run an ancient RHEL3 userspace with a spanking-new 2.6.37 kernel? Won't there be all sorts of other problems there, because RHEL3 was released with a 2.4.x kernel that would prevent this from happening? It may even be that Eric back-ported this fix to RHEL4 at the time...
Generally, either people leave their software alone, because they need stability, or the people who upgrade a lot will tend to also upgrade everything at the same time. The only realistic scenario is hardware failure that forces a new kernel install to support the new hardware, but applications that depend on the old distro.
The question is whether RHEL3 has a realistic chance to work with such a new kernel? Secondly, they would have had to format their filesystem with 256-byte inodes, which was almost unheard of at that time. Finally, they would have to delete lost+found and re-use that inode. I don't think the chances of this happening are very high.
In any case, I think the above work-around is sufficient, and it prevents a REAL problem with non-broken, non-ancient distros in case lost+found is deleted.
Cheers, Andreas
Hi Andreas,
first let me explain one thing: I definitely agree that the issue
spotted by Masayoshi MIZUMA is more serious than some possible problem
with ancient e2fsprogs. What I was discussing about is, whether we should
fix the issue with the original patch (removing the workaround from
ext3_iget) or with my patch (putting the same logic into ext3_new_inode so
that it does not create xattrs which ext3_iget will just ignore).
The disadvantage of my fix is that xattrs for inos <= EXT3_FIRST_INO will
be always stored out of inode, the disadvantage of the original patch is
the remote possibility of problems with ancient e2fsprogs.
So do you disagree with my way of fixing things or is that fine with you
and we just misunderstood?
BTW, I've attached my fix for reference again.
On Thu 26-08-10 17:59:31, Andreas Dilger wrote:
> On 2010-08-26, at 06:27, Jan Kara wrote:
> > On Wed 25-08-10 17:39:11, Andreas Dilger wrote:
> >> The fix to e2fsck for this issue has been around for a long time, AFAIK.
> >> It was only needed in the kernel while the broken mke2fs was in wide use,
> >> and before a fixed e2fsck was available.
> >
> > I agree but rather old e2fsprogs are still in use and if a filesystem
> > created by these e2fsprogs would be (possibly on a different machine)
> > accessed by the new kernel it would see corrupted xattrs.
>
> The kernel should detect if there is the xattr magic before accessing
> this space. I think the only fallout of an uninitialized i_extra_isize
> is that it might waste some space in the inode, or more likely it will
> detect that i_extra_isize is invalid.
>
> In that case, ext3 could be more friendly for (i_ino ==
> EXT3_FIRST_INO(inode->i_sb)) it makes sense to just set i_extra_isize = 0
> instead of returning -EIO and marking it a bad inode:
>
> if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
> ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
> if (EXT3_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
> EXT3_INODE_SIZE(inode->i_sb)) {
> /*
> * Old mke2fs <= 1.37 didn't zero i_extra_isize for large
> * reserved inodes. Instead of assuming corruption and
> * returning an error, just reset i_extra_isize for them.
> * Remove this in 2013 (RHEL3 EOL).
> */
> if (inode->i_ino <= EXT3_FIRST_INO(inode->i_sb)) {
> ei->i_extra_isize = 0;
> } else {
> brelse (bh);
> ret = -EIO;
> goto bad_inode;
> }
> }
Ah, right I didn't know about the XATTR magic. So with the above
workaround I'd feel safe also with the original solution.
> > I've looked at our
> > supported products (the oldest is currently SLES9 SP3) and it has e2fsprogs
> > 1.38. This should be new enough. But RHEL3 which is also still supported
> > for another three years has e2fsprogs 1.32 so these are buggy. So I'd
> > rather be on the safe side and fix the bug by consistently refusing to
> > store extented attributes in inode for inodes <= EXT3_FIRST_INO + 1 as I
> > don't think that really costs us much...
>
> The question is what problem are you trying to prevent? Do people run an
> ancient RHEL3 userspace with a spanking-new 2.6.37 kernel? Won't there
> be all sorts of other problems there, because RHEL3 was released with a
> 2.4.x kernel that would prevent this from happening? It may even be that
> Eric back-ported this fix to RHEL4 at the time...
>
> Generally, either people leave their software alone, because they need
> stability, or the people who upgrade a lot will tend to also upgrade
> everything at the same time. The only realistic scenario is hardware
> failure that forces a new kernel install to support the new hardware, but
> applications that depend on the old distro.
>
> The question is whether RHEL3 has a realistic chance to work with such a
> new kernel? Secondly, they would have had to format their filesystem
> with 256-byte inodes, which was almost unheard of at that time. Finally,
> they would have to delete lost+found and re-use that inode. I don't
> think the chances of this happening are very high.
I agree that a combination of new kernel + ancient e2fsprogs is highly
unlikely for a single machine. But a situation where you format your
portable USB drive on an ancient server and then attach it to your laptop
isn't so remote, although the chance that you specify inode size 256 puts
it in the realm of almost-fiction as well.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2010-08-27, at 07:58, Jan Kara wrote:
> first let me explain one thing: I definitely agree that the issue
> spotted by Masayoshi MIZUMA is more serious than some possible problem
> with ancient e2fsprogs. What I was discussing about is, whether we should
> fix the issue with the original patch (removing the workaround from
> ext3_iget) or with my patch (putting the same logic into ext3_new_inode so
> that it does not create xattrs which ext3_iget will just ignore).
I agree that this is a safe fix, but it will propagate the workaround far into the future instead of actually fixing it.
> The disadvantage of my fix is that xattrs for inos <= EXT3_FIRST_INO will
> be always stored out of inode, the disadvantage of the original patch is
> the remote possibility of problems with ancient e2fsprogs.
I don't think there are realistic chances of problems with older filesystems running newer kernels. I think the workaround that was suggested below is also totally safe - instead of silently erasing the xattr (as kernels do today), or returning an error with a bad i_extra_isize (as would happen with the originally proposed patch) it will "know" that this bad value on low-numbered inodes was caused by mke2fs and just reset it instead of returning the error.
There cannot possibly be xattrs stored in-inode for ext3 due to the existing bug, and in the case of inode #11 (normally lost+found) being reallocated, the ext3_new_inode() code will correctly initialize i_extra_isize the way it does for any other new inode.
That leaves the only potential inode with a bad (but not completely invalid) i_extra_isize as root, and in (65535-127)/65535 of the cases, any bogus value would be zeroed. The remaining 127/65536 chance would leave less space in the inode for xattrs, but have no other ill effect (the i_extra_isize space is not used for anything in ext3).
> So do you disagree with my way of fixing things or is that fine with you
> and we just misunderstood?
> BTW, I've attached my fix for reference again.
>
> On Thu 26-08-10 17:59:31, Andreas Dilger wrote:
>> On 2010-08-26, at 06:27, Jan Kara wrote:
>>> On Wed 25-08-10 17:39:11, Andreas Dilger wrote:
>>>> The fix to e2fsck for this issue has been around for a long time, AFAIK.
>>>> It was only needed in the kernel while the broken mke2fs was in wide use,
>>>> and before a fixed e2fsck was available.
>>>
>>> I agree but rather old e2fsprogs are still in use and if a filesystem
>>> created by these e2fsprogs would be (possibly on a different machine)
>>> accessed by the new kernel it would see corrupted xattrs.
>>
>> The kernel should detect if there is the xattr magic before accessing
>> this space. I think the only fallout of an uninitialized i_extra_isize
>> is that it might waste some space in the inode, or more likely it will
>> detect that i_extra_isize is invalid.
>>
>> In that case, ext3 could be more friendly for (i_ino ==
>> EXT3_FIRST_INO(inode->i_sb)) it makes sense to just set i_extra_isize = 0
>> instead of returning -EIO and marking it a bad inode:
>>
>> if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
>> ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
>> if (EXT3_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
>> EXT3_INODE_SIZE(inode->i_sb)) {
>> /*
>> * Old mke2fs <= 1.37 didn't zero i_extra_isize for large
>> * reserved inodes. Instead of assuming corruption and
>> * returning an error, just reset i_extra_isize for them.
>> * Remove this in 2013 (RHEL3 EOL).
>> */
>> if (inode->i_ino <= EXT3_FIRST_INO(inode->i_sb)) {
>> ei->i_extra_isize = 0;
>> } else {
>> brelse (bh);
>> ret = -EIO;
>> goto bad_inode;
>> }
>> }
> Ah, right I didn't know about the XATTR magic. So with the above
> workaround I'd feel safe also with the original solution.
>
>>> I've looked at our
>>> supported products (the oldest is currently SLES9 SP3) and it has e2fsprogs
>>> 1.38. This should be new enough. But RHEL3 which is also still supported
>>> for another three years has e2fsprogs 1.32 so these are buggy. So I'd
>>> rather be on the safe side and fix the bug by consistently refusing to
>>> store extented attributes in inode for inodes <= EXT3_FIRST_INO + 1 as I
>>> don't think that really costs us much...
>>
>> The question is what problem are you trying to prevent? Do people run an
>> ancient RHEL3 userspace with a spanking-new 2.6.37 kernel? Won't there
>> be all sorts of other problems there, because RHEL3 was released with a
>> 2.4.x kernel that would prevent this from happening? It may even be that
>> Eric back-ported this fix to RHEL4 at the time...
>>
>> Generally, either people leave their software alone, because they need
>> stability, or the people who upgrade a lot will tend to also upgrade
>> everything at the same time. The only realistic scenario is hardware
>> failure that forces a new kernel install to support the new hardware, but
>> applications that depend on the old distro.
>>
>> The question is whether RHEL3 has a realistic chance to work with such a
>> new kernel? Secondly, they would have had to format their filesystem
>> with 256-byte inodes, which was almost unheard of at that time. Finally,
>> they would have to delete lost+found and re-use that inode. I don't
>> think the chances of this happening are very high.
> I agree that a combination of new kernel + ancient e2fsprogs is highly
> unlikely for a single machine. But a situation where you format your
> portable USB drive on an ancient server and then attach it to your laptop
> isn't so remote, although the chance that you specify inode size 256 puts
> it in the realm of almost-fiction as well.
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> <0001-ext3-Fix-lost-extented-attributes-for-inode-with-ino.patch>
On Fri 27-08-10 12:57:47, Andreas Dilger wrote:
> On 2010-08-27, at 07:58, Jan Kara wrote:
> > first let me explain one thing: I definitely agree that the issue
> > spotted by Masayoshi MIZUMA is more serious than some possible problem
> > with ancient e2fsprogs. What I was discussing about is, whether we should
> > fix the issue with the original patch (removing the workaround from
> > ext3_iget) or with my patch (putting the same logic into ext3_new_inode so
> > that it does not create xattrs which ext3_iget will just ignore).
>
> I agree that this is a safe fix, but it will propagate the workaround far
> into the future instead of actually fixing it.
>
> > The disadvantage of my fix is that xattrs for inos <= EXT3_FIRST_INO will
> > be always stored out of inode, the disadvantage of the original patch is
> > the remote possibility of problems with ancient e2fsprogs.
>
> I don't think there are realistic chances of problems with older
> filesystems running newer kernels. I think the workaround that was
> suggested below is also totally safe - instead of silently erasing the
> xattr (as kernels do today), or returning an error with a bad
> i_extra_isize (as would happen with the originally proposed patch) it
> will "know" that this bad value on low-numbered inodes was caused by
> mke2fs and just reset it instead of returning the error.
Sigh, sorry to bring this up again... I wanted to use the original patch
with the workaround you suggested but realized there's still a hole. The
new patched kernel will happily use extra inode space for xattrs for inode
11 but if you mount the filesystem with older kernel it will reset
i_extra_size to zero and thus you'll lose the stored xattrs.
So to get out of this compatibility trap we'd have to accept xattrs in
the extended inode space but never store it there and after a sufficiently
long time, we could remove the workaround. But this seems not worth the
effort to me given we speak about a single inode and rather trivial
workaround in ext3_iget and ext3_new_inode.
Any ideas Andreas?
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2010-09-09, at 10:38, Jan Kara wrote:
> Sigh, sorry to bring this up again... I wanted to use the original patch
> with the workaround you suggested but realized there's still a hole. The
> new patched kernel will happily use extra inode space for xattrs for inode
> 11 but if you mount the filesystem with older kernel it will reset
> i_extra_size to zero and thus you'll lose the stored xattrs.
> So to get out of this compatibility trap we'd have to accept xattrs in
> the extended inode space but never store it there and after a sufficiently
> long time, we could remove the workaround. But this seems not worth the
> effort to me given we speak about a single inode and rather trivial
> workaround in ext3_iget and ext3_new_inode.
The old saying goes "Be strict in what you send, but generous in what you receive", so I think it makes sense that we allow reading xattrs from inode #11 (and also the root inode #2 is affected), and as you suggest we should only store these xattrs in external blocks.
The unfortunate thing is that xatts on the root inode will probably be the most heavily used, but it will also most likely be in cache so it probably won't be a serious performance issue (it can't be any worse than it is today).
While I think it would be slightly less complex to deny in-inode xattrs entirely, that doesn't give us any path toward actually removing this problem.
Of course, I think in the next 6 months or so (once ext4 is the default on RHEL6, widely in use at Google, and available on SLES11 SP1) we should probably consider ext4 stable enough that we should consider removing ext3 entirely from the tree and just have the "ext3" filesystem type be handled by ext4 as well.
Cheers, Andreas.