From: Tao Ma <[email protected]>
Now when we set the group inode free count, we don't have a proper
group lock so that multiple threads may decrease the inode free
count at the same time. And e2fsck will complain something like:
Free inodes count wrong for group #1 (1, counted=0).
Fix? no
Free inodes count wrong for group #2 (3, counted=0).
Fix? no
Directories count wrong for group #2 (780, counted=779).
Fix? no
Free inodes count wrong for group #3 (2272, counted=2273).
Fix? no
So this patch try to protect it with the ext4_lock_group.
btw, it is found by xfstests test case 269 and I have run it 100
times and the error in e2fsck doesn't show up again.
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/ialloc.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 409c2ee..25595e2 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -772,7 +772,9 @@ got:
ext4_itable_unused_set(sb, gdp,
(EXT4_INODES_PER_GROUP(sb) - ino));
up_read(&grp->alloc_sem);
- }
+ } else
+ ext4_lock_group(sb, group);
+
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
if (S_ISDIR(mode)) {
ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1);
@@ -782,10 +784,9 @@ got:
atomic_inc(&sbi->s_flex_groups[f].used_dirs);
}
}
- if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
- ext4_unlock_group(sb, group);
- }
+ ext4_unlock_group(sb, group);
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
--
1.7.1
On 5/16/12 3:49 AM, Tao Ma wrote:
> From: Tao Ma <[email protected]>
>
> Now when we set the group inode free count, we don't have a proper
> group lock so that multiple threads may decrease the inode free
> count at the same time. And e2fsck will complain something like:
This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess?
That would be worth mentioning in the summary & changelog.
I guess you were testing without that for some reason?
-eric
> Free inodes count wrong for group #1 (1, counted=0).
> Fix? no
>
> Free inodes count wrong for group #2 (3, counted=0).
> Fix? no
>
> Directories count wrong for group #2 (780, counted=779).
> Fix? no
>
> Free inodes count wrong for group #3 (2272, counted=2273).
> Fix? no
>
> So this patch try to protect it with the ext4_lock_group.
>
> btw, it is found by xfstests test case 269 and I have run it 100
> times and the error in e2fsck doesn't show up again.
>
> Cc: Theodore Ts'o <[email protected]>
> Signed-off-by: Tao Ma <[email protected]>
> ---
> fs/ext4/ialloc.c | 9 +++++----
> 1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 409c2ee..25595e2 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -772,7 +772,9 @@ got:
> ext4_itable_unused_set(sb, gdp,
> (EXT4_INODES_PER_GROUP(sb) - ino));
> up_read(&grp->alloc_sem);
> - }
> + } else
> + ext4_lock_group(sb, group);
> +
> ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
> if (S_ISDIR(mode)) {
> ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1);
> @@ -782,10 +784,9 @@ got:
> atomic_inc(&sbi->s_flex_groups[f].used_dirs);
> }
> }
> - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
> - ext4_unlock_group(sb, group);
> - }
> + ext4_unlock_group(sb, group);
>
> BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
On 05/16/2012 09:43 PM, Eric Sandeen wrote:
> On 5/16/12 3:49 AM, Tao Ma wrote:
>> From: Tao Ma <[email protected]>
>>
>> Now when we set the group inode free count, we don't have a proper
>> group lock so that multiple threads may decrease the inode free
>> count at the same time. And e2fsck will complain something like:
>
> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess?
> That would be worth mentioning in the summary & changelog.
sure, I will add it in v2.
>
> I guess you were testing without that for some reason?
See my comments below. I found it when running xfstests 269.
Thanks
Tao
>
> -eric
>
>> Free inodes count wrong for group #1 (1, counted=0).
>> Fix? no
>>
>> Free inodes count wrong for group #2 (3, counted=0).
>> Fix? no
>>
>> Directories count wrong for group #2 (780, counted=779).
>> Fix? no
>>
>> Free inodes count wrong for group #3 (2272, counted=2273).
>> Fix? no
>>
>> So this patch try to protect it with the ext4_lock_group.
>>
>> btw, it is found by xfstests test case 269 and I have run it 100
>> times and the error in e2fsck doesn't show up again.
>>
>> Cc: Theodore Ts'o <[email protected]>
>> Signed-off-by: Tao Ma <[email protected]>
>> ---
>> fs/ext4/ialloc.c | 9 +++++----
>> 1 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index 409c2ee..25595e2 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -772,7 +772,9 @@ got:
>> ext4_itable_unused_set(sb, gdp,
>> (EXT4_INODES_PER_GROUP(sb) - ino));
>> up_read(&grp->alloc_sem);
>> - }
>> + } else
>> + ext4_lock_group(sb, group);
>> +
>> ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
>> if (S_ISDIR(mode)) {
>> ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1);
>> @@ -782,10 +784,9 @@ got:
>> atomic_inc(&sbi->s_flex_groups[f].used_dirs);
>> }
>> }
>> - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
>> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
>> gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
>> - ext4_unlock_group(sb, group);
>> - }
>> + ext4_unlock_group(sb, group);
>>
>> BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
>> err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-05-16, at 2:49 AM, Tao Ma wrote:
> From: Tao Ma <[email protected]>
>
> Now when we set the group inode free count, we don't have a proper
> group lock so that multiple threads may decrease the inode free
> count at the same time. And e2fsck will complain something like:
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 409c2ee..25595e2 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -772,7 +772,9 @@ got:
> ext4_itable_unused_set(sb, gdp,
> (EXT4_INODES_PER_GROUP(sb) - ino));
> up_read(&grp->alloc_sem);
> - }
> + } else
> + ext4_lock_group(sb, group);
Minor coding style fix - when one half of if/else is using braces
then the other half should also use braces, like:
if {
:
} else {
:
}
> +
> ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
> if (S_ISDIR(mode)) {
> ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1);
> @@ -782,10 +784,9 @@ got:
> atomic_inc(&sbi->s_flex_groups[f].used_dirs);
> }
> }
> - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
> - ext4_unlock_group(sb, group);
> - }
> + ext4_unlock_group(sb, group);
>
> BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
On 05/16/2012 10:58 PM, Andreas Dilger wrote:
> On 2012-05-16, at 2:49 AM, Tao Ma wrote:
>> From: Tao Ma <[email protected]>
>>
>> Now when we set the group inode free count, we don't have a proper
>> group lock so that multiple threads may decrease the inode free
>> count at the same time. And e2fsck will complain something like:
>>
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index 409c2ee..25595e2 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -772,7 +772,9 @@ got:
>> ext4_itable_unused_set(sb, gdp,
>> (EXT4_INODES_PER_GROUP(sb) - ino));
>> up_read(&grp->alloc_sem);
>> - }
>> + } else
>> + ext4_lock_group(sb, group);
>
> Minor coding style fix - when one half of if/else is using braces
> then the other half should also use braces, like:
>
> if {
> :
> } else {
> :
> }
thank you Andreas, and I will change it in the next version.
Thanks
Tao
>
>> +
>> ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
>> if (S_ISDIR(mode)) {
>> ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1);
>> @@ -782,10 +784,9 @@ got:
>> atomic_inc(&sbi->s_flex_groups[f].used_dirs);
>> }
>> }
>> - if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
>> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
>> gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
>> - ext4_unlock_group(sb, group);
>> - }
>> + ext4_unlock_group(sb, group);
>>
>> BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
>> err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/16/12 9:55 AM, Tao Ma wrote:
> On 05/16/2012 09:43 PM, Eric Sandeen wrote:
>> On 5/16/12 3:49 AM, Tao Ma wrote:
>>> From: Tao Ma <[email protected]>
>>>
>>> Now when we set the group inode free count, we don't have a proper
>>> group lock so that multiple threads may decrease the inode free
>>> count at the same time. And e2fsck will complain something like:
>>
>> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess?
>> That would be worth mentioning in the summary & changelog.
> sure, I will add it in v2.
>>
>> I guess you were testing without that for some reason?
> See my comments below. I found it when running xfstests 269.
Still not sure how you got a filesystem w/o that feature though, unless
I am forgetting something obvious. Isn't it on by default?
-Eric
On 05/16/2012 11:49 PM, Eric Sandeen wrote:
> On 5/16/12 9:55 AM, Tao Ma wrote:
>> On 05/16/2012 09:43 PM, Eric Sandeen wrote:
>>> On 5/16/12 3:49 AM, Tao Ma wrote:
>>>> From: Tao Ma <[email protected]>
>>>>
>>>> Now when we set the group inode free count, we don't have a proper
>>>> group lock so that multiple threads may decrease the inode free
>>>> count at the same time. And e2fsck will complain something like:
>>>
>>> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess?
>>> That would be worth mentioning in the summary & changelog.
>> sure, I will add it in v2.
>>>
>>> I guess you were testing without that for some reason?
>> See my comments below. I found it when running xfstests 269.
>
> Still not sure how you got a filesystem w/o that feature though, unless
> I am forgetting something obvious. Isn't it on by default?
oh, I see. Yes, we mkfs the system with the following configurations:
mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr
Maybe that's the reason why it has never be met by others before. ;)
Thanks
Tao
On 5/16/12 9:17 PM, Tao Ma wrote:
> On 05/16/2012 11:49 PM, Eric Sandeen wrote:
>> On 5/16/12 9:55 AM, Tao Ma wrote:
>>> On 05/16/2012 09:43 PM, Eric Sandeen wrote:
>>>> On 5/16/12 3:49 AM, Tao Ma wrote:
>>>>> From: Tao Ma <[email protected]>
>>>>>
>>>>> Now when we set the group inode free count, we don't have a proper
>>>>> group lock so that multiple threads may decrease the inode free
>>>>> count at the same time. And e2fsck will complain something like:
>>>>
>>>> This is only in the ! EXT4_FEATURE_RO_COMPAT_GDT_CSUM case I guess?
>>>> That would be worth mentioning in the summary & changelog.
>>> sure, I will add it in v2.
>>>>
>>>> I guess you were testing without that for some reason?
>>> See my comments below. I found it when running xfstests 269.
>>
>> Still not sure how you got a filesystem w/o that feature though, unless
>> I am forgetting something obvious. Isn't it on by default?
> oh, I see. Yes, we mkfs the system with the following configurations:
> mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr
> Maybe that's the reason why it has never be met by others before. ;)
Ok, good. I figured it was in some barely-reachable corner
of the infinite test matrix ;)
-Eric
> Thanks
> Tao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 17, 2012 at 10:17:34AM +0800, Tao Ma wrote:
> oh, I see. Yes, we mkfs the system with the following configurations:
> mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr
> Maybe that's the reason why it has never be met by others before. ;)
I'm curious -- is there a specific reason you're disabling the group
descriptor checksum? Or was that just something that was picked at
one point and you haven't changed it since?
Thanks,
- Ted
On 05/29/2012 06:22 AM, Ted Ts'o wrote:
> On Thu, May 17, 2012 at 10:17:34AM +0800, Tao Ma wrote:
>> oh, I see. Yes, we mkfs the system with the following configurations:
>> mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr
>> Maybe that's the reason why it has never be met by others before. ;)
>
> I'm curious -- is there a specific reason you're disabling the group
> descriptor checksum? Or was that just something that was picked at
> one point and you haven't changed it since?
We are just disabling the uninit_bg so as to let the block group
initialization happen in the mkfs time. I don't know why the checksum is
also disabled by ^uninit_bg.
Thanks
Tao
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-05-28, at 8:18 PM, Tao Ma wrote:
> On 05/29/2012 06:22 AM, Ted Ts'o wrote:
>> On Thu, May 17, 2012 at 10:17:34AM +0800, Tao Ma wrote:
>>> oh, I see. Yes, we mkfs the system with the following configurations:
>>> mke2fs -O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr
>>> Maybe that's the reason why it has never be met by others before. ;)
>>
>> I'm curious -- is there a specific reason you're disabling the group
>> descriptor checksum? Or was that just something that was picked at
>> one point and you haven't changed it since?
>
> We are just disabling the uninit_bg so as to let the block group
> initialization happen in the mkfs time. I don't know why the checksum is
> also disabled by ^uninit_bg.
The checksum is controlled by uninit_bg, because there was a need to
ensure the bg_itable_unused count and the UNINIT flags could be
trusted when doing an e2fsck.
If you don't want to do lazy inode table initialization, that is
disabled by "mke2fs -E lazy_itable_init=0".
Cheers, Andreas
On Mon, May 28, 2012 at 11:57:59PM -0600, Andreas Dilger wrote:
> > We are just disabling the uninit_bg so as to let the block group
> > initialization happen in the mkfs time. I don't know why the checksum is
> > also disabled by ^uninit_bg.
>
> The checksum is controlled by uninit_bg, because there was a need to
> ensure the bg_itable_unused count and the UNINIT flags could be
> trusted when doing an e2fsck.
>
> If you don't want to do lazy inode table initialization, that is
> disabled by "mke2fs -E lazy_itable_init=0".
You can also disable whether or not lazy inode table initialization
happens by default via /etc/mke2fs.conf. At work we have a very
fairly havily modified /etc/mke2fs.conf which has our
production-specific mke2fs parameters defined, so that someone who
runs mke2fs by hand will get the same result at as the automated
systems.
I can easily see how if you are trying for predictable latency
numbers, disabling lazy inode table initialization makes a huge amount
of sense. I'd recommend doing it via /etc/mke2fs.conf, though.
Regards,
- Ted
On 05/29/2012 08:39 PM, Ted Ts'o wrote:
> On Mon, May 28, 2012 at 11:57:59PM -0600, Andreas Dilger wrote:
>>> We are just disabling the uninit_bg so as to let the block group
>>> initialization happen in the mkfs time. I don't know why the checksum is
>>> also disabled by ^uninit_bg.
>>
>> The checksum is controlled by uninit_bg, because there was a need to
>> ensure the bg_itable_unused count and the UNINIT flags could be
>> trusted when doing an e2fsck.
>>
>> If you don't want to do lazy inode table initialization, that is
>> disabled by "mke2fs -E lazy_itable_init=0".
>
> You can also disable whether or not lazy inode table initialization
> happens by default via /etc/mke2fs.conf. At work we have a very
> fairly havily modified /etc/mke2fs.conf which has our
> production-specific mke2fs parameters defined, so that someone who
> runs mke2fs by hand will get the same result at as the automated
> systems.
>
> I can easily see how if you are trying for predictable latency
> numbers, disabling lazy inode table initialization makes a huge amount
> of sense. I'd recommend doing it via /etc/mke2fs.conf, though.
OK, thank you all for the good suggestion.
Thanks
Tao