2011-11-25 01:08:23

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext4: fix race condition when loading block or inode bitmaps

We use an separate flag in buffer head to determine whether the bitmap
has been valid. This is distinct from it being uptodate, due to the
uninit_bg feature. More details about the rationale for this flag can
be found in commit 2ccb5fb9f1. We set this bitmap_uptodate bit before
issuing the read request, so if another CPU attempts to load the same
block or inode bitmap, since ext4_read_{block,inode}_bitmap() checks
the bitmap_uptodate flag without locking the buffer head, hilarity
ensues.

This result of this bug is that occasionally a block or inode gets
allocated twice, which gets noticed when the second user of the block
gets deleted, or when an directory suddenly becomes a regular file or
a symlink. I'm *really* surprised this doesn't happen more often; but
in actual practice the fact that we tend to search for a zero bit in
the bitmap without taking a lock, and then taking the block group lock
and double checking to see if we actually got the allocation tends to
protect us.

This bug was introduced in commit 2ccb5fb9f1, which dates back to
January 2009 and 2.6.29. So this bug has been around for a *long*
time. (We've seen it for over a year, but rarely enough that it we
could never find a repro case so we could study it in controlled
circumstances.)

Google-Bug-Id: 2828254
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
---
fs/ext4/balloc.c | 12 ++++++------
fs/ext4/ialloc.c | 12 ++++++------
2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 12ccacd..4501aab 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -372,7 +372,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
ext4_unlock_group(sb, block_group);
if (buffer_uptodate(bh)) {
/*
- * if not uninit if bh is uptodate,
+ * if not uninit && bh is uptodate,
* bitmap is also uptodate
*/
set_bitmap_uptodate(bh);
@@ -380,13 +380,12 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
return bh;
}
/*
- * submit the buffer_head for read. We can
- * safely mark the bitmap as uptodate now.
- * We do it here so the bitmap uptodate bit
- * get set with buffer lock held.
+ * submit the buffer_head for read. It's important that we
+ * *not* mark the bitmap up to date until the read is
+ * completed, since we check bitmap_update() above without
+ * locking the buffer for speed reasons.
*/
trace_ext4_read_block_bitmap_load(sb, block_group);
- set_bitmap_uptodate(bh);
if (bh_submit_read(bh) < 0) {
put_bh(bh);
ext4_error(sb, "Cannot read block bitmap - "
@@ -394,6 +393,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
block_group, bitmap_blk);
return NULL;
}
+ set_bitmap_uptodate(bh);
ext4_valid_block_bitmap(sb, desc, block_group, bh);
/*
* file system mounted not to panic on error,
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 00beb4f..6fbae6d 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -139,7 +139,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)

if (buffer_uptodate(bh)) {
/*
- * if not uninit if bh is uptodate,
+ * if not uninit && bh is uptodate,
* bitmap is also uptodate
*/
set_bitmap_uptodate(bh);
@@ -147,13 +147,12 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
return bh;
}
/*
- * submit the buffer_head for read. We can
- * safely mark the bitmap as uptodate now.
- * We do it here so the bitmap uptodate bit
- * get set with buffer lock held.
+ * submit the buffer_head for read. It's important that we
+ * *not* mark the bitmap up to date until the read is
+ * completed, since we check bitmap_update() above without
+ * locking the buffer for speed reasons.
*/
trace_ext4_load_inode_bitmap(sb, block_group);
- set_bitmap_uptodate(bh);
if (bh_submit_read(bh) < 0) {
put_bh(bh);
ext4_error(sb, "Cannot read inode bitmap - "
@@ -161,6 +160,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
block_group, bitmap_blk);
return NULL;
}
+ set_bitmap_uptodate(bh);
return bh;
}

--
1.7.4.1.22.gec8e1.dirty



2011-11-25 02:34:59

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix race condition when loading block or inode bitmaps

Hi Ted,
On 11/25/2011 09:08 AM, Theodore Ts'o wrote:
> We use an separate flag in buffer head to determine whether the bitmap
> has been valid. This is distinct from it being uptodate, due to the
> uninit_bg feature. More details about the rationale for this flag can
> be found in commit 2ccb5fb9f1. We set this bitmap_uptodate bit before
> issuing the read request, so if another CPU attempts to load the same
> block or inode bitmap, since ext4_read_{block,inode}_bitmap() checks
> the bitmap_uptodate flag without locking the buffer head, hilarity
> ensues.
>
> This result of this bug is that occasionally a block or inode gets
> allocated twice, which gets noticed when the second user of the block
> gets deleted, or when an directory suddenly becomes a regular file or
> a symlink. I'm *really* surprised this doesn't happen more often; but
> in actual practice the fact that we tend to search for a zero bit in
> the bitmap without taking a lock, and then taking the block group lock
> and double checking to see if we actually got the allocation tends to
> protect us.
Sorry, but I don't get your meaning here.
In bitmap_uptodate, we check both the flag of BH_uptodate and
BH_BITMAP_UPTODATE. And in your patch below, we just move the set of
bitmap_uptodate after bh_uptodate. So I don't think the above scenario
would ever happen. Could you please explain it in more detail?

Thanks
Tao
>
> This bug was introduced in commit 2ccb5fb9f1, which dates back to
> January 2009 and 2.6.29. So this bug has been around for a *long*
> time. (We've seen it for over a year, but rarely enough that it we
> could never find a repro case so we could study it in controlled
> circumstances.)
>
> Google-Bug-Id: 2828254
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: [email protected]
> ---
> fs/ext4/balloc.c | 12 ++++++------
> fs/ext4/ialloc.c | 12 ++++++------
> 2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 12ccacd..4501aab 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -372,7 +372,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
> ext4_unlock_group(sb, block_group);
> if (buffer_uptodate(bh)) {
> /*
> - * if not uninit if bh is uptodate,
> + * if not uninit && bh is uptodate,
> * bitmap is also uptodate
> */
> set_bitmap_uptodate(bh);
> @@ -380,13 +380,12 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
> return bh;
> }
> /*
> - * submit the buffer_head for read. We can
> - * safely mark the bitmap as uptodate now.
> - * We do it here so the bitmap uptodate bit
> - * get set with buffer lock held.
> + * submit the buffer_head for read. It's important that we
> + * *not* mark the bitmap up to date until the read is
> + * completed, since we check bitmap_update() above without
> + * locking the buffer for speed reasons.
> */
> trace_ext4_read_block_bitmap_load(sb, block_group);
> - set_bitmap_uptodate(bh);
> if (bh_submit_read(bh) < 0) {
> put_bh(bh);
> ext4_error(sb, "Cannot read block bitmap - "
> @@ -394,6 +393,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
> block_group, bitmap_blk);
> return NULL;
> }
> + set_bitmap_uptodate(bh);
> ext4_valid_block_bitmap(sb, desc, block_group, bh);
> /*
> * file system mounted not to panic on error,
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 00beb4f..6fbae6d 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -139,7 +139,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
>
> if (buffer_uptodate(bh)) {
> /*
> - * if not uninit if bh is uptodate,
> + * if not uninit && bh is uptodate,
> * bitmap is also uptodate
> */
> set_bitmap_uptodate(bh);
> @@ -147,13 +147,12 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
> return bh;
> }
> /*
> - * submit the buffer_head for read. We can
> - * safely mark the bitmap as uptodate now.
> - * We do it here so the bitmap uptodate bit
> - * get set with buffer lock held.
> + * submit the buffer_head for read. It's important that we
> + * *not* mark the bitmap up to date until the read is
> + * completed, since we check bitmap_update() above without
> + * locking the buffer for speed reasons.
> */
> trace_ext4_load_inode_bitmap(sb, block_group);
> - set_bitmap_uptodate(bh);
> if (bh_submit_read(bh) < 0) {
> put_bh(bh);
> ext4_error(sb, "Cannot read inode bitmap - "
> @@ -161,6 +160,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
> block_group, bitmap_blk);
> return NULL;
> }
> + set_bitmap_uptodate(bh);
> return bh;
> }
>


2011-11-25 03:41:15

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix race condition when loading block or inode bitmaps

On Fri, Nov 25, 2011 at 9:08 AM, Theodore Ts'o <[email protected]> wrote:
> We use an separate flag in buffer head to determine whether the bitmap
> has been valid. ?This is distinct from it being uptodate, due to the
> uninit_bg feature. ?More details about the rationale for this flag can
> be found in commit 2ccb5fb9f1. ?We set this bitmap_uptodate bit before
> issuing the read request, so if another CPU attempts to load the same
> block or inode bitmap, since ext4_read_{block,inode}_bitmap() checks
> the bitmap_uptodate flag without locking the buffer head, hilarity
> ensues.
>
> This result of this bug is that occasionally a block or inode gets
> allocated twice, which gets noticed when the second user of the block
> gets deleted, or when an directory suddenly becomes a regular file or
> a symlink. ?I'm *really* surprised this doesn't happen more often; but
> in actual practice the fact that we tend to search for a zero bit in
> the bitmap without taking a lock, and then taking the block group lock
> and double checking to see if we actually got the allocation tends to
> protect us.
It is true for inode bitmap, but block bitmap is another story,
blocks are allocated from buddy allocator and mb_load_buddy does not
call block_read_bitmap at all. block_read_bitmap is called in
mark_space_used, free_blocks and free_inode_pa. So I am guessing if
mb_load_buddy calls block_read_bitmap, the bug will reproduced easily.


BTW: It seems that we should factor code reading bitmaps. Now there
is one function for each every bitmap. and we can pass
read_block/inode_bitmap a flag indicates sync or async read and a flag
indicates which kind of bitmap. Thus mb_load_buddy can use
read_bimap as well and the code will be much more maintainable.

ext4-snapshot has exclude bitmap, so if we read all bitmaps via only
one function, it would be much better.

Any opinion?

Yongqiang.


>
> This bug was introduced in commit 2ccb5fb9f1, which dates back to
> January 2009 and 2.6.29. ?So this bug has been around for a *long*
> time. ?(We've seen it for over a year, but rarely enough that it we
> could never find a repro case so we could study it in controlled
> circumstances.)
>
> Google-Bug-Id: 2828254
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: [email protected]
> ---
> ?fs/ext4/balloc.c | ? 12 ++++++------
> ?fs/ext4/ialloc.c | ? 12 ++++++------
> ?2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 12ccacd..4501aab 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -372,7 +372,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
> ? ? ? ?ext4_unlock_group(sb, block_group);
> ? ? ? ?if (buffer_uptodate(bh)) {
> ? ? ? ? ? ? ? ?/*
> - ? ? ? ? ? ? ? ?* if not uninit if bh is uptodate,
> + ? ? ? ? ? ? ? ?* if not uninit && bh is uptodate,
> ? ? ? ? ? ? ? ? * bitmap is also uptodate
> ? ? ? ? ? ? ? ? */
> ? ? ? ? ? ? ? ?set_bitmap_uptodate(bh);
> @@ -380,13 +380,12 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
> ? ? ? ? ? ? ? ?return bh;
> ? ? ? ?}
> ? ? ? ?/*
> - ? ? ? ?* submit the buffer_head for read. We can
> - ? ? ? ?* safely mark the bitmap as uptodate now.
> - ? ? ? ?* We do it here so the bitmap uptodate bit
> - ? ? ? ?* get set with buffer lock held.
> + ? ? ? ?* submit the buffer_head for read. ?It's important that we
> + ? ? ? ?* *not* mark the bitmap up to date until the read is
> + ? ? ? ?* completed, since we check bitmap_update() above without
> + ? ? ? ?* locking the buffer for speed reasons.
> ? ? ? ? */
> ? ? ? ?trace_ext4_read_block_bitmap_load(sb, block_group);
> - ? ? ? set_bitmap_uptodate(bh);
> ? ? ? ?if (bh_submit_read(bh) < 0) {
> ? ? ? ? ? ? ? ?put_bh(bh);
> ? ? ? ? ? ? ? ?ext4_error(sb, "Cannot read block bitmap - "
> @@ -394,6 +393,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?block_group, bitmap_blk);
> ? ? ? ? ? ? ? ?return NULL;
> ? ? ? ?}
> + ? ? ? set_bitmap_uptodate(bh);
> ? ? ? ?ext4_valid_block_bitmap(sb, desc, block_group, bh);
> ? ? ? ?/*
> ? ? ? ? * file system mounted not to panic on error,
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 00beb4f..6fbae6d 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -139,7 +139,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
>
> ? ? ? ?if (buffer_uptodate(bh)) {
> ? ? ? ? ? ? ? ?/*
> - ? ? ? ? ? ? ? ?* if not uninit if bh is uptodate,
> + ? ? ? ? ? ? ? ?* if not uninit && bh is uptodate,
> ? ? ? ? ? ? ? ? * bitmap is also uptodate
> ? ? ? ? ? ? ? ? */
> ? ? ? ? ? ? ? ?set_bitmap_uptodate(bh);
> @@ -147,13 +147,12 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
> ? ? ? ? ? ? ? ?return bh;
> ? ? ? ?}
> ? ? ? ?/*
> - ? ? ? ?* submit the buffer_head for read. We can
> - ? ? ? ?* safely mark the bitmap as uptodate now.
> - ? ? ? ?* We do it here so the bitmap uptodate bit
> - ? ? ? ?* get set with buffer lock held.
> + ? ? ? ?* submit the buffer_head for read. ?It's important that we
> + ? ? ? ?* *not* mark the bitmap up to date until the read is
> + ? ? ? ?* completed, since we check bitmap_update() above without
> + ? ? ? ?* locking the buffer for speed reasons.
> ? ? ? ? */
> ? ? ? ?trace_ext4_load_inode_bitmap(sb, block_group);
> - ? ? ? set_bitmap_uptodate(bh);
> ? ? ? ?if (bh_submit_read(bh) < 0) {
> ? ? ? ? ? ? ? ?put_bh(bh);
> ? ? ? ? ? ? ? ?ext4_error(sb, "Cannot read inode bitmap - "
> @@ -161,6 +160,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?block_group, bitmap_blk);
> ? ? ? ? ? ? ? ?return NULL;
> ? ? ? ?}
> + ? ? ? set_bitmap_uptodate(bh);
> ? ? ? ?return bh;
> ?}
>
> --
> 1.7.4.1.22.gec8e1.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best Wishes
Yongqiang Yang

2011-11-25 16:19:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix race condition when loading block or inode bitmaps

On Fri, Nov 25, 2011 at 10:34:54AM +0800, Tao Ma wrote:
> Sorry, but I don't get your meaning here.
> In bitmap_uptodate, we check both the flag of BH_uptodate and
> BH_BITMAP_UPTODATE. And in your patch below, we just move the set of
> bitmap_uptodate after bh_uptodate. So I don't think the above scenario
> would ever happen. Could you please explain it in more detail?

Yes, you're right. I didn't realize bitmap_uptodate() checked both
bits. I had assumed it used the same convention as the other
functions that set/git bits in bh_state.

Rats, and I thought I had finally nailed the bug...

- Ted