2013-05-15 00:01:22

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 1/2] f2fs: fix inconsistency of block count during recovery

Currently f2fs recovers the dentry of fsynced files.
When power-off-recovery is conducted, this newly recovered inode should increase
node block count as well as inode block count.

This patch resolves this inconsistency that results in:

1. create a file
2. write data
3. fsync
4. reboot without sync
5. mount and recover the file
6. node block count is 1 and inode block count is 2
: fall into the inconsistent state
7. unlink the file
: trigger the following BUG_ON

------------[ cut here ]------------
kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716!
Call Trace:
[<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs]
[<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs]
[<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs]
[<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs]
[<ffffffff811c7a67>] evict+0xa7/0x1a0
[<ffffffff811c82b5>] iput+0x105/0x190
[<ffffffff811c2b30>] d_kill+0xe0/0x120
[<ffffffff811c2c57>] dput+0xe7/0x1e0
[<ffffffff811acc3d>] __fput+0x19d/0x2d0
[<ffffffff811acd7e>] ____fput+0xe/0x10
[<ffffffff81070645>] task_work_run+0xb5/0xe0
[<ffffffff81002941>] do_notify_resume+0x71/0xb0
[<ffffffff8175f14a>] int_signal+0x12/0x17

Reported-by: Chris Fries <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/node.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 3df43b4..9641534 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1492,6 +1492,8 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page)
new_ni = old_ni;
new_ni.ino = ino;

+ if (!inc_valid_node_count(sbi, NULL, 1))
+ WARN_ON(1);
set_node_addr(sbi, &new_ni, NEW_ADDR);
inc_valid_inode_count(sbi);

--
1.8.1.3.566.gaa39828


2013-05-15 00:01:37

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 2/2] f2fs: fix the inconsistent state of data pages

In get_lock_data_page, if there is a data race between get_dnode_of_data for
node and grab_cache_page for data, f2fs is able to face with the following
BUG_ON(dn.data_blkaddr == NEW_ADDR).

kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
[<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
Call Trace:
[<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
[<ffffffff811a0920>] ? fillonedir+0x100/0x100
[<ffffffff811a0920>] ? fillonedir+0x100/0x100
[<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
[<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
[<ffffffff816d7999>] system_call_fastpath+0x16/0x1b

This bug is able to be occurred when the block address of the data block is
changed after f2fs_put_dnode().
In order to avoid that, this patch fixes the lock order of node and data
blocks in which the node block lock is covered by the data block lock.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/data.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 91ff93b..05fb5c6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -233,18 +233,23 @@ struct page *get_lock_data_page(struct inode *inode, pgoff_t index)
struct page *page;
int err;

+repeat:
+ page = grab_cache_page(mapping, index);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, index, LOOKUP_NODE);
- if (err)
+ if (err) {
+ f2fs_put_page(page, 1);
return ERR_PTR(err);
+ }
f2fs_put_dnode(&dn);

- if (dn.data_blkaddr == NULL_ADDR)
+ if (dn.data_blkaddr == NULL_ADDR) {
+ f2fs_put_page(page, 1);
return ERR_PTR(-ENOENT);
-repeat:
- page = grab_cache_page(mapping, index);
- if (!page)
- return ERR_PTR(-ENOMEM);
+ }

if (PageUptodate(page))
return page;
--
1.8.1.3.566.gaa39828

2013-05-15 03:47:24

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 1/2] f2fs: fix inconsistency of block count during recovery

>
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 3df43b4..9641534 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1492,6 +1492,8 @@ int recover_inode_page(struct f2fs_sb_info *sbi,
> struct page *page)
> new_ni = old_ni;
> new_ni.ino = ino;
>
Hi. Jaegeuk.

I have a minor comment.
> + if (!inc_valid_node_count(sbi, NULL, 1))
> + WARN_ON(1);
How about change WARN_ON(!inc_valid_node_count(sbi, NULL, 1)); ?

Reviewed-by: Namjae Jeon <[email protected]>
Thanks.
> set_node_addr(sbi, &new_ni, NEW_ADDR);
> inc_valid_inode_count(sbi);
>
> --
> 1.8.1.3.566.gaa39828
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-05-15 04:04:10

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 2/2] f2fs: fix the inconsistent state of data pages

2013/5/15, Jaegeuk Kim <[email protected]>:
> In get_lock_data_page, if there is a data race between get_dnode_of_data
> for
> node and grab_cache_page for data, f2fs is able to face with the following
> BUG_ON(dn.data_blkaddr == NEW_ADDR).
>
> kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
> [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
> Call Trace:
> [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
> [<ffffffff811a0920>] ? fillonedir+0x100/0x100
> [<ffffffff811a0920>] ? fillonedir+0x100/0x100
> [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
> [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
> [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b
>
> This bug is able to be occurred when the block address of the data block is
> changed after f2fs_put_dnode().
> In order to avoid that, this patch fixes the lock order of node and data
> blocks in which the node block lock is covered by the data block lock.
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/data.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 91ff93b..05fb5c6 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -233,18 +233,23 @@ struct page *get_lock_data_page(struct inode *inode,
> pgoff_t index)
> struct page *page;
> int err;
>
> +repeat:
> + page = grab_cache_page(mapping, index);
> + if (!page)
> + return ERR_PTR(-ENOMEM);
> +
> set_new_dnode(&dn, inode, NULL, NULL, 0);
> err = get_dnode_of_data(&dn, index, LOOKUP_NODE);
> - if (err)
> + if (err) {
> + f2fs_put_page(page, 1);
> return ERR_PTR(err);
> + }
> f2fs_put_dnode(&dn);
>
> - if (dn.data_blkaddr == NULL_ADDR)
> + if (dn.data_blkaddr == NULL_ADDR) {
> + f2fs_put_page(page, 1);
> return ERR_PTR(-ENOENT);
> -repeat:
> - page = grab_cache_page(mapping, index);
> - if (!page)
> - return ERR_PTR(-ENOMEM);
> + }
>
> if (PageUptodate(page))
> return page;
Is there no need to move PageUptodate condition checking to
grab_cache_page next ?

Thanks.
> --
> 1.8.1.3.566.gaa39828
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-05-15 04:39:28

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH 1/2] f2fs: fix inconsistency of block count during recovery

2013-05-15 (수), 12:47 +0900, Namjae Jeon:
> >
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 3df43b4..9641534 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1492,6 +1492,8 @@ int recover_inode_page(struct f2fs_sb_info *sbi,
> > struct page *page)
> > new_ni = old_ni;
> > new_ni.ino = ino;
> >
> Hi. Jaegeuk.
>
> I have a minor comment.
> > + if (!inc_valid_node_count(sbi, NULL, 1))
> > + WARN_ON(1);

Hi Namjae,

Negative since inc_valid_node_count() is not for debugging.
IMO, for readability, we need to make clear what the function call is
used for.
Thank you for the review. :)

> How about change WARN_ON(!inc_valid_node_count(sbi, NULL, 1)); ?
>
> Reviewed-by: Namjae Jeon <[email protected]>
> Thanks.
> > set_node_addr(sbi, &new_ni, NEW_ADDR);
> > inc_valid_inode_count(sbi);
> >
> > --
> > 1.8.1.3.566.gaa39828
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >

--
Jaegeuk Kim
Samsung


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2013-05-15 04:42:44

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] f2fs: fix the inconsistent state of data pages

2013-05-15 (수), 13:04 +0900, Namjae Jeon:
> 2013/5/15, Jaegeuk Kim <[email protected]>:
> > In get_lock_data_page, if there is a data race between get_dnode_of_data
> > for
> > node and grab_cache_page for data, f2fs is able to face with the following
> > BUG_ON(dn.data_blkaddr == NEW_ADDR).
> >
> > kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
> > [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
> > Call Trace:
> > [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
> > [<ffffffff811a0920>] ? fillonedir+0x100/0x100
> > [<ffffffff811a0920>] ? fillonedir+0x100/0x100
> > [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
> > [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
> > [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b
> >
> > This bug is able to be occurred when the block address of the data block is
> > changed after f2fs_put_dnode().
> > In order to avoid that, this patch fixes the lock order of node and data
> > blocks in which the node block lock is covered by the data block lock.
> >
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/data.c | 17 +++++++++++------
> > 1 file changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 91ff93b..05fb5c6 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -233,18 +233,23 @@ struct page *get_lock_data_page(struct inode *inode,
> > pgoff_t index)
> > struct page *page;
> > int err;
> >
> > +repeat:
> > + page = grab_cache_page(mapping, index);
> > + if (!page)
> > + return ERR_PTR(-ENOMEM);
> > +
> > set_new_dnode(&dn, inode, NULL, NULL, 0);
> > err = get_dnode_of_data(&dn, index, LOOKUP_NODE);
> > - if (err)
> > + if (err) {
> > + f2fs_put_page(page, 1);
> > return ERR_PTR(err);
> > + }
> > f2fs_put_dnode(&dn);
> >
> > - if (dn.data_blkaddr == NULL_ADDR)
> > + if (dn.data_blkaddr == NULL_ADDR) {
> > + f2fs_put_page(page, 1);
> > return ERR_PTR(-ENOENT);
> > -repeat:
> > - page = grab_cache_page(mapping, index);
> > - if (!page)
> > - return ERR_PTR(-ENOMEM);
> > + }
> >
> > if (PageUptodate(page))
> > return page;
> Is there no need to move PageUptodate condition checking to
> grab_cache_page next ?

For the data consistency, I'd like to check index in its node block
prior to this.
Thanks,

>
> Thanks.
> > --
> > 1.8.1.3.566.gaa39828
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

--
Jaegeuk Kim
Samsung


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part