2020-02-03 15:10:52

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 0/2] jbd2: fix an oops problem

Hi, Ted and Jan
We encountered a jbd2 oops problem on an aarch64 machine with 4K block
size and 64K page size when doing stress tests.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
...
user pgtable: 64k pages, 42-bit VAs, pgdp = (____ptrval____)
...
pc : jbd2_journal_put_journal_head+0x7c/0x284
lr : jbd2_journal_put_journal_head+0x3c/0x284
...
Call trace:
jbd2_journal_put_journal_head+0x7c/0x284
__jbd2_journal_refile_buffer+0x164/0x188
jbd2_journal_commit_transaction+0x12a0/0x1a50
kjournald2+0xd0/0x260
kthread+0x134/0x138
ret_from_fork+0x10/0x1c
Code: 51000400 b9000ac0 35000760 f9402274 (b9400a80)
---[ end trace 8fa99273d06aeb63 ]---

These patch set can fix this issue, the first patch is just a cleanup
patch, and the second one describe the root cause and fix it, please
review.

Thanks,
Yi.

zhangyi (F) (2):
jbd2: move the clearing of b_modified flag to the
journal_unmap_buffer()
jbd2: do not clear the BH_Mapped flag when forgetting a metadata
buffer

fs/jbd2/commit.c | 36 +++++++++++++-----------------------
fs/jbd2/transaction.c | 25 ++++++++++++-------------
include/linux/jbd2.h | 2 ++
3 files changed, 27 insertions(+), 36 deletions(-)

--
2.17.2


2020-02-03 15:10:52

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()

There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().

Signed-off-by: zhangyi (F) <[email protected]>
---
fs/jbd2/commit.c | 43 +++++++++++++++----------------------------
fs/jbd2/transaction.c | 24 +++++++++++-------------
2 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 2494095e0340..6396fe70085b 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -976,34 +976,21 @@ void jbd2_journal_commit_transaction(journal_t *journal)
* it. */

/*
- * A buffer which has been freed while still being journaled by
- * a previous transaction.
- */
- if (buffer_freed(bh)) {
- /*
- * If the running transaction is the one containing
- * "add to orphan" operation (b_next_transaction !=
- * NULL), we have to wait for that transaction to
- * commit before we can really get rid of the buffer.
- * So just clear b_modified to not confuse transaction
- * credit accounting and refile the buffer to
- * BJ_Forget of the running transaction. If the just
- * committed transaction contains "add to orphan"
- * operation, we can completely invalidate the buffer
- * now. We are rather through in that since the
- * buffer may be still accessible when blocksize <
- * pagesize and it is attached to the last partial
- * page.
- */
- jh->b_modified = 0;
- if (!jh->b_next_transaction) {
- clear_buffer_freed(bh);
- clear_buffer_jbddirty(bh);
- clear_buffer_mapped(bh);
- clear_buffer_new(bh);
- clear_buffer_req(bh);
- bh->b_bdev = NULL;
- }
+ * A buffer which has been freed while still being journaled
+ * by a previous transaction, refile the buffer to BJ_Forget of
+ * the running transaction. If the just committed transaction
+ * contains "add to orphan" operation, we can completely
+ * invalidate the buffer now. We are rather through in that
+ * since the buffer may be still accessible when blocksize <
+ * pagesize and it is attached to the last partial page.
+ */
+ if (buffer_freed(bh) && !jh->b_next_transaction) {
+ clear_buffer_freed(bh);
+ clear_buffer_jbddirty(bh);
+ clear_buffer_mapped(bh);
+ clear_buffer_new(bh);
+ clear_buffer_req(bh);
+ bh->b_bdev = NULL;
}

if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e77a5a0b4e46..a479cbf8ae54 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2337,11 +2337,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
set_buffer_freed(bh);
if (journal->j_running_transaction && buffer_jbddirty(bh))
jh->b_next_transaction = journal->j_running_transaction;
- spin_unlock(&journal->j_list_lock);
- spin_unlock(&jh->b_state_lock);
- write_unlock(&journal->j_state_lock);
- jbd2_journal_put_journal_head(jh);
- return 0;
+ may_free = 0;
} else {
/* Good, the buffer belongs to the running transaction.
* We are writing our own transaction's data, not any
@@ -2369,14 +2365,16 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
write_unlock(&journal->j_state_lock);
jbd2_journal_put_journal_head(jh);
zap_buffer_unlocked:
- clear_buffer_dirty(bh);
- J_ASSERT_BH(bh, !buffer_jbddirty(bh));
- clear_buffer_mapped(bh);
- clear_buffer_req(bh);
- clear_buffer_new(bh);
- clear_buffer_delay(bh);
- clear_buffer_unwritten(bh);
- bh->b_bdev = NULL;
+ if (!buffer_freed(bh)) {
+ clear_buffer_dirty(bh);
+ J_ASSERT_BH(bh, !buffer_jbddirty(bh));
+ clear_buffer_mapped(bh);
+ clear_buffer_req(bh);
+ clear_buffer_new(bh);
+ clear_buffer_delay(bh);
+ clear_buffer_unwritten(bh);
+ bh->b_bdev = NULL;
+ }
return may_free;
}

--
2.17.2

2020-02-03 15:10:52

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1 kjournald2 mkdir 2
jbd2_journal_commit_transaction
commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
jbd2_journal_commit_transaction
commit transaction N+1
...
clear_buffer_mapped(bh1)
ext4_getblk(bh2 ummapped)
...
grow_dev_page
init_page_buffers
bh1->b_private=NULL
bh2->b_private=NULL
jbd2_journal_put_journal_head(jh1)
__journal_remove_journal_head(hb1)
jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
already been unmapped.

For the metadata buffer we forgetting, clear the dirty flags is enough,
so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
keep the mapped flag for the metadata buffer.

Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Signed-off-by: zhangyi (F) <[email protected]>
---
fs/jbd2/commit.c | 11 +++++++----
fs/jbd2/transaction.c | 1 +
include/linux/jbd2.h | 2 ++
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 6396fe70085b..a649cdd1c5e5 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (buffer_freed(bh) && !jh->b_next_transaction) {
clear_buffer_freed(bh);
clear_buffer_jbddirty(bh);
- clear_buffer_mapped(bh);
- clear_buffer_new(bh);
- clear_buffer_req(bh);
- bh->b_bdev = NULL;
+ if (buffer_unmap(bh)) {
+ clear_buffer_unmap(bh);
+ clear_buffer_mapped(bh);
+ clear_buffer_new(bh);
+ clear_buffer_req(bh);
+ bh->b_bdev = NULL;
+ }
}

if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a479cbf8ae54..717964eec9d3 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2335,6 +2335,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
* should clear dirty bits when it is done with the buffer.
*/
set_buffer_freed(bh);
+ set_buffer_unmap(bh);
if (journal->j_running_transaction && buffer_jbddirty(bh))
jh->b_next_transaction = journal->j_running_transaction;
may_free = 0;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f613d8529863..f74906ebc73a 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -310,6 +310,7 @@ enum jbd_state_bits {
= BH_PrivateStart,
BH_JWrite, /* Being written to log (@@@ DEBUGGING) */
BH_Freed, /* Has been freed (truncated) */
+ BH_Unmap, /* Has been freed and need to unmap */
BH_Revoked, /* Has been revoked from the log */
BH_RevokeValid, /* Revoked flag is valid */
BH_JBDDirty, /* Is dirty but journaled */
@@ -328,6 +329,7 @@ TAS_BUFFER_FNS(Revoked, revoked)
BUFFER_FNS(RevokeValid, revokevalid)
TAS_BUFFER_FNS(RevokeValid, revokevalid)
BUFFER_FNS(Freed, freed)
+BUFFER_FNS(Unmap, unmap)
BUFFER_FNS(Shadow, shadow)
BUFFER_FNS(Verified, verified)

--
2.17.2

2020-02-06 11:20:16

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()

On Mon 03-02-20 22:04:57, zhangyi (F) wrote:
> There is no need to delay the clearing of b_modified flag to the
> transaction committing time when unmapping the journalled buffer, so
> just move it to the journal_unmap_buffer().
>
> Signed-off-by: zhangyi (F) <[email protected]>

Thanks for the patch. It looks good, just one small comment below:

> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index e77a5a0b4e46..a479cbf8ae54 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -2337,11 +2337,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
> set_buffer_freed(bh);
> if (journal->j_running_transaction && buffer_jbddirty(bh))
> jh->b_next_transaction = journal->j_running_transaction;
> - spin_unlock(&journal->j_list_lock);
> - spin_unlock(&jh->b_state_lock);
> - write_unlock(&journal->j_state_lock);
> - jbd2_journal_put_journal_head(jh);
> - return 0;
> + may_free = 0;

I'd rather add b_modified clearing here than trying to reuse the tail of
the function. Because this condition is different from the other ones that
end up in zap_buffer_locked - here we really want to keep bh and jh mostly
intact.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-02-06 11:52:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> an older transaction") set the BH_Freed flag when forgetting a metadata
> buffer which belongs to the committing transaction, it indicate the
> committing process clear dirty bits when it is done with the buffer. But
> it also clear the BH_Mapped flag at the same time, which may trigger
> below NULL pointer oops when block_size < PAGE_SIZE.
>
> rmdir 1 kjournald2 mkdir 2
> jbd2_journal_commit_transaction
> commit transaction N
> jbd2_journal_forget
> set_buffer_freed(bh1)
> jbd2_journal_commit_transaction
> commit transaction N+1
> ...
> clear_buffer_mapped(bh1)
> ext4_getblk(bh2 ummapped)
> ...
> grow_dev_page
> init_page_buffers
> bh1->b_private=NULL
> bh2->b_private=NULL
> jbd2_journal_put_journal_head(jh1)
> __journal_remove_journal_head(hb1)
> jh1 is NULL and trigger oops
>
> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
> already been unmapped.
>
> For the metadata buffer we forgetting, clear the dirty flags is enough,
> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
> keep the mapped flag for the metadata buffer.
>
> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> Signed-off-by: zhangyi (F) <[email protected]>

Good spotting! Thanks for the patch. Some comments below:

> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 6396fe70085b..a649cdd1c5e5 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> if (buffer_freed(bh) && !jh->b_next_transaction) {
> clear_buffer_freed(bh);
> clear_buffer_jbddirty(bh);
> - clear_buffer_mapped(bh);
> - clear_buffer_new(bh);
> - clear_buffer_req(bh);
> - bh->b_bdev = NULL;
> + if (buffer_unmap(bh)) {
> + clear_buffer_unmap(bh);
> + clear_buffer_mapped(bh);
> + clear_buffer_new(bh);
> + clear_buffer_req(bh);
> + bh->b_bdev = NULL;
> + }

Any reason why you don't want to clear buffer_req and buffer_new flags for
all buffers as well? I agree that b_bdev setting and buffer_mapped need
special treatment.

Also rather than introducing this new buffer_unmap bit, I'd use the fact
this special treatment is needed only for buffers coming from the block device
mapping. And we can check for that like:

/*
* We can (and need to) unmap buffer only for normal mappings.
* Block device buffers need to stay mapped all the time.
* We need to be careful about the check because the page
* mapping can get cleared under our hands.
*/
mapping = READ_ONCE(bh->b_page->mapping);
if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
...
}

Longer term, we might want to rework how the handling of truncated buffers
works with JDB2. There's lots of duplication between jbd2_journal_forget()
and jbd2_journal_unmap_buffer(), the dirtiness is tracked in jh->b_modified
as well as buffer_jbddirty() and it is further redundant with the journal
list the buffer is currently on. So I suspect it could all be simplified if
we took a fresh look at things.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-02-06 15:28:34

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

Thanks for the comments.

On 2020/2/6 19:46, Jan Kara wrote:
> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
[..]
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index 6396fe70085b..a649cdd1c5e5 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>> if (buffer_freed(bh) && !jh->b_next_transaction) {
>> clear_buffer_freed(bh);
>> clear_buffer_jbddirty(bh);
>> - clear_buffer_mapped(bh);
>> - clear_buffer_new(bh);
>> - clear_buffer_req(bh);
>> - bh->b_bdev = NULL;
>> + if (buffer_unmap(bh)) {
>> + clear_buffer_unmap(bh);
>> + clear_buffer_mapped(bh);
>> + clear_buffer_new(bh);
>> + clear_buffer_req(bh);
>> + bh->b_bdev = NULL;
>> + }
>
> Any reason why you don't want to clear buffer_req and buffer_new flags for
> all buffers as well? I agree that b_bdev setting and buffer_mapped need
> special treatment.
>
IIUC, for the buffer coming from jbd2_journal_forget() is always 'block
device backed' metadata buffer (not pretty sure), and for these metadata
buffer, buffer_new flag will not be set. At the same time, since it's
always mapped, so it's fine to keep the buffer_req flag even it's freed
by the filesystem now, because it means the block device has committed
this buffer, and it seems that it does not affect we reuse this buffer.
Am I missing something ?

> Also rather than introducing this new buffer_unmap bit, I'd use the fact
> this special treatment is needed only for buffers coming from the block device
> mapping. And we can check for that like:
>
> /*
> * We can (and need to) unmap buffer only for normal mappings.
> * Block device buffers need to stay mapped all the time.
> * We need to be careful about the check because the page
> * mapping can get cleared under our hands.
> */
> mapping = READ_ONCE(bh->b_page->mapping);
> if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> ...
> }
>
It looks better, I will use this checking in the next iteration.

> Longer term, we might want to rework how the handling of truncated buffers
> works with JDB2. There's lots of duplication between jbd2_journal_forget()
> and jbd2_journal_unmap_buffer(), the dirtiness is tracked in jh->b_modified
> as well as buffer_jbddirty() and it is further redundant with the journal
> list the buffer is currently on. So I suspect it could all be simplified if
> we took a fresh look at things.
>
Indeed, it is tricky and not pretty easy to understand now, refactoring
these is awesome int the future.

Thanks,
Yi.

2020-02-11 07:29:36

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

On 2020/2/6 19:46, Jan Kara wrote:
> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
>> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
>> an older transaction") set the BH_Freed flag when forgetting a metadata
>> buffer which belongs to the committing transaction, it indicate the
>> committing process clear dirty bits when it is done with the buffer. But
>> it also clear the BH_Mapped flag at the same time, which may trigger
>> below NULL pointer oops when block_size < PAGE_SIZE.
>>
>> rmdir 1 kjournald2 mkdir 2
>> jbd2_journal_commit_transaction
>> commit transaction N
>> jbd2_journal_forget
>> set_buffer_freed(bh1)
>> jbd2_journal_commit_transaction
>> commit transaction N+1
>> ...
>> clear_buffer_mapped(bh1)
>> ext4_getblk(bh2 ummapped)
>> ...
>> grow_dev_page
>> init_page_buffers
>> bh1->b_private=NULL
>> bh2->b_private=NULL
>> jbd2_journal_put_journal_head(jh1)
>> __journal_remove_journal_head(hb1)
>> jh1 is NULL and trigger oops
>>
>> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>> already been unmapped.
>>
>> For the metadata buffer we forgetting, clear the dirty flags is enough,
>> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
>> keep the mapped flag for the metadata buffer.
>>
>> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
>> Signed-off-by: zhangyi (F) <[email protected]>
[..]
>
> Also rather than introducing this new buffer_unmap bit, I'd use the fact
> this special treatment is needed only for buffers coming from the block device
> mapping. And we can check for that like:
>
> /*
> * We can (and need to) unmap buffer only for normal mappings.
> * Block device buffers need to stay mapped all the time.
> * We need to be careful about the check because the page
> * mapping can get cleared under our hands.
> */
> mapping = READ_ONCE(bh->b_page->mapping);
> if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> ...
> }

Think about it again, it may missing clearing of mapped flag if 'mapping'
of journalled data page was cleared, and finally trigger exception if
we reuse the buffer again. So I think it should be:

if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
...
}

Thanks,
Yi.

2020-02-12 08:46:06

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

On Thu 06-02-20 23:28:01, zhangyi (F) wrote:
> Thanks for the comments.
>
> On 2020/2/6 19:46, Jan Kara wrote:
> > On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> [..]
> >> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> >> index 6396fe70085b..a649cdd1c5e5 100644
> >> --- a/fs/jbd2/commit.c
> >> +++ b/fs/jbd2/commit.c
> >> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> >> if (buffer_freed(bh) && !jh->b_next_transaction) {
> >> clear_buffer_freed(bh);
> >> clear_buffer_jbddirty(bh);
> >> - clear_buffer_mapped(bh);
> >> - clear_buffer_new(bh);
> >> - clear_buffer_req(bh);
> >> - bh->b_bdev = NULL;
> >> + if (buffer_unmap(bh)) {
> >> + clear_buffer_unmap(bh);
> >> + clear_buffer_mapped(bh);
> >> + clear_buffer_new(bh);
> >> + clear_buffer_req(bh);
> >> + bh->b_bdev = NULL;
> >> + }
> >
> > Any reason why you don't want to clear buffer_req and buffer_new flags for
> > all buffers as well? I agree that b_bdev setting and buffer_mapped need
> > special treatment.
> >
> IIUC, for the buffer coming from jbd2_journal_forget() is always 'block
> device backed' metadata buffer (not pretty sure), and for these metadata
Yes, it is.

> buffer, buffer_new flag will not be set. At the same time, since it's
> always mapped, so it's fine to keep the buffer_req flag even it's freed
> by the filesystem now, because it means the block device has committed
> this buffer, and it seems that it does not affect we reuse this buffer.
> Am I missing something ?

OK, you're right that buffer_new shouldn't be ever set for block backed
buffers and we don't care about buffer_req. So let's keep the split of bits
to clear as you did and just add a comment that for block device buffers it
is enough to clear buffer_jbddirty and buffer_freed, for file mapping
buffers (i.e., journalled data) we have to be more careful and clear more
bits.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-02-12 08:48:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

On Tue 11-02-20 14:51:10, zhangyi (F) wrote:
> On 2020/2/6 19:46, Jan Kara wrote:
> > On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> >> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> >> an older transaction") set the BH_Freed flag when forgetting a metadata
> >> buffer which belongs to the committing transaction, it indicate the
> >> committing process clear dirty bits when it is done with the buffer. But
> >> it also clear the BH_Mapped flag at the same time, which may trigger
> >> below NULL pointer oops when block_size < PAGE_SIZE.
> >>
> >> rmdir 1 kjournald2 mkdir 2
> >> jbd2_journal_commit_transaction
> >> commit transaction N
> >> jbd2_journal_forget
> >> set_buffer_freed(bh1)
> >> jbd2_journal_commit_transaction
> >> commit transaction N+1
> >> ...
> >> clear_buffer_mapped(bh1)
> >> ext4_getblk(bh2 ummapped)
> >> ...
> >> grow_dev_page
> >> init_page_buffers
> >> bh1->b_private=NULL
> >> bh2->b_private=NULL
> >> jbd2_journal_put_journal_head(jh1)
> >> __journal_remove_journal_head(hb1)
> >> jh1 is NULL and trigger oops
> >>
> >> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
> >> already been unmapped.
> >>
> >> For the metadata buffer we forgetting, clear the dirty flags is enough,
> >> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
> >> keep the mapped flag for the metadata buffer.
> >>
> >> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> >> Signed-off-by: zhangyi (F) <[email protected]>
> [..]
> >
> > Also rather than introducing this new buffer_unmap bit, I'd use the fact
> > this special treatment is needed only for buffers coming from the block device
> > mapping. And we can check for that like:
> >
> > /*
> > * We can (and need to) unmap buffer only for normal mappings.
> > * Block device buffers need to stay mapped all the time.
> > * We need to be careful about the check because the page
> > * mapping can get cleared under our hands.
> > */
> > mapping = READ_ONCE(bh->b_page->mapping);
> > if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> > ...
> > }
>
> Think about it again, it may missing clearing of mapped flag if 'mapping'
> of journalled data page was cleared, and finally trigger exception if
> we reuse the buffer again. So I think it should be:
>
> if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
> ...
> }

Well, if b_page->mapping got cleared, it means the page got fully truncated
and in such case buffers can never be reused - the page and buffers will be
freed once we are done with them. So what you are concerned about cannot
happen. But you're right it is good to explain this in the comment.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-02-12 13:16:44

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

Hi,

On 2020/2/12 16:47, Jan Kara wrote:
> On Tue 11-02-20 14:51:10, zhangyi (F) wrote:
>> On 2020/2/6 19:46, Jan Kara wrote:
>>> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
>>>> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
>>>> an older transaction") set the BH_Freed flag when forgetting a metadata
>>>> buffer which belongs to the committing transaction, it indicate the
>>>> committing process clear dirty bits when it is done with the buffer. But
>>>> it also clear the BH_Mapped flag at the same time, which may trigger
>>>> below NULL pointer oops when block_size < PAGE_SIZE.
>>>>
>>>> rmdir 1 kjournald2 mkdir 2
>>>> jbd2_journal_commit_transaction
>>>> commit transaction N
>>>> jbd2_journal_forget
>>>> set_buffer_freed(bh1)
>>>> jbd2_journal_commit_transaction
>>>> commit transaction N+1
>>>> ...
>>>> clear_buffer_mapped(bh1)
>>>> ext4_getblk(bh2 ummapped)
>>>> ...
>>>> grow_dev_page
>>>> init_page_buffers
>>>> bh1->b_private=NULL
>>>> bh2->b_private=NULL
>>>> jbd2_journal_put_journal_head(jh1)
>>>> __journal_remove_journal_head(hb1)
>>>> jh1 is NULL and trigger oops
>>>>
>>>> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>>>> already been unmapped.
>>>>
>>>> For the metadata buffer we forgetting, clear the dirty flags is enough,
>>>> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
>>>> keep the mapped flag for the metadata buffer.
>>>>
>>>> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
>>>> Signed-off-by: zhangyi (F) <[email protected]>
>> [..]
>>>
>>> Also rather than introducing this new buffer_unmap bit, I'd use the fact
>>> this special treatment is needed only for buffers coming from the block device
>>> mapping. And we can check for that like:
>>>
>>> /*
>>> * We can (and need to) unmap buffer only for normal mappings.
>>> * Block device buffers need to stay mapped all the time.
>>> * We need to be careful about the check because the page
>>> * mapping can get cleared under our hands.
>>> */
>>> mapping = READ_ONCE(bh->b_page->mapping);
>>> if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
>>> ...
>>> }
>>
>> Think about it again, it may missing clearing of mapped flag if 'mapping'
>> of journalled data page was cleared, and finally trigger exception if
>> we reuse the buffer again. So I think it should be:
>>
>> if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
>> ...
>> }
>
> Well, if b_page->mapping got cleared, it means the page got fully truncated
> and in such case buffers can never be reused - the page and buffers will be
> freed once we are done with them. So what you are concerned about cannot
> happen. But you're right it is good to explain this in the comment.
>
Yes, you are right, the page and buffer will be freed in release_buffer_page()
and it seems there is no exception, I will send V3 to back to use the judgement
condition as you suggested and add comments after tests.

Thanks,
Yi.