2022-07-01 02:34:07

by Xiubo Li

[permalink] [raw]
Subject: [PATCH 0/2] netfs, ceph: fix the crash when unlocking the folio

From: Xiubo Li <[email protected]>

kernel: page:00000000c9746ff1 refcount:2 mapcount:0 mapping:00000000dc2785bb index:0x1 pfn:0x141afc
kernel: memcg:ffff88810f766000
kernel: aops:ceph_aops [ceph] ino:100000005e7 dentry name:"postgresql-Fri.log"
kernel: flags: 0x5ffc000000201c(uptodate|dirty|lru|private|node=0|zone=2|lastcpupid=0x7ff)
kernel: raw: 005ffc000000201c ffffea000a9eeb48 ffffea00060ade48 ffff888193ed8228
kernel: raw: 0000000000000001 ffff88810cc96500 00000002ffffffff ffff88810f766000
kernel: page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
kernel: ------------[ cut here ]------------
kernel: kernel BUG at mm/filemap.c:1559!
kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI
kernel: CPU: 4 PID: 131697 Comm: postmaster Tainted: G S 5.19.0-rc2-ceph-g822a4c74e05d #1
kernel: Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
kernel: RIP: 0010:folio_unlock+0x26/0x30
kernel: Code: 00 0f 1f 00 0f 1f 44 00 00 48 8b 07 a8 01 74 0e f0 80 27 fe 78 01 c3 31 f6 e9 d6 fe ff ff 48 c7 c6 c0 81 37 82 e8 aa 64 04 00 <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 b8 01 00 00
kernel: RSP: 0018:ffffc90004377bc8 EFLAGS: 00010246
kernel: RAX: 000000000000003f RBX: ffff888193ed8228 RCX: 0000000000000001
kernel: RDX: 0000000000000000 RSI: ffffffff823a3569 RDI: 00000000ffffffff
kernel: RBP: ffffffff828a0058 R08: 0000000000000001 R09: 0000000000000001
kernel: R10: 000000007c6b0fd2 R11: 0000000000000034 R12: 0000000000000001
kernel: R13: 00000000fffffe00 R14: ffffea000506bf00 R15: ffff888193ed8000
kernel: FS: 00007f4993626340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000555789ee8000 CR3: 000000017a52a006 CR4: 00000000001706e0
kernel: Call Trace:
kernel: <TASK>
kernel: netfs_write_begin+0x130/0x950 [netfs]
kernel: ceph_write_begin+0x46/0xd0 [ceph]
kernel: generic_perform_write+0xef/0x200
kernel: ? file_update_time+0xd4/0x110
kernel: ceph_write_iter+0xb01/0xcd0 [ceph]
kernel: ? lock_is_held_type+0xe3/0x140
kernel: ? new_sync_write+0x106/0x180
kernel: new_sync_write+0x106/0x180
kernel: vfs_write+0x29a/0x3a0
kernel: ksys_write+0x5c/0xd0
kernel: do_syscall_64+0x34/0x80
kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
kernel: RIP: 0033:0x7f49903205c8
kernel: Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 d5 3f 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
kernel: RSP: 002b:00007fff104bd178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000048 RCX: 00007f49903205c8
kernel: RDX: 0000000000000048 RSI: 000055944d3c1ea0 RDI: 000000000000000b
kernel: RBP: 000055944d3c1ea0 R08: 000055944d3963d0 R09: 00007fff1055b080
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055944d3962f0
kernel: R13: 0000000000000048 R14: 00007f49905bb880 R15: 0000000000000048
kernel: </TASK>


Xiubo Li (2):
netfs: release the folio lock and put the folio before retrying
ceph: do not release the folio lock in kceph

fs/ceph/addr.c | 6 +++---
fs/netfs/buffered_read.c | 5 ++++-
2 files changed, 7 insertions(+), 4 deletions(-)

--
2.36.0.rc1


2022-07-01 02:34:48

by Xiubo Li

[permalink] [raw]
Subject: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying

From: Xiubo Li <[email protected]>

The lower layer filesystem should always make sure the folio is
locked and do the unlock and put the folio in netfs layer.

URL: https://tracker.ceph.com/issues/56423
Signed-off-by: Xiubo Li <[email protected]>
---
fs/netfs/buffered_read.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 42f892c5712e..257fd37c2461 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
if (ret < 0) {
trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
- if (ret == -EAGAIN)
+ if (ret == -EAGAIN) {
+ folio_unlock(folio);
+ folio_put(folio);
goto retry;
+ }
goto error;
}
}
--
2.36.0.rc1

2022-07-01 02:45:02

by Xiubo Li

[permalink] [raw]
Subject: [PATCH 2/2] ceph: do not release the folio lock in kceph

From: Xiubo Li <[email protected]>

The netfs layer should be responsible to unlock and put the folio,
and we will always return 0 when succeeds.

URL: https://tracker.ceph.com/issues/56423
Signed-off-by: Xiubo Li <[email protected]>
---
fs/ceph/addr.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index fe6147f20dee..3ef5200e2005 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1310,16 +1310,16 @@ static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned
if (snapc) {
int r;

- folio_unlock(folio);
- folio_put(folio);
if (IS_ERR(snapc))
return PTR_ERR(snapc);

+ folio_unlock(folio);
ceph_queue_writeback(inode);
r = wait_event_killable(ci->i_cap_wq,
context_is_writeable_or_written(inode, snapc));
ceph_put_snap_context(snapc);
- return r == 0 ? -EAGAIN : r;
+ folio_lock(folio);
+ return r == 0 ? -EAGAIN : 0;
}
return 0;
}
--
2.36.0.rc1

2022-07-01 10:50:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying

On Fri, 2022-07-01 at 10:29 +0800, [email protected] wrote:
> From: Xiubo Li <[email protected]>
>
> The lower layer filesystem should always make sure the folio is
> locked and do the unlock and put the folio in netfs layer.
>
> URL: https://tracker.ceph.com/issues/56423
> Signed-off-by: Xiubo Li <[email protected]>
> ---
> fs/netfs/buffered_read.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> index 42f892c5712e..257fd37c2461 100644
> --- a/fs/netfs/buffered_read.c
> +++ b/fs/netfs/buffered_read.c
> @@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
> ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
> if (ret < 0) {
> trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
> - if (ret == -EAGAIN)
> + if (ret == -EAGAIN) {
> + folio_unlock(folio);
> + folio_put(folio);
> goto retry;
> + }
> goto error;
> }
> }

I don't know here... I think it might be better to just expect that when
this function returns an error that the folio has already been unlocked.
Doing it this way will mean that you will lock and unlock the folio a
second time for no reason.

Maybe something like this instead?

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 42f892c5712e..8ae7b0f4c909 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
if (ret == -EAGAIN)
goto retry;
- goto error;
+ goto error_unlocked;
}
}

@@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
error:
folio_unlock(folio);
folio_put(folio);
+error_unlocked:
_leave(" = %d", ret);
return ret;
}

2022-07-04 01:24:24

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying


On 7/1/22 6:38 PM, Jeff Layton wrote:
> On Fri, 2022-07-01 at 10:29 +0800, [email protected] wrote:
>> From: Xiubo Li <[email protected]>
>>
>> The lower layer filesystem should always make sure the folio is
>> locked and do the unlock and put the folio in netfs layer.
>>
>> URL: https://tracker.ceph.com/issues/56423
>> Signed-off-by: Xiubo Li <[email protected]>
>> ---
>> fs/netfs/buffered_read.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
>> index 42f892c5712e..257fd37c2461 100644
>> --- a/fs/netfs/buffered_read.c
>> +++ b/fs/netfs/buffered_read.c
>> @@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
>> ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
>> if (ret < 0) {
>> trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>> - if (ret == -EAGAIN)
>> + if (ret == -EAGAIN) {
>> + folio_unlock(folio);
>> + folio_put(folio);
>> goto retry;
>> + }
>> goto error;
>> }
>> }
> I don't know here... I think it might be better to just expect that when
> this function returns an error that the folio has already been unlocked.
> Doing it this way will mean that you will lock and unlock the folio a
> second time for no reason.
>
> Maybe something like this instead?
>
> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> index 42f892c5712e..8ae7b0f4c909 100644
> --- a/fs/netfs/buffered_read.c
> +++ b/fs/netfs/buffered_read.c
> @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
> if (ret == -EAGAIN)
> goto retry;
> - goto error;
> + goto error_unlocked;
> }
> }
>
> @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> error:
> folio_unlock(folio);
> folio_put(folio);
> +error_unlocked:
> _leave(" = %d", ret);
> return ret;
> }

Then the "afs" won't work correctly:

377 static int afs_check_write_begin(struct file *file, loff_t pos,
unsigned len,
378????????????????????????????????? struct folio *folio, void **_fsdata)
379 {
380???????? struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
381
382???????? return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
383 }

The "afs" does nothing with the folio lock.

-- Xiubo


2022-07-04 02:16:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying

On Mon, Jul 04, 2022 at 09:13:44AM +0800, Xiubo Li wrote:
> On 7/1/22 6:38 PM, Jeff Layton wrote:
> > I don't know here... I think it might be better to just expect that when
> > this function returns an error that the folio has already been unlocked.
> > Doing it this way will mean that you will lock and unlock the folio a
> > second time for no reason.
> >
> > Maybe something like this instead?
> >
> > diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> > index 42f892c5712e..8ae7b0f4c909 100644
> > --- a/fs/netfs/buffered_read.c
> > +++ b/fs/netfs/buffered_read.c
> > @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> > trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
> > if (ret == -EAGAIN)
> > goto retry;
> > - goto error;
> > + goto error_unlocked;
> > }
> > }
> > @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> > error:
> > folio_unlock(folio);
> > folio_put(folio);
> > +error_unlocked:
> > _leave(" = %d", ret);
> > return ret;
> > }
>
> Then the "afs" won't work correctly:
>
> 377 static int afs_check_write_begin(struct file *file, loff_t pos, unsigned
> len,
> 378????????????????????????????????? struct folio *folio, void **_fsdata)
> 379 {
> 380???????? struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> 381
> 382???????? return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
> 383 }
>
> The "afs" does nothing with the folio lock.

It's OK to fix AFS too.

2022-07-04 02:55:30

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying


On 7/4/22 10:10 AM, Matthew Wilcox wrote:
> On Mon, Jul 04, 2022 at 09:13:44AM +0800, Xiubo Li wrote:
>> On 7/1/22 6:38 PM, Jeff Layton wrote:
>>> I don't know here... I think it might be better to just expect that when
>>> this function returns an error that the folio has already been unlocked.
>>> Doing it this way will mean that you will lock and unlock the folio a
>>> second time for no reason.
>>>
>>> Maybe something like this instead?
>>>
>>> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
>>> index 42f892c5712e..8ae7b0f4c909 100644
>>> --- a/fs/netfs/buffered_read.c
>>> +++ b/fs/netfs/buffered_read.c
>>> @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>>> trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>>> if (ret == -EAGAIN)
>>> goto retry;
>>> - goto error;
>>> + goto error_unlocked;
>>> }
>>> }
>>> @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>>> error:
>>> folio_unlock(folio);
>>> folio_put(folio);
>>> +error_unlocked:
>>> _leave(" = %d", ret);
>>> return ret;
>>> }
>> Then the "afs" won't work correctly:
>>
>> 377 static int afs_check_write_begin(struct file *file, loff_t pos, unsigned
>> len,
>> 378                                  struct folio *folio, void **_fsdata)
>> 379 {
>> 380         struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
>> 381
>> 382         return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
>> 383 }
>>
>> The "afs" does nothing with the folio lock.
> It's OK to fix AFS too.
>
Okay, will fix it. Thanks!

-- Xiubo


2022-07-04 07:28:19

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying


On 7/1/22 6:38 PM, Jeff Layton wrote:
> On Fri, 2022-07-01 at 10:29 +0800, [email protected] wrote:
>> From: Xiubo Li <[email protected]>
>>
>> The lower layer filesystem should always make sure the folio is
>> locked and do the unlock and put the folio in netfs layer.
>>
>> URL: https://tracker.ceph.com/issues/56423
>> Signed-off-by: Xiubo Li <[email protected]>
>> ---
>> fs/netfs/buffered_read.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
>> index 42f892c5712e..257fd37c2461 100644
>> --- a/fs/netfs/buffered_read.c
>> +++ b/fs/netfs/buffered_read.c
>> @@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
>> ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
>> if (ret < 0) {
>> trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>> - if (ret == -EAGAIN)
>> + if (ret == -EAGAIN) {
>> + folio_unlock(folio);
>> + folio_put(folio);
>> goto retry;
>> + }
>> goto error;
>> }
>> }
> I don't know here... I think it might be better to just expect that when
> this function returns an error that the folio has already been unlocked.
> Doing it this way will mean that you will lock and unlock the folio a
> second time for no reason.
>
> Maybe something like this instead?
>
> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> index 42f892c5712e..8ae7b0f4c909 100644
> --- a/fs/netfs/buffered_read.c
> +++ b/fs/netfs/buffered_read.c
> @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
> if (ret == -EAGAIN)
> goto retry;
> - goto error;
> + goto error_unlocked;
> }
> }
>
> @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> error:
> folio_unlock(folio);
> folio_put(folio);
> +error_unlocked:

Should we also put the folio in ceph and afs ? Won't it introduce
something like use-after-free bug ?

Maybe we should unlock it in ceph and afs and put it in netfs layer.

-- Xiubo



> _leave(" = %d", ret);
> return ret;
> }
>

2022-07-05 14:38:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying

On Tue, 2022-07-05 at 14:21 +0100, David Howells wrote:
> Jeff Layton <[email protected]> wrote:
>
> > I don't know here... I think it might be better to just expect that when
> > this function returns an error that the folio has already been unlocked.
> > Doing it this way will mean that you will lock and unlock the folio a
> > second time for no reason.
>
> I seem to remember there was some reason you wanted the folio unlocking and
> putting. I guess you need to drop the ref to flush it.
>
> Would it make sense for ->check_write_begin() to be passed a "struct folio
> **folio" rather than "struct folio *folio" and then the filesystem can clear
> *folio if it disposes of the page?
>

I'd be OK with that too.
--
Jeff Layton <[email protected]>

2022-07-05 14:40:46

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying

Jeff Layton <[email protected]> wrote:

> I don't know here... I think it might be better to just expect that when
> this function returns an error that the folio has already been unlocked.
> Doing it this way will mean that you will lock and unlock the folio a
> second time for no reason.

I seem to remember there was some reason you wanted the folio unlocking and
putting. I guess you need to drop the ref to flush it.

Would it make sense for ->check_write_begin() to be passed a "struct folio
**folio" rather than "struct folio *folio" and then the filesystem can clear
*folio if it disposes of the page?

David

2022-07-06 01:30:31

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] netfs: release the folio lock and put the folio before retrying


On 7/5/22 9:21 PM, David Howells wrote:
> Jeff Layton <[email protected]> wrote:
>
>> I don't know here... I think it might be better to just expect that when
>> this function returns an error that the folio has already been unlocked.
>> Doing it this way will mean that you will lock and unlock the folio a
>> second time for no reason.
> I seem to remember there was some reason you wanted the folio unlocking and
> putting. I guess you need to drop the ref to flush it.
>
> Would it make sense for ->check_write_begin() to be passed a "struct folio
> **folio" rather than "struct folio *folio" and then the filesystem can clear
> *folio if it disposes of the page?

Yeah, this also sounds good to me.

-- Xiubo


> David
>