2024-01-29 15:50:21

by David Wysochanski

[permalink] [raw]
Subject: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt

The loop inside nfs_netfs_issue_read() currently does not disable
interrupts while iterating through pages in the xarray to submit
for NFS read. This is not safe though since after taking xa_lock,
another page in the mapping could be processed for writeback inside
an interrupt, and deadlock can occur. To fix this, use the
irqsave/irqrestore primitives for the xa_lock.

The problem is easily reproduced with the following test:
mount -o vers=3,fsc 127.0.0.1:/export /mnt/nfs
dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/nfs/file1.bin of=/dev/null
umount /mnt/nfs

On the console with a lockdep-enabled kernel a message similar to the
following will be seen:

================================
WARNING: inconsistent lock state
6.7.0-lockdbg+ #10 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
test5/1708 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888127baa598 (&xa->xa_lock#4){+.?.}-{3:3}, at: nfs_netfs_issue_read+0x1b2/0x4b0 [nfs]
{IN-SOFTIRQ-W} state was registered at:
lock_acquire+0x144/0x380
_raw_spin_lock_irqsave+0x4e/0xa0
__folio_end_writeback+0x17e/0x5c0
folio_end_writeback+0x93/0x1b0
iomap_finish_ioend+0xeb/0x6a0
blk_update_request+0x204/0x7f0
blk_mq_end_request+0x30/0x1c0
blk_complete_reqs+0x7e/0xa0
__do_softirq+0x113/0x544
__irq_exit_rcu+0xfe/0x120
irq_exit_rcu+0xe/0x20
sysvec_call_function_single+0x6f/0x90
asm_sysvec_call_function_single+0x1a/0x20
pv_native_safe_halt+0xf/0x20
default_idle+0x9/0x20
default_idle_call+0x67/0xa0
do_idle+0x2b5/0x300
cpu_startup_entry+0x34/0x40
start_secondary+0x19d/0x1c0
secondary_startup_64_no_verify+0x18f/0x19b
irq event stamp: 176891
hardirqs last enabled at (176891): [<ffffffffa67a0be4>] _raw_spin_unlock_irqrestore+0x44/0x60
hardirqs last disabled at (176890): [<ffffffffa67a0899>] _raw_spin_lock_irqsave+0x79/0xa0
softirqs last enabled at (176646): [<ffffffffa515d91e>] __irq_exit_rcu+0xfe/0x120
softirqs last disabled at (176633): [<ffffffffa515d91e>] __irq_exit_rcu+0xfe/0x120

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&xa->xa_lock#4);
<Interrupt>
lock(&xa->xa_lock#4);

*** DEADLOCK ***

2 locks held by test5/1708:
#0: ffff888127baa498 (&sb->s_type->i_mutex_key#22){++++}-{4:4}, at: nfs_start_io_read+0x28/0x90 [nfs]
#1: ffff888127baa650 (mapping.invalidate_lock#3){.+.+}-{4:4}, at: page_cache_ra_unbounded+0xa4/0x280

stack backtrace:
CPU: 6 PID: 1708 Comm: test5 Kdump: loaded Not tainted 6.7.0-lockdbg+ #10
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014
Call Trace:
dump_stack_lvl+0x5b/0x90
mark_lock+0xb3f/0xd20
__lock_acquire+0x77b/0x3360
_raw_spin_lock+0x34/0x80
nfs_netfs_issue_read+0x1b2/0x4b0 [nfs]
netfs_begin_read+0x77f/0x980 [netfs]
nfs_netfs_readahead+0x45/0x60 [nfs]
nfs_readahead+0x323/0x5a0 [nfs]
read_pages+0xf3/0x5c0
page_cache_ra_unbounded+0x1c8/0x280
filemap_get_pages+0x38c/0xae0
filemap_read+0x206/0x5e0
nfs_file_read+0xb7/0x140 [nfs]
vfs_read+0x2a9/0x460
ksys_read+0xb7/0x140

Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Dave Wysochanski <[email protected]>
---
fs/nfs/fscache.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index b05717fe0d4e..de7ec89bfe8d 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -308,6 +308,7 @@ static void nfs_netfs_issue_read(struct netfs_io_subrequest *sreq)
struct nfs_open_context *ctx = sreq->rreq->netfs_priv;
struct page *page;
int err;
+ unsigned long flags;
pgoff_t start = (sreq->start + sreq->transferred) >> PAGE_SHIFT;
pgoff_t last = ((sreq->start + sreq->len -
sreq->transferred - 1) >> PAGE_SHIFT);
@@ -322,19 +323,19 @@ static void nfs_netfs_issue_read(struct netfs_io_subrequest *sreq)

pgio.pg_netfs = netfs; /* used in completion */

- xas_lock(&xas);
+ xas_lock_irqsave(&xas, flags);
xas_for_each(&xas, page, last) {
/* nfs_read_add_folio() may schedule() due to pNFS layout and other RPCs */
xas_pause(&xas);
- xas_unlock(&xas);
+ xas_unlock_irqrestore(&xas, flags);
err = nfs_read_add_folio(&pgio, ctx, page_folio(page));
if (err < 0) {
netfs->error = err;
goto out;
}
- xas_lock(&xas);
+ xas_lock_irqsave(&xas, flags);
}
- xas_unlock(&xas);
+ xas_unlock_irqrestore(&xas, flags);
out:
nfs_pageio_complete_read(&pgio);
nfs_netfs_put(netfs);
--
2.39.3



2024-01-29 17:26:20

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt

Dave Wysochanski <[email protected]> wrote:

> - xas_lock(&xas);
> + xas_lock_irqsave(&xas, flags);
> xas_for_each(&xas, page, last) {

You probably want to use RCU, not xas_lock(). The pages are locked and so
cannot be evicted from the xarray.

David


2024-01-29 17:35:43

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt

On Mon, Jan 29, 2024 at 12:15 PM David Howells <[email protected]> wrote:
>
> Dave Wysochanski <[email protected]> wrote:
>
> > - xas_lock(&xas);
> > + xas_lock_irqsave(&xas, flags);
> > xas_for_each(&xas, page, last) {
>
> You probably want to use RCU, not xas_lock(). The pages are locked and so
> cannot be evicted from the xarray.
>

I tried RCU originally and ran into a problem because NFS can schedule
(see comment on line 328 below)

326 xas_lock_irqsave(&xas, flags);
327 xas_for_each(&xas, page, last) {
328 /* nfs_read_add_folio() may schedule() due to pNFS
layout and other RPCs */
329 xas_pause(&xas);
330 xas_unlock_irqrestore(&xas, flags);
331 err = nfs_read_add_folio(&pgio, ctx, page_folio(page));
332 if (err < 0) {
333 netfs->error = err;
334 goto out;
335 }
336 xas_lock_irqsave(&xas, flags);
337 }
338 xas_unlock_irqrestore(&xas, flags);


2024-01-29 17:46:04

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt

On Mon, 2024-01-29 at 12:34 -0500, David Wysochanski wrote:
> On Mon, Jan 29, 2024 at 12:15 PM David Howells <[email protected]> wrote:
> >
> > Dave Wysochanski <[email protected]> wrote:
> >
> > > - xas_lock(&xas);
> > > + xas_lock_irqsave(&xas, flags);
> > > xas_for_each(&xas, page, last) {
> >
> > You probably want to use RCU, not xas_lock(). The pages are locked and so
> > cannot be evicted from the xarray.
> >
>
> I tried RCU originally and ran into a problem because NFS can schedule
> (see comment on line 328 below)
>
> 326 xas_lock_irqsave(&xas, flags);
> 327 xas_for_each(&xas, page, last) {
> 328 /* nfs_read_add_folio() may schedule() due to pNFS
> layout and other RPCs */
> 329 xas_pause(&xas);
> 330 xas_unlock_irqrestore(&xas, flags);
> 331 err = nfs_read_add_folio(&pgio, ctx, page_folio(page));
> 332 if (err < 0) {
> 333 netfs->error = err;
> 334 goto out;
> 335 }
> 336 xas_lock_irqsave(&xas, flags);
> 337 }
> 338 xas_unlock_irqrestore(&xas, flags);
>

Looking at it more closely, I think you might want to just use
xa_for_each_start(). That will do the traversal using the rcu_read_lock
under the hood, and you should be able to block on every iteration.

--
Jeff Layton <[email protected]>

2024-01-30 14:56:51

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt

On Mon, Jan 29, 2024 at 12:44 PM Jeff Layton <[email protected]> wrote:
>
> On Mon, 2024-01-29 at 12:34 -0500, David Wysochanski wrote:
> > On Mon, Jan 29, 2024 at 12:15 PM David Howells <[email protected]> wrote:
> > >
> > > Dave Wysochanski <[email protected]> wrote:
> > >
> > > > - xas_lock(&xas);
> > > > + xas_lock_irqsave(&xas, flags);
> > > > xas_for_each(&xas, page, last) {
> > >
> > > You probably want to use RCU, not xas_lock(). The pages are locked and so
> > > cannot be evicted from the xarray.
> > >
> >
> > I tried RCU originally and ran into a problem because NFS can schedule
> > (see comment on line 328 below)
> >
> > 326 xas_lock_irqsave(&xas, flags);
> > 327 xas_for_each(&xas, page, last) {
> > 328 /* nfs_read_add_folio() may schedule() due to pNFS
> > layout and other RPCs */
> > 329 xas_pause(&xas);
> > 330 xas_unlock_irqrestore(&xas, flags);
> > 331 err = nfs_read_add_folio(&pgio, ctx, page_folio(page));
> > 332 if (err < 0) {
> > 333 netfs->error = err;
> > 334 goto out;
> > 335 }
> > 336 xas_lock_irqsave(&xas, flags);
> > 337 }
> > 338 xas_unlock_irqrestore(&xas, flags);
> >
>
> Looking at it more closely, I think you might want to just use
> xa_for_each_start(). That will do the traversal using the rcu_read_lock
> under the hood, and you should be able to block on every iteration.
>
Thanks Jeff. Yes, I agree after looking at this further, this is a
good approach, and much cleaner. I'll work on a v2 patch (actually
with xa_for_each_range as you suggested off list) and send after
a bit of testing -- so far, so good.

FWIW, my original usage of RCU was outside the whole loop.
I ran into problems due to nfs_read_add_folio().


2024-01-30 15:00:05

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt

On Tue, 2024-01-30 at 09:56 -0500, David Wysochanski wrote:
> On Mon, Jan 29, 2024 at 12:44 PM Jeff Layton <[email protected]> wrote:
> >
> > On Mon, 2024-01-29 at 12:34 -0500, David Wysochanski wrote:
> > > On Mon, Jan 29, 2024 at 12:15 PM David Howells <[email protected]> wrote:
> > > >
> > > > Dave Wysochanski <[email protected]> wrote:
> > > >
> > > > > - xas_lock(&xas);
> > > > > + xas_lock_irqsave(&xas, flags);
> > > > > xas_for_each(&xas, page, last) {
> > > >
> > > > You probably want to use RCU, not xas_lock(). The pages are locked and so
> > > > cannot be evicted from the xarray.
> > > >
> > >
> > > I tried RCU originally and ran into a problem because NFS can schedule
> > > (see comment on line 328 below)
> > >
> > > 326 xas_lock_irqsave(&xas, flags);
> > > 327 xas_for_each(&xas, page, last) {
> > > 328 /* nfs_read_add_folio() may schedule() due to pNFS
> > > layout and other RPCs */
> > > 329 xas_pause(&xas);
> > > 330 xas_unlock_irqrestore(&xas, flags);
> > > 331 err = nfs_read_add_folio(&pgio, ctx, page_folio(page));
> > > 332 if (err < 0) {
> > > 333 netfs->error = err;
> > > 334 goto out;
> > > 335 }
> > > 336 xas_lock_irqsave(&xas, flags);
> > > 337 }
> > > 338 xas_unlock_irqrestore(&xas, flags);
> > >
> >
> > Looking at it more closely, I think you might want to just use
> > xa_for_each_start(). That will do the traversal using the rcu_read_lock
> > under the hood, and you should be able to block on every iteration.
> >
> Thanks Jeff. Yes, I agree after looking at this further, this is a
> good approach, and much cleaner. I'll work on a v2 patch (actually
> with xa_for_each_range as you suggested off list) and send after
> a bit of testing -- so far, so good.
>
> FWIW, my original usage of RCU was outside the whole loop.
> I ran into problems due to nfs_read_add_folio().
>

Makes sense. In principle you could do this by just dropping and
acquiring the rcu_read_lock in the same places you do the spinlock in
the original patch, but using xa_for_each_range is much simpler.

--
Jeff Layton <[email protected]>