2023-11-16 17:12:00

by Benjamin Coddington

[permalink] [raw]
Subject: Blocklayoutdriver deadlock with knfsd

I ran into this old problem again recently, last discussed here:
https://lore.kernel.org/linux-nfs/[email protected]/#t

Problem is that clients can easily issue enough WRITEs that end up in

__break_lease
xfs_break_leased_layouts
...
nfsd_vfs_write
...
svc_process
svc_recv
nfsd

.. so that all the knfsds are there, and nothing can process the
LAYOUTRETURN.

I'm pretty sure we can make the linux client a bit smarter about it (I saw
one LAYOUTGET and one conflicting WRITE in the same TCP segment, in that
order).

But what can knfsd do to defend itself?

Ben


2023-11-16 17:18:38

by Benjamin Coddington

[permalink] [raw]
Subject: Re: Blocklayoutdriver deadlock with knfsd

On 16 Nov 2023, at 12:11, Benjamin Coddington wrote:

> I ran into this old problem again recently, last discussed here:
> https://lore.kernel.org/linux-nfs/[email protected]/#t
>
> Problem is that clients can easily issue enough WRITEs that end up in
>
> __break_lease
> xfs_break_leased_layouts
> ...
> nfsd_vfs_write
> ...
> svc_process
> svc_recv
> nfsd
>
> .. so that all the knfsds are there, and nothing can process the
> LAYOUTRETURN.
>
> I'm pretty sure we can make the linux client a bit smarter about it (I saw
> one LAYOUTGET and one conflicting WRITE in the same TCP segment, in that
> order).

Actually, I can't imagine any block/SCSI/NVMe server implementation that
would be fine with a client writing to the device at the same time the server
does, and so why shouldn't the client preemptively return the layout?

Ben

2023-11-16 17:34:50

by Jeffrey Layton

[permalink] [raw]
Subject: Re: Blocklayoutdriver deadlock with knfsd

On Thu, 2023-11-16 at 12:17 -0500, Benjamin Coddington wrote:
> On 16 Nov 2023, at 12:11, Benjamin Coddington wrote:
>
> > I ran into this old problem again recently, last discussed here:
> > https://lore.kernel.org/linux-nfs/[email protected]/#t
> >
> > Problem is that clients can easily issue enough WRITEs that end up in
> >
> > __break_lease
> > xfs_break_leased_layouts
> > ...
> > nfsd_vfs_write
> > ...
> > svc_process
> > svc_recv
> > nfsd
> >
> > .. so that all the knfsds are there, and nothing can process the
> > LAYOUTRETURN.
> >
> > I'm pretty sure we can make the linux client a bit smarter about it (I saw
> > one LAYOUTGET and one conflicting WRITE in the same TCP segment, in that
> > order).
>
> Actually, I can't imagine any block/SCSI/NVMe server implementation that
> would be fine with a client writing to the device at the same time the server
> does, and so why shouldn't the client preemptively return the layout?
>
> Ben
>

That may be, but I don't think we can rely on that from the server-side.
I know that Neil and Chuck were discussing spinning up more nfsd threads
dynamically as required. That may be the best fix for this sort of
problem.

Until then, the guidance should probably be to start a _lot_ of threads
if you're running a pNFS SCSI enabled nfsd server.

--
Jeff Layton <[email protected]>

2023-11-16 18:19:31

by Chuck Lever

[permalink] [raw]
Subject: Re: Blocklayoutdriver deadlock with knfsd



> On Nov 16, 2023, at 12:11 PM, Benjamin Coddington <[email protected]> wrote:
>
> I ran into this old problem again recently, last discussed here:
> https://lore.kernel.org/linux-nfs/[email protected]/#t
>
> Problem is that clients can easily issue enough WRITEs that end up in
>
> __break_lease
> xfs_break_leased_layouts
> ...
> nfsd_vfs_write
> ...
> svc_process
> svc_recv
> nfsd
>
> .. so that all the knfsds are there, and nothing can process the
> LAYOUTRETURN.
>
> I'm pretty sure we can make the linux client a bit smarter about it (I saw
> one LAYOUTGET and one conflicting WRITE in the same TCP segment, in that
> order).
>
> But what can knfsd do to defend itself?

If nfsd threads are waiting indefinitely, that's a potential DoS
vector. Ideally the thread should preserve the waiting request
somehow (or return NFS4ERR_DELAY, maybe?). At some later point
when the lease conflict is resolved, the requests can be reprocessed.

That's my naive 800,000 ft view.


--
Chuck Lever


2023-11-16 20:33:03

by Jeffrey Layton

[permalink] [raw]
Subject: Re: Blocklayoutdriver deadlock with knfsd

On Thu, 2023-11-16 at 18:18 +0000, Chuck Lever III wrote:
>
> > On Nov 16, 2023, at 12:11 PM, Benjamin Coddington <[email protected]> wrote:
> >
> > I ran into this old problem again recently, last discussed here:
> > https://lore.kernel.org/linux-nfs/[email protected]/#t
> >
> > Problem is that clients can easily issue enough WRITEs that end up in
> >
> > __break_lease
> > xfs_break_leased_layouts
> > ...
> > nfsd_vfs_write
> > ...
> > svc_process
> > svc_recv
> > nfsd
> >
> > .. so that all the knfsds are there, and nothing can process the
> > LAYOUTRETURN.
> >
> > I'm pretty sure we can make the linux client a bit smarter about it (I saw
> > one LAYOUTGET and one conflicting WRITE in the same TCP segment, in that
> > order).
> >
> > But what can knfsd do to defend itself?

One thing that might help would be make the nfsd threadpool more
dynamic. If it could just spin up another thread, we'd be OK.

Maybe the server could keep an emergency thread around that is just for
processing LAYOUTRETURN/DELEGRETURN (maybe also CLOSE, etc.) calls?
Sometimes these ops are mixed into compounds with other sorts of calls
though, so we'd need to deal with that somehow.


>
> If nfsd threads are waiting indefinitely, that's a potential DoS
> vector. Ideally the thread should preserve the waiting request
> somehow (or return NFS4ERR_DELAY, maybe?). At some later point
> when the lease conflict is resolved, the requests can be reprocessed.
>
> That's my naive 800,000 ft view.
>

Yeah, that would probably be ideal. xfs_break_leased_layouts is pretty
heavy handed right now. It currently does this:

while ((error = break_layout(inode, false)) == -EWOULDBLOCK) {
xfs_iunlock(ip, *iolock);
*did_unlock = true;
error = break_layout(inode, true);
*iolock &= ~XFS_IOLOCK_SHARED;
*iolock |= XFS_IOLOCK_EXCL;
xfs_ilock(ip, *iolock);
}

So you can always get stuck in that inner break_layout call. We could
try to use IOCB_NOWAIT for I/Os coming from nfsd, which would prevent
that, but that seems like it could change the I/O behavior in other ways
we don't want. It's not clear to me how that would work alongside
IOCB_SYNC anyway.

It's not immediately clear to me how we'd do this with the existing
IOCB_* flags, so we might need a new one (IOCB_NFSD?). Then, we could
just make sure that break_layout call is always nonblocking if that flag
is set.
--
Jeff Layton <[email protected]>

2023-11-17 13:01:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Blocklayoutdriver deadlock with knfsd

On Thu, Nov 16, 2023 at 03:32:34PM -0500, Jeff Layton wrote:
> One thing that might help would be make the nfsd threadpool more
> dynamic. If it could just spin up another thread, we'd be OK.
>
> Maybe the server could keep an emergency thread around that is just for
> processing LAYOUTRETURN/DELEGRETURN (maybe also CLOSE, etc.) calls?
> Sometimes these ops are mixed into compounds with other sorts of calls
> though, so we'd need to deal with that somehow.

I think having an extra thread just for LAYOUTRETURN/DELEGRETURN is
fundamentally the right thing to do, as they need to be processed
to allow everyone else to progress.

> > If nfsd threads are waiting indefinitely, that's a potential DoS
> > vector. Ideally the thread should preserve the waiting request
> > somehow (or return NFS4ERR_DELAY, maybe?). At some later point
> > when the lease conflict is resolved, the requests can be reprocessed.
> >
> > That's my naive 800,000 ft view.
> >

That is probably a good idea on top of the above.

> So you can always get stuck in that inner break_layout call. We could
> try to use IOCB_NOWAIT for I/Os coming from nfsd, which would prevent
> that, but that seems like it could change the I/O behavior in other ways
> we don't want. It's not clear to me how that would work alongside
> IOCB_SYNC anyway.

So I definitively think using IOCB_NOWAIT from nfsd and not block
the thread for trivial I/O is a good thing. This might even enable
not offloading I/O to threads until the IOCB_NOWAIT failed. But any
IOCB_NOWAIT that returned -EAGAIN needs to eventually fall back to
doing blocking I/O as there is no progress guarantee for IOCB_NOWAIT,
and some I/O simply is entirely impossible with IOCB_NOWAIT.

> It's not immediately clear to me how we'd do this with the existing
> IOCB_* flags, so we might need a new one (IOCB_NFSD?). Then, we could
> just make sure that break_layout call is always nonblocking if that flag
> is set.

I'd name it about the behavior that it controls and not the callers,
but otherwise this too seems like a good idea.