2015-06-22 07:53:31

by NeilBrown

[permalink] [raw]
Subject: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.


Hi,
this is my proposed solution to the problem I outlined in
NFSv4 state management issue - Linux disagrees with Netapp.
I haven't tested it yet (no direct access to the Netapp), but
I'll try to get some testing done. RFC for now.

NeilBrown


When opening a file, nfs _nfs4_do_open() will return any
incompatible delegation, meaning if the delegation held for
that file does not give all the permissions required, it is
returned.
This is because various places assume that the current delegation
provides all necessary access.

However when a delegation is received, it is not validated in the
same way so it is possible to, for example, hold a read-only
delegation while the file is open write-only.
When that delegation is recalled, the NFS client will try to
reclaim the write-only open, and that will fail.

So when considering a new delegation, reject it if it is incompatible
with any open.

Signed-off-by: NeilBrown <[email protected]>
URL: https://bugzilla.suse.com/show_bug.cgi?id=934203

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 029d688..92cecd3 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -377,6 +377,25 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
old_delegation, clp);
if (freeme == NULL)
goto out;
+ } else {
+ /* Don't accept an incompatible delegation */
+ int incompatible = 0;
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs4_state *state;
+
+ spin_lock(&inode->i_lock);
+ list_for_each_entry(state, &nfsi->open_states, inode_states) {
+ if ((state->state & delegation->type) != state->state) {
+ incompatible = 1;
+ break;
+ }
+ }
+ spin_unlock(&inode->i_lock);
+ if (incompatible) {
+ freeme = delegation;
+ delegation = NULL;
+ goto out;
+ }
}
list_add_tail_rcu(&delegation->super_list, &server->delegations);
rcu_assign_pointer(nfsi->delegation, delegation);


2015-06-22 11:41:12

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.

Hi Neil,

On Mon, Jun 22, 2015 at 3:53 AM, NeilBrown <[email protected]> wrote:
>
> Hi,
> this is my proposed solution to the problem I outlined in
> NFSv4 state management issue - Linux disagrees with Netapp.
> I haven't tested it yet (no direct access to the Netapp), but
> I'll try to get some testing done. RFC for now.
>
> NeilBrown
>
>
> When opening a file, nfs _nfs4_do_open() will return any
> incompatible delegation, meaning if the delegation held for
> that file does not give all the permissions required, it is
> returned.
> This is because various places assume that the current delegation
> provides all necessary access.
>
> However when a delegation is received, it is not validated in the
> same way so it is possible to, for example, hold a read-only
> delegation while the file is open write-only.
> When that delegation is recalled, the NFS client will try to
> reclaim the write-only open, and that will fail.
>

I'd argue that the bug here is the attempt to reclaim the write-only
open; your previous email appeared to show that the client already
held a corresponding open stateid.

Cheers
Trond

2015-06-22 21:04:22

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.

On Mon, 22 Jun 2015 07:41:11 -0400
Trond Myklebust <[email protected]> wrote:

> Hi Neil,
>
> On Mon, Jun 22, 2015 at 3:53 AM, NeilBrown <[email protected]> wrote:
> >
> > Hi,
> > this is my proposed solution to the problem I outlined in
> > NFSv4 state management issue - Linux disagrees with Netapp.
> > I haven't tested it yet (no direct access to the Netapp), but
> > I'll try to get some testing done. RFC for now.
> >
> > NeilBrown
> >
> >
> > When opening a file, nfs _nfs4_do_open() will return any
> > incompatible delegation, meaning if the delegation held for
> > that file does not give all the permissions required, it is
> > returned.
> > This is because various places assume that the current delegation
> > provides all necessary access.
> >
> > However when a delegation is received, it is not validated in the
> > same way so it is possible to, for example, hold a read-only
> > delegation while the file is open write-only.
> > When that delegation is recalled, the NFS client will try to
> > reclaim the write-only open, and that will fail.
> >
>
> I'd argue that the bug here is the attempt to reclaim the write-only
> open; your previous email appeared to show that the client already
> held a corresponding open stateid.

I did consider that approach, but I managed to talk myself out of it...
Let's see if I can talk you out of it too.

There are potentially two state ids available for each open_owner+inode
- an open_stateid and a delegation stateid.

Linux does track which of read/write the delegation stateid permits,
but does *not* track which the open_stateid permits.
So when returning a delegation it does not know which of "read" and
"write" need to be reclaimed (because open_stateid doesn't provide
them) but it does know which cannot be reclaimed (because delegation
stateid didn't provide them) - so it could just reclaim whatever it
needs that the delegation *could* have provided.
So this particular bug could be fixed that way.

However, consider the scenario I described up to just before the 'link'
system call.
The client holds a write-only open_stateid and a read-only delegation
stateid.
If the client (same lockowner) opens the file read-only again the open
will succeed without talking to the server on the strength of the
delegation.
update_open_stateid will then copy the delegation stateid into the state
and all IO will use that stateid. If a write is attempted with the
still-open write-only fd, it will use the read-only delegation stateid
and presumably get an error.

Unless I've missed something there is no code in Linux/NFS to
selectively use one stateid for reads and another for writes - both
coming from the same lockowner to the same inode.

Presumably this is the reason that we have
nf4_return_incompatible_delegation(): because Linux/NFS assumes that if
it holds a delegation, that delegation covers all active open modes.
For exactly the same reason, we need to reject a delegation if it
doesn't cover all the open modes that are already active.

Certainly we *could* track exactly which accesses the open_stateid
allows, and could have (potentially) separate "read" and "write"
stateids, but that paths wasn't the easiest so I didn't follow it.

Convinced?

NeilBrown

2015-06-22 21:34:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.

On Mon, Jun 22, 2015 at 5:04 PM, NeilBrown <[email protected]> wrote:
>
> On Mon, 22 Jun 2015 07:41:11 -0400
> Trond Myklebust <[email protected]> wrote:
>
> > Hi Neil,
> >
> > On Mon, Jun 22, 2015 at 3:53 AM, NeilBrown <[email protected]> wrote:
> > >
> > > Hi,
> > > this is my proposed solution to the problem I outlined in
> > > NFSv4 state management issue - Linux disagrees with Netapp.
> > > I haven't tested it yet (no direct access to the Netapp), but
> > > I'll try to get some testing done. RFC for now.
> > >
> > > NeilBrown
> > >
> > >
> > > When opening a file, nfs _nfs4_do_open() will return any
> > > incompatible delegation, meaning if the delegation held for
> > > that file does not give all the permissions required, it is
> > > returned.
> > > This is because various places assume that the current delegation
> > > provides all necessary access.
> > >
> > > However when a delegation is received, it is not validated in the
> > > same way so it is possible to, for example, hold a read-only
> > > delegation while the file is open write-only.
> > > When that delegation is recalled, the NFS client will try to
> > > reclaim the write-only open, and that will fail.
> > >
> >
> > I'd argue that the bug here is the attempt to reclaim the write-only
> > open; your previous email appeared to show that the client already
> > held a corresponding open stateid.
>
> I did consider that approach, but I managed to talk myself out of it...
> Let's see if I can talk you out of it too.
>
> There are potentially two state ids available for each open_owner+inode
> - an open_stateid and a delegation stateid.
>
> Linux does track which of read/write the delegation stateid permits,
> but does *not* track which the open_stateid permits.
> So when returning a delegation it does not know which of "read" and
> "write" need to be reclaimed (because open_stateid doesn't provide
> them) but it does know which cannot be reclaimed (because delegation
> stateid didn't provide them) - so it could just reclaim whatever it
> needs that the delegation *could* have provided.
> So this particular bug could be fixed that way.
>
> However, consider the scenario I described up to just before the 'link'
> system call.
> The client holds a write-only open_stateid and a read-only delegation
> stateid.
> If the client (same lockowner) opens the file read-only again the open
> will succeed without talking to the server on the strength of the
> delegation.
> update_open_stateid will then copy the delegation stateid into the state
> and all IO will use that stateid. If a write is attempted with the
> still-open write-only fd, it will use the read-only delegation stateid
> and presumably get an error.

This is incorrect. As far as I know, a 4.1 client will do the following:

The NFSv4 open() code will catch the delegation as being insufficient
using can_open_delegated(), and will ensure that the client calls OPEN
in this case. The resulting open stateid is then saved in the
state->open_stateid.

If an I/O attempt is then made for an I/O type for which the
delegation cannot be used, then nfs4_select_rw_stateid() will return
either the lock stateid or the open stateid; whichever is appropriate.


> Unless I've missed something there is no code in Linux/NFS to
> selectively use one stateid for reads and another for writes - both
> coming from the same lockowner to the same inode.

See above.

> Presumably this is the reason that we have
> nf4_return_incompatible_delegation(): because Linux/NFS assumes that if
> it holds a delegation, that delegation covers all active open modes.
> For exactly the same reason, we need to reject a delegation if it
> doesn't cover all the open modes that are already active.
>
> Certainly we *could* track exactly which accesses the open_stateid
> allows, and could have (potentially) separate "read" and "write"
> stateids, but that paths wasn't the easiest so I didn't follow it.
>

I'm rather thinking that the simplest fix is simply to have
nfs4_open_delegation_recall() skip those file modes for which the
current delegation stateid is not appropriate. From a client
perspective, that should always make sense.

Cheers
Trond

2015-06-23 01:07:26

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.

On Mon, 22 Jun 2015 17:34:12 -0400
Trond Myklebust <[email protected]> wrote:

> On Mon, Jun 22, 2015 at 5:04 PM, NeilBrown <[email protected]> wrote:
> >
> > On Mon, 22 Jun 2015 07:41:11 -0400
> > Trond Myklebust <[email protected]> wrote:
> >
> > > Hi Neil,
> > >
> > > On Mon, Jun 22, 2015 at 3:53 AM, NeilBrown <[email protected]> wrote:
> > > >
> > > > Hi,
> > > > this is my proposed solution to the problem I outlined in
> > > > NFSv4 state management issue - Linux disagrees with Netapp.
> > > > I haven't tested it yet (no direct access to the Netapp), but
> > > > I'll try to get some testing done. RFC for now.
> > > >
> > > > NeilBrown
> > > >
> > > >
> > > > When opening a file, nfs _nfs4_do_open() will return any
> > > > incompatible delegation, meaning if the delegation held for
> > > > that file does not give all the permissions required, it is
> > > > returned.
> > > > This is because various places assume that the current delegation
> > > > provides all necessary access.
> > > >
> > > > However when a delegation is received, it is not validated in the
> > > > same way so it is possible to, for example, hold a read-only
> > > > delegation while the file is open write-only.
> > > > When that delegation is recalled, the NFS client will try to
> > > > reclaim the write-only open, and that will fail.
> > > >
> > >
> > > I'd argue that the bug here is the attempt to reclaim the write-only
> > > open; your previous email appeared to show that the client already
> > > held a corresponding open stateid.
> >
> > I did consider that approach, but I managed to talk myself out of it...
> > Let's see if I can talk you out of it too.
> >
> > There are potentially two state ids available for each open_owner+inode
> > - an open_stateid and a delegation stateid.
> >
> > Linux does track which of read/write the delegation stateid permits,
> > but does *not* track which the open_stateid permits.
> > So when returning a delegation it does not know which of "read" and
> > "write" need to be reclaimed (because open_stateid doesn't provide
> > them) but it does know which cannot be reclaimed (because delegation
> > stateid didn't provide them) - so it could just reclaim whatever it
> > needs that the delegation *could* have provided.
> > So this particular bug could be fixed that way.
> >
> > However, consider the scenario I described up to just before the 'link'
> > system call.
> > The client holds a write-only open_stateid and a read-only delegation
> > stateid.
> > If the client (same lockowner) opens the file read-only again the open
> > will succeed without talking to the server on the strength of the
> > delegation.
> > update_open_stateid will then copy the delegation stateid into the state
> > and all IO will use that stateid. If a write is attempted with the
> > still-open write-only fd, it will use the read-only delegation stateid
> > and presumably get an error.
>
> This is incorrect. As far as I know, a 4.1 client will do the following:
>
> The NFSv4 open() code will catch the delegation as being insufficient
> using can_open_delegated(), and will ensure that the client calls OPEN
> in this case. The resulting open stateid is then saved in the
> state->open_stateid.

In my scenario, the new open is a read-only open. The delegation is a
read-only delegation. So can_open_delegated() will succeed.

>
> If an I/O attempt is then made for an I/O type for which the
> delegation cannot be used, then nfs4_select_rw_stateid() will return
> either the lock stateid or the open stateid; whichever is appropriate.

This is the bit I was missing - thanks. nfs4_select_rw_stateid().

I was thinking that state->stateid was used for all IO, but it isn't.
It is only used to detect if a delegation was used for any of the
active opens on the file.

>
>
> > Unless I've missed something there is no code in Linux/NFS to
> > selectively use one stateid for reads and another for writes - both
> > coming from the same lockowner to the same inode.
>
> See above.
>
> > Presumably this is the reason that we have
> > nf4_return_incompatible_delegation(): because Linux/NFS assumes that if
> > it holds a delegation, that delegation covers all active open modes.
> > For exactly the same reason, we need to reject a delegation if it
> > doesn't cover all the open modes that are already active.
> >
> > Certainly we *could* track exactly which accesses the open_stateid
> > allows, and could have (potentially) separate "read" and "write"
> > stateids, but that paths wasn't the easiest so I didn't follow it.
> >
>
> I'm rather thinking that the simplest fix is simply to have
> nfs4_open_delegation_recall() skip those file modes for which the
> current delegation stateid is not appropriate. From a client
> perspective, that should always make sense.

So maybe something like this:

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 55e1e3a..ce5f1489 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1553,6 +1553,10 @@ static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmod
struct nfs4_state *newstate;
int ret;

+ if ((opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEGATE_CUR ||
+ opendata->o-arg.claim == NFS4_OPEN_CLAIM_DELE_CUR_FH) &&
+ (opendata->o_arg.u_delegation_type & mode) != mode)
+ return 0;
opendata->o_arg.open_flags = 0;
opendata->o_arg.fmode = fmode;
opendata->o_arg.share_access = nfs4_map_atomic_open_share(


I'm not entirely clear on the process of reclaiming opens and
delegations after a server reboot, so this may need to be adjusted to
handle that correctly.

I'll keep looking and try to arrange some testing.

Thanks,
NeilBrown

2015-06-23 01:16:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.

On Mon, Jun 22, 2015 at 9:07 PM, NeilBrown <[email protected]> wrote:
> On Mon, 22 Jun 2015 17:34:12 -0400
> Trond Myklebust <[email protected]> wrote:
>
>> On Mon, Jun 22, 2015 at 5:04 PM, NeilBrown <[email protected]> wrote:
>> >
>> > On Mon, 22 Jun 2015 07:41:11 -0400
>> > Trond Myklebust <[email protected]> wrote:
>> >
>> > > Hi Neil,
>> > >
>> > > On Mon, Jun 22, 2015 at 3:53 AM, NeilBrown <[email protected]> wrote:
>> > > >
>> > > > Hi,
>> > > > this is my proposed solution to the problem I outlined in
>> > > > NFSv4 state management issue - Linux disagrees with Netapp.
>> > > > I haven't tested it yet (no direct access to the Netapp), but
>> > > > I'll try to get some testing done. RFC for now.
>> > > >
>> > > > NeilBrown
>> > > >
>> > > >
>> > > > When opening a file, nfs _nfs4_do_open() will return any
>> > > > incompatible delegation, meaning if the delegation held for
>> > > > that file does not give all the permissions required, it is
>> > > > returned.
>> > > > This is because various places assume that the current delegation
>> > > > provides all necessary access.
>> > > >
>> > > > However when a delegation is received, it is not validated in the
>> > > > same way so it is possible to, for example, hold a read-only
>> > > > delegation while the file is open write-only.
>> > > > When that delegation is recalled, the NFS client will try to
>> > > > reclaim the write-only open, and that will fail.
>> > > >
>> > >
>> > > I'd argue that the bug here is the attempt to reclaim the write-only
>> > > open; your previous email appeared to show that the client already
>> > > held a corresponding open stateid.
>> >
>> > I did consider that approach, but I managed to talk myself out of it...
>> > Let's see if I can talk you out of it too.
>> >
>> > There are potentially two state ids available for each open_owner+inode
>> > - an open_stateid and a delegation stateid.
>> >
>> > Linux does track which of read/write the delegation stateid permits,
>> > but does *not* track which the open_stateid permits.
>> > So when returning a delegation it does not know which of "read" and
>> > "write" need to be reclaimed (because open_stateid doesn't provide
>> > them) but it does know which cannot be reclaimed (because delegation
>> > stateid didn't provide them) - so it could just reclaim whatever it
>> > needs that the delegation *could* have provided.
>> > So this particular bug could be fixed that way.
>> >
>> > However, consider the scenario I described up to just before the 'link'
>> > system call.
>> > The client holds a write-only open_stateid and a read-only delegation
>> > stateid.
>> > If the client (same lockowner) opens the file read-only again the open
>> > will succeed without talking to the server on the strength of the
>> > delegation.
>> > update_open_stateid will then copy the delegation stateid into the state
>> > and all IO will use that stateid. If a write is attempted with the
>> > still-open write-only fd, it will use the read-only delegation stateid
>> > and presumably get an error.
>>
>> This is incorrect. As far as I know, a 4.1 client will do the following:
>>
>> The NFSv4 open() code will catch the delegation as being insufficient
>> using can_open_delegated(), and will ensure that the client calls OPEN
>> in this case. The resulting open stateid is then saved in the
>> state->open_stateid.
>
> In my scenario, the new open is a read-only open. The delegation is a
> read-only delegation. So can_open_delegated() will succeed.

Right, but my point was that any read-write or write-only open will
fail that test, and so should result in another on-the-wire OPEN.
Those particular open states should therefore not normally need to be
reclaimed when the read delegation is returned.

>>
>> If an I/O attempt is then made for an I/O type for which the
>> delegation cannot be used, then nfs4_select_rw_stateid() will return
>> either the lock stateid or the open stateid; whichever is appropriate.
>
> This is the bit I was missing - thanks. nfs4_select_rw_stateid().
>
> I was thinking that state->stateid was used for all IO, but it isn't.
> It is only used to detect if a delegation was used for any of the
> active opens on the file.
>
>>
>>
>> > Unless I've missed something there is no code in Linux/NFS to
>> > selectively use one stateid for reads and another for writes - both
>> > coming from the same lockowner to the same inode.
>>
>> See above.
>>
>> > Presumably this is the reason that we have
>> > nf4_return_incompatible_delegation(): because Linux/NFS assumes that if
>> > it holds a delegation, that delegation covers all active open modes.
>> > For exactly the same reason, we need to reject a delegation if it
>> > doesn't cover all the open modes that are already active.
>> >
>> > Certainly we *could* track exactly which accesses the open_stateid
>> > allows, and could have (potentially) separate "read" and "write"
>> > stateids, but that paths wasn't the easiest so I didn't follow it.
>> >
>>
>> I'm rather thinking that the simplest fix is simply to have
>> nfs4_open_delegation_recall() skip those file modes for which the
>> current delegation stateid is not appropriate. From a client
>> perspective, that should always make sense.
>
> So maybe something like this:
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 55e1e3a..ce5f1489 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1553,6 +1553,10 @@ static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmod
> struct nfs4_state *newstate;
> int ret;
>
> + if ((opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEGATE_CUR ||
> + opendata->o-arg.claim == NFS4_OPEN_CLAIM_DELE_CUR_FH) &&
> + (opendata->o_arg.u_delegation_type & mode) != mode)
> + return 0;
> opendata->o_arg.open_flags = 0;
> opendata->o_arg.fmode = fmode;
> opendata->o_arg.share_access = nfs4_map_atomic_open_share(
>
>
> I'm not entirely clear on the process of reclaiming opens and
> delegations after a server reboot, so this may need to be adjusted to
> handle that correctly.

The above is along the lines of what I was suggesting. I hope it tests out OK.

> I'll keep looking and try to arrange some testing.

Thanks for your efforts! They are very much appreciated.

Cheers
Trond

2015-06-29 04:24:14

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH/RFC] NFSv4 - do not accept an incompatible delegation.

On Mon, 22 Jun 2015 21:16:37 -0400 Trond Myklebust
<[email protected]> wrote:

> On Mon, Jun 22, 2015 at 9:07 PM, NeilBrown <[email protected]> wrote:

> > So maybe something like this:
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 55e1e3a..ce5f1489 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -1553,6 +1553,10 @@ static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmod
> > struct nfs4_state *newstate;
> > int ret;
> >
> > + if ((opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEGATE_CUR ||
> > + opendata->o-arg.claim == NFS4_OPEN_CLAIM_DELE_CUR_FH) &&
> > + (opendata->o_arg.u_delegation_type & mode) != mode)
> > + return 0;
> > opendata->o_arg.open_flags = 0;
> > opendata->o_arg.fmode = fmode;
> > opendata->o_arg.share_access = nfs4_map_atomic_open_share(
> >
> >
> > I'm not entirely clear on the process of reclaiming opens and
> > delegations after a server reboot, so this may need to be adjusted to
> > handle that correctly.
>
> The above is along the lines of what I was suggesting. I hope it tests out OK.
>

Once I fixed the syntax errors, this was all I needed.
I was concerned there might be similar issues with
OPEN_CLAIM_DELEGATE_PREV (or whatever it is called) but Linux doesn't
appear to use that.

I tested it by commenting out

status = vfs_setlease(filp, fl->fl_type, &fl, NULL);

in nfs4_setlease in the Linux nfsd, so that it would always return a
read delegation. This was enough to demonstrate the problem and verify
the fix.
I haven't yet received confirmation from people with access to the
Netapp but I would be very surprised if this doesn't work (though I
think there are other issues that still need to be examined).

Separately I've discovered a bug that causes a livelock when a TCP
sendmsg fails because kmalloc failed. I'll send patches for both.

NeilBrown