2018-09-05 22:39:27

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 1/4] pNFS: Ensure we return the error if someone kills a waiting layoutget

If someone interrupts a wait on one or more outstanding layoutgets in
pnfs_update_layout() then return the ERESTARTSYS/EINTR error.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pnfs.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e8f232de484f..7d9a51e6b847 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1740,16 +1740,16 @@ static bool pnfs_within_mdsthreshold(struct nfs_open_context *ctx,
return ret;
}

-static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
+static int pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
{
/*
* send layoutcommit as it can hold up layoutreturn due to lseg
* reference
*/
pnfs_layoutcommit_inode(lo->plh_inode, false);
- return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
+ return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
nfs_wait_bit_killable,
- TASK_UNINTERRUPTIBLE);
+ TASK_KILLABLE);
}

static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo)
@@ -1830,7 +1830,9 @@ pnfs_update_layout(struct inode *ino,
}

lookup_again:
- nfs4_client_recover_expired_lease(clp);
+ lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp));
+ if (IS_ERR(lseg))
+ goto out;
first = false;
spin_lock(&ino->i_lock);
lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
@@ -1863,9 +1865,9 @@ pnfs_update_layout(struct inode *ino,
if (list_empty(&lo->plh_segs) &&
atomic_read(&lo->plh_outstanding) != 0) {
spin_unlock(&ino->i_lock);
- if (wait_var_event_killable(&lo->plh_outstanding,
- atomic_read(&lo->plh_outstanding) == 0
- || !list_empty(&lo->plh_segs)))
+ lseg = ERR_PTR(wait_var_event_killable(&lo->plh_outstanding,
+ atomic_read(&lo->plh_outstanding)));
+ if (IS_ERR(lseg) || !list_empty(&lo->plh_segs))
goto out_put_layout_hdr;
pnfs_put_layout_hdr(lo);
goto lookup_again;
@@ -1898,8 +1900,11 @@ pnfs_update_layout(struct inode *ino,
if (test_and_set_bit(NFS_LAYOUT_FIRST_LAYOUTGET,
&lo->plh_flags)) {
spin_unlock(&ino->i_lock);
- wait_on_bit(&lo->plh_flags, NFS_LAYOUT_FIRST_LAYOUTGET,
- TASK_UNINTERRUPTIBLE);
+ lseg = ERR_PTR(wait_on_bit(&lo->plh_flags,
+ NFS_LAYOUT_FIRST_LAYOUTGET,
+ TASK_KILLABLE));
+ if (IS_ERR(lseg))
+ goto out_put_layout_hdr;
pnfs_put_layout_hdr(lo);
dprintk("%s retrying\n", __func__);
goto lookup_again;
@@ -1925,7 +1930,8 @@ pnfs_update_layout(struct inode *ino,
if (test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags)) {
spin_unlock(&ino->i_lock);
dprintk("%s wait for layoutreturn\n", __func__);
- if (pnfs_prepare_to_retry_layoutget(lo)) {
+ lseg = ERR_PTR(pnfs_prepare_to_retry_layoutget(lo));
+ if (!IS_ERR(lseg)) {
if (first)
pnfs_clear_first_layoutget(lo);
pnfs_put_layout_hdr(lo);
--
2.17.1


2018-09-05 22:39:27

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 2/4] NFSv4: Fix a tracepoint Oops in initiate_file_draining()

Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to
change the test in the tracepoint.

Fixes: ce5624f7e6675 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...")
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4trace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index a275fba93170..708342f4692f 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -1194,7 +1194,7 @@ DECLARE_EVENT_CLASS(nfs4_inode_stateid_callback_event,
TP_fast_assign(
__entry->error = error;
__entry->fhandle = nfs_fhandle_hash(fhandle);
- if (inode != NULL) {
+ if (!IS_ERR_OR_NULL(inode)) {
__entry->fileid = NFS_FILEID(inode);
__entry->dev = inode->i_sb->s_dev;
} else {
--
2.17.1

2018-09-05 22:39:28

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 3/4] NFSv4.1 fix infinite loop on I/O.

The previous fix broke recovery of delegated stateids because it assumes
that if we did not mark the delegation as suspect, then the delegation has
effectively been revoked, and so it removes that delegation irrespectively
of whether or not it is valid and still in use. While this is "mostly
harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
in an infinite loop while complaining that we're using an invalid stateid
(in this case the all-zero stateid).

What we rather want to do here is ensure that the delegation is always
correctly marked as needing testing when that is the case. So we want
to close the loophole offered by nfs4_schedule_stateid_recovery(),
which marks the state as needing to be reclaimed, but not the
delegation that may be backing it.

Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4proc.c | 10 +++++++---
fs/nfs/nfs4state.c | 2 ++
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 34830f6457ea..2e4456ac9556 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2676,14 +2676,18 @@ static void nfs41_check_delegation_stateid(struct nfs4_state *state)
}

nfs4_stateid_copy(&stateid, &delegation->stateid);
- if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags) ||
- !test_and_clear_bit(NFS_DELEGATION_TEST_EXPIRED,
- &delegation->flags)) {
+ if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) {
rcu_read_unlock();
nfs_finish_clear_delegation_stateid(state, &stateid);
return;
}

+ if (!test_and_clear_bit(NFS_DELEGATION_TEST_EXPIRED,
+ &delegation->flags)) {
+ rcu_read_unlock();
+ return;
+ }
+
cred = get_rpccred(delegation->cred);
rcu_read_unlock();
status = nfs41_test_and_free_expired_stateid(server, &stateid, cred);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3df0eb52da1c..40a08cd483f0 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1390,6 +1390,8 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_

if (!nfs4_state_mark_reclaim_nograce(clp, state))
return -EBADF;
+ nfs_inode_find_delegation_state_and_recover(state->inode,
+ &state->stateid);
dprintk("%s: scheduling stateid recovery for server %s\n", __func__,
clp->cl_hostname);
nfs4_schedule_state_manager(clp);
--
2.17.1

2018-09-05 22:39:28

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 4/4] NFS: Don't open code clearing of delegation state

Add a helper for the case when the nfs4 open state has been set to use
a delegation stateid, and we want to revert to using the open stateid.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4proc.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 2e4456ac9556..8220a168282e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1637,6 +1637,14 @@ static void nfs_state_set_delegation(struct nfs4_state *state,
write_sequnlock(&state->seqlock);
}

+static void nfs_state_clear_delegation(struct nfs4_state *state)
+{
+ write_seqlock(&state->seqlock);
+ nfs4_stateid_copy(&state->stateid, &state->open_stateid);
+ clear_bit(NFS_DELEGATED_STATE, &state->flags);
+ write_sequnlock(&state->seqlock);
+}
+
static int update_open_stateid(struct nfs4_state *state,
const nfs4_stateid *open_stateid,
const nfs4_stateid *delegation,
@@ -2145,10 +2153,7 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx,
if (IS_ERR(opendata))
return PTR_ERR(opendata);
nfs4_stateid_copy(&opendata->o_arg.u.delegation, stateid);
- write_seqlock(&state->seqlock);
- nfs4_stateid_copy(&state->stateid, &state->open_stateid);
- write_sequnlock(&state->seqlock);
- clear_bit(NFS_DELEGATED_STATE, &state->flags);
+ nfs_state_clear_delegation(state);
switch (type & (FMODE_READ|FMODE_WRITE)) {
case FMODE_READ|FMODE_WRITE:
case FMODE_WRITE:
@@ -2601,10 +2606,7 @@ static void nfs_finish_clear_delegation_stateid(struct nfs4_state *state,
const nfs4_stateid *stateid)
{
nfs_remove_bad_delegation(state->inode, stateid);
- write_seqlock(&state->seqlock);
- nfs4_stateid_copy(&state->stateid, &state->open_stateid);
- write_sequnlock(&state->seqlock);
- clear_bit(NFS_DELEGATED_STATE, &state->flags);
+ nfs_state_clear_delegation(state);
}

static void nfs40_clear_delegation_stateid(struct nfs4_state *state)
@@ -2672,13 +2674,14 @@ static void nfs41_check_delegation_stateid(struct nfs4_state *state)
delegation = rcu_dereference(NFS_I(state->inode)->delegation);
if (delegation == NULL) {
rcu_read_unlock();
+ nfs_state_clear_delegation(state);
return;
}

nfs4_stateid_copy(&stateid, &delegation->stateid);
if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) {
rcu_read_unlock();
- nfs_finish_clear_delegation_stateid(state, &stateid);
+ nfs_state_clear_delegation(state);
return;
}

--
2.17.1

2019-03-12 20:04:41

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 1/4] pNFS: Ensure we return the error if someone kills a waiting layoutget

Hi Trond,

I'm seeing a hang when testing xfstests generic/013 on v4.1 with pNFS after this
patch:

On Wed, 2018-09-05 at 14:07 -0400, Trond Myklebust wrote:
> If someone interrupts a wait on one or more outstanding layoutgets in
> pnfs_update_layout() then return the ERESTARTSYS/EINTR error.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/pnfs.c | 26 ++++++++++++++++----------
> 1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index e8f232de484f..7d9a51e6b847 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1740,16 +1740,16 @@ static bool pnfs_within_mdsthreshold(struct
> nfs_open_context *ctx,
> return ret;
> }
>
> -static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
> +static int pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
> {
> /*
> * send layoutcommit as it can hold up layoutreturn due to lseg
> * reference
> */
> pnfs_layoutcommit_inode(lo->plh_inode, false);
> - return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
> + return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
> nfs_wait_bit_killable,
> - TASK_UNINTERRUPTIBLE);
> + TASK_KILLABLE);
> }
>
> static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo)
> @@ -1830,7 +1830,9 @@ pnfs_update_layout(struct inode *ino,
> }
>
> lookup_again:
> - nfs4_client_recover_expired_lease(clp);
> + lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp));
> + if (IS_ERR(lseg))
> + goto out;
> first = false;
> spin_lock(&ino->i_lock);
> lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
> @@ -1863,9 +1865,9 @@ pnfs_update_layout(struct inode *ino,
> if (list_empty(&lo->plh_segs) &&
> atomic_read(&lo->plh_outstanding) != 0) {
> spin_unlock(&ino->i_lock);
> - if (wait_var_event_killable(&lo->plh_outstanding,
> - atomic_read(&lo->plh_outstanding) == 0
> - || !list_empty(&lo->plh_segs)))
> + lseg = ERR_PTR(wait_var_event_killable(&lo->plh_outstanding,
> + atomic_read(&lo->plh_outstanding)));
> + if (IS_ERR(lseg) || !list_empty(&lo->plh_segs))

Was dropping the "== 0" condition attached to the atomic_read() here a mistake?
I think what's happening is that my client is waiting for plh_outstanding to be
anything other than 0 when there isn't any work left to do.

Thanks,
Anna

> goto out_put_layout_hdr;
> pnfs_put_layout_hdr(lo);
> goto lookup_again;
> @@ -1898,8 +1900,11 @@ pnfs_update_layout(struct inode *ino,
> if (test_and_set_bit(NFS_LAYOUT_FIRST_LAYOUTGET,
> &lo->plh_flags)) {
> spin_unlock(&ino->i_lock);
> - wait_on_bit(&lo->plh_flags, NFS_LAYOUT_FIRST_LAYOUTGET,
> - TASK_UNINTERRUPTIBLE);
> + lseg = ERR_PTR(wait_on_bit(&lo->plh_flags,
> + NFS_LAYOUT_FIRST_LAYOUTGET,
> + TASK_KILLABLE));
> + if (IS_ERR(lseg))
> + goto out_put_layout_hdr;
> pnfs_put_layout_hdr(lo);
> dprintk("%s retrying\n", __func__);
> goto lookup_again;
> @@ -1925,7 +1930,8 @@ pnfs_update_layout(struct inode *ino,
> if (test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags)) {
> spin_unlock(&ino->i_lock);
> dprintk("%s wait for layoutreturn\n", __func__);
> - if (pnfs_prepare_to_retry_layoutget(lo)) {
> + lseg = ERR_PTR(pnfs_prepare_to_retry_layoutget(lo));
> + if (!IS_ERR(lseg)) {
> if (first)
> pnfs_clear_first_layoutget(lo);
> pnfs_put_layout_hdr(lo);

2019-03-12 20:12:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/4] pNFS: Ensure we return the error if someone kills a waiting layoutget

On Tue, 2019-03-12 at 20:04 +0000, Schumaker, Anna wrote:
> Hi Trond,
>
> I'm seeing a hang when testing xfstests generic/013 on v4.1 with pNFS
> after this
> patch:
>
> On Wed, 2018-09-05 at 14:07 -0400, Trond Myklebust wrote:
> > If someone interrupts a wait on one or more outstanding layoutgets
> > in
> > pnfs_update_layout() then return the ERESTARTSYS/EINTR error.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> > fs/nfs/pnfs.c | 26 ++++++++++++++++----------
> > 1 file changed, 16 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index e8f232de484f..7d9a51e6b847 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -1740,16 +1740,16 @@ static bool pnfs_within_mdsthreshold(struct
> > nfs_open_context *ctx,
> > return ret;
> > }
> >
> > -static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr
> > *lo)
> > +static int pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr
> > *lo)
> > {
> > /*
> > * send layoutcommit as it can hold up layoutreturn due to lseg
> > * reference
> > */
> > pnfs_layoutcommit_inode(lo->plh_inode, false);
> > - return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
> > + return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
> > nfs_wait_bit_killable,
> > - TASK_UNINTERRUPTIBLE);
> > + TASK_KILLABLE);
> > }
> >
> > static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo)
> > @@ -1830,7 +1830,9 @@ pnfs_update_layout(struct inode *ino,
> > }
> >
> > lookup_again:
> > - nfs4_client_recover_expired_lease(clp);
> > + lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp));
> > + if (IS_ERR(lseg))
> > + goto out;
> > first = false;
> > spin_lock(&ino->i_lock);
> > lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
> > @@ -1863,9 +1865,9 @@ pnfs_update_layout(struct inode *ino,
> > if (list_empty(&lo->plh_segs) &&
> > atomic_read(&lo->plh_outstanding) != 0) {
> > spin_unlock(&ino->i_lock);
> > - if (wait_var_event_killable(&lo->plh_outstanding,
> > - atomic_read(&lo-
> > >plh_outstanding) == 0
> > - || !list_empty(&lo->plh_segs)))
> > + lseg = ERR_PTR(wait_var_event_killable(&lo-
> > >plh_outstanding,
> > + atomic_read(&lo-
> > >plh_outstanding)));
> > + if (IS_ERR(lseg) || !list_empty(&lo->plh_segs))
>
> Was dropping the "== 0" condition attached to the atomic_read() here
> a mistake?
> I think what's happening is that my client is waiting for
> plh_outstanding to be
> anything other than 0 when there isn't any work left to do.

Yes. That's a bug. How about the following patch?

8<---------------------------------------------------
From 400417b05f3ec0531544ca5f94e64d838d8b8849 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Tue, 12 Mar 2019 16:04:51 -0400
Subject: [PATCH] pNFS: Fix a typo in pnfs_update_layout

We're supposed to wait for the outstanding layout count to go to zero,
but that got lost somehow.

Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...")
Reported-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pnfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 8247bd1634cb..7066cd7c7aff 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1889,7 +1889,7 @@ pnfs_update_layout(struct inode *ino,
atomic_read(&lo->plh_outstanding) != 0) {
spin_unlock(&ino->i_lock);
lseg = ERR_PTR(wait_var_event_killable(&lo->plh_outstanding,
- atomic_read(&lo->plh_outstanding)));
+ !atomic_read(&lo->plh_outstanding)));
if (IS_ERR(lseg) || !list_empty(&lo->plh_segs))
goto out_put_layout_hdr;
pnfs_put_layout_hdr(lo);
--
2.20.1

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-12 21:22:55

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 1/4] pNFS: Ensure we return the error if someone kills a waiting layoutget

On Tue, 2019-03-12 at 20:12 +0000, Trond Myklebust wrote:
> On Tue, 2019-03-12 at 20:04 +0000, Schumaker, Anna wrote:
> > Hi Trond,
> >
> > I'm seeing a hang when testing xfstests generic/013 on v4.1 with pNFS
> > after this
> > patch:
> >
> > On Wed, 2018-09-05 at 14:07 -0400, Trond Myklebust wrote:
> > > If someone interrupts a wait on one or more outstanding layoutgets
> > > in
> > > pnfs_update_layout() then return the ERESTARTSYS/EINTR error.
> > >
> > > Signed-off-by: Trond Myklebust <[email protected]>
> > > ---
> > > fs/nfs/pnfs.c | 26 ++++++++++++++++----------
> > > 1 file changed, 16 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > > index e8f232de484f..7d9a51e6b847 100644
> > > --- a/fs/nfs/pnfs.c
> > > +++ b/fs/nfs/pnfs.c
> > > @@ -1740,16 +1740,16 @@ static bool pnfs_within_mdsthreshold(struct
> > > nfs_open_context *ctx,
> > > return ret;
> > > }
> > >
> > > -static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr
> > > *lo)
> > > +static int pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr
> > > *lo)
> > > {
> > > /*
> > > * send layoutcommit as it can hold up layoutreturn due to lseg
> > > * reference
> > > */
> > > pnfs_layoutcommit_inode(lo->plh_inode, false);
> > > - return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
> > > + return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
> > > nfs_wait_bit_killable,
> > > - TASK_UNINTERRUPTIBLE);
> > > + TASK_KILLABLE);
> > > }
> > >
> > > static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo)
> > > @@ -1830,7 +1830,9 @@ pnfs_update_layout(struct inode *ino,
> > > }
> > >
> > > lookup_again:
> > > - nfs4_client_recover_expired_lease(clp);
> > > + lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp));
> > > + if (IS_ERR(lseg))
> > > + goto out;
> > > first = false;
> > > spin_lock(&ino->i_lock);
> > > lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
> > > @@ -1863,9 +1865,9 @@ pnfs_update_layout(struct inode *ino,
> > > if (list_empty(&lo->plh_segs) &&
> > > atomic_read(&lo->plh_outstanding) != 0) {
> > > spin_unlock(&ino->i_lock);
> > > - if (wait_var_event_killable(&lo->plh_outstanding,
> > > - atomic_read(&lo-
> > > > plh_outstanding) == 0
> > > - || !list_empty(&lo->plh_segs)))
> > > + lseg = ERR_PTR(wait_var_event_killable(&lo-
> > > > plh_outstanding,
> > > + atomic_read(&lo-
> > > > plh_outstanding)));
> > > + if (IS_ERR(lseg) || !list_empty(&lo->plh_segs))
> >
> > Was dropping the "== 0" condition attached to the atomic_read() here
> > a mistake?
> > I think what's happening is that my client is waiting for
> > plh_outstanding to be
> > anything other than 0 when there isn't any work left to do.
>
> Yes. That's a bug. How about the following patch?

This patch works for me, but for some reason doing "!atomic_read()" takes 8
minutes longer to complete compared to doing "atomic_read() == 0". I have not
run this multiple times to confirm that it's always the case.

Anna

>
> 8<---------------------------------------------------
> From 400417b05f3ec0531544ca5f94e64d838d8b8849 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Tue, 12 Mar 2019 16:04:51 -0400
> Subject: [PATCH] pNFS: Fix a typo in pnfs_update_layout
>
> We're supposed to wait for the outstanding layout count to go to zero,
> but that got lost somehow.
>
> Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...")
> Reported-by: Anna Schumaker <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/pnfs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 8247bd1634cb..7066cd7c7aff 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1889,7 +1889,7 @@ pnfs_update_layout(struct inode *ino,
> atomic_read(&lo->plh_outstanding) != 0) {
> spin_unlock(&ino->i_lock);
> lseg = ERR_PTR(wait_var_event_killable(&lo->plh_outstanding,
> - atomic_read(&lo->plh_outstanding)));
> + !atomic_read(&lo->plh_outstanding)));
> if (IS_ERR(lseg) || !list_empty(&lo->plh_segs))
> goto out_put_layout_hdr;
> pnfs_put_layout_hdr(lo);
> --
> 2.20.1
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>