LinuxLists.cc - [RFC PATCH 0/6] nfs: pnfs layout pipelining

2016-05-05 18:24:55

Subject: [RFC PATCH 0/6] nfs: pnfs layout pipelining

At Primary Data, one of the things we're most interested in is data
mobility. IOW, we want to be able to change the layout for an inode
seamlessly, with little interruption to I/O patterns.

The problem we have now is that CB_LAYOUTRECALLs interrupt I/O. When one
comes in, most pNFS servers refuse to hand out new layouts until the
recalled ones have been returned (or the client indicates that it no
longer knows about them). It doesn't have to be this way though. RFC5661
allows for concurrent LAYOUTGET and LAYOUTRETURN calls.

Furthermore, servers are expected to deal with old stateids in
LAYOUTRETURN. From RFC5661, section 18.44.3:

If the client returns the layout in response to a CB_LAYOUTRECALL
where the lor_recalltype field of the clora_recall field was
LAYOUTRECALL4_FILE, the client should use the lor_stateid value from
CB_LAYOUTRECALL as the value for lrf_stateid. Otherwise, it should
use logr_stateid (from a previous LAYOUTGET result) or lorr_stateid
(from a previous LAYRETURN result). This is done to indicate the
point in time (in terms of layout stateid transitions) when the
recall was sent.

The way I'm interpreting this is that we can treat a LAYOUTRETURN with
an old stateid as returning all layouts that matched the given iomode,
at the time that that seqid was current.

With that, we can allow a LAYOUTGET on the same fh to proceed even when
there are still recalled layouts outstanding. This should allow the
client to pivot to a new layout while it's still draining I/Os
that are pinning the ones to be returned.

This patchset is a first draft of the client side piece that allows
this. Basically whenever we get a new layout segment, we'll tag it with
the seqid that was in the LAYOUTGET stateid that grants it.

When a CB_LAYOUTRECALL comes in, we tag the return seqid in the layout
header with the one that was in the request. When we do a LAYOUTRETURN
in response to a CB_LAYOUTRECALL, we craft the seqid such that we're
only returning the layouts that were recalled. Nothing that has been
granted since then will be returned.

I think I've done this in a way that the existing behavior is preserved
in the case where the server enforces the serialization of these
operations, but please do have a look and let me know if you see any
potential problems here. Testing this is still a WIP...

Jeff Layton (6):
nfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
nfs: record sequence in pnfs_layout_segment when it's created
nfs: keep track of the return sequence number in pnfs_layout_hdr
nfs4: only tear down lsegs that precede seqid in LAYOUTRETURN args
nfs4: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED in
flexfiles code
nfs4: add kerneldoc header to nfs4_ff_layout_prepare_ds

fs/nfs/callback_proc.c | 3 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +--
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 26 +++++++----
fs/nfs/nfs42proc.c | 2 +-
fs/nfs/nfs4proc.c | 5 ++-
fs/nfs/pnfs.c | 71 +++++++++++++++++++++----------
fs/nfs/pnfs.h | 5 ++-
7 files changed, 78 insertions(+), 40 deletions(-)

--
2.5.5

2016-05-05 18:24:56

by Jeff Layton

[permalink] [raw]

Subject: [RFC PATCH 1/6] nfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set

Otherwise, we'll end up returning layouts that we've just received if
the client issues a new LAYOUTGET prior to the LAYOUTRETURN.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 61b27d99df18..0ee2613b8aa6 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -298,6 +298,8 @@ ff_lseg_merge(struct pnfs_layout_segment *new,
{
u64 new_end, old_end;

+ if (test_bit(NFS_LSEG_LAYOUTRETURN, &old->pls_flags))
+ return false;
if (new->pls_range.iomode != old->pls_range.iomode)
return false;
old_end = pnfs_calc_offset_end(old->pls_range.offset,
@@ -318,8 +320,6 @@ ff_lseg_merge(struct pnfs_layout_segment *new,
new_end);
if (test_bit(NFS_LSEG_ROC, &old->pls_flags))
set_bit(NFS_LSEG_ROC, &new->pls_flags);
- if (test_bit(NFS_LSEG_LAYOUTRETURN, &old->pls_flags))
- set_bit(NFS_LSEG_LAYOUTRETURN, &new->pls_flags);
return true;
}

--
2.5.5

2016-05-05 18:24:57

by Jeff Layton

[permalink] [raw]

Subject: [RFC PATCH 2/6] nfs: record sequence in pnfs_layout_segment when it's created

In later patches, we're going to teach the client to be more selective
about how it returns layouts. This means keeping a record of what the
stateid's seqid was at the time that the server handed out a layout
segment.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/pnfs.c | 1 +
fs/nfs/pnfs.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 89a5ef4df08a..7cc2c431f355 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1697,6 +1697,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)

init_lseg(lo, lseg);
lseg->pls_range = res->range;
+ lseg->pls_seq = be32_to_cpu(res->stateid.seqid);

spin_lock(&ino->i_lock);
if (pnfs_layoutgets_blocked(lo)) {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1ac1db5f6dad..62e61cf0ad98 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -64,6 +64,7 @@ struct pnfs_layout_segment {
struct list_head pls_lc_list;
struct pnfs_layout_range pls_range;
atomic_t pls_refcount;
+ u32 pls_seq;
unsigned long pls_flags;
struct pnfs_layout_hdr *pls_layout;
struct work_struct pls_work;
--
2.5.5

2016-05-05 18:24:59

by Jeff Layton

[permalink] [raw]

Subject: [RFC PATCH 4/6] nfs4: only tear down lsegs that precede seqid in LAYOUTRETURN args

LAYOUTRETURN is "special" in that servers and clients are expected to
work with old stateids. When the client sends a LAYOUTRETURN with an old
stateid in it then the server is expected to only tear down layout
segments that were present when that seqid was current. Ensure that the
client handles its accounting accordingly.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback_proc.c | 3 ++-
fs/nfs/nfs42proc.c | 2 +-
fs/nfs/nfs4proc.c | 5 +++--
fs/nfs/pnfs.c | 61 +++++++++++++++++++++++++++++++++-----------------
fs/nfs/pnfs.h | 3 ++-
5 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 618ced381a14..755838df9996 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -217,7 +217,8 @@ static u32 initiate_file_draining(struct nfs_client *clp,
}

if (pnfs_mark_matching_lsegs_return(lo, &free_me_list,
- &args->cbl_range)) {
+ &args->cbl_range,
+ be32_to_cpu(args->cbl_stateid.seqid))) {
rv = NFS4_OK;
goto unlock;
}
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index dff83460e5a6..198bcc3e103d 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -232,7 +232,7 @@ nfs42_layoutstat_done(struct rpc_task *task, void *calldata)
* with the current stateid.
*/
set_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags);
- pnfs_mark_matching_lsegs_invalid(lo, &head, NULL);
+ pnfs_mark_matching_lsegs_invalid(lo, &head, NULL, 0);
spin_unlock(&inode->i_lock);
pnfs_free_lseg_list(&head);
} else
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 327b8c34d360..c0645b53c4bc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7926,7 +7926,7 @@ static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
* with the current stateid.
*/
set_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags);
- pnfs_mark_matching_lsegs_invalid(lo, &head, NULL);
+ pnfs_mark_matching_lsegs_invalid(lo, &head, NULL, 0);
spin_unlock(&inode->i_lock);
pnfs_free_lseg_list(&head);
} else
@@ -8118,7 +8118,8 @@ static void nfs4_layoutreturn_release(void *calldata)

dprintk("--> %s\n", __func__);
spin_lock(&lo->plh_inode->i_lock);
- pnfs_mark_matching_lsegs_invalid(lo, &freeme, &lrp->args.range);
+ pnfs_mark_matching_lsegs_invalid(lo, &freeme, &lrp->args.range,
+ be32_to_cpu(lrp->args.stateid.seqid));
pnfs_mark_layout_returned_if_empty(lo);
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 12016ed54e1d..dc75db4c4849 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -270,7 +270,7 @@ pnfs_mark_layout_stateid_invalid(struct pnfs_layout_hdr *lo,
};

set_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags);
- return pnfs_mark_matching_lsegs_invalid(lo, lseg_list, &range);
+ return pnfs_mark_matching_lsegs_invalid(lo, lseg_list, &range, 0);
}

static int
@@ -308,7 +308,7 @@ pnfs_layout_io_set_failed(struct pnfs_layout_hdr *lo, u32 iomode)

spin_lock(&inode->i_lock);
pnfs_layout_set_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode));
- pnfs_mark_matching_lsegs_invalid(lo, &head, &range);
+ pnfs_mark_matching_lsegs_invalid(lo, &head, &range, 0);
spin_unlock(&inode->i_lock);
pnfs_free_lseg_list(&head);
dprintk("%s Setting layout IOMODE_%s fail bit\n", __func__,
@@ -522,13 +522,35 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
return rv;
}

-/* Returns count of number of matching invalid lsegs remaining in list
- * after call.
+/*
+ * Compare 2 layout stateid sequence ids, to see which is newer,
+ * taking into account wraparound issues.
+ */
+static bool pnfs_seqid_is_newer(u32 s1, u32 s2)
+{
+ return (s32)(s1 - s2) > 0;
+}
+
+/**
+ * pnfs_mark_matching_lsegs_invalid - tear down lsegs or mark them for later
+ * @lo: layout header containing the lsegs
+ * @tmp_list: list head where doomed lsegs should go
+ * @recall_range: optional recall range argument to match (may be NULL)
+ * @seq: only invalidate lsegs obtained prior to this sequence (may be 0)
+ *
+ * Walk the list of lsegs in the layout header, and tear down any that should
+ * be destroyed. If "recall_range" is specified then the segment must match
+ * that range. If "seq" is non-zero, then only match segments that were handed
+ * out at or before that sequence.
+ *
+ * Returns number of matching invalid lsegs remaining in list after scanning
+ * it and purging them.
*/
int
pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
struct list_head *tmp_list,
- const struct pnfs_layout_range *recall_range)
+ const struct pnfs_layout_range *recall_range,
+ u32 seq)
{
struct pnfs_layout_segment *lseg, *next;
int remaining = 0;
@@ -540,10 +562,12 @@ pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list)
if (!recall_range ||
should_free_lseg(&lseg->pls_range, recall_range)) {
- dprintk("%s: freeing lseg %p iomode %d "
+ if (seq && pnfs_seqid_is_newer(lseg->pls_seq, seq))
+ continue;
+ dprintk("%s: freeing lseg %p iomode %d seq %u"
"offset %llu length %llu\n", __func__,
- lseg, lseg->pls_range.iomode, lseg->pls_range.offset,
- lseg->pls_range.length);
+ lseg, lseg->pls_range.iomode, lseg->pls_seq,
+ lseg->pls_range.offset, lseg->pls_range.length);
if (!mark_lseg_invalid(lseg, tmp_list))
remaining++;
}
@@ -730,15 +754,6 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
pnfs_destroy_layouts_byclid(clp, false);
}

-/*
- * Compare 2 layout stateid sequence ids, to see which is newer,
- * taking into account wraparound issues.
- */
-static bool pnfs_seqid_is_newer(u32 s1, u32 s2)
-{
- return (s32)(s1 - s2) > 0;
-}
-
/* update lo->plh_stateid with new if is more recent */
void
pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo, const nfs4_stateid *new,
@@ -1014,7 +1029,7 @@ _pnfs_return_layout(struct inode *ino)
pnfs_get_layout_hdr(lo);
empty = list_empty(&lo->plh_segs);
pnfs_clear_layoutcommit(ino, &tmp_list);
- pnfs_mark_matching_lsegs_invalid(lo, &tmp_list, NULL);
+ pnfs_mark_matching_lsegs_invalid(lo, &tmp_list, NULL, 0);

if (NFS_SERVER(ino)->pnfs_curr_ld->return_range) {
struct pnfs_layout_range range = {
@@ -1721,7 +1736,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
* inode invalid, and don't bother validating the stateid
* sequence number.
*/
- pnfs_mark_matching_lsegs_invalid(lo, &free_me, NULL);
+ pnfs_mark_matching_lsegs_invalid(lo, &free_me, NULL, 0);

nfs4_stateid_copy(&lo->plh_stateid, &res->stateid);
lo->plh_barrier = be32_to_cpu(res->stateid.seqid);
@@ -1775,7 +1790,8 @@ pnfs_set_plh_return_info(struct pnfs_layout_hdr *lo, enum pnfs_iomode iomode,
int
pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
struct list_head *tmp_list,
- const struct pnfs_layout_range *return_range)
+ const struct pnfs_layout_range *return_range,
+ u32 seq)
{
struct pnfs_layout_segment *lseg, *next;
int remaining = 0;
@@ -1798,8 +1814,11 @@ pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
continue;
remaining++;
set_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags);
- pnfs_set_plh_return_info(lo, return_range->iomode, lseg->pls_seq);
}
+
+ if (remaining)
+ pnfs_set_plh_return_info(lo, return_range->iomode, seq);
+
return remaining;
}

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 47042084888c..bb68d354255e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -266,7 +266,8 @@ int pnfs_choose_layoutget_stateid(nfs4_stateid *dst,
struct nfs4_state *open_state);
int pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
struct list_head *tmp_list,
- const struct pnfs_layout_range *recall_range);
+ const struct pnfs_layout_range *recall_range,
+ u32 seq);
int pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
struct list_head *tmp_list,
const struct pnfs_layout_range *recall_range);
--
2.5.5

2016-05-05 18:24:58

by Jeff Layton

[permalink] [raw]

Subject: [RFC PATCH 3/6] nfs: keep track of the return sequence number in pnfs_layout_hdr

When we want to selectively do a LAYOUTRETURN, we need to specify a
stateid that represents most recent layout acquisition that is to be
returned.

When we mark a layout stateid to be returned, we update the return
sequence number in the layout header with that value, if it's newer
than the existing one. Then, when we go to do a LAYOUTRETURN on
layout header put, we overwrite the seqid in the stateid with the
saved one, and then zero it out.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/pnfs.c | 11 ++++++++---
fs/nfs/pnfs.h | 1 +
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7cc2c431f355..12016ed54e1d 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -899,6 +899,7 @@ pnfs_prepare_layoutreturn(struct pnfs_layout_hdr *lo)
if (test_and_set_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
return false;
lo->plh_return_iomode = 0;
+ lo->plh_return_seq = 0;
pnfs_get_layout_hdr(lo);
clear_bit(NFS_LAYOUT_RETURN_REQUESTED, &lo->plh_flags);
return true;
@@ -969,6 +970,7 @@ static void pnfs_layoutreturn_before_put_layout_hdr(struct pnfs_layout_hdr *lo)
bool send;

nfs4_stateid_copy(&stateid, &lo->plh_stateid);
+ stateid.seqid = cpu_to_be32(lo->plh_return_seq);
iomode = lo->plh_return_iomode;
send = pnfs_prepare_layoutreturn(lo);
spin_unlock(&inode->i_lock);
@@ -1747,7 +1749,8 @@ out_forget_reply:
}

static void
-pnfs_set_plh_return_iomode(struct pnfs_layout_hdr *lo, enum pnfs_iomode iomode)
+pnfs_set_plh_return_info(struct pnfs_layout_hdr *lo, enum pnfs_iomode iomode,
+ u32 seq)
{
if (lo->plh_return_iomode == iomode)
return;
@@ -1755,6 +1758,8 @@ pnfs_set_plh_return_iomode(struct pnfs_layout_hdr *lo, enum pnfs_iomode iomode)
iomode = IOMODE_ANY;
lo->plh_return_iomode = iomode;
set_bit(NFS_LAYOUT_RETURN_REQUESTED, &lo->plh_flags);
+ if (!lo->plh_return_seq || pnfs_seqid_is_newer(seq, lo->plh_return_seq))
+ lo->plh_return_seq = seq;
}

/**
@@ -1793,7 +1798,7 @@ pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
continue;
remaining++;
set_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags);
- pnfs_set_plh_return_iomode(lo, return_range->iomode);
+ pnfs_set_plh_return_info(lo, return_range->iomode, lseg->pls_seq);
}
return remaining;
}
@@ -1811,7 +1816,7 @@ void pnfs_error_mark_layout_for_return(struct inode *inode,
bool return_now = false;

spin_lock(&inode->i_lock);
- pnfs_set_plh_return_iomode(lo, range.iomode);
+ pnfs_set_plh_return_info(lo, range.iomode, lseg->pls_seq);
/*
* mark all matching lsegs so that we are sure to have no live
* segments at hand when sending layoutreturn. See pnfs_put_lseg()
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 62e61cf0ad98..47042084888c 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -195,6 +195,7 @@ struct pnfs_layout_hdr {
unsigned long plh_flags;
nfs4_stateid plh_stateid;
u32 plh_barrier; /* ignore lower seqids */
+ u32 plh_return_seq;
enum pnfs_iomode plh_return_iomode;
loff_t plh_lwb; /* last write byte for layoutcommit */
struct rpc_cred *plh_lc_cred; /* layoutcommit cred */
--
2.5.5

2016-05-05 18:25:00

by Jeff Layton

[permalink] [raw]

Subject: [RFC PATCH 5/6] nfs4: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED in flexfiles code

Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything,
unless there are lsegs that are also being marked for return. At the
point where that happens this flag is also set, so these set_bit calls
don't do anything useful.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 2 --
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 8 +-------
2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 0ee2613b8aa6..e2bb3eaaff32 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1249,8 +1249,6 @@ static int ff_layout_read_done_cb(struct rpc_task *task,
hdr->pgio_mirror_idx + 1,
&hdr->pgio_mirror_idx))
goto out_eagain;
- set_bit(NFS_LAYOUT_RETURN_REQUESTED,
- &hdr->lseg->pls_layout->plh_flags);
pnfs_read_resend_pnfs(hdr);
return task->tk_status;
case -NFS4ERR_RESET_TO_MDS:
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index 56296f3df19c..5eff05e9eb4d 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -393,13 +393,7 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
mirror, lseg->pls_range.offset,
lseg->pls_range.length, NFS4ERR_NXIO,
OP_ILLEGAL, GFP_NOIO);
- if (!fail_return) {
- if (ff_layout_has_available_ds(lseg))
- set_bit(NFS_LAYOUT_RETURN_REQUESTED,
- &lseg->pls_layout->plh_flags);
- else
- pnfs_error_mark_layout_for_return(ino, lseg);
- } else
+ if (fail_return || !ff_layout_has_available_ds(lseg))
pnfs_error_mark_layout_for_return(ino, lseg);
ds = NULL;
}
--
2.5.5

2016-05-05 18:25:01

by Jeff Layton

[permalink] [raw]

Subject: [RFC PATCH 6/6] nfs4: add kerneldoc header to nfs4_ff_layout_prepare_ds

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index 5eff05e9eb4d..61094d59ee3a 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -342,7 +342,23 @@ out:
return fh;
}

-/* Upon return, either ds is connected, or ds is NULL */
+/**
+ * nfs4_ff_layout_prepare_ds - prepare a DS connection for an RPC call
+ * @lseg: the layout segment we're operating on
+ * @ds_idx: index of the DS to use
+ * @fail_return: return layout on connect failure?
+ *
+ * Try to prepare a DS connection to accept an RPC call. This involves
+ * selecting a mirror to use and connecting the client to it if it's not
+ * already connected.
+ *
+ * Since we only need single functioning mirror to satisfy a read, we don't
+ * want to return the layout if there is one. For writes though, any down
+ * mirror should result in a LAYOUTRETURN. @fail_return is how we distinguish
+ * between the two cases.
+ *
+ * Returns a pointer to a connected DS object on success or NULL on failure.
+ */
struct nfs4_pnfs_ds *
nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
bool fail_return)
--
2.5.5

2016-05-05 18:40:19

by Jeff Layton

[permalink] [raw]

Subject: Re: [RFC PATCH 4/6] nfs4: only tear down lsegs that precede seqid in LAYOUTRETURN args

On Thu, 2016-05-05 at 14:24 -0400, Jeff Layton wrote:
> LAYOUTRETURN is "special" in that servers and clients are expected to
> work with old stateids. When the client sends a LAYOUTRETURN with an old
> stateid in it then the server is expected to only tear down layout
> segments that were present when that seqid was current. Ensure that the
> client handles its accounting accordingly.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/callback_proc.c |  3 ++-
> fs/nfs/nfs42proc.c     |  2 +-
> fs/nfs/nfs4proc.c      |  5 +++--
> fs/nfs/pnfs.c          | 61 +++++++++++++++++++++++++++++++++-----------------
> fs/nfs/pnfs.h          |  3 ++-
> 5 files changed, 48 insertions(+), 26 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 618ced381a14..755838df9996 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -217,7 +217,8 @@ static u32 initiate_file_draining(struct nfs_client *clp,
> }
>
> if (pnfs_mark_matching_lsegs_return(lo, &free_me_list,
> - &args->cbl_range)) {
> + &args->cbl_range,
> + be32_to_cpu(args->cbl_stateid.seqid))) {
> rv = NFS4_OK;
> goto unlock;
> }
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index dff83460e5a6..198bcc3e103d 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -232,7 +232,7 @@ nfs42_layoutstat_done(struct rpc_task *task, void *calldata)
> * with the current stateid.
> */
> set_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags);
> - pnfs_mark_matching_lsegs_invalid(lo, &head, NULL);
> + pnfs_mark_matching_lsegs_invalid(lo, &head, NULL, 0);
> spin_unlock(&inode->i_lock);
> pnfs_free_lseg_list(&head);
> } else
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 327b8c34d360..c0645b53c4bc 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7926,7 +7926,7 @@ static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
> * with the current stateid.
> */
> set_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags);
> - pnfs_mark_matching_lsegs_invalid(lo, &head, NULL);
> + pnfs_mark_matching_lsegs_invalid(lo, &head, NULL, 0);
> spin_unlock(&inode->i_lock);
> pnfs_free_lseg_list(&head);
> } else
> @@ -8118,7 +8118,8 @@ static void nfs4_layoutreturn_release(void *calldata)
>
> dprintk("--> %s\n", __func__);
> spin_lock(&lo->plh_inode->i_lock);
> - pnfs_mark_matching_lsegs_invalid(lo, &freeme, &lrp->args.range);
> + pnfs_mark_matching_lsegs_invalid(lo, &freeme, &lrp->args.range,
> + be32_to_cpu(lrp->args.stateid.seqid));
> pnfs_mark_layout_returned_if_empty(lo);
> if (lrp->res.lrs_present)
> pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 12016ed54e1d..dc75db4c4849 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -270,7 +270,7 @@ pnfs_mark_layout_stateid_invalid(struct pnfs_layout_hdr *lo,
> };
>
> set_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags);
> - return pnfs_mark_matching_lsegs_invalid(lo, lseg_list, &range);
> + return pnfs_mark_matching_lsegs_invalid(lo, lseg_list, &range, 0);
> }
>
> static int
> @@ -308,7 +308,7 @@ pnfs_layout_io_set_failed(struct pnfs_layout_hdr *lo, u32 iomode)
>
> spin_lock(&inode->i_lock);
> pnfs_layout_set_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode));
> - pnfs_mark_matching_lsegs_invalid(lo, &head, &range);
> + pnfs_mark_matching_lsegs_invalid(lo, &head, &range, 0);
> spin_unlock(&inode->i_lock);
> pnfs_free_lseg_list(&head);
> dprintk("%s Setting layout IOMODE_%s fail bit\n", __func__,
> @@ -522,13 +522,35 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
> return rv;
> }
>
> -/* Returns count of number of matching invalid lsegs remaining in list
> - * after call.
> +/*
> + * Compare 2 layout stateid sequence ids, to see which is newer,
> + * taking into account wraparound issues.
> + */
> +static bool pnfs_seqid_is_newer(u32 s1, u32 s2)
> +{
> + return (s32)(s1 - s2) > 0;
> +}
> +
> +/**
> + * pnfs_mark_matching_lsegs_invalid - tear down lsegs or mark them for later
> + * @lo: layout header containing the lsegs
> + * @tmp_list: list head where doomed lsegs should go
> + * @recall_range: optional recall range argument to match (may be NULL)
> + * @seq: only invalidate lsegs obtained prior to this sequence (may be 0)
> + *
> + * Walk the list of lsegs in the layout header, and tear down any that should
> + * be destroyed. If "recall_range" is specified then the segment must match
> + * that range. If "seq" is non-zero, then only match segments that were handed
> + * out at or before that sequence.
> + *
> + * Returns number of matching invalid lsegs remaining in list after scanning
> + * it and purging them.
> */
> int
> pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
>     struct list_head *tmp_list,
> -     const struct pnfs_layout_range *recall_range)
> +     const struct pnfs_layout_range *recall_range,
> +     u32 seq)
> {
> struct pnfs_layout_segment *lseg, *next;
> int remaining = 0;
> @@ -540,10 +562,12 @@ pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
> list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list)
> if (!recall_range ||
>     should_free_lseg(&lseg->pls_range, recall_range)) {
> - dprintk("%s: freeing lseg %p iomode %d "
> + if (seq && pnfs_seqid_is_newer(lseg->pls_seq, seq))
> + continue;
> + dprintk("%s: freeing lseg %p iomode %d seq %u"
> "offset %llu length %llu\n", __func__,
> - lseg, lseg->pls_range.iomode, lseg->pls_range.offset,
> - lseg->pls_range.length);
> + lseg, lseg->pls_range.iomode, lseg->pls_seq,
> + lseg->pls_range.offset, lseg->pls_range.length);
> if (!mark_lseg_invalid(lseg, tmp_list))
> remaining++;
> }
> @@ -730,15 +754,6 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
> pnfs_destroy_layouts_byclid(clp, false);
> }
>
> -/*
> - * Compare 2 layout stateid sequence ids, to see which is newer,
> - * taking into account wraparound issues.
> - */
> -static bool pnfs_seqid_is_newer(u32 s1, u32 s2)
> -{
> - return (s32)(s1 - s2) > 0;
> -}
> -
> /* update lo->plh_stateid with new if is more recent */
> void
> pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo, const nfs4_stateid *new,
> @@ -1014,7 +1029,7 @@ _pnfs_return_layout(struct inode *ino)
> pnfs_get_layout_hdr(lo);
> empty = list_empty(&lo->plh_segs);
> pnfs_clear_layoutcommit(ino, &tmp_list);
> - pnfs_mark_matching_lsegs_invalid(lo, &tmp_list, NULL);
> + pnfs_mark_matching_lsegs_invalid(lo, &tmp_list, NULL, 0);
>
> if (NFS_SERVER(ino)->pnfs_curr_ld->return_range) {
> struct pnfs_layout_range range = {
> @@ -1721,7 +1736,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
> * inode invalid, and don't bother validating the stateid
> * sequence number.
> */
> - pnfs_mark_matching_lsegs_invalid(lo, &free_me, NULL);
> + pnfs_mark_matching_lsegs_invalid(lo, &free_me, NULL, 0);
>
> nfs4_stateid_copy(&lo->plh_stateid, &res->stateid);
> lo->plh_barrier = be32_to_cpu(res->stateid.seqid);
> @@ -1775,7 +1790,8 @@ pnfs_set_plh_return_info(struct pnfs_layout_hdr *lo, enum pnfs_iomode iomode,
> int
> pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
> struct list_head *tmp_list,
> - const struct pnfs_layout_range *return_range)
> + const struct pnfs_layout_range *return_range,
> + u32 seq)
> {
> struct pnfs_layout_segment *lseg, *next;
> int remaining = 0;
> @@ -1798,8 +1814,11 @@ pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
> continue;
> remaining++;
> set_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags);
> - pnfs_set_plh_return_info(lo, return_range->iomode, lseg->pls_seq);
> }
> +
> + if (remaining)
> + pnfs_set_plh_return_info(lo, return_range->iomode, seq);
> +
> return remaining;
> }
>
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 47042084888c..bb68d354255e 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -266,7 +266,8 @@ int pnfs_choose_layoutget_stateid(nfs4_stateid *dst,
>   struct nfs4_state *open_state);
> int pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
> struct list_head *tmp_list,
> - const struct pnfs_layout_range *recall_range);
> + const struct pnfs_layout_range *recall_range,
> + u32 seq);
> int pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
> struct list_head *tmp_list,
> const struct pnfs_layout_range *recall_range);

Oops, I sent an earlier version of this set that didn't have a needed
fix. Must also fix the prototype for
pnfs_mark_matching_lsegs_return here, and the caller
in pnfs_error_mark_layout_for_return to pass in lseg->pls_seq for
"seq".

--
Jeff Layton <[email protected]>