Hi,
This patchset introduces the Flexfile Layout Module for the
client.
It corresponds to draft 4
(http://tools.ietf.org/id/draft-ietf-nfsv4-flex-files-02.txt)
of the Parallel NFS (pNFS) Flexible File Layout
(https://datatracker.ietf.org/doc/draft-ietf-nfsv4-flex-files/).
This version fixes the following review comments:
1) XDR should be for Draft 4 and not Draft 2.
2) Can you use get_nfs_version() / put_nfs_version() here rather than
exposing nfs_v3 to the entire client? If this *has* to be
global then please put it in nfs3_fs.h.
3) Can we consolidate "nfs/flexfiles: send layoutreturn before freeing lseg"
and "nfs/flexfiles: defer sending layoutreturn in pnfs_put_lseg"
to remove some temporary hard to read code?
Thanks,
Tom
Peng Tao (35):
nfs41: pull data server cache from file layout to generic pnfs
nfs41: pull nfs4_ds_connect from file layout to generic pnfs
nfs41: pull decode_ds_addr from file layout to generic pnfs
nfs41: allow LD to choose DS connection auth flavor
nfs41: move file layout macros to generic pnfs
nfsv3: introduce nfs3_set_ds_client
nfs41: allow LD to choose DS connection version/minor_version
nfs41: create NFSv3 DS connection if specified
nfs: allow different protocol in nfs_initiate_commit
nfs4: pass slot table to nfs40_setup_sequence
nfs4: export nfs4_sequence_done
nfs: allow to specify cred in nfs_initiate_pgio
nfs: set hostname when creating nfsv3 ds connection
nfs/flexclient: export pnfs_layoutcommit_inode
nfs41: close a small race window when adding new layout to global list
nfs41: serialize first layoutget of a file
nfs: save server READ/WRITE/COMMIT status
nfs41: pass iomode through layoutreturn args
nfs41: make a helper function to send layoutreturn
nfs41: add a helper to mark layout for return
nfs41: don't use a layout if it is marked for returning
nfs41: send layoutreturn in last put_lseg
nfs41: clear NFS_LAYOUT_RETURN if layoutreturn is sent or failed to
send
nfs/filelayout: use pnfs_error_mark_layout_for_return
nfs41: add a debug warning if we destroy an unempty layout
nfs: only reset desc->pg_mirror_idx when mirroring is supported
nfs: add nfs_pgio_current_mirror helper
pnfs: allow LD to ask to resend read through pnfs
nfs41: add range to layoutreturn args
nfs41: allow async version layoutreturn
nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
nfs/flexfiles: send layoutreturn before freeing lseg
nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
Tom Haynes (4):
pnfs: Prepare for flexfiles by pulling out common code
pnfs: Do not grab the commit_info lock twice when rescheduling writes
pnfs: Add nfs_rpc_ops in calls to nfs_initiate_pgio
pnfs/flexfiles: Add the FlexFile Layout Driver
Trond Myklebust (1):
NFSv4.1/NFSv3: Add pNFS callbacks for nfs3_(read|write|commit)_done()
Weston Andros Adamson (9):
sunrpc: add rpc_count_iostats_idx
nfs: introduce pg_cleanup op for pgio descriptors
pnfs: release lseg in pnfs_generic_pg_cleanup
nfs: handle overlapping reqs in lock_and_join
nfs: rename pgio header ds_idx to ds_commit_idx
pnfs: pass ds_commit_idx through the commit path
nfs: add mirroring support to pgio layer
nfs: mirroring support for direct io
pnfs: fail comparison when bucket verifier not set
fs/nfs/Kconfig | 5 +
fs/nfs/Makefile | 3 +-
fs/nfs/blocklayout/blocklayout.c | 2 +
fs/nfs/direct.c | 108 +-
fs/nfs/filelayout/filelayout.c | 315 +-----
fs/nfs/filelayout/filelayout.h | 40 -
fs/nfs/filelayout/filelayoutdev.c | 469 +--------
fs/nfs/flexfilelayout/Makefile | 5 +
fs/nfs/flexfilelayout/flexfilelayout.c | 1600 +++++++++++++++++++++++++++++
fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
fs/nfs/internal.h | 31 +-
fs/nfs/nfs3_fs.h | 3 +-
fs/nfs/nfs3client.c | 41 +
fs/nfs/nfs3proc.c | 9 +
fs/nfs/nfs3super.c | 2 +-
fs/nfs/nfs3xdr.c | 3 +
fs/nfs/nfs4_fs.h | 6 +
fs/nfs/nfs4client.c | 7 +-
fs/nfs/nfs4proc.c | 45 +-
fs/nfs/nfs4xdr.c | 9 +-
fs/nfs/objlayout/objio_osd.c | 5 +-
fs/nfs/pagelist.c | 294 +++++-
fs/nfs/pnfs.c | 407 ++++++--
fs/nfs/pnfs.h | 119 ++-
fs/nfs/pnfs_dev.c | 522 ++++++++++
fs/nfs/pnfs_nfsio.c | 283 +++++
fs/nfs/read.c | 33 +-
fs/nfs/write.c | 49 +-
include/linux/nfs4.h | 1 +
include/linux/nfs_page.h | 22 +-
include/linux/nfs_xdr.h | 6 +-
include/linux/sunrpc/metrics.h | 2 +
net/sunrpc/stats.c | 26 +-
34 files changed, 4182 insertions(+), 1000 deletions(-)
create mode 100644 fs/nfs/flexfilelayout/Makefile
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
create mode 100644 fs/nfs/pnfs_nfsio.c
--
1.9.3
The flexfilelayout driver will share some common code
with the filelayout driver. This set of changes refactors
that common code out to avoid any module depenencies.
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/Makefile | 2 +-
fs/nfs/filelayout/filelayout.c | 291 ++------------------------------------
fs/nfs/filelayout/filelayout.h | 11 --
fs/nfs/filelayout/filelayoutdev.c | 2 +-
fs/nfs/pnfs.h | 23 +++
fs/nfs/pnfs_nfsio.c | 291 ++++++++++++++++++++++++++++++++++++++
6 files changed, 330 insertions(+), 290 deletions(-)
create mode 100644 fs/nfs/pnfs_nfsio.c
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 04cb830..7973c4e3 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -27,7 +27,7 @@ nfsv4-y := nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o nfs4super.o nfs4file.o
dns_resolve.o nfs4trace.o
nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
nfsv4-$(CONFIG_SYSCTL) += nfs4sysctl.o
-nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o
+nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o pnfs_nfsio.o
nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 7afb52f..bc36ed3 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -118,13 +118,6 @@ static void filelayout_reset_read(struct nfs_pgio_header *hdr)
}
}
-static void filelayout_fenceme(struct inode *inode, struct pnfs_layout_hdr *lo)
-{
- if (!test_and_clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
- return;
- pnfs_return_layout(inode);
-}
-
static int filelayout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
@@ -339,16 +332,6 @@ static void filelayout_read_count_stats(struct rpc_task *task, void *data)
rpc_count_iostats(task, NFS_SERVER(hdr->inode)->client->cl_metrics);
}
-static void filelayout_read_release(void *data)
-{
- struct nfs_pgio_header *hdr = data;
- struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
-
- filelayout_fenceme(lo->plh_inode, lo);
- nfs_put_client(hdr->ds_clp);
- hdr->mds_ops->rpc_release(data);
-}
-
static int filelayout_write_done_cb(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
@@ -371,17 +354,6 @@ static int filelayout_write_done_cb(struct rpc_task *task,
return 0;
}
-/* Fake up some data that will cause nfs_commit_release to retry the writes. */
-static void prepare_to_resend_writes(struct nfs_commit_data *data)
-{
- struct nfs_page *first = nfs_list_entry(data->pages.next);
-
- data->task.tk_status = 0;
- memcpy(&data->verf.verifier, &first->wb_verf,
- sizeof(data->verf.verifier));
- data->verf.verifier.data[0]++; /* ensure verifier mismatch */
-}
-
static int filelayout_commit_done_cb(struct rpc_task *task,
struct nfs_commit_data *data)
{
@@ -393,7 +365,7 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
switch (err) {
case -NFS4ERR_RESET_TO_MDS:
- prepare_to_resend_writes(data);
+ pnfs_generic_prepare_to_resend_writes(data);
return -EAGAIN;
case -EAGAIN:
rpc_restart_call_prepare(task);
@@ -451,16 +423,6 @@ static void filelayout_write_count_stats(struct rpc_task *task, void *data)
rpc_count_iostats(task, NFS_SERVER(hdr->inode)->client->cl_metrics);
}
-static void filelayout_write_release(void *data)
-{
- struct nfs_pgio_header *hdr = data;
- struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
-
- filelayout_fenceme(lo->plh_inode, lo);
- nfs_put_client(hdr->ds_clp);
- hdr->mds_ops->rpc_release(data);
-}
-
static void filelayout_commit_prepare(struct rpc_task *task, void *data)
{
struct nfs_commit_data *wdata = data;
@@ -471,14 +433,6 @@ static void filelayout_commit_prepare(struct rpc_task *task, void *data)
task);
}
-static void filelayout_write_commit_done(struct rpc_task *task, void *data)
-{
- struct nfs_commit_data *wdata = data;
-
- /* Note this may cause RPC to be resent */
- wdata->mds_ops->rpc_call_done(task, data);
-}
-
static void filelayout_commit_count_stats(struct rpc_task *task, void *data)
{
struct nfs_commit_data *cdata = data;
@@ -486,35 +440,25 @@ static void filelayout_commit_count_stats(struct rpc_task *task, void *data)
rpc_count_iostats(task, NFS_SERVER(cdata->inode)->client->cl_metrics);
}
-static void filelayout_commit_release(void *calldata)
-{
- struct nfs_commit_data *data = calldata;
-
- data->completion_ops->completion(data);
- pnfs_put_lseg(data->lseg);
- nfs_put_client(data->ds_clp);
- nfs_commitdata_release(data);
-}
-
static const struct rpc_call_ops filelayout_read_call_ops = {
.rpc_call_prepare = filelayout_read_prepare,
.rpc_call_done = filelayout_read_call_done,
.rpc_count_stats = filelayout_read_count_stats,
- .rpc_release = filelayout_read_release,
+ .rpc_release = pnfs_generic_rw_release,
};
static const struct rpc_call_ops filelayout_write_call_ops = {
.rpc_call_prepare = filelayout_write_prepare,
.rpc_call_done = filelayout_write_call_done,
.rpc_count_stats = filelayout_write_count_stats,
- .rpc_release = filelayout_write_release,
+ .rpc_release = pnfs_generic_rw_release,
};
static const struct rpc_call_ops filelayout_commit_call_ops = {
.rpc_call_prepare = filelayout_commit_prepare,
- .rpc_call_done = filelayout_write_commit_done,
+ .rpc_call_done = pnfs_generic_write_commit_done,
.rpc_count_stats = filelayout_commit_count_stats,
- .rpc_release = filelayout_commit_release,
+ .rpc_release = pnfs_generic_commit_release,
};
static enum pnfs_try_status
@@ -1004,33 +948,6 @@ static u32 select_bucket_index(struct nfs4_filelayout_segment *fl, u32 j)
return j;
}
-/* The generic layer is about to remove the req from the commit list.
- * If this will make the bucket empty, it will need to put the lseg reference.
- * Note this is must be called holding the inode (/cinfo) lock
- */
-static void
-filelayout_clear_request_commit(struct nfs_page *req,
- struct nfs_commit_info *cinfo)
-{
- struct pnfs_layout_segment *freeme = NULL;
-
- if (!test_and_clear_bit(PG_COMMIT_TO_DS, &req->wb_flags))
- goto out;
- cinfo->ds->nwritten--;
- if (list_is_singular(&req->wb_list)) {
- struct pnfs_commit_bucket *bucket;
-
- bucket = list_first_entry(&req->wb_list,
- struct pnfs_commit_bucket,
- written);
- freeme = bucket->wlseg;
- bucket->wlseg = NULL;
- }
-out:
- nfs_request_remove_commit_list(req, cinfo);
- pnfs_put_lseg_locked(freeme);
-}
-
static void
filelayout_mark_request_commit(struct nfs_page *req,
struct pnfs_layout_segment *lseg,
@@ -1064,7 +981,7 @@ filelayout_mark_request_commit(struct nfs_page *req,
* is normally transferred to the COMMIT call and released
* there. It could also be released if the last req is pulled
* off due to a rewrite, in which case it will be done in
- * filelayout_clear_request_commit
+ * pnfs_generic_clear_request_commit
*/
buckets[i].wlseg = pnfs_get_lseg(lseg);
}
@@ -1142,97 +1059,11 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
&filelayout_commit_call_ops, how,
RPC_TASK_SOFTCONN);
out_err:
- prepare_to_resend_writes(data);
- filelayout_commit_release(data);
+ pnfs_generic_prepare_to_resend_writes(data);
+ pnfs_generic_commit_release(data);
return -EAGAIN;
}
-static int
-transfer_commit_list(struct list_head *src, struct list_head *dst,
- struct nfs_commit_info *cinfo, int max)
-{
- struct nfs_page *req, *tmp;
- int ret = 0;
-
- list_for_each_entry_safe(req, tmp, src, wb_list) {
- if (!nfs_lock_request(req))
- continue;
- kref_get(&req->wb_kref);
- if (cond_resched_lock(cinfo->lock))
- list_safe_reset_next(req, tmp, wb_list);
- nfs_request_remove_commit_list(req, cinfo);
- clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
- nfs_list_add_request(req, dst);
- ret++;
- if ((ret == max) && !cinfo->dreq)
- break;
- }
- return ret;
-}
-
-/* Note called with cinfo->lock held. */
-static int
-filelayout_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
- struct nfs_commit_info *cinfo,
- int max)
-{
- struct list_head *src = &bucket->written;
- struct list_head *dst = &bucket->committing;
- int ret;
-
- ret = transfer_commit_list(src, dst, cinfo, max);
- if (ret) {
- cinfo->ds->nwritten -= ret;
- cinfo->ds->ncommitting += ret;
- bucket->clseg = bucket->wlseg;
- if (list_empty(src))
- bucket->wlseg = NULL;
- else
- pnfs_get_lseg(bucket->clseg);
- }
- return ret;
-}
-
-/* Move reqs from written to committing lists, returning count of number moved.
- * Note called with cinfo->lock held.
- */
-static int filelayout_scan_commit_lists(struct nfs_commit_info *cinfo,
- int max)
-{
- int i, rv = 0, cnt;
-
- for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
- cnt = filelayout_scan_ds_commit_list(&cinfo->ds->buckets[i],
- cinfo, max);
- max -= cnt;
- rv += cnt;
- }
- return rv;
-}
-
-/* Pull everything off the committing lists and dump into @dst */
-static void filelayout_recover_commit_reqs(struct list_head *dst,
- struct nfs_commit_info *cinfo)
-{
- struct pnfs_commit_bucket *b;
- struct pnfs_layout_segment *freeme;
- int i;
-
-restart:
- spin_lock(cinfo->lock);
- for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
- if (transfer_commit_list(&b->written, dst, cinfo, 0)) {
- freeme = b->wlseg;
- b->wlseg = NULL;
- spin_unlock(cinfo->lock);
- pnfs_put_lseg(freeme);
- goto restart;
- }
- }
- cinfo->ds->nwritten = 0;
- spin_unlock(cinfo->lock);
-}
-
/* filelayout_search_commit_reqs - Search lists in @cinfo for the head reqest
* for @page
* @cinfo - commit info for current inode
@@ -1263,108 +1094,14 @@ filelayout_search_commit_reqs(struct nfs_commit_info *cinfo, struct page *page)
return NULL;
}
-static void filelayout_retry_commit(struct nfs_commit_info *cinfo, int idx)
-{
- struct pnfs_ds_commit_info *fl_cinfo = cinfo->ds;
- struct pnfs_commit_bucket *bucket;
- struct pnfs_layout_segment *freeme;
- int i;
-
- for (i = idx; i < fl_cinfo->nbuckets; i++) {
- bucket = &fl_cinfo->buckets[i];
- if (list_empty(&bucket->committing))
- continue;
- nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
- spin_lock(cinfo->lock);
- freeme = bucket->clseg;
- bucket->clseg = NULL;
- spin_unlock(cinfo->lock);
- pnfs_put_lseg(freeme);
- }
-}
-
-static unsigned int
-alloc_ds_commits(struct nfs_commit_info *cinfo, struct list_head *list)
-{
- struct pnfs_ds_commit_info *fl_cinfo;
- struct pnfs_commit_bucket *bucket;
- struct nfs_commit_data *data;
- int i;
- unsigned int nreq = 0;
-
- fl_cinfo = cinfo->ds;
- bucket = fl_cinfo->buckets;
- for (i = 0; i < fl_cinfo->nbuckets; i++, bucket++) {
- if (list_empty(&bucket->committing))
- continue;
- data = nfs_commitdata_alloc();
- if (!data)
- break;
- data->ds_commit_index = i;
- spin_lock(cinfo->lock);
- data->lseg = bucket->clseg;
- bucket->clseg = NULL;
- spin_unlock(cinfo->lock);
- list_add(&data->pages, list);
- nreq++;
- }
-
- /* Clean up on error */
- filelayout_retry_commit(cinfo, i);
- /* Caller will clean up entries put on list */
- return nreq;
-}
-
-/* This follows nfs_commit_list pretty closely */
static int
filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
int how, struct nfs_commit_info *cinfo)
{
- struct nfs_commit_data *data, *tmp;
- LIST_HEAD(list);
- unsigned int nreq = 0;
-
- if (!list_empty(mds_pages)) {
- data = nfs_commitdata_alloc();
- if (data != NULL) {
- data->lseg = NULL;
- list_add(&data->pages, &list);
- nreq++;
- } else {
- nfs_retry_commit(mds_pages, NULL, cinfo);
- filelayout_retry_commit(cinfo, 0);
- cinfo->completion_ops->error_cleanup(NFS_I(inode));
- return -ENOMEM;
- }
- }
-
- nreq += alloc_ds_commits(cinfo, &list);
-
- if (nreq == 0) {
- cinfo->completion_ops->error_cleanup(NFS_I(inode));
- goto out;
- }
-
- atomic_add(nreq, &cinfo->mds->rpcs_out);
-
- list_for_each_entry_safe(data, tmp, &list, pages) {
- list_del_init(&data->pages);
- if (!data->lseg) {
- nfs_init_commit(data, mds_pages, NULL, cinfo);
- nfs_initiate_commit(NFS_CLIENT(inode), data,
- data->mds_ops, how, 0);
- } else {
- struct pnfs_commit_bucket *buckets;
-
- buckets = cinfo->ds->buckets;
- nfs_init_commit(data, &buckets[data->ds_commit_index].committing, data->lseg, cinfo);
- filelayout_initiate_commit(data, how);
- }
- }
-out:
- cinfo->ds->ncommitting = 0;
- return PNFS_ATTEMPTED;
+ return pnfs_generic_commit_pagelist(inode, mds_pages, how, cinfo,
+ filelayout_initiate_commit);
}
+
static struct nfs4_deviceid_node *
filelayout_alloc_deviceid_node(struct nfs_server *server,
struct pnfs_device *pdev, gfp_t gfp_flags)
@@ -1421,9 +1158,9 @@ static struct pnfs_layoutdriver_type filelayout_type = {
.pg_write_ops = &filelayout_pg_write_ops,
.get_ds_info = &filelayout_get_ds_info,
.mark_request_commit = filelayout_mark_request_commit,
- .clear_request_commit = filelayout_clear_request_commit,
- .scan_commit_lists = filelayout_scan_commit_lists,
- .recover_commit_reqs = filelayout_recover_commit_reqs,
+ .clear_request_commit = pnfs_generic_clear_request_commit,
+ .scan_commit_lists = pnfs_generic_scan_commit_lists,
+ .recover_commit_reqs = pnfs_generic_recover_commit_reqs,
.search_commit_reqs = filelayout_search_commit_reqs,
.commit_pagelist = filelayout_commit_pagelist,
.read_pagelist = filelayout_read_pagelist,
diff --git a/fs/nfs/filelayout/filelayout.h b/fs/nfs/filelayout/filelayout.h
index 7c9f800..a5ce9b4 100644
--- a/fs/nfs/filelayout/filelayout.h
+++ b/fs/nfs/filelayout/filelayout.h
@@ -119,17 +119,6 @@ FILELAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg)
return &FILELAYOUT_LSEG(lseg)->dsaddr->id_node;
}
-static inline void
-filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
-{
- u32 *p = (u32 *)&node->deviceid;
-
- printk(KERN_WARNING "NFS: Deviceid [%x%x%x%x] marked out of use.\n",
- p[0], p[1], p[2], p[3]);
-
- set_bit(NFS_DEVICEID_INVALID, &node->flags);
-}
-
static inline bool
filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
{
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index bfecac7..d21080a 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -708,7 +708,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
if (ds == NULL) {
printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
__func__, ds_idx);
- filelayout_mark_devid_invalid(devid);
+ pnfs_generic_mark_devid_invalid(devid);
goto out;
}
smp_rmb();
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9ae5b76..88eede0 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -275,6 +275,23 @@ void nfs4_mark_deviceid_unavailable(struct nfs4_deviceid_node *node);
bool nfs4_test_deviceid_unavailable(struct nfs4_deviceid_node *node);
void nfs4_deviceid_purge_client(const struct nfs_client *);
+/* pnfs_nfsio.c */
+void pnfs_generic_clear_request_commit(struct nfs_page *req,
+ struct nfs_commit_info *cinfo);
+void pnfs_generic_commit_release(void *calldata);
+void pnfs_generic_prepare_to_resend_writes(struct nfs_commit_data *data);
+void pnfs_generic_rw_release(void *data);
+void pnfs_generic_recover_commit_reqs(struct list_head *dst,
+ struct nfs_commit_info *cinfo);
+int pnfs_generic_commit_pagelist(struct inode *inode,
+ struct list_head *mds_pages,
+ int how,
+ struct nfs_commit_info *cinfo,
+ int (*initiate_commit)(struct nfs_commit_data *data,
+ int how));
+int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo, int max);
+void pnfs_generic_write_commit_done(struct rpc_task *task, void *data);
+
static inline struct nfs4_deviceid_node *
nfs4_get_deviceid(struct nfs4_deviceid_node *d)
{
@@ -317,6 +334,12 @@ pnfs_get_ds_info(struct inode *inode)
return ld->get_ds_info(inode);
}
+static inline void
+pnfs_generic_mark_devid_invalid(struct nfs4_deviceid_node *node)
+{
+ set_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
static inline bool
pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
struct nfs_commit_info *cinfo)
diff --git a/fs/nfs/pnfs_nfsio.c b/fs/nfs/pnfs_nfsio.c
new file mode 100644
index 0000000..e5f841c
--- /dev/null
+++ b/fs/nfs/pnfs_nfsio.c
@@ -0,0 +1,291 @@
+/*
+ * Common NFS I/O operations for the pnfs file based
+ * layout drivers.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tom Haynes <[email protected]>
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
+
+#include "internal.h"
+#include "pnfs.h"
+
+static void pnfs_generic_fenceme(struct inode *inode,
+ struct pnfs_layout_hdr *lo)
+{
+ if (!test_and_clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
+ return;
+ pnfs_return_layout(inode);
+}
+
+void pnfs_generic_rw_release(void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+ struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
+
+ pnfs_generic_fenceme(lo->plh_inode, lo);
+ nfs_put_client(hdr->ds_clp);
+ hdr->mds_ops->rpc_release(data);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_rw_release);
+
+/* Fake up some data that will cause nfs_commit_release to retry the writes. */
+void pnfs_generic_prepare_to_resend_writes(struct nfs_commit_data *data)
+{
+ struct nfs_page *first = nfs_list_entry(data->pages.next);
+
+ data->task.tk_status = 0;
+ memcpy(&data->verf.verifier, &first->wb_verf,
+ sizeof(data->verf.verifier));
+ data->verf.verifier.data[0]++; /* ensure verifier mismatch */
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_prepare_to_resend_writes);
+
+void pnfs_generic_write_commit_done(struct rpc_task *task, void *data)
+{
+ struct nfs_commit_data *wdata = data;
+
+ /* Note this may cause RPC to be resent */
+ wdata->mds_ops->rpc_call_done(task, data);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_write_commit_done);
+
+void pnfs_generic_commit_release(void *calldata)
+{
+ struct nfs_commit_data *data = calldata;
+
+ data->completion_ops->completion(data);
+ pnfs_put_lseg(data->lseg);
+ nfs_put_client(data->ds_clp);
+ nfs_commitdata_release(data);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_commit_release);
+
+/* The generic layer is about to remove the req from the commit list.
+ * If this will make the bucket empty, it will need to put the lseg reference.
+ * Note this is must be called holding the inode (/cinfo) lock
+ */
+void
+pnfs_generic_clear_request_commit(struct nfs_page *req,
+ struct nfs_commit_info *cinfo)
+{
+ struct pnfs_layout_segment *freeme = NULL;
+
+ if (!test_and_clear_bit(PG_COMMIT_TO_DS, &req->wb_flags))
+ goto out;
+ cinfo->ds->nwritten--;
+ if (list_is_singular(&req->wb_list)) {
+ struct pnfs_commit_bucket *bucket;
+
+ bucket = list_first_entry(&req->wb_list,
+ struct pnfs_commit_bucket,
+ written);
+ freeme = bucket->wlseg;
+ bucket->wlseg = NULL;
+ }
+out:
+ nfs_request_remove_commit_list(req, cinfo);
+ pnfs_put_lseg_locked(freeme);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_clear_request_commit);
+
+static int
+pnfs_generic_transfer_commit_list(struct list_head *src, struct list_head *dst,
+ struct nfs_commit_info *cinfo, int max)
+{
+ struct nfs_page *req, *tmp;
+ int ret = 0;
+
+ list_for_each_entry_safe(req, tmp, src, wb_list) {
+ if (!nfs_lock_request(req))
+ continue;
+ kref_get(&req->wb_kref);
+ if (cond_resched_lock(cinfo->lock))
+ list_safe_reset_next(req, tmp, wb_list);
+ nfs_request_remove_commit_list(req, cinfo);
+ clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
+ nfs_list_add_request(req, dst);
+ ret++;
+ if ((ret == max) && !cinfo->dreq)
+ break;
+ }
+ return ret;
+}
+
+/* Note called with cinfo->lock held. */
+static int
+pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
+ struct nfs_commit_info *cinfo,
+ int max)
+{
+ struct list_head *src = &bucket->written;
+ struct list_head *dst = &bucket->committing;
+ int ret;
+
+ ret = pnfs_generic_transfer_commit_list(src, dst, cinfo, max);
+ if (ret) {
+ cinfo->ds->nwritten -= ret;
+ cinfo->ds->ncommitting += ret;
+ bucket->clseg = bucket->wlseg;
+ if (list_empty(src))
+ bucket->wlseg = NULL;
+ else
+ pnfs_get_lseg(bucket->clseg);
+ }
+ return ret;
+}
+
+/* Move reqs from written to committing lists, returning count of number moved.
+ * Note called with cinfo->lock held.
+ */
+int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
+ int max)
+{
+ int i, rv = 0, cnt;
+
+ for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
+ cnt = pnfs_generic_scan_ds_commit_list(&cinfo->ds->buckets[i],
+ cinfo, max);
+ max -= cnt;
+ rv += cnt;
+ }
+ return rv;
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_scan_commit_lists);
+
+/* Pull everything off the committing lists and dump into @dst */
+void pnfs_generic_recover_commit_reqs(struct list_head *dst,
+ struct nfs_commit_info *cinfo)
+{
+ struct pnfs_commit_bucket *b;
+ struct pnfs_layout_segment *freeme;
+ int i;
+
+restart:
+ spin_lock(cinfo->lock);
+ for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
+ if (pnfs_generic_transfer_commit_list(&b->written, dst,
+ cinfo, 0)) {
+ freeme = b->wlseg;
+ b->wlseg = NULL;
+ spin_unlock(cinfo->lock);
+ pnfs_put_lseg(freeme);
+ goto restart;
+ }
+ }
+ cinfo->ds->nwritten = 0;
+ spin_unlock(cinfo->lock);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_recover_commit_reqs);
+
+static void pnfs_generic_retry_commit(struct nfs_commit_info *cinfo, int idx)
+{
+ struct pnfs_ds_commit_info *fl_cinfo = cinfo->ds;
+ struct pnfs_commit_bucket *bucket;
+ struct pnfs_layout_segment *freeme;
+ int i;
+
+ for (i = idx; i < fl_cinfo->nbuckets; i++) {
+ bucket = &fl_cinfo->buckets[i];
+ if (list_empty(&bucket->committing))
+ continue;
+ nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
+ spin_lock(cinfo->lock);
+ freeme = bucket->clseg;
+ bucket->clseg = NULL;
+ spin_unlock(cinfo->lock);
+ pnfs_put_lseg(freeme);
+ }
+}
+
+static unsigned int
+pnfs_generic_alloc_ds_commits(struct nfs_commit_info *cinfo,
+ struct list_head *list)
+{
+ struct pnfs_ds_commit_info *fl_cinfo;
+ struct pnfs_commit_bucket *bucket;
+ struct nfs_commit_data *data;
+ int i;
+ unsigned int nreq = 0;
+
+ fl_cinfo = cinfo->ds;
+ bucket = fl_cinfo->buckets;
+ for (i = 0; i < fl_cinfo->nbuckets; i++, bucket++) {
+ if (list_empty(&bucket->committing))
+ continue;
+ data = nfs_commitdata_alloc();
+ if (!data)
+ break;
+ data->ds_commit_index = i;
+ spin_lock(cinfo->lock);
+ data->lseg = bucket->clseg;
+ bucket->clseg = NULL;
+ spin_unlock(cinfo->lock);
+ list_add(&data->pages, list);
+ nreq++;
+ }
+
+ /* Clean up on error */
+ pnfs_generic_retry_commit(cinfo, i);
+ return nreq;
+}
+
+/* This follows nfs_commit_list pretty closely */
+int
+pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
+ int how, struct nfs_commit_info *cinfo,
+ int (*initiate_commit)(struct nfs_commit_data *data,
+ int how))
+{
+ struct nfs_commit_data *data, *tmp;
+ LIST_HEAD(list);
+ unsigned int nreq = 0;
+
+ if (!list_empty(mds_pages)) {
+ data = nfs_commitdata_alloc();
+ if (data != NULL) {
+ data->lseg = NULL;
+ list_add(&data->pages, &list);
+ nreq++;
+ } else {
+ nfs_retry_commit(mds_pages, NULL, cinfo);
+ pnfs_generic_retry_commit(cinfo, 0);
+ cinfo->completion_ops->error_cleanup(NFS_I(inode));
+ return -ENOMEM;
+ }
+ }
+
+ nreq += pnfs_generic_alloc_ds_commits(cinfo, &list);
+
+ if (nreq == 0) {
+ cinfo->completion_ops->error_cleanup(NFS_I(inode));
+ goto out;
+ }
+
+ atomic_add(nreq, &cinfo->mds->rpcs_out);
+
+ list_for_each_entry_safe(data, tmp, &list, pages) {
+ list_del_init(&data->pages);
+ if (!data->lseg) {
+ nfs_init_commit(data, mds_pages, NULL, cinfo);
+ nfs_initiate_commit(NFS_CLIENT(inode), data,
+ data->mds_ops, how, 0);
+ } else {
+ struct pnfs_commit_bucket *buckets;
+
+ buckets = cinfo->ds->buckets;
+ nfs_init_commit(data,
+ &buckets[data->ds_commit_index].committing,
+ data->lseg,
+ cinfo);
+ initiate_commit(data, how);
+ }
+ }
+out:
+ cinfo->ds->ncommitting = 0;
+ return PNFS_ATTEMPTED;
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_commit_pagelist);
--
1.9.3
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/direct.c | 19 +++++++++++++++----
fs/nfs/pnfs.h | 15 ---------------
fs/nfs/pnfs_nfsio.c | 15 ++++++++-------
3 files changed, 23 insertions(+), 26 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 10bf072..e84f764 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -573,6 +573,20 @@ out:
return result;
}
+static void
+nfs_direct_write_scan_commit_list(struct inode *inode,
+ struct list_head *list,
+ struct nfs_commit_info *cinfo)
+{
+ spin_lock(cinfo->lock);
+#ifdef CONFIG_NFS_V4_1
+ if (cinfo->ds != NULL && cinfo->ds->nwritten != 0)
+ NFS_SERVER(inode)->pnfs_curr_ld->recover_commit_reqs(list, cinfo);
+#endif
+ nfs_scan_commit_list(&cinfo->mds->list, list, cinfo, 0);
+ spin_unlock(cinfo->lock);
+}
+
static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
{
struct nfs_pageio_descriptor desc;
@@ -582,10 +596,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
LIST_HEAD(failed);
nfs_init_cinfo_from_dreq(&cinfo, dreq);
- pnfs_recover_commit_reqs(dreq->inode, &reqs, &cinfo);
- spin_lock(cinfo.lock);
- nfs_scan_commit_list(&cinfo.mds->list, &reqs, &cinfo, 0);
- spin_unlock(cinfo.lock);
+ nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo);
dreq->count = 0;
get_dreq(dreq);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 88eede0..f666bc6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -375,15 +375,6 @@ pnfs_scan_commit_lists(struct inode *inode, struct nfs_commit_info *cinfo,
return NFS_SERVER(inode)->pnfs_curr_ld->scan_commit_lists(cinfo, max);
}
-static inline void
-pnfs_recover_commit_reqs(struct inode *inode, struct list_head *list,
- struct nfs_commit_info *cinfo)
-{
- if (cinfo->ds == NULL || cinfo->ds->nwritten == 0)
- return;
- NFS_SERVER(inode)->pnfs_curr_ld->recover_commit_reqs(list, cinfo);
-}
-
static inline struct nfs_page *
pnfs_search_commit_reqs(struct inode *inode, struct nfs_commit_info *cinfo,
struct page *page)
@@ -554,12 +545,6 @@ pnfs_scan_commit_lists(struct inode *inode, struct nfs_commit_info *cinfo,
return 0;
}
-static inline void
-pnfs_recover_commit_reqs(struct inode *inode, struct list_head *list,
- struct nfs_commit_info *cinfo)
-{
-}
-
static inline struct nfs_page *
pnfs_search_commit_reqs(struct inode *inode, struct nfs_commit_info *cinfo,
struct page *page)
diff --git a/fs/nfs/pnfs_nfsio.c b/fs/nfs/pnfs_nfsio.c
index e5f841c..fd2a2f0 100644
--- a/fs/nfs/pnfs_nfsio.c
+++ b/fs/nfs/pnfs_nfsio.c
@@ -66,7 +66,7 @@ EXPORT_SYMBOL_GPL(pnfs_generic_commit_release);
/* The generic layer is about to remove the req from the commit list.
* If this will make the bucket empty, it will need to put the lseg reference.
- * Note this is must be called holding the inode (/cinfo) lock
+ * Note this must be called holding the inode (/cinfo) lock
*/
void
pnfs_generic_clear_request_commit(struct nfs_page *req,
@@ -115,7 +115,6 @@ pnfs_generic_transfer_commit_list(struct list_head *src, struct list_head *dst,
return ret;
}
-/* Note called with cinfo->lock held. */
static int
pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
struct nfs_commit_info *cinfo,
@@ -125,6 +124,7 @@ pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
struct list_head *dst = &bucket->committing;
int ret;
+ lockdep_assert_held(cinfo->lock);
ret = pnfs_generic_transfer_commit_list(src, dst, cinfo, max);
if (ret) {
cinfo->ds->nwritten -= ret;
@@ -138,14 +138,15 @@ pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
return ret;
}
-/* Move reqs from written to committing lists, returning count of number moved.
- * Note called with cinfo->lock held.
+/* Move reqs from written to committing lists, returning count
+ * of number moved.
*/
int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
int max)
{
int i, rv = 0, cnt;
+ lockdep_assert_held(cinfo->lock);
for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
cnt = pnfs_generic_scan_ds_commit_list(&cinfo->ds->buckets[i],
cinfo, max);
@@ -156,7 +157,7 @@ int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
}
EXPORT_SYMBOL_GPL(pnfs_generic_scan_commit_lists);
-/* Pull everything off the committing lists and dump into @dst */
+/* Pull everything off the committing lists and dump into @dst. */
void pnfs_generic_recover_commit_reqs(struct list_head *dst,
struct nfs_commit_info *cinfo)
{
@@ -164,8 +165,8 @@ void pnfs_generic_recover_commit_reqs(struct list_head *dst,
struct pnfs_layout_segment *freeme;
int i;
+ lockdep_assert_held(cinfo->lock);
restart:
- spin_lock(cinfo->lock);
for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
if (pnfs_generic_transfer_commit_list(&b->written, dst,
cinfo, 0)) {
@@ -173,11 +174,11 @@ restart:
b->wlseg = NULL;
spin_unlock(cinfo->lock);
pnfs_put_lseg(freeme);
+ spin_lock(cinfo->lock);
goto restart;
}
}
cinfo->ds->nwritten = 0;
- spin_unlock(cinfo->lock);
}
EXPORT_SYMBOL_GPL(pnfs_generic_recover_commit_reqs);
--
1.9.3
From: Peng Tao <[email protected]>
Also pull nfs4_pnfs_ds_addr and nfs4_pnfs_ds to generic pnfs.
They can all be reused by flexfile layout as well.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.h | 19 ---
fs/nfs/filelayout/filelayoutdev.c | 235 +------------------------------------
fs/nfs/pnfs.h | 21 ++++
fs/nfs/pnfs_dev.c | 240 ++++++++++++++++++++++++++++++++++++++
4 files changed, 263 insertions(+), 252 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.h b/fs/nfs/filelayout/filelayout.h
index a5ce9b4..f97eea6 100644
--- a/fs/nfs/filelayout/filelayout.h
+++ b/fs/nfs/filelayout/filelayout.h
@@ -56,24 +56,6 @@ enum stripetype4 {
STRIPE_DENSE = 2
};
-/* Individual ip address */
-struct nfs4_pnfs_ds_addr {
- struct sockaddr_storage da_addr;
- size_t da_addrlen;
- struct list_head da_node; /* nfs4_pnfs_dev_hlist dev_dslist */
- char *da_remotestr; /* human readable addr+port */
-};
-
-struct nfs4_pnfs_ds {
- struct list_head ds_node; /* nfs4_pnfs_dev_hlist dev_dslist */
- char *ds_remotestr; /* comma sep list of addrs */
- struct list_head ds_addrs;
- struct nfs_client *ds_clp;
- atomic_t ds_count;
- unsigned long ds_state;
-#define NFS4DS_CONNECTING 0 /* ds is establishing connection */
-};
-
struct nfs4_file_layout_dsaddr {
struct nfs4_deviceid_node id_node;
u32 stripe_count;
@@ -131,7 +113,6 @@ filelayout_test_devid_unavailable(struct nfs4_deviceid_node *node);
extern struct nfs_fh *
nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
-extern void print_ds(struct nfs4_pnfs_ds *ds);
u32 nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset);
u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j);
struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index d21080a..fbfbb70 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -43,114 +43,6 @@ static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
/*
- * Data server cache
- *
- * Data servers can be mapped to different device ids.
- * nfs4_pnfs_ds reference counting
- * - set to 1 on allocation
- * - incremented when a device id maps a data server already in the cache.
- * - decremented when deviceid is removed from the cache.
- */
-static DEFINE_SPINLOCK(nfs4_ds_cache_lock);
-static LIST_HEAD(nfs4_data_server_cache);
-
-/* Debug routines */
-void
-print_ds(struct nfs4_pnfs_ds *ds)
-{
- if (ds == NULL) {
- printk("%s NULL device\n", __func__);
- return;
- }
- printk(" ds %s\n"
- " ref count %d\n"
- " client %p\n"
- " cl_exchange_flags %x\n",
- ds->ds_remotestr,
- atomic_read(&ds->ds_count), ds->ds_clp,
- ds->ds_clp ? ds->ds_clp->cl_exchange_flags : 0);
-}
-
-static bool
-same_sockaddr(struct sockaddr *addr1, struct sockaddr *addr2)
-{
- struct sockaddr_in *a, *b;
- struct sockaddr_in6 *a6, *b6;
-
- if (addr1->sa_family != addr2->sa_family)
- return false;
-
- switch (addr1->sa_family) {
- case AF_INET:
- a = (struct sockaddr_in *)addr1;
- b = (struct sockaddr_in *)addr2;
-
- if (a->sin_addr.s_addr == b->sin_addr.s_addr &&
- a->sin_port == b->sin_port)
- return true;
- break;
-
- case AF_INET6:
- a6 = (struct sockaddr_in6 *)addr1;
- b6 = (struct sockaddr_in6 *)addr2;
-
- /* LINKLOCAL addresses must have matching scope_id */
- if (ipv6_addr_src_scope(&a6->sin6_addr) ==
- IPV6_ADDR_SCOPE_LINKLOCAL &&
- a6->sin6_scope_id != b6->sin6_scope_id)
- return false;
-
- if (ipv6_addr_equal(&a6->sin6_addr, &b6->sin6_addr) &&
- a6->sin6_port == b6->sin6_port)
- return true;
- break;
-
- default:
- dprintk("%s: unhandled address family: %u\n",
- __func__, addr1->sa_family);
- return false;
- }
-
- return false;
-}
-
-static bool
-_same_data_server_addrs_locked(const struct list_head *dsaddrs1,
- const struct list_head *dsaddrs2)
-{
- struct nfs4_pnfs_ds_addr *da1, *da2;
-
- /* step through both lists, comparing as we go */
- for (da1 = list_first_entry(dsaddrs1, typeof(*da1), da_node),
- da2 = list_first_entry(dsaddrs2, typeof(*da2), da_node);
- da1 != NULL && da2 != NULL;
- da1 = list_entry(da1->da_node.next, typeof(*da1), da_node),
- da2 = list_entry(da2->da_node.next, typeof(*da2), da_node)) {
- if (!same_sockaddr((struct sockaddr *)&da1->da_addr,
- (struct sockaddr *)&da2->da_addr))
- return false;
- }
- if (da1 == NULL && da2 == NULL)
- return true;
-
- return false;
-}
-
-/*
- * Lookup DS by addresses. nfs4_ds_cache_lock is held
- */
-static struct nfs4_pnfs_ds *
-_data_server_lookup_locked(const struct list_head *dsaddrs)
-{
- struct nfs4_pnfs_ds *ds;
-
- list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
- if (_same_data_server_addrs_locked(&ds->ds_addrs, dsaddrs))
- return ds;
- return NULL;
-}
-
-/*
* Create an rpc connection to the nfs4_pnfs_ds data server
* Currently only supports IPv4 and IPv6 addresses
*/
@@ -195,30 +87,6 @@ out_put:
goto out;
}
-static void
-destroy_ds(struct nfs4_pnfs_ds *ds)
-{
- struct nfs4_pnfs_ds_addr *da;
-
- dprintk("--> %s\n", __func__);
- ifdebug(FACILITY)
- print_ds(ds);
-
- nfs_put_client(ds->ds_clp);
-
- while (!list_empty(&ds->ds_addrs)) {
- da = list_first_entry(&ds->ds_addrs,
- struct nfs4_pnfs_ds_addr,
- da_node);
- list_del_init(&da->da_node);
- kfree(da->da_remotestr);
- kfree(da);
- }
-
- kfree(ds->ds_remotestr);
- kfree(ds);
-}
-
void
nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
{
@@ -229,113 +97,14 @@ nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
for (i = 0; i < dsaddr->ds_num; i++) {
ds = dsaddr->ds_list[i];
- if (ds != NULL) {
- if (atomic_dec_and_lock(&ds->ds_count,
- &nfs4_ds_cache_lock)) {
- list_del_init(&ds->ds_node);
- spin_unlock(&nfs4_ds_cache_lock);
- destroy_ds(ds);
- }
- }
+ if (ds != NULL)
+ nfs4_pnfs_ds_put(ds);
}
kfree(dsaddr->stripe_indices);
kfree(dsaddr);
}
/*
- * Create a string with a human readable address and port to avoid
- * complicated setup around many dprinks.
- */
-static char *
-nfs4_pnfs_remotestr(struct list_head *dsaddrs, gfp_t gfp_flags)
-{
- struct nfs4_pnfs_ds_addr *da;
- char *remotestr;
- size_t len;
- char *p;
-
- len = 3; /* '{', '}' and eol */
- list_for_each_entry(da, dsaddrs, da_node) {
- len += strlen(da->da_remotestr) + 1; /* string plus comma */
- }
-
- remotestr = kzalloc(len, gfp_flags);
- if (!remotestr)
- return NULL;
-
- p = remotestr;
- *(p++) = '{';
- len--;
- list_for_each_entry(da, dsaddrs, da_node) {
- size_t ll = strlen(da->da_remotestr);
-
- if (ll > len)
- goto out_err;
-
- memcpy(p, da->da_remotestr, ll);
- p += ll;
- len -= ll;
-
- if (len < 1)
- goto out_err;
- (*p++) = ',';
- len--;
- }
- if (len < 2)
- goto out_err;
- *(p++) = '}';
- *p = '\0';
- return remotestr;
-out_err:
- kfree(remotestr);
- return NULL;
-}
-
-static struct nfs4_pnfs_ds *
-nfs4_pnfs_ds_add(struct list_head *dsaddrs, gfp_t gfp_flags)
-{
- struct nfs4_pnfs_ds *tmp_ds, *ds = NULL;
- char *remotestr;
-
- if (list_empty(dsaddrs)) {
- dprintk("%s: no addresses defined\n", __func__);
- goto out;
- }
-
- ds = kzalloc(sizeof(*ds), gfp_flags);
- if (!ds)
- goto out;
-
- /* this is only used for debugging, so it's ok if its NULL */
- remotestr = nfs4_pnfs_remotestr(dsaddrs, gfp_flags);
-
- spin_lock(&nfs4_ds_cache_lock);
- tmp_ds = _data_server_lookup_locked(dsaddrs);
- if (tmp_ds == NULL) {
- INIT_LIST_HEAD(&ds->ds_addrs);
- list_splice_init(dsaddrs, &ds->ds_addrs);
- ds->ds_remotestr = remotestr;
- atomic_set(&ds->ds_count, 1);
- INIT_LIST_HEAD(&ds->ds_node);
- ds->ds_clp = NULL;
- list_add(&ds->ds_node, &nfs4_data_server_cache);
- dprintk("%s add new data server %s\n", __func__,
- ds->ds_remotestr);
- } else {
- kfree(remotestr);
- kfree(ds);
- atomic_inc(&tmp_ds->ds_count);
- dprintk("%s data server %s found, inc'ed ds_count to %d\n",
- __func__, tmp_ds->ds_remotestr,
- atomic_read(&tmp_ds->ds_count));
- ds = tmp_ds;
- }
- spin_unlock(&nfs4_ds_cache_lock);
-out:
- return ds;
-}
-
-/*
* Currently only supports ipv4, ipv6 and one multi-path address.
*/
static struct nfs4_pnfs_ds_addr *
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f666bc6..d0b8e0c 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -40,6 +40,24 @@ enum {
NFS_LSEG_LAYOUTCOMMIT, /* layoutcommit bit set for layoutcommit */
};
+/* Individual ip address */
+struct nfs4_pnfs_ds_addr {
+ struct sockaddr_storage da_addr;
+ size_t da_addrlen;
+ struct list_head da_node; /* nfs4_pnfs_dev_hlist dev_dslist */
+ char *da_remotestr; /* human readable addr+port */
+};
+
+struct nfs4_pnfs_ds {
+ struct list_head ds_node; /* nfs4_pnfs_dev_hlist dev_dslist */
+ char *ds_remotestr; /* comma sep list of addrs */
+ struct list_head ds_addrs;
+ struct nfs_client *ds_clp;
+ atomic_t ds_count;
+ unsigned long ds_state;
+#define NFS4DS_CONNECTING 0 /* ds is establishing connection */
+};
+
struct pnfs_layout_segment {
struct list_head pls_list;
struct list_head pls_lc_list;
@@ -274,6 +292,9 @@ bool nfs4_put_deviceid_node(struct nfs4_deviceid_node *);
void nfs4_mark_deviceid_unavailable(struct nfs4_deviceid_node *node);
bool nfs4_test_deviceid_unavailable(struct nfs4_deviceid_node *node);
void nfs4_deviceid_purge_client(const struct nfs_client *);
+void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
+struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
+ gfp_t gfp_flags);
/* pnfs_nfsio.c */
void pnfs_generic_clear_request_commit(struct nfs_page *req,
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index aa2ec00..26d7e8d 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -358,3 +358,243 @@ nfs4_deviceid_mark_client_invalid(struct nfs_client *clp)
}
rcu_read_unlock();
}
+
+/*
+ * Data server cache
+ *
+ * Data servers can be mapped to different device ids.
+ * nfs4_pnfs_ds reference counting
+ * - set to 1 on allocation
+ * - incremented when a device id maps a data server already in the cache.
+ * - decremented when deviceid is removed from the cache.
+ */
+static DEFINE_SPINLOCK(nfs4_ds_cache_lock);
+static LIST_HEAD(nfs4_data_server_cache);
+
+/* Debug routines */
+static void
+print_ds(struct nfs4_pnfs_ds *ds)
+{
+ if (ds == NULL) {
+ printk(KERN_WARNING "%s NULL device\n", __func__);
+ return;
+ }
+ printk(KERN_WARNING " ds %s\n"
+ " ref count %d\n"
+ " client %p\n"
+ " cl_exchange_flags %x\n",
+ ds->ds_remotestr,
+ atomic_read(&ds->ds_count), ds->ds_clp,
+ ds->ds_clp ? ds->ds_clp->cl_exchange_flags : 0);
+}
+
+static bool
+same_sockaddr(struct sockaddr *addr1, struct sockaddr *addr2)
+{
+ struct sockaddr_in *a, *b;
+ struct sockaddr_in6 *a6, *b6;
+
+ if (addr1->sa_family != addr2->sa_family)
+ return false;
+
+ switch (addr1->sa_family) {
+ case AF_INET:
+ a = (struct sockaddr_in *)addr1;
+ b = (struct sockaddr_in *)addr2;
+
+ if (a->sin_addr.s_addr == b->sin_addr.s_addr &&
+ a->sin_port == b->sin_port)
+ return true;
+ break;
+
+ case AF_INET6:
+ a6 = (struct sockaddr_in6 *)addr1;
+ b6 = (struct sockaddr_in6 *)addr2;
+
+ /* LINKLOCAL addresses must have matching scope_id */
+ if (ipv6_addr_src_scope(&a6->sin6_addr) ==
+ IPV6_ADDR_SCOPE_LINKLOCAL &&
+ a6->sin6_scope_id != b6->sin6_scope_id)
+ return false;
+
+ if (ipv6_addr_equal(&a6->sin6_addr, &b6->sin6_addr) &&
+ a6->sin6_port == b6->sin6_port)
+ return true;
+ break;
+
+ default:
+ dprintk("%s: unhandled address family: %u\n",
+ __func__, addr1->sa_family);
+ return false;
+ }
+
+ return false;
+}
+
+static bool
+_same_data_server_addrs_locked(const struct list_head *dsaddrs1,
+ const struct list_head *dsaddrs2)
+{
+ struct nfs4_pnfs_ds_addr *da1, *da2;
+
+ /* step through both lists, comparing as we go */
+ for (da1 = list_first_entry(dsaddrs1, typeof(*da1), da_node),
+ da2 = list_first_entry(dsaddrs2, typeof(*da2), da_node);
+ da1 != NULL && da2 != NULL;
+ da1 = list_entry(da1->da_node.next, typeof(*da1), da_node),
+ da2 = list_entry(da2->da_node.next, typeof(*da2), da_node)) {
+ if (!same_sockaddr((struct sockaddr *)&da1->da_addr,
+ (struct sockaddr *)&da2->da_addr))
+ return false;
+ }
+ if (da1 == NULL && da2 == NULL)
+ return true;
+
+ return false;
+}
+
+/*
+ * Lookup DS by addresses. nfs4_ds_cache_lock is held
+ */
+static struct nfs4_pnfs_ds *
+_data_server_lookup_locked(const struct list_head *dsaddrs)
+{
+ struct nfs4_pnfs_ds *ds;
+
+ list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
+ if (_same_data_server_addrs_locked(&ds->ds_addrs, dsaddrs))
+ return ds;
+ return NULL;
+}
+
+static void destroy_ds(struct nfs4_pnfs_ds *ds)
+{
+ struct nfs4_pnfs_ds_addr *da;
+
+ dprintk("--> %s\n", __func__);
+ ifdebug(FACILITY)
+ print_ds(ds);
+
+ nfs_put_client(ds->ds_clp);
+
+ while (!list_empty(&ds->ds_addrs)) {
+ da = list_first_entry(&ds->ds_addrs,
+ struct nfs4_pnfs_ds_addr,
+ da_node);
+ list_del_init(&da->da_node);
+ kfree(da->da_remotestr);
+ kfree(da);
+ }
+
+ kfree(ds->ds_remotestr);
+ kfree(ds);
+}
+
+void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds)
+{
+ if (atomic_dec_and_lock(&ds->ds_count,
+ &nfs4_ds_cache_lock)) {
+ list_del_init(&ds->ds_node);
+ spin_unlock(&nfs4_ds_cache_lock);
+ destroy_ds(ds);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_put);
+
+/*
+ * Create a string with a human readable address and port to avoid
+ * complicated setup around many dprinks.
+ */
+static char *
+nfs4_pnfs_remotestr(struct list_head *dsaddrs, gfp_t gfp_flags)
+{
+ struct nfs4_pnfs_ds_addr *da;
+ char *remotestr;
+ size_t len;
+ char *p;
+
+ len = 3; /* '{', '}' and eol */
+ list_for_each_entry(da, dsaddrs, da_node) {
+ len += strlen(da->da_remotestr) + 1; /* string plus comma */
+ }
+
+ remotestr = kzalloc(len, gfp_flags);
+ if (!remotestr)
+ return NULL;
+
+ p = remotestr;
+ *(p++) = '{';
+ len--;
+ list_for_each_entry(da, dsaddrs, da_node) {
+ size_t ll = strlen(da->da_remotestr);
+
+ if (ll > len)
+ goto out_err;
+
+ memcpy(p, da->da_remotestr, ll);
+ p += ll;
+ len -= ll;
+
+ if (len < 1)
+ goto out_err;
+ (*p++) = ',';
+ len--;
+ }
+ if (len < 2)
+ goto out_err;
+ *(p++) = '}';
+ *p = '\0';
+ return remotestr;
+out_err:
+ kfree(remotestr);
+ return NULL;
+}
+
+/*
+ * Given a list of multipath struct nfs4_pnfs_ds_addr, add it to ds cache if
+ * uncached and return cached struct nfs4_pnfs_ds.
+ */
+struct nfs4_pnfs_ds *
+nfs4_pnfs_ds_add(struct list_head *dsaddrs, gfp_t gfp_flags)
+{
+ struct nfs4_pnfs_ds *tmp_ds, *ds = NULL;
+ char *remotestr;
+
+ if (list_empty(dsaddrs)) {
+ dprintk("%s: no addresses defined\n", __func__);
+ goto out;
+ }
+
+ ds = kzalloc(sizeof(*ds), gfp_flags);
+ if (!ds)
+ goto out;
+
+ /* this is only used for debugging, so it's ok if its NULL */
+ remotestr = nfs4_pnfs_remotestr(dsaddrs, gfp_flags);
+
+ spin_lock(&nfs4_ds_cache_lock);
+ tmp_ds = _data_server_lookup_locked(dsaddrs);
+ if (tmp_ds == NULL) {
+ INIT_LIST_HEAD(&ds->ds_addrs);
+ list_splice_init(dsaddrs, &ds->ds_addrs);
+ ds->ds_remotestr = remotestr;
+ atomic_set(&ds->ds_count, 1);
+ INIT_LIST_HEAD(&ds->ds_node);
+ ds->ds_clp = NULL;
+ list_add(&ds->ds_node, &nfs4_data_server_cache);
+ dprintk("%s add new data server %s\n", __func__,
+ ds->ds_remotestr);
+ } else {
+ kfree(remotestr);
+ kfree(ds);
+ atomic_inc(&tmp_ds->ds_count);
+ dprintk("%s data server %s found, inc'ed ds_count to %d\n",
+ __func__, tmp_ds->ds_remotestr,
+ atomic_read(&tmp_ds->ds_count));
+ ds = tmp_ds;
+ }
+ spin_unlock(&nfs4_ds_cache_lock);
+out:
+ return ds;
+}
+EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_add);
--
1.9.3
From: Peng Tao <[email protected]>
It can be reused by flexfiles layout client.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 78 +++------------------------------------
fs/nfs/pnfs.h | 3 ++
2 files changed, 8 insertions(+), 73 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index fbfbb70..eb2e93b 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -42,51 +42,6 @@
static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
-/*
- * Create an rpc connection to the nfs4_pnfs_ds data server
- * Currently only supports IPv4 and IPv6 addresses
- */
-static int
-nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
-{
- struct nfs_client *clp = ERR_PTR(-EIO);
- struct nfs4_pnfs_ds_addr *da;
- int status = 0;
-
- dprintk("--> %s DS %s au_flavor %d\n", __func__, ds->ds_remotestr,
- mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
-
- list_for_each_entry(da, &ds->ds_addrs, da_node) {
- dprintk("%s: DS %s: trying address %s\n",
- __func__, ds->ds_remotestr, da->da_remotestr);
-
- clp = nfs4_set_ds_client(mds_srv->nfs_client,
- (struct sockaddr *)&da->da_addr,
- da->da_addrlen, IPPROTO_TCP,
- dataserver_timeo, dataserver_retrans);
- if (!IS_ERR(clp))
- break;
- }
-
- if (IS_ERR(clp)) {
- status = PTR_ERR(clp);
- goto out;
- }
-
- status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
- if (status)
- goto out_put;
-
- smp_wmb();
- ds->ds_clp = clp;
- dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
-out:
- return status;
-out_put:
- nfs_put_client(clp);
- goto out;
-}
-
void
nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
{
@@ -450,22 +405,7 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
return flseg->fh_array[i];
}
-static void nfs4_wait_ds_connect(struct nfs4_pnfs_ds *ds)
-{
- might_sleep();
- wait_on_bit_action(&ds->ds_state, NFS4DS_CONNECTING,
- nfs_wait_bit_killable, TASK_KILLABLE);
-}
-
-static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
-{
- smp_mb__before_atomic();
- clear_bit(NFS4DS_CONNECTING, &ds->ds_state);
- smp_mb__after_atomic();
- wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
-}
-
-
+/* Upon return, either ds is connected, or ds is NULL */
struct nfs4_pnfs_ds *
nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
{
@@ -473,6 +413,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
struct nfs4_pnfs_ds *ret = ds;
+ struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
if (ds == NULL) {
printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
@@ -484,18 +425,9 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
if (ds->ds_clp)
goto out_test_devid;
- if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
- struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
- int err;
-
- err = nfs4_ds_connect(s, ds);
- if (err)
- nfs4_mark_deviceid_unavailable(devid);
- nfs4_clear_ds_conn_bit(ds);
- } else {
- /* Either ds is connected, or ds is NULL */
- nfs4_wait_ds_connect(ds);
- }
+ nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
+ dataserver_retrans);
+
out_test_devid:
if (filelayout_test_devid_unavailable(devid))
ret = NULL;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index d0b8e0c..a213c2d 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -295,6 +295,9 @@ void nfs4_deviceid_purge_client(const struct nfs_client *);
void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
+void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
+ struct nfs4_deviceid_node *devid, unsigned int timeo,
+ unsigned int retrans);
/* pnfs_nfsio.c */
void pnfs_generic_clear_request_commit(struct nfs_page *req,
--
1.9.3
From: Peng Tao <[email protected]>
It can be reused by flexfile layout.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 152 +------------------------
fs/nfs/pnfs.h | 3 +
fs/nfs/pnfs_dev.c | 229 ++++++++++++++++++++++++++++++++++++++
3 files changed, 234 insertions(+), 150 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index eb2e93b..27bdd8c 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -31,7 +31,6 @@
#include <linux/nfs_fs.h>
#include <linux/vmalloc.h>
#include <linux/module.h>
-#include <linux/sunrpc/addr.h>
#include "../internal.h"
#include "../nfs4session.h"
@@ -59,153 +58,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
kfree(dsaddr);
}
-/*
- * Currently only supports ipv4, ipv6 and one multi-path address.
- */
-static struct nfs4_pnfs_ds_addr *
-decode_ds_addr(struct net *net, struct xdr_stream *streamp, gfp_t gfp_flags)
-{
- struct nfs4_pnfs_ds_addr *da = NULL;
- char *buf, *portstr;
- __be16 port;
- int nlen, rlen;
- int tmp[2];
- __be32 *p;
- char *netid, *match_netid;
- size_t len, match_netid_len;
- char *startsep = "";
- char *endsep = "";
-
-
- /* r_netid */
- p = xdr_inline_decode(streamp, 4);
- if (unlikely(!p))
- goto out_err;
- nlen = be32_to_cpup(p++);
-
- p = xdr_inline_decode(streamp, nlen);
- if (unlikely(!p))
- goto out_err;
-
- netid = kmalloc(nlen+1, gfp_flags);
- if (unlikely(!netid))
- goto out_err;
-
- netid[nlen] = '\0';
- memcpy(netid, p, nlen);
-
- /* r_addr: ip/ip6addr with port in dec octets - see RFC 5665 */
- p = xdr_inline_decode(streamp, 4);
- if (unlikely(!p))
- goto out_free_netid;
- rlen = be32_to_cpup(p);
-
- p = xdr_inline_decode(streamp, rlen);
- if (unlikely(!p))
- goto out_free_netid;
-
- /* port is ".ABC.DEF", 8 chars max */
- if (rlen > INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN + 8) {
- dprintk("%s: Invalid address, length %d\n", __func__,
- rlen);
- goto out_free_netid;
- }
- buf = kmalloc(rlen + 1, gfp_flags);
- if (!buf) {
- dprintk("%s: Not enough memory\n", __func__);
- goto out_free_netid;
- }
- buf[rlen] = '\0';
- memcpy(buf, p, rlen);
-
- /* replace port '.' with '-' */
- portstr = strrchr(buf, '.');
- if (!portstr) {
- dprintk("%s: Failed finding expected dot in port\n",
- __func__);
- goto out_free_buf;
- }
- *portstr = '-';
-
- /* find '.' between address and port */
- portstr = strrchr(buf, '.');
- if (!portstr) {
- dprintk("%s: Failed finding expected dot between address and "
- "port\n", __func__);
- goto out_free_buf;
- }
- *portstr = '\0';
-
- da = kzalloc(sizeof(*da), gfp_flags);
- if (unlikely(!da))
- goto out_free_buf;
-
- INIT_LIST_HEAD(&da->da_node);
-
- if (!rpc_pton(net, buf, portstr-buf, (struct sockaddr *)&da->da_addr,
- sizeof(da->da_addr))) {
- dprintk("%s: error parsing address %s\n", __func__, buf);
- goto out_free_da;
- }
-
- portstr++;
- sscanf(portstr, "%d-%d", &tmp[0], &tmp[1]);
- port = htons((tmp[0] << 8) | (tmp[1]));
-
- switch (da->da_addr.ss_family) {
- case AF_INET:
- ((struct sockaddr_in *)&da->da_addr)->sin_port = port;
- da->da_addrlen = sizeof(struct sockaddr_in);
- match_netid = "tcp";
- match_netid_len = 3;
- break;
-
- case AF_INET6:
- ((struct sockaddr_in6 *)&da->da_addr)->sin6_port = port;
- da->da_addrlen = sizeof(struct sockaddr_in6);
- match_netid = "tcp6";
- match_netid_len = 4;
- startsep = "[";
- endsep = "]";
- break;
-
- default:
- dprintk("%s: unsupported address family: %u\n",
- __func__, da->da_addr.ss_family);
- goto out_free_da;
- }
-
- if (nlen != match_netid_len || strncmp(netid, match_netid, nlen)) {
- dprintk("%s: ERROR: r_netid \"%s\" != \"%s\"\n",
- __func__, netid, match_netid);
- goto out_free_da;
- }
-
- /* save human readable address */
- len = strlen(startsep) + strlen(buf) + strlen(endsep) + 7;
- da->da_remotestr = kzalloc(len, gfp_flags);
-
- /* NULL is ok, only used for dprintk */
- if (da->da_remotestr)
- snprintf(da->da_remotestr, len, "%s%s%s:%u", startsep,
- buf, endsep, ntohs(port));
-
- dprintk("%s: Parsed DS addr %s\n", __func__, da->da_remotestr);
- kfree(buf);
- kfree(netid);
- return da;
-
-out_free_da:
- kfree(da);
-out_free_buf:
- dprintk("%s: Error parsing DS addr: %s\n", __func__, buf);
- kfree(buf);
-out_free_netid:
- kfree(netid);
-out_err:
- return NULL;
-}
-
/* Decode opaque device data and return the result */
struct nfs4_file_layout_dsaddr *
nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
@@ -308,8 +160,8 @@ nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
mp_count = be32_to_cpup(p); /* multipath count */
for (j = 0; j < mp_count; j++) {
- da = decode_ds_addr(server->nfs_client->cl_net,
- &stream, gfp_flags);
+ da = nfs4_decode_mp_ds_addr(server->nfs_client->cl_net,
+ &stream, gfp_flags);
if (da)
list_add_tail(&da->da_node, &dsaddrs);
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a213c2d..4c53f16 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -298,6 +298,9 @@ struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
unsigned int retrans);
+struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
+ struct xdr_stream *xdr,
+ gfp_t gfp_flags);
/* pnfs_nfsio.c */
void pnfs_generic_clear_request_commit(struct nfs_page *req,
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 26d7e8d..a4e33aa 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -29,6 +29,7 @@
*/
#include <linux/export.h>
+#include <linux/sunrpc/addr.h>
#include <linux/nfs_fs.h>
#include "nfs4session.h"
#include "internal.h"
@@ -598,3 +599,231 @@ out:
return ds;
}
EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_add);
+
+static void nfs4_wait_ds_connect(struct nfs4_pnfs_ds *ds)
+{
+ might_sleep();
+ wait_on_bit(&ds->ds_state, NFS4DS_CONNECTING,
+ TASK_KILLABLE);
+}
+
+static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
+{
+ smp_mb__before_atomic();
+ clear_bit(NFS4DS_CONNECTING, &ds->ds_state);
+ smp_mb__after_atomic();
+ wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
+}
+
+static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
+ struct nfs4_pnfs_ds *ds,
+ unsigned int timeo,
+ unsigned int retrans)
+{
+ struct nfs_client *clp = ERR_PTR(-EIO);
+ struct nfs4_pnfs_ds_addr *da;
+ int status = 0;
+
+ dprintk("--> %s DS %s au_flavor %d\n", __func__, ds->ds_remotestr,
+ mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
+
+ list_for_each_entry(da, &ds->ds_addrs, da_node) {
+ dprintk("%s: DS %s: trying address %s\n",
+ __func__, ds->ds_remotestr, da->da_remotestr);
+
+ clp = nfs4_set_ds_client(mds_srv->nfs_client,
+ (struct sockaddr *)&da->da_addr,
+ da->da_addrlen, IPPROTO_TCP,
+ timeo, retrans);
+ if (!IS_ERR(clp))
+ break;
+ }
+
+ if (IS_ERR(clp)) {
+ status = PTR_ERR(clp);
+ goto out;
+ }
+
+ status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
+ if (status)
+ goto out_put;
+
+ smp_wmb();
+ ds->ds_clp = clp;
+ dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
+out:
+ return status;
+out_put:
+ nfs_put_client(clp);
+ goto out;
+}
+
+/*
+ * Create an rpc connection to the nfs4_pnfs_ds data server.
+ * Currently only supports IPv4 and IPv6 addresses.
+ * If connection fails, make devid unavailable.
+ */
+void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
+ struct nfs4_deviceid_node *devid, unsigned int timeo,
+ unsigned int retrans)
+{
+ if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
+ int err = 0;
+
+ err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans);
+ if (err)
+ nfs4_mark_deviceid_unavailable(devid);
+ nfs4_clear_ds_conn_bit(ds);
+ } else {
+ nfs4_wait_ds_connect(ds);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_connect);
+
+/*
+ * Currently only supports ipv4, ipv6 and one multi-path address.
+ */
+struct nfs4_pnfs_ds_addr *
+nfs4_decode_mp_ds_addr(struct net *net, struct xdr_stream *xdr, gfp_t gfp_flags)
+{
+ struct nfs4_pnfs_ds_addr *da = NULL;
+ char *buf, *portstr;
+ __be16 port;
+ int nlen, rlen;
+ int tmp[2];
+ __be32 *p;
+ char *netid, *match_netid;
+ size_t len, match_netid_len;
+ char *startsep = "";
+ char *endsep = "";
+
+
+ /* r_netid */
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_err;
+ nlen = be32_to_cpup(p++);
+
+ p = xdr_inline_decode(xdr, nlen);
+ if (unlikely(!p))
+ goto out_err;
+
+ netid = kmalloc(nlen+1, gfp_flags);
+ if (unlikely(!netid))
+ goto out_err;
+
+ netid[nlen] = '\0';
+ memcpy(netid, p, nlen);
+
+ /* r_addr: ip/ip6addr with port in dec octets - see RFC 5665 */
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_free_netid;
+ rlen = be32_to_cpup(p);
+
+ p = xdr_inline_decode(xdr, rlen);
+ if (unlikely(!p))
+ goto out_free_netid;
+
+ /* port is ".ABC.DEF", 8 chars max */
+ if (rlen > INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN + 8) {
+ dprintk("%s: Invalid address, length %d\n", __func__,
+ rlen);
+ goto out_free_netid;
+ }
+ buf = kmalloc(rlen + 1, gfp_flags);
+ if (!buf) {
+ dprintk("%s: Not enough memory\n", __func__);
+ goto out_free_netid;
+ }
+ buf[rlen] = '\0';
+ memcpy(buf, p, rlen);
+
+ /* replace port '.' with '-' */
+ portstr = strrchr(buf, '.');
+ if (!portstr) {
+ dprintk("%s: Failed finding expected dot in port\n",
+ __func__);
+ goto out_free_buf;
+ }
+ *portstr = '-';
+
+ /* find '.' between address and port */
+ portstr = strrchr(buf, '.');
+ if (!portstr) {
+ dprintk("%s: Failed finding expected dot between address and "
+ "port\n", __func__);
+ goto out_free_buf;
+ }
+ *portstr = '\0';
+
+ da = kzalloc(sizeof(*da), gfp_flags);
+ if (unlikely(!da))
+ goto out_free_buf;
+
+ INIT_LIST_HEAD(&da->da_node);
+
+ if (!rpc_pton(net, buf, portstr-buf, (struct sockaddr *)&da->da_addr,
+ sizeof(da->da_addr))) {
+ dprintk("%s: error parsing address %s\n", __func__, buf);
+ goto out_free_da;
+ }
+
+ portstr++;
+ sscanf(portstr, "%d-%d", &tmp[0], &tmp[1]);
+ port = htons((tmp[0] << 8) | (tmp[1]));
+
+ switch (da->da_addr.ss_family) {
+ case AF_INET:
+ ((struct sockaddr_in *)&da->da_addr)->sin_port = port;
+ da->da_addrlen = sizeof(struct sockaddr_in);
+ match_netid = "tcp";
+ match_netid_len = 3;
+ break;
+
+ case AF_INET6:
+ ((struct sockaddr_in6 *)&da->da_addr)->sin6_port = port;
+ da->da_addrlen = sizeof(struct sockaddr_in6);
+ match_netid = "tcp6";
+ match_netid_len = 4;
+ startsep = "[";
+ endsep = "]";
+ break;
+
+ default:
+ dprintk("%s: unsupported address family: %u\n",
+ __func__, da->da_addr.ss_family);
+ goto out_free_da;
+ }
+
+ if (nlen != match_netid_len || strncmp(netid, match_netid, nlen)) {
+ dprintk("%s: ERROR: r_netid \"%s\" != \"%s\"\n",
+ __func__, netid, match_netid);
+ goto out_free_da;
+ }
+
+ /* save human readable address */
+ len = strlen(startsep) + strlen(buf) + strlen(endsep) + 7;
+ da->da_remotestr = kzalloc(len, gfp_flags);
+
+ /* NULL is ok, only used for dprintk */
+ if (da->da_remotestr)
+ snprintf(da->da_remotestr, len, "%s%s%s:%u", startsep,
+ buf, endsep, ntohs(port));
+
+ dprintk("%s: Parsed DS addr %s\n", __func__, da->da_remotestr);
+ kfree(buf);
+ kfree(netid);
+ return da;
+
+out_free_da:
+ kfree(da);
+out_free_buf:
+ dprintk("%s: Error parsing DS addr: %s\n", __func__, buf);
+ kfree(buf);
+out_free_netid:
+ kfree(netid);
+out_err:
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(nfs4_decode_mp_ds_addr);
--
1.9.3
From: Peng Tao <[email protected]>
flexfile layout may use different auth flavor as specified by MDS.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 3 ++-
fs/nfs/internal.h | 3 ++-
fs/nfs/nfs4client.c | 5 +++--
fs/nfs/pnfs.h | 2 +-
fs/nfs/pnfs_dev.c | 10 ++++++----
5 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index 27bdd8c..5e4b0ce 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -278,7 +278,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
goto out_test_devid;
nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
- dataserver_retrans);
+ dataserver_retrans,
+ s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
out_test_devid:
if (filelayout_test_devid_unavailable(devid))
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index efaa31c..7d7c36f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -189,7 +189,8 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr,
int ds_addrlen, int ds_proto,
unsigned int ds_timeo,
- unsigned int ds_retrans);
+ unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor);
extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
struct inode *);
#ifdef CONFIG_PROC_FS
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 0331125..61d552d 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -837,7 +837,8 @@ error:
*/
struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr, int ds_addrlen,
- int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans)
+ int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor)
{
struct nfs_client_initdata cl_init = {
.addr = ds_addr,
@@ -862,7 +863,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
*/
nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
- mds_clp->cl_rpcclient->cl_auth->au_flavor);
+ au_flavor);
dprintk("<-- %s %p\n", __func__, clp);
return clp;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 4c53f16..cb666e8 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -297,7 +297,7 @@ struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans);
+ unsigned int retrans, rpc_authflavor_t au_flavor);
struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
struct xdr_stream *xdr,
gfp_t gfp_flags);
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index a4e33aa..f819aa3 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -618,7 +618,8 @@ static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
struct nfs4_pnfs_ds *ds,
unsigned int timeo,
- unsigned int retrans)
+ unsigned int retrans,
+ rpc_authflavor_t au_flavor)
{
struct nfs_client *clp = ERR_PTR(-EIO);
struct nfs4_pnfs_ds_addr *da;
@@ -634,7 +635,7 @@ static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
clp = nfs4_set_ds_client(mds_srv->nfs_client,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
- timeo, retrans);
+ timeo, retrans, au_flavor);
if (!IS_ERR(clp))
break;
}
@@ -665,12 +666,13 @@ out_put:
*/
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans)
+ unsigned int retrans, rpc_authflavor_t au_flavor)
{
if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
int err = 0;
- err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans);
+ err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo,
+ retrans, au_flavor);
if (err)
nfs4_mark_deviceid_unavailable(devid);
nfs4_clear_ds_conn_bit(ds);
--
1.9.3
From: Peng Tao <[email protected]>
They can be reused by flexfile layout as well.
Also add a code such that if read fails on one DS and
there are other DSes available to use, don't resend
through MDS but through pNFS so that client can read
from other DSes.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.h | 10 ----------
fs/nfs/pnfs.h | 11 +++++++++++
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.h b/fs/nfs/filelayout/filelayout.h
index f97eea6..2896cb8 100644
--- a/fs/nfs/filelayout/filelayout.h
+++ b/fs/nfs/filelayout/filelayout.h
@@ -33,13 +33,6 @@
#include "../pnfs.h"
/*
- * Default data server connection timeout and retrans vaules.
- * Set by module paramters dataserver_timeo and dataserver_retrans.
- */
-#define NFS4_DEF_DS_TIMEO 600 /* in tenths of a second */
-#define NFS4_DEF_DS_RETRANS 5
-
-/*
* Field testing shows we need to support up to 4096 stripe indices.
* We store each index as a u8 (u32 on the wire) to keep the memory footprint
* reasonable. This in turn means we support a maximum of 256
@@ -48,9 +41,6 @@
#define NFS4_PNFS_MAX_STRIPE_CNT 4096
#define NFS4_PNFS_MAX_MULTI_CNT 256 /* 256 fit into a u8 stripe_index */
-/* error codes for internal use */
-#define NFS4ERR_RESET_TO_MDS 12001
-
enum stripetype4 {
STRIPE_SPARSE = 1,
STRIPE_DENSE = 2
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index cb666e8..588b2f1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -77,6 +77,17 @@ enum pnfs_try_status {
#define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
+/*
+ * Default data server connection timeout and retrans vaules.
+ * Set by module parameters dataserver_timeo and dataserver_retrans.
+ */
+#define NFS4_DEF_DS_TIMEO 600 /* in tenths of a second */
+#define NFS4_DEF_DS_RETRANS 5
+
+/* error codes for internal use */
+#define NFS4ERR_RESET_TO_MDS 12001
+#define NFS4ERR_RESET_TO_PNFS 12002
+
enum {
NFS_LAYOUT_RO_FAILED = 0, /* get ro layout failed stop trying */
NFS_LAYOUT_RW_FAILED, /* get rw layout failed stop trying */
--
1.9.3
From: Peng Tao <[email protected]>
The flexfiles layout wants to create DS connection over NFSv3.
Add nfs3_set_ds_client to allow that to happen.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/internal.h | 4 ++++
fs/nfs/nfs3_fs.h | 3 ++-
fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
fs/nfs/nfs3super.c | 2 +-
4 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 7d7c36f..7332ba1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
rpc_authflavor_t au_flavor);
extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
struct inode *);
+extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
+ const struct sockaddr *ds_addr, int ds_addrlen,
+ int ds_proto, unsigned int ds_timeo,
+ unsigned int ds_retrans, rpc_authflavor_t au_flavor);
#ifdef CONFIG_PROC_FS
extern int __init nfs_fs_proc_init(void);
extern void nfs_fs_proc_exit(void);
diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
index 333ae40..fc9cd85 100644
--- a/fs/nfs/nfs3_fs.h
+++ b/fs/nfs/nfs3_fs.h
@@ -29,6 +29,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subversion *);
struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
struct nfs_fattr *, rpc_authflavor_t);
-
+/* nfs3super.c */
+extern struct nfs_subversion nfs_v3;
#endif /* __LINUX_FS_NFS_NFS3_FS_H */
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 8c1b437..52e2344 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
nfs_init_server_aclclient(server);
return server;
}
+
+/*
+ * Set up a pNFS Data Server client over NFSv3.
+ *
+ * Return any existing nfs_client that matches server address,port,version
+ * and minorversion.
+ *
+ * For a new nfs_client, use a soft mount (default), a low retrans and a
+ * low timeout interval so that if a connection is lost, we retry through
+ * the MDS.
+ */
+struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
+ const struct sockaddr *ds_addr, int ds_addrlen,
+ int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor)
+{
+ struct nfs_client_initdata cl_init = {
+ .addr = ds_addr,
+ .addrlen = ds_addrlen,
+ .nfs_mod = &nfs_v3,
+ .proto = ds_proto,
+ .net = mds_clp->cl_net,
+ };
+ struct rpc_timeout ds_timeout;
+ struct nfs_client *clp;
+
+ /* Use the MDS nfs_client cl_ipaddr. */
+ nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
+ clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
+ au_flavor);
+
+ return clp;
+}
+EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
index 6af29c2..5c4394e 100644
--- a/fs/nfs/nfs3super.c
+++ b/fs/nfs/nfs3super.c
@@ -7,7 +7,7 @@
#include "nfs3_fs.h"
#include "nfs.h"
-static struct nfs_subversion nfs_v3 = {
+struct nfs_subversion nfs_v3 = {
.owner = THIS_MODULE,
.nfs_fs = &nfs_fs_type,
.rpc_vers = &nfs_version3,
--
1.9.3
From: Peng Tao <[email protected]>
flexfile layout may need to set such when making DS connections.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 3 ++-
fs/nfs/internal.h | 1 +
fs/nfs/nfs4client.c | 4 ++--
fs/nfs/pnfs.h | 3 ++-
fs/nfs/pnfs_dev.c | 11 +++++++----
5 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index 5e4b0ce..4f372e2 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -278,7 +278,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
goto out_test_devid;
nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
- dataserver_retrans,
+ dataserver_retrans, 4,
+ s->nfs_client->cl_minorversion,
s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
out_test_devid:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 7332ba1..5543850 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -190,6 +190,7 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
int ds_addrlen, int ds_proto,
unsigned int ds_timeo,
unsigned int ds_retrans,
+ u32 minor_version,
rpc_authflavor_t au_flavor);
extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
struct inode *);
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 61d552d..d303627 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -838,14 +838,14 @@ error:
struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr, int ds_addrlen,
int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
- rpc_authflavor_t au_flavor)
+ u32 minor_version, rpc_authflavor_t au_flavor)
{
struct nfs_client_initdata cl_init = {
.addr = ds_addr,
.addrlen = ds_addrlen,
.nfs_mod = &nfs_v4,
.proto = ds_proto,
- .minorversion = mds_clp->cl_minorversion,
+ .minorversion = minor_version,
.net = mds_clp->cl_net,
};
struct rpc_timeout ds_timeout;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 588b2f1..a5b168c 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -308,7 +308,8 @@ struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans, rpc_authflavor_t au_flavor);
+ unsigned int retrans, u32 versoin, u32 minor_version,
+ rpc_authflavor_t au_flavor);
struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
struct xdr_stream *xdr,
gfp_t gfp_flags);
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index f819aa3..56f5c16 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -619,6 +619,7 @@ static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
struct nfs4_pnfs_ds *ds,
unsigned int timeo,
unsigned int retrans,
+ u32 minor_version,
rpc_authflavor_t au_flavor)
{
struct nfs_client *clp = ERR_PTR(-EIO);
@@ -635,7 +636,8 @@ static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
clp = nfs4_set_ds_client(mds_srv->nfs_client,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
- timeo, retrans, au_flavor);
+ timeo, retrans, minor_version,
+ au_flavor);
if (!IS_ERR(clp))
break;
}
@@ -666,13 +668,14 @@ out_put:
*/
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans, rpc_authflavor_t au_flavor)
+ unsigned int retrans, u32 version,
+ u32 minor_version, rpc_authflavor_t au_flavor)
{
if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
int err = 0;
- err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo,
- retrans, au_flavor);
+ err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans,
+ minor_version, au_flavor);
if (err)
nfs4_mark_deviceid_unavailable(devid);
nfs4_clear_ds_conn_bit(ds);
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs_dev.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 56f5c16..655333d 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -615,7 +615,44 @@ static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
}
-static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
+static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv,
+ struct nfs4_pnfs_ds *ds,
+ unsigned int timeo,
+ unsigned int retrans,
+ rpc_authflavor_t au_flavor)
+{
+ struct nfs_client *clp = ERR_PTR(-EIO);
+ struct nfs4_pnfs_ds_addr *da;
+ int status = 0;
+
+ dprintk("--> %s DS %s au_flavor %d\n", __func__,
+ ds->ds_remotestr, au_flavor);
+
+ list_for_each_entry(da, &ds->ds_addrs, da_node) {
+ dprintk("%s: DS %s: trying address %s\n",
+ __func__, ds->ds_remotestr, da->da_remotestr);
+
+ clp = nfs3_set_ds_client(mds_srv->nfs_client,
+ (struct sockaddr *)&da->da_addr,
+ da->da_addrlen, IPPROTO_TCP,
+ timeo, retrans, au_flavor);
+ if (!IS_ERR(clp))
+ break;
+ }
+
+ if (IS_ERR(clp)) {
+ status = PTR_ERR(clp);
+ goto out;
+ }
+
+ smp_wmb();
+ ds->ds_clp = clp;
+ dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
+out:
+ return status;
+}
+
+static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
struct nfs4_pnfs_ds *ds,
unsigned int timeo,
unsigned int retrans,
@@ -674,8 +711,19 @@ void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
int err = 0;
- err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans,
- minor_version, au_flavor);
+ if (version == 3) {
+ err = _nfs4_pnfs_v3_ds_connect(mds_srv, ds, timeo,
+ retrans, au_flavor);
+ } else if (version == 4) {
+ err = _nfs4_pnfs_v4_ds_connect(mds_srv, ds, timeo,
+ retrans, minor_version,
+ au_flavor);
+ } else {
+ dprintk("%s: unsupported DS version %d\n", __func__,
+ version);
+ err = -EPROTONOSUPPORT;
+ }
+
if (err)
nfs4_mark_deviceid_unavailable(devid);
nfs4_clear_ds_conn_bit(ds);
--
1.9.3
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 4 ++--
fs/nfs/internal.h | 1 +
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/read.c | 3 ++-
fs/nfs/write.c | 3 ++-
include/linux/nfs_page.h | 1 +
6 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index bc36ed3..25c4896 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -501,7 +501,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr,
+ nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
&filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
}
@@ -542,7 +542,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr,
+ nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
&filelayout_write_call_ops, sync,
RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5543850..1d15ffa 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -251,6 +251,7 @@ void nfs_pgio_header_free(struct nfs_pgio_header *);
void nfs_pgio_data_destroy(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
+ const struct nfs_rpc_ops *,
const struct rpc_call_ops *, int, int);
void nfs_free_request(struct nfs_page *req);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 2b5e769..35a2626 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -597,6 +597,7 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
}
int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
@@ -616,7 +617,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
};
int ret = 0;
- hdr->rw_ops->rw_initiate(hdr, &msg, &task_setup_data, how);
+ hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how);
dprintk("NFS: %5u initiated pgio call "
"(req %s/%llu, %u bytes @ offset %llu)\n",
@@ -792,7 +793,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0)
ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
- hdr, desc->pg_rpc_callops,
+ hdr, NFS_PROTO(hdr->inode),
+ desc->pg_rpc_callops,
desc->pg_ioflags, 0);
return ret;
}
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index c91a479..092ab49 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -168,13 +168,14 @@ out:
static void nfs_initiate_read(struct nfs_pgio_header *hdr,
struct rpc_message *msg,
+ const struct nfs_rpc_ops *rpc_ops,
struct rpc_task_setup *task_setup_data, int how)
{
struct inode *inode = hdr->inode;
int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
task_setup_data->flags |= swap_flags;
- NFS_PROTO(inode)->read_setup(hdr, msg);
+ rpc_ops->read_setup(hdr, msg);
}
static void
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index af3af68..e5ed21c 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1240,13 +1240,14 @@ static int flush_task_priority(int how)
static void nfs_initiate_write(struct nfs_pgio_header *hdr,
struct rpc_message *msg,
+ const struct nfs_rpc_ops *rpc_ops,
struct rpc_task_setup *task_setup_data, int how)
{
struct inode *inode = hdr->inode;
int priority = flush_task_priority(how);
task_setup_data->priority = priority;
- NFS_PROTO(inode)->write_setup(hdr, msg);
+ rpc_ops->write_setup(hdr, msg);
nfs4_state_protect_write(NFS_SERVER(inode)->nfs_client,
&task_setup_data->rpc_client, msg, hdr);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 6c3e06e..4c3aa80 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -69,6 +69,7 @@ struct nfs_rw_ops {
struct inode *);
void (*rw_result)(struct rpc_task *, struct nfs_pgio_header *);
void (*rw_initiate)(struct nfs_pgio_header *, struct rpc_message *,
+ const struct nfs_rpc_ops *,
struct rpc_task_setup *, int);
};
--
1.9.3
From: Peng Tao <[email protected]>
pnfs flexfile layout client may want to use NFSv3 ops rather
than the default MDS v4 ops.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 2 +-
fs/nfs/internal.h | 1 +
fs/nfs/pnfs_nfsio.c | 1 +
fs/nfs/write.c | 7 ++++---
4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 25c4896..e5a3c5b 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -1055,7 +1055,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
if (fh)
data->args.fh = fh;
- return nfs_initiate_commit(ds_clnt, data,
+ return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
RPC_TASK_SOFTCONN);
out_err:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 1d15ffa..98dee83 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -436,6 +436,7 @@ extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
extern void nfs_commit_prepare(struct rpc_task *task, void *calldata);
extern int nfs_initiate_commit(struct rpc_clnt *clnt,
struct nfs_commit_data *data,
+ const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
int how, int flags);
extern void nfs_init_commit(struct nfs_commit_data *data,
diff --git a/fs/nfs/pnfs_nfsio.c b/fs/nfs/pnfs_nfsio.c
index fd2a2f0..329447c 100644
--- a/fs/nfs/pnfs_nfsio.c
+++ b/fs/nfs/pnfs_nfsio.c
@@ -273,6 +273,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
if (!data->lseg) {
nfs_init_commit(data, mds_pages, NULL, cinfo);
nfs_initiate_commit(NFS_CLIENT(inode), data,
+ NFS_PROTO(data->inode),
data->mds_ops, how, 0);
} else {
struct pnfs_commit_bucket *buckets;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e5ed21c..ab392af 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1466,6 +1466,7 @@ void nfs_commitdata_release(struct nfs_commit_data *data)
EXPORT_SYMBOL_GPL(nfs_commitdata_release);
int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
+ const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
int how, int flags)
{
@@ -1487,7 +1488,7 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
.priority = priority,
};
/* Set up the initial task struct. */
- NFS_PROTO(data->inode)->commit_setup(data, &msg);
+ nfs_ops->commit_setup(data, &msg);
dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
@@ -1590,8 +1591,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
/* Set up the argument struct */
nfs_init_commit(data, head, NULL, cinfo);
atomic_inc(&cinfo->mds->rpcs_out);
- return nfs_initiate_commit(NFS_CLIENT(inode), data, data->mds_ops,
- how, 0);
+ return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
+ data->mds_ops, how, 0);
out_bad:
nfs_retry_commit(head, NULL, cinfo);
cinfo->completion_ops->error_cleanup(NFS_I(inode));
--
1.9.3
From: Peng Tao <[email protected]>
flexclient needs this as there is no nfs_server to DS connection.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4_fs.h | 4 ++++
fs/nfs/nfs4proc.c | 24 +++++++++++++-----------
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index a081787..90c4ffe 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -443,6 +443,10 @@ extern void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid);
extern void nfs_increment_lock_seqid(int status, struct nfs_seqid *seqid);
extern void nfs_release_seqid(struct nfs_seqid *seqid);
extern void nfs_free_seqid(struct nfs_seqid *seqid);
+extern int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
+ struct nfs4_sequence_args *args,
+ struct nfs4_sequence_res *res,
+ struct rpc_task *task);
extern void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e7f8d5f..9b1a481 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -495,12 +495,11 @@ static void nfs4_set_sequence_privileged(struct nfs4_sequence_args *args)
args->sa_privileged = 1;
}
-static int nfs40_setup_sequence(const struct nfs_server *server,
- struct nfs4_sequence_args *args,
- struct nfs4_sequence_res *res,
- struct rpc_task *task)
+int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
+ struct nfs4_sequence_args *args,
+ struct nfs4_sequence_res *res,
+ struct rpc_task *task)
{
- struct nfs4_slot_table *tbl = server->nfs_client->cl_slot_tbl;
struct nfs4_slot *slot;
/* slot already allocated? */
@@ -535,6 +534,7 @@ out_sleep:
spin_unlock(&tbl->slot_tbl_lock);
return -EAGAIN;
}
+EXPORT_SYMBOL_GPL(nfs40_setup_sequence);
static int nfs40_sequence_done(struct rpc_task *task,
struct nfs4_sequence_res *res)
@@ -777,7 +777,8 @@ static int nfs4_setup_sequence(const struct nfs_server *server,
int ret = 0;
if (!session)
- return nfs40_setup_sequence(server, args, res, task);
+ return nfs40_setup_sequence(server->nfs_client->cl_slot_tbl,
+ args, res, task);
dprintk("--> %s clp %p session %p sr_slot %u\n",
__func__, session->clp, session, res->sr_slot ?
@@ -818,7 +819,8 @@ static int nfs4_setup_sequence(const struct nfs_server *server,
struct nfs4_sequence_res *res,
struct rpc_task *task)
{
- return nfs40_setup_sequence(server, args, res, task);
+ return nfs40_setup_sequence(server->nfs_client->cl_slot_tbl,
+ args, res, task);
}
static int nfs4_sequence_done(struct rpc_task *task,
@@ -1681,8 +1683,8 @@ static void nfs4_open_confirm_prepare(struct rpc_task *task, void *calldata)
{
struct nfs4_opendata *data = calldata;
- nfs40_setup_sequence(data->o_arg.server, &data->c_arg.seq_args,
- &data->c_res.seq_res, task);
+ nfs40_setup_sequence(data->o_arg.server->nfs_client->cl_slot_tbl,
+ &data->c_arg.seq_args, &data->c_res.seq_res, task);
}
static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
@@ -5965,8 +5967,8 @@ static void nfs4_release_lockowner_prepare(struct rpc_task *task, void *calldata
{
struct nfs_release_lockowner_data *data = calldata;
struct nfs_server *server = data->server;
- nfs40_setup_sequence(server, &data->args.seq_args,
- &data->res.seq_res, task);
+ nfs40_setup_sequence(server->nfs_client->cl_slot_tbl,
+ &data->args.seq_args, &data->res.seq_res, task);
data->args.lock_owner.clientid = server->nfs_client->cl_clientid;
data->timestamp = jiffies;
}
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4_fs.h | 2 ++
fs/nfs/nfs4proc.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 90c4ffe..b3c771e 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -447,6 +447,8 @@ extern int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
struct nfs4_sequence_args *args,
struct nfs4_sequence_res *res,
struct rpc_task *task);
+extern int nfs4_sequence_done(struct rpc_task *task,
+ struct nfs4_sequence_res *res);
extern void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9b1a481..4883a42 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -694,8 +694,7 @@ out_retry:
}
EXPORT_SYMBOL_GPL(nfs41_sequence_done);
-static int nfs4_sequence_done(struct rpc_task *task,
- struct nfs4_sequence_res *res)
+int nfs4_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res)
{
if (res->sr_slot == NULL)
return 1;
@@ -703,6 +702,7 @@ static int nfs4_sequence_done(struct rpc_task *task,
return nfs40_sequence_done(task, res);
return nfs41_sequence_done(task, res);
}
+EXPORT_SYMBOL_GPL(nfs4_sequence_done);
int nfs41_setup_sequence(struct nfs4_session *session,
struct nfs4_sequence_args *args,
--
1.9.3
From: Peng Tao <[email protected]>
so that flexfile layout client can pass in DS credential instead of
using user cred, which will be done in the next patch.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 11 ++++++-----
fs/nfs/internal.h | 6 +++---
fs/nfs/pagelist.c | 8 +++++---
3 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index e5a3c5b..bfa8547 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -501,8 +501,9 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
- &filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
+ nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
+ 0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
}
@@ -542,9 +543,9 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
- &filelayout_write_call_ops, sync,
- RPC_TASK_SOFTCONN);
+ nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
+ sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 98dee83..e9305e9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -250,9 +250,9 @@ struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
void nfs_pgio_data_destroy(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
- const struct nfs_rpc_ops *,
- const struct rpc_call_ops *, int, int);
+int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
+ struct rpc_cred *cred, const struct nfs_rpc_ops *rpc_ops,
+ const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
static inline void nfs_iocounter_init(struct nfs_io_counter *c)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 35a2626..c4d1758 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -597,14 +597,14 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
}
int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct nfs_rpc_ops *rpc_ops,
+ struct rpc_cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
struct rpc_message msg = {
.rpc_argp = &hdr->args,
.rpc_resp = &hdr->res,
- .rpc_cred = hdr->cred,
+ .rpc_cred = cred,
};
struct rpc_task_setup task_setup_data = {
.rpc_client = clnt,
@@ -793,7 +793,9 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0)
ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
- hdr, NFS_PROTO(hdr->inode),
+ hdr,
+ hdr->cred,
+ NFS_PROTO(hdr->inode),
desc->pg_rpc_callops,
desc->pg_ioflags, 0);
return ret;
--
1.9.3
From: Trond Myklebust <[email protected]>
Enable pNFS callbacks to allow flex files to work correctly with a
NFSv3-enabled data server.
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3proc.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 524f9f8..78e557c 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -800,6 +800,9 @@ static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
{
struct inode *inode = hdr->inode;
+ if (hdr->pgio_done_cb != NULL)
+ return hdr->pgio_done_cb(task, hdr);
+
if (nfs3_async_handle_jukebox(task, inode))
return -EAGAIN;
@@ -825,6 +828,9 @@ static int nfs3_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
{
struct inode *inode = hdr->inode;
+ if (hdr->pgio_done_cb != NULL)
+ return hdr->pgio_done_cb(task, hdr);
+
if (nfs3_async_handle_jukebox(task, inode))
return -EAGAIN;
if (task->tk_status >= 0)
@@ -845,6 +851,9 @@ static void nfs3_proc_commit_rpc_prepare(struct rpc_task *task, struct nfs_commi
static int nfs3_commit_done(struct rpc_task *task, struct nfs_commit_data *data)
{
+ if (data->commit_done_cb != NULL)
+ return data->commit_done_cb(task, data);
+
if (nfs3_async_handle_jukebox(task, data->inode))
return -EAGAIN;
nfs_refresh_inode(data->inode, data->res.fattr);
--
1.9.3
From: Weston Andros Adamson <[email protected]>
Add a call to tally stats for a task under a different statsidx than
what's contained in the task structure.
This is needed to properly account for pnfs reads/writes when the
DS nfs version != the MDS version.
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
include/linux/sunrpc/metrics.h | 2 ++
net/sunrpc/stats.c | 26 +++++++++++++++++++-------
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
index eecb5a7..89f2ca1 100644
--- a/include/linux/sunrpc/metrics.h
+++ b/include/linux/sunrpc/metrics.h
@@ -79,6 +79,8 @@ struct rpc_clnt;
struct rpc_iostats * rpc_alloc_iostats(struct rpc_clnt *);
void rpc_count_iostats(const struct rpc_task *,
struct rpc_iostats *);
+void rpc_count_iostats_metrics(const struct rpc_task *,
+ struct rpc_iostats *);
void rpc_print_iostats(struct seq_file *, struct rpc_clnt *);
void rpc_free_iostats(struct rpc_iostats *);
diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 9711a15..2ecb994 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -140,22 +140,20 @@ void rpc_free_iostats(struct rpc_iostats *stats)
EXPORT_SYMBOL_GPL(rpc_free_iostats);
/**
- * rpc_count_iostats - tally up per-task stats
+ * rpc_count_iostats_metrics - tally up per-task stats
* @task: completed rpc_task
- * @stats: array of stat structures
+ * @op_metrics: stat structure for OP that will accumulate stats from @task
*/
-void rpc_count_iostats(const struct rpc_task *task, struct rpc_iostats *stats)
+void rpc_count_iostats_metrics(const struct rpc_task *task,
+ struct rpc_iostats *op_metrics)
{
struct rpc_rqst *req = task->tk_rqstp;
- struct rpc_iostats *op_metrics;
ktime_t delta, now;
- if (!stats || !req)
+ if (!op_metrics || !req)
return;
now = ktime_get();
- op_metrics = &stats[task->tk_msg.rpc_proc->p_statidx];
-
spin_lock(&op_metrics->om_lock);
op_metrics->om_ops++;
@@ -175,6 +173,20 @@ void rpc_count_iostats(const struct rpc_task *task, struct rpc_iostats *stats)
spin_unlock(&op_metrics->om_lock);
}
+EXPORT_SYMBOL_GPL(rpc_count_iostats_metrics);
+
+/**
+ * rpc_count_iostats - tally up per-task stats
+ * @task: completed rpc_task
+ * @stats: array of stat structures
+ *
+ * Uses the statidx from @task
+ */
+void rpc_count_iostats(const struct rpc_task *task, struct rpc_iostats *stats)
+{
+ rpc_count_iostats_metrics(task,
+ &stats[task->tk_msg.rpc_proc->p_statidx]);
+}
EXPORT_SYMBOL_GPL(rpc_count_iostats);
static void _print_name(struct seq_file *seq, unsigned int op,
--
1.9.3
From: Peng Tao <[email protected]>
lockd assumes hostname exists otherwise kernel oops.
It can be reproduced by following steps:
1. mount flexfile MDS
2. write some files
3. mount DS via nfsv3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8134f332>] strlen+0x2/0x20
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: nfsd(F) nfs_layout_flexfiles(F) rpcsec_gss_krb5(F) auth_rpcgss(F) nfsv4(F) dns_resolver(F) nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) ebtable_nat(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) bnep(F) snd_ens1371(F) snd_rawmidi(F) snd_ac97_codec(F) btusb(F) ac97_bus(F) snd_seq(F) snd_seq_device(F) snd_pcm(F) ppdev(F) bluetooth(F) 6lowpan_iphc(F) rfkill(F) vmw_balloon(F) snd_timer(F) snd(F) soundcore(F) gameport(F) i2c_piix4(F) e1000(F) vmw_vmci(F) parport_pc(F) parport(F) shpchp(F) uinput(F) xfs(F) libcrc32c(F) vmwgfx(F) ttm(F) drm(F) mptspi(F) scsi_transport_spi(F) mptscsih(F) mptbase(F) i2c_core(F)
CPU: 0 PID: 10397 Comm: mount.nfs Tainted: GF 3.14.7-100.pd_client.001.fc16.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff880008942600 ti: ffff880007990000 task.ti: ffff880007990000
RIP: 0010:[<ffffffff8134f332>] [<ffffffff8134f332>] strlen+0x2/0x20
RSP: 0018:ffff880007991aa0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880038d39c20 RCX: 0000000000000004
RDX: 0000000000000006 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff880007991b38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000014600 R11: 0000000000000400 R12: ffffffff81cc8580
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000004
FS: 00007f90cd2ef880(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001710000 CR4: 00000000001407f0
Stack:
ffffffffa045f52c ffff880001782230 ffff880004141e28 0006880007991ac8
ffffffff816dc14b ffff880000000000 ffff880038d39c20 0000000000000010
0000000481cc0006 0000000000000000 ffffffffa0410be8 000000000000c014
Call Trace:
[<ffffffffa045f52c>] ? nlmclnt_lookup_host+0x4c/0x2c0 [lockd]
[<ffffffff816dc14b>] ? _raw_spin_unlock_bh+0x1b/0x20
[<ffffffffa0410be8>] ? svc_destroy+0xb8/0x140 [sunrpc]
[<ffffffffa045c323>] nlmclnt_init+0x53/0xc0 [lockd]
[<ffffffffa047d2dc>] ? nfs_get_client+0x1cc/0x340 [nfs]
[<ffffffffa047c2e7>] nfs_start_lockd+0xa7/0xd0 [nfs]
[<ffffffffa047df71>] nfs_create_server+0x181/0x5c0 [nfs]
[<ffffffffa04460f3>] nfs3_create_server+0x13/0x30 [nfsv3]
[<ffffffffa048a0bc>] nfs_try_mount+0x21c/0x300 [nfs]
[<ffffffff811ca32d>] ? __kmalloc_track_caller+0x1ad/0x240
[<ffffffffa048b677>] ? nfs_fs_mount+0xc37/0xd80 [nfs]
[<ffffffffa048ad05>] nfs_fs_mount+0x2c5/0xd80 [nfs]
[<ffffffffa048a830>] ? nfs_clone_super+0x140/0x140 [nfs]
[<ffffffffa048a240>] ? nfs_clone_sb_security+0x40/0x40 [nfs]
[<ffffffff811e7e43>] mount_fs+0x43/0x1b0
[<ffffffff81193100>] ? __alloc_percpu+0x10/0x20
[<ffffffff812026e6>] vfs_kern_mount+0x76/0x120
[<ffffffff81204917>] do_mount+0x237/0xa80
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3client.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 52e2344..9e9fa34 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -1,5 +1,6 @@
#include <linux/nfs_fs.h>
#include <linux/nfs_mount.h>
+#include <linux/sunrpc/addr.h>
#include "internal.h"
#include "nfs3_fs.h"
@@ -89,6 +90,12 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
};
struct rpc_timeout ds_timeout;
struct nfs_client *clp;
+ char buf[INET6_ADDRSTRLEN + 1];
+
+ /* fake a hostname because lockd wants it */
+ if (rpc_ntop(ds_addr, buf, sizeof(buf)) <= 0)
+ return ERR_PTR(-EINVAL);
+ cl_init.hostname = buf;
/* Use the MDS nfs_client cl_ipaddr. */
nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
--
1.9.3
From: Peng Tao <[email protected]>
flexfiles needs to start layoutcommit when necessary
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/pnfs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0a5dda4..2d25670 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1966,6 +1966,7 @@ clear_layoutcommitting:
pnfs_clear_layoutcommitting(inode);
goto out;
}
+EXPORT_SYMBOL_GPL(pnfs_layoutcommit_inode);
struct nfs4_threshold *pnfs_mdsthreshold_alloc(void)
{
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2d25670..fa00b56 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1288,7 +1288,6 @@ pnfs_update_layout(struct inode *ino,
struct nfs_client *clp = server->nfs_client;
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg = NULL;
- bool first;
if (!pnfs_enabled_sb(NFS_SERVER(ino)))
goto out;
@@ -1321,16 +1320,15 @@ pnfs_update_layout(struct inode *ino,
if (pnfs_layoutgets_blocked(lo, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
-
- first = list_empty(&lo->plh_layouts) ? true : false;
spin_unlock(&ino->i_lock);
- if (first) {
+ if (list_empty(&lo->plh_layouts)) {
/* The lo must be on the clp list if there is any
* chance of a CB_LAYOUTRECALL(FILE) coming in.
*/
spin_lock(&clp->cl_lock);
- list_add_tail(&lo->plh_layouts, &server->layouts);
+ if (list_empty(&lo->plh_layouts))
+ list_add_tail(&lo->plh_layouts, &server->layouts);
spin_unlock(&clp->cl_lock);
}
--
1.9.3
From: Peng Tao <[email protected]>
Per RFC 5661 Errata 3208:
| A client MAY always forget its layout state and associated
| layout stateid at any time (See also section 12.5.5.1).
| In such case, the client MUST use a non-layout stateid for the next
| LAYOUTGET operation. This will signal the server that the client has
| no more layouts on the file and its respective layout state can be
| released before issuing a new layout in response to LAYOUTGET.
In order to make such a signal unique to server, client needs to serialize
all layoutgets using non-layout stateid. We implement this by serializing
layoutgets when client has no layout segments at hand.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 35 +++++++++++++++++++++++++++++++----
fs/nfs/pnfs.h | 1 +
2 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index fa00b56..7e1bac1 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1288,6 +1288,7 @@ pnfs_update_layout(struct inode *ino,
struct nfs_client *clp = server->nfs_client;
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg = NULL;
+ bool first;
if (!pnfs_enabled_sb(NFS_SERVER(ino)))
goto out;
@@ -1295,6 +1296,8 @@ pnfs_update_layout(struct inode *ino,
if (pnfs_within_mdsthreshold(ctx, ino, iomode))
goto out;
+lookup_again:
+ first = false;
spin_lock(&ino->i_lock);
lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
if (lo == NULL) {
@@ -1312,10 +1315,27 @@ pnfs_update_layout(struct inode *ino,
if (pnfs_layout_io_test_failed(lo, iomode))
goto out_unlock;
- /* Check to see if the layout for the given range already exists */
- lseg = pnfs_find_lseg(lo, &arg);
- if (lseg)
- goto out_unlock;
+ first = list_empty(&lo->plh_segs);
+ if (first) {
+ /* The first layoutget for the file. Need to serialize per
+ * RFC 5661 Errata 3208.
+ */
+ if (test_and_set_bit(NFS_LAYOUT_FIRST_LAYOUTGET,
+ &lo->plh_flags)) {
+ spin_unlock(&ino->i_lock);
+ wait_on_bit(&lo->plh_flags, NFS_LAYOUT_FIRST_LAYOUTGET,
+ TASK_UNINTERRUPTIBLE);
+ pnfs_put_layout_hdr(lo);
+ goto lookup_again;
+ }
+ } else {
+ /* Check to see if the layout for the given range
+ * already exists
+ */
+ lseg = pnfs_find_lseg(lo, &arg);
+ if (lseg)
+ goto out_unlock;
+ }
if (pnfs_layoutgets_blocked(lo, 0))
goto out_unlock;
@@ -1343,6 +1363,13 @@ pnfs_update_layout(struct inode *ino,
lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
atomic_dec(&lo->plh_outstanding);
out_put_layout_hdr:
+ if (first) {
+ unsigned long *bitlock = &lo->plh_flags;
+
+ clear_bit_unlock(NFS_LAYOUT_FIRST_LAYOUTGET, bitlock);
+ smp_mb__after_atomic();
+ wake_up_bit(bitlock, NFS_LAYOUT_FIRST_LAYOUTGET);
+ }
pnfs_put_layout_hdr(lo);
out:
dprintk("%s: inode %s/%llu pNFS layout segment %s for "
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a5b168c..6594429 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -95,6 +95,7 @@ enum {
NFS_LAYOUT_ROC, /* some lseg had roc bit set */
NFS_LAYOUT_RETURN, /* Return this layout ASAP */
NFS_LAYOUT_INVALID_STID, /* layout stateid id is invalid */
+ NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */
};
enum layoutdriver_policy_flags {
--
1.9.3
From: Peng Tao <[email protected]>
Flexfiles layout would want to use them to report DS IO status.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs3xdr.c | 3 +++
fs/nfs/nfs4xdr.c | 3 +++
include/linux/nfs_xdr.h | 2 ++
3 files changed, 8 insertions(+)
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 8f4cbe7..2a932fd 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -1636,6 +1636,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr,
error = decode_post_op_attr(xdr, result->fattr);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS3_OK)
goto out_status;
error = decode_read3resok(xdr, result);
@@ -1708,6 +1709,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr,
error = decode_wcc_data(xdr, result->fattr);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS3_OK)
goto out_status;
error = decode_write3resok(xdr, result);
@@ -2323,6 +2325,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req,
error = decode_wcc_data(xdr, result->fattr);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS3_OK)
goto out_status;
error = decode_writeverf3(xdr, &result->verf->verifier);
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index cb4376b..7d8d7a4 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6567,6 +6567,7 @@ static int nfs4_xdr_dec_read(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
int status;
status = decode_compound_hdr(xdr, &hdr);
+ res->op_status = hdr.status;
if (status)
goto out;
status = decode_sequence(xdr, &res->seq_res, rqstp);
@@ -6592,6 +6593,7 @@ static int nfs4_xdr_dec_write(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
int status;
status = decode_compound_hdr(xdr, &hdr);
+ res->op_status = hdr.status;
if (status)
goto out;
status = decode_sequence(xdr, &res->seq_res, rqstp);
@@ -6621,6 +6623,7 @@ static int nfs4_xdr_dec_commit(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
int status;
status = decode_compound_hdr(xdr, &hdr);
+ res->op_status = hdr.status;
if (status)
goto out;
status = decode_sequence(xdr, &res->seq_res, rqstp);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 467c84e..962f461 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -513,6 +513,7 @@ struct nfs_pgio_res {
struct nfs4_sequence_res seq_res;
struct nfs_fattr * fattr;
__u32 count;
+ __u32 op_status;
int eof; /* used by read */
struct nfs_writeverf * verf; /* used by write */
const struct nfs_server *server; /* used by write */
@@ -532,6 +533,7 @@ struct nfs_commitargs {
struct nfs_commitres {
struct nfs4_sequence_res seq_res;
+ __u32 op_status;
struct nfs_fattr *fattr;
struct nfs_writeverf *verf;
const struct nfs_server *server;
--
1.9.3
From: Peng Tao <[email protected]>
So that it is possible to return a specific iomode layouts.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4xdr.c | 2 +-
fs/nfs/pnfs.c | 1 +
include/linux/nfs_xdr.h | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 7d8d7a4..3c3ff63 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2012,7 +2012,7 @@ encode_layoutreturn(struct xdr_stream *xdr,
p = reserve_space(xdr, 16);
*p++ = cpu_to_be32(0); /* reclaim. always 0 for now */
*p++ = cpu_to_be32(args->layout_type);
- *p++ = cpu_to_be32(IOMODE_ANY);
+ *p++ = cpu_to_be32(args->iomode);
*p = cpu_to_be32(RETURN_FILE);
p = reserve_space(xdr, 16);
p = xdr_encode_hyper(p, 0);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7e1bac1..1b544c1 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -914,6 +914,7 @@ _pnfs_return_layout(struct inode *ino)
lrp->args.stateid = stateid;
lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
lrp->args.inode = ino;
+ lrp->args.iomode = IOMODE_ANY;
lrp->args.layout = lo;
lrp->clp = NFS_SERVER(ino)->nfs_client;
lrp->cred = lo->plh_lc_cred;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 962f461..4fd7793 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -293,6 +293,7 @@ struct nfs4_layoutreturn_args {
struct nfs4_sequence_args seq_args;
struct pnfs_layout_hdr *layout;
struct inode *inode;
+ enum pnfs_iomode iomode;
nfs4_stateid stateid;
__u32 layout_type;
};
--
1.9.3
From: Peng Tao <[email protected]>
It allows to specify different iomode to return.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 53 +++++++++++++++++++++++++++++++++--------------------
1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 1b544c1..1b97209 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -845,6 +845,38 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
}
}
+static int
+pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
+ enum pnfs_iomode iomode)
+{
+ struct inode *ino = lo->plh_inode;
+ struct nfs4_layoutreturn *lrp;
+ int status = 0;
+
+ lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+ if (unlikely(lrp == NULL)) {
+ status = -ENOMEM;
+ spin_lock(&ino->i_lock);
+ lo->plh_block_lgets--;
+ spin_unlock(&ino->i_lock);
+ pnfs_put_layout_hdr(lo);
+ goto out;
+ }
+
+ lrp->args.stateid = stateid;
+ lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
+ lrp->args.inode = ino;
+ lrp->args.iomode = iomode;
+ lrp->args.layout = lo;
+ lrp->clp = NFS_SERVER(ino)->nfs_client;
+ lrp->cred = lo->plh_lc_cred;
+
+ status = nfs4_proc_layoutreturn(lrp);
+out:
+ dprintk("<-- %s status: %d\n", __func__, status);
+ return status;
+}
+
/*
* Initiates a LAYOUTRETURN(FILE), and removes the pnfs_layout_hdr
* when the layout segment list is empty.
@@ -859,7 +891,6 @@ _pnfs_return_layout(struct inode *ino)
struct pnfs_layout_hdr *lo = NULL;
struct nfs_inode *nfsi = NFS_I(ino);
LIST_HEAD(tmp_list);
- struct nfs4_layoutreturn *lrp;
nfs4_stateid stateid;
int status = 0, empty;
@@ -901,25 +932,7 @@ _pnfs_return_layout(struct inode *ino)
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&tmp_list);
- lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
- if (unlikely(lrp == NULL)) {
- status = -ENOMEM;
- spin_lock(&ino->i_lock);
- lo->plh_block_lgets--;
- spin_unlock(&ino->i_lock);
- pnfs_put_layout_hdr(lo);
- goto out;
- }
-
- lrp->args.stateid = stateid;
- lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
- lrp->args.inode = ino;
- lrp->args.iomode = IOMODE_ANY;
- lrp->args.layout = lo;
- lrp->clp = NFS_SERVER(ino)->nfs_client;
- lrp->cred = lo->plh_lc_cred;
-
- status = nfs4_proc_layoutreturn(lrp);
+ status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
--
1.9.3
From: Peng Tao <[email protected]>
It marks all matching layout segments as NFS_LSEG_LAYOUTRETURN,
which is an indicator for pnfs_put_lseg() to send layoutreturn,
and also prevents pnfs_update_layout() from using the returning
segments. Once it is set, it never gets cleared.
It also sets proper io failure bit so that pnfs path can be retried
after PNFS_LAYOUTGET_RETRY_TIMEOUT second.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/pnfs.h | 4 ++++
2 files changed, 59 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 1b97209..0bd149b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1479,6 +1479,61 @@ out_forget_reply:
goto out;
}
+static void
+pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
+ struct list_head *tmp_list,
+ struct pnfs_layout_range *return_range)
+{
+ struct pnfs_layout_segment *lseg, *next;
+
+ dprintk("%s:Begin lo %p\n", __func__, lo);
+
+ if (list_empty(&lo->plh_segs))
+ return;
+
+ list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list)
+ if (should_free_lseg(&lseg->pls_range, return_range)) {
+ dprintk("%s: marking lseg %p iomode %d "
+ "offset %llu length %llu\n", __func__,
+ lseg, lseg->pls_range.iomode,
+ lseg->pls_range.offset,
+ lseg->pls_range.length);
+ set_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags);
+ mark_lseg_invalid(lseg, tmp_list);
+ }
+}
+
+void pnfs_error_mark_layout_for_return(struct inode *inode,
+ struct pnfs_layout_segment *lseg)
+{
+ struct pnfs_layout_hdr *lo = NFS_I(inode)->layout;
+ int iomode = pnfs_iomode_to_fail_bit(lseg->pls_range.iomode);
+ struct pnfs_layout_range range = {
+ .iomode = lseg->pls_range.iomode,
+ .offset = 0,
+ .length = NFS4_MAX_UINT64,
+ };
+ LIST_HEAD(free_me);
+
+ spin_lock(&inode->i_lock);
+ /* set failure bit so that pnfs path will be retried later */
+ pnfs_layout_set_fail_bit(lo, iomode);
+ set_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ if (lo->plh_return_iomode == 0)
+ lo->plh_return_iomode = range.iomode;
+ else if (lo->plh_return_iomode != range.iomode)
+ lo->plh_return_iomode = IOMODE_ANY;
+ /*
+ * mark all matching lsegs so that we are sure to have no live
+ * segments at hand when sending layoutreturn. See pnfs_put_lseg()
+ * for how it works.
+ */
+ pnfs_mark_matching_lsegs_return(lo, &free_me, &range);
+ spin_unlock(&inode->i_lock);
+ pnfs_free_lseg_list(&free_me);
+}
+EXPORT_SYMBOL_GPL(pnfs_error_mark_layout_for_return);
+
void
pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
{
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 6594429..3ce292e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -38,6 +38,7 @@ enum {
NFS_LSEG_VALID = 0, /* cleared when lseg is recalled/returned */
NFS_LSEG_ROC, /* roc bit received from server */
NFS_LSEG_LAYOUTCOMMIT, /* layoutcommit bit set for layoutcommit */
+ NFS_LSEG_LAYOUTRETURN, /* layoutreturn bit set for layoutreturn */
};
/* Individual ip address */
@@ -184,6 +185,7 @@ struct pnfs_layout_hdr {
u32 plh_barrier; /* ignore lower seqids */
unsigned long plh_retry_timestamp;
unsigned long plh_flags;
+ enum pnfs_iomode plh_return_iomode;
loff_t plh_lwb; /* last write byte for layoutcommit */
struct rpc_cred *plh_lc_cred; /* layoutcommit cred */
struct inode *plh_inode;
@@ -274,6 +276,8 @@ void nfs4_deviceid_mark_client_invalid(struct nfs_client *clp);
int pnfs_read_done_resend_to_mds(struct nfs_pgio_header *);
int pnfs_write_done_resend_to_mds(struct nfs_pgio_header *);
struct nfs4_threshold *pnfs_mdsthreshold_alloc(void);
+void pnfs_error_mark_layout_for_return(struct inode *inode,
+ struct pnfs_layout_segment *lseg);
/* nfs4_deviceid_flags */
enum {
--
1.9.3
From: Peng Tao <[email protected]>
And if we are to return the same type of layouts, don't bother
sending more layoutgets.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
fs/nfs/pnfs.c | 23 ++++++++++++++++++-----
fs/nfs/pnfs.h | 1 +
3 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4883a42..b5bbe35 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7530,6 +7530,7 @@ nfs4_layoutget_prepare(struct rpc_task *task, void *calldata)
return;
if (pnfs_choose_layoutget_stateid(&lgp->args.stateid,
NFS_I(lgp->args.inode)->layout,
+ &lgp->args.range,
lgp->args.ctx->state)) {
rpc_exit(task, NFS4_OK);
}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0bd149b..853b544 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -740,25 +740,37 @@ pnfs_layout_stateid_blocked(const struct pnfs_layout_hdr *lo,
return !pnfs_seqid_is_newer(seqid, lo->plh_barrier);
}
+static bool
+pnfs_layout_returning(const struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range)
+{
+ return test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags) &&
+ (lo->plh_return_iomode == IOMODE_ANY ||
+ lo->plh_return_iomode == range->iomode);
+}
+
/* lget is set to 1 if called from inside send_layoutget call chain */
static bool
-pnfs_layoutgets_blocked(const struct pnfs_layout_hdr *lo, int lget)
+pnfs_layoutgets_blocked(const struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range, int lget)
{
return lo->plh_block_lgets ||
test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
(list_empty(&lo->plh_segs) &&
- (atomic_read(&lo->plh_outstanding) > lget));
+ (atomic_read(&lo->plh_outstanding) > lget)) ||
+ pnfs_layout_returning(lo, range);
}
int
pnfs_choose_layoutget_stateid(nfs4_stateid *dst, struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range,
struct nfs4_state *open_state)
{
int status = 0;
dprintk("--> %s\n", __func__);
spin_lock(&lo->plh_inode->i_lock);
- if (pnfs_layoutgets_blocked(lo, 1)) {
+ if (pnfs_layoutgets_blocked(lo, range, 1)) {
status = -EAGAIN;
} else if (!nfs4_valid_open_stateid(open_state)) {
status = -EBADF;
@@ -1192,6 +1204,7 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo,
list_for_each_entry(lseg, &lo->plh_segs, pls_list) {
if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags) &&
+ !test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags) &&
pnfs_lseg_range_match(&lseg->pls_range, range)) {
ret = pnfs_get_lseg(lseg);
break;
@@ -1351,7 +1364,7 @@ lookup_again:
goto out_unlock;
}
- if (pnfs_layoutgets_blocked(lo, 0))
+ if (pnfs_layoutgets_blocked(lo, &arg, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
spin_unlock(&ino->i_lock);
@@ -1432,7 +1445,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
goto out_forget_reply;
}
- if (pnfs_layoutgets_blocked(lo, 1)) {
+ if (pnfs_layoutgets_blocked(lo, &lgp->args.range, 1)) {
dprintk("%s forget reply due to state\n", __func__);
goto out_forget_reply;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 3ce292e..4863991 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -249,6 +249,7 @@ void pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
bool update_barrier);
int pnfs_choose_layoutget_stateid(nfs4_stateid *dst,
struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range,
struct nfs4_state *open_state);
int pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
struct list_head *tmp_list,
--
1.9.3
From: Peng Tao <[email protected]>
If current lseg is the last lseg marked with NFS_LSEG_LAYOUTRETURN,
send layoutreturn.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 853b544..e9acfcf 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -50,6 +50,10 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
*/
static LIST_HEAD(pnfs_modules_tbl);
+static int
+pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
+ enum pnfs_iomode iomode);
+
/* Return the registered pnfs layout driver module matching given id */
static struct pnfs_layoutdriver_type *
find_pnfs_driver_locked(u32 id)
@@ -337,6 +341,29 @@ pnfs_layout_remove_lseg(struct pnfs_layout_hdr *lo,
rpc_wake_up(&NFS_SERVER(inode)->roc_rpcwaitq);
}
+/* Return true if layoutreturn is needed */
+static bool
+pnfs_layout_need_return(struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_segment *lseg,
+ nfs4_stateid *stateid, enum pnfs_iomode *iomode)
+{
+ struct pnfs_layout_segment *s;
+
+ if (!test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags))
+ return false;
+
+ list_for_each_entry(s, &lo->plh_segs, pls_list)
+ if (test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags))
+ return false;
+
+ *stateid = lo->plh_stateid;
+ *iomode = lo->plh_return_iomode;
+ /* decreased in pnfs_send_layoutreturn() */
+ lo->plh_block_lgets++;
+ lo->plh_return_iomode = 0;
+ return true;
+}
+
void
pnfs_put_lseg(struct pnfs_layout_segment *lseg)
{
@@ -352,11 +379,20 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
lo = lseg->pls_layout;
inode = lo->plh_inode;
if (atomic_dec_and_lock(&lseg->pls_refcount, &inode->i_lock)) {
+ bool need_return;
+ nfs4_stateid stateid;
+ enum pnfs_iomode iomode;
+
pnfs_get_layout_hdr(lo);
pnfs_layout_remove_lseg(lo, lseg);
+ need_return = pnfs_layout_need_return(lo, lseg,
+ &stateid, &iomode);
spin_unlock(&inode->i_lock);
pnfs_free_lseg(lseg);
- pnfs_put_layout_hdr(lo);
+ if (need_return)
+ pnfs_send_layoutreturn(lo, stateid, iomode);
+ else
+ pnfs_put_layout_hdr(lo);
}
}
EXPORT_SYMBOL_GPL(pnfs_put_lseg);
--
1.9.3
From: Peng Tao <[email protected]>
So that pnfs path is not disabled for ever.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
fs/nfs/pnfs.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b5bbe35..bf5ef58 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7786,6 +7786,7 @@ static void nfs4_layoutreturn_release(void *calldata)
spin_lock(&lo->plh_inode->i_lock);
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
+ clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
lo->plh_block_lgets--;
spin_unlock(&lo->plh_inode->i_lock);
pnfs_put_layout_hdr(lrp->args.layout);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e9acfcf..63992c8 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -921,6 +921,11 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = nfs4_proc_layoutreturn(lrp);
out:
+ if (status) {
+ spin_lock(&ino->i_lock);
+ clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ spin_unlock(&ino->i_lock);
+ }
dprintk("<-- %s status: %d\n", __func__, status);
return status;
}
--
1.9.3
From: Peng Tao <[email protected]>
Instead of calling layoutreturn directly, call pnfs_error_mark_layout_for_return
to mark layouts for return and let generic code return layout when
layout segments are freed.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
Conflicts:
fs/nfs/filelayout/filelayout.c
---
fs/nfs/filelayout/filelayout.c | 2 +-
fs/nfs/pnfs_nfsio.c | 10 ----------
2 files changed, 1 insertion(+), 11 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index bfa8547..5d2eadc 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -200,7 +200,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
dprintk("%s DS connection error %d\n", __func__,
task->tk_status);
nfs4_mark_deviceid_unavailable(devid);
- set_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ pnfs_error_mark_layout_for_return(inode, lseg);
rpc_wake_up(&tbl->slot_tbl_waitq);
/* fall through */
default:
diff --git a/fs/nfs/pnfs_nfsio.c b/fs/nfs/pnfs_nfsio.c
index 329447c..c6c3efc 100644
--- a/fs/nfs/pnfs_nfsio.c
+++ b/fs/nfs/pnfs_nfsio.c
@@ -13,20 +13,10 @@
#include "internal.h"
#include "pnfs.h"
-static void pnfs_generic_fenceme(struct inode *inode,
- struct pnfs_layout_hdr *lo)
-{
- if (!test_and_clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
- return;
- pnfs_return_layout(inode);
-}
-
void pnfs_generic_rw_release(void *data)
{
struct nfs_pgio_header *hdr = data;
- struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
- pnfs_generic_fenceme(lo->plh_inode, lo);
nfs_put_client(hdr->ds_clp);
hdr->mds_ops->rpc_release(data);
}
--
1.9.3
From: Weston Andros Adamson <[email protected]>
Add a new operation to nfs_pageio_ops that is called on nfs_pageio_complete.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/pagelist.c | 5 ++++-
include/linux/nfs_page.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index c4d1758..1c03187 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1050,7 +1050,7 @@ int nfs_pageio_resend(struct nfs_pageio_descriptor *desc,
EXPORT_SYMBOL_GPL(nfs_pageio_resend);
/**
- * nfs_pageio_complete - Complete I/O on an nfs_pageio_descriptor
+ * nfs_pageio_complete - Complete I/O then cleanup an nfs_pageio_descriptor
* @desc: pointer to io descriptor
*/
void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
@@ -1062,6 +1062,9 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
if (!nfs_do_recoalesce(desc))
break;
}
+
+ if (desc->pg_ops->pg_cleanup)
+ desc->pg_ops->pg_cleanup(desc);
}
/**
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 4c3aa80..479c566 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -58,6 +58,7 @@ struct nfs_pageio_ops {
size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
struct nfs_page *);
int (*pg_doio)(struct nfs_pageio_descriptor *);
+ void (*pg_cleanup)(struct nfs_pageio_descriptor *);
};
struct nfs_rw_ops {
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/blocklayout/blocklayout.c | 2 ++
fs/nfs/filelayout/filelayout.c | 2 ++
fs/nfs/objlayout/objio_osd.c | 2 ++
fs/nfs/pnfs.c | 32 ++++++++++++++------------------
fs/nfs/pnfs.h | 1 +
5 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 77fec6a..1cac3c1 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -860,12 +860,14 @@ static const struct nfs_pageio_ops bl_pg_read_ops = {
.pg_init = bl_pg_init_read,
.pg_test = bl_pg_test_read,
.pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static const struct nfs_pageio_ops bl_pg_write_ops = {
.pg_init = bl_pg_init_write,
.pg_test = bl_pg_test_write,
.pg_doio = pnfs_generic_pg_writepages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static struct pnfs_layoutdriver_type blocklayout_type = {
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 5d2eadc..2af32fc 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -933,12 +933,14 @@ static const struct nfs_pageio_ops filelayout_pg_read_ops = {
.pg_init = filelayout_pg_init_read,
.pg_test = filelayout_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static const struct nfs_pageio_ops filelayout_pg_write_ops = {
.pg_init = filelayout_pg_init_write,
.pg_test = filelayout_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static u32 select_bucket_index(struct nfs4_filelayout_segment *fl, u32 j)
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 9e5bc42..d007780 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -607,12 +607,14 @@ static const struct nfs_pageio_ops objio_pg_read_ops = {
.pg_init = objio_init_read,
.pg_test = objio_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static const struct nfs_pageio_ops objio_pg_write_ops = {
.pg_init = objio_init_write,
.pg_test = objio_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static struct pnfs_layoutdriver_type objlayout_type = {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 63992c8..2da2e77 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1631,6 +1631,16 @@ pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_init_write);
+void
+pnfs_generic_pg_cleanup(struct nfs_pageio_descriptor *desc)
+{
+ if (desc->pg_lseg) {
+ pnfs_put_lseg(desc->pg_lseg);
+ desc->pg_lseg = NULL;
+ }
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
+
/*
* Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
* of bytes (maximum @req->wb_bytes) that can be coalesced.
@@ -1756,11 +1766,9 @@ pnfs_do_write(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- desc->pg_lseg = NULL;
trypnfs = pnfs_try_to_write_data(hdr, call_ops, lseg, how);
if (trypnfs == PNFS_NOT_ATTEMPTED)
pnfs_write_through_mds(desc, hdr);
- pnfs_put_lseg(lseg);
}
static void pnfs_writehdr_free(struct nfs_pgio_header *hdr)
@@ -1779,17 +1787,13 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
desc->pg_completion_ops->error_cleanup(&desc->pg_list);
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
+
hdr->lseg = pnfs_get_lseg(desc->pg_lseg);
ret = nfs_generic_pgio(desc, hdr);
- if (ret != 0) {
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
- } else
+ if (!ret)
pnfs_do_write(desc, hdr, desc->pg_ioflags);
return ret;
}
@@ -1874,11 +1878,9 @@ pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- desc->pg_lseg = NULL;
trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
if (trypnfs == PNFS_NOT_ATTEMPTED)
pnfs_read_through_mds(desc, hdr);
- pnfs_put_lseg(lseg);
}
static void pnfs_readhdr_free(struct nfs_pgio_header *hdr)
@@ -1897,18 +1899,12 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
desc->pg_completion_ops->error_cleanup(&desc->pg_list);
- ret = -ENOMEM;
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
- return ret;
+ return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
hdr->lseg = pnfs_get_lseg(desc->pg_lseg);
ret = nfs_generic_pgio(desc, hdr);
- if (ret != 0) {
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
- } else
+ if (!ret)
pnfs_do_read(desc, hdr);
return ret;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 4863991..d3cbb6e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -230,6 +230,7 @@ void pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *, struct nfs_page *
int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc);
void pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
struct nfs_page *req, u64 wb_size);
+void pnfs_generic_pg_cleanup(struct nfs_pageio_descriptor *);
int pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc);
size_t pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req);
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This is needed for mirrored DS support, where multuple requests
cover the same range.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/write.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ab392af..a6eadac 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -473,13 +473,18 @@ try_again:
do {
/*
* Subrequests are always contiguous, non overlapping
- * and in order. If not, it's a programming error.
+ * and in order - but may be repeated (mirrored writes).
*/
- WARN_ON_ONCE(subreq->wb_offset !=
- (head->wb_offset + total_bytes));
-
- /* keep track of how many bytes this group covers */
- total_bytes += subreq->wb_bytes;
+ if (subreq->wb_offset == (head->wb_offset + total_bytes)) {
+ /* keep track of how many bytes this group covers */
+ total_bytes += subreq->wb_bytes;
+ } else if (WARN_ON_ONCE(subreq->wb_offset < head->wb_offset ||
+ ((subreq->wb_offset + subreq->wb_bytes) >
+ (head->wb_offset + total_bytes)))) {
+ nfs_page_group_unlock(head);
+ spin_unlock(&inode->i_lock);
+ return ERR_PTR(-EIO);
+ }
if (!nfs_lock_request(subreq)) {
/* releases page group bit lock and
--
1.9.3
From: Weston Andros Adamson <[email protected]>
'ds_commit_idx' is a better name - it is used to select the right
commit bucket for pnfs.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 14 ++++++--------
fs/nfs/filelayout/filelayout.c | 4 ++--
include/linux/nfs_xdr.h | 2 +-
3 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e84f764..d7c2d43 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -112,22 +112,22 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
* nfs_direct_select_verf - select the right verifier
* @dreq - direct request possibly spanning multiple servers
* @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
- * @ds_idx - index of data server in data server list, only valid if ds_clp set
+ * @commit_idx - commit bucket index for the DS
*
* returns the correct verifier to use given the role of the server
*/
static struct nfs_writeverf *
nfs_direct_select_verf(struct nfs_direct_req *dreq,
struct nfs_client *ds_clp,
- int ds_idx)
+ int commit_idx)
{
struct nfs_writeverf *verfp = &dreq->verf;
#ifdef CONFIG_NFS_V4_1
if (ds_clp) {
/* pNFS is in use, use the DS verf */
- if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
- verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
+ if (commit_idx >= 0 && commit_idx < dreq->ds_cinfo.nbuckets)
+ verfp = &dreq->ds_cinfo.buckets[commit_idx].direct_verf;
else
WARN_ON_ONCE(1);
}
@@ -148,8 +148,7 @@ static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
{
struct nfs_writeverf *verfp;
- verfp = nfs_direct_select_verf(dreq, hdr->ds_clp,
- hdr->ds_idx);
+ verfp = nfs_direct_select_verf(dreq, hdr->ds_clp, hdr->ds_commit_idx);
WARN_ON_ONCE(verfp->committed >= 0);
memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
WARN_ON_ONCE(verfp->committed < 0);
@@ -169,8 +168,7 @@ static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
{
struct nfs_writeverf *verfp;
- verfp = nfs_direct_select_verf(dreq, hdr->ds_clp,
- hdr->ds_idx);
+ verfp = nfs_direct_select_verf(dreq, hdr->ds_clp, hdr->ds_commit_idx);
if (verfp->committed < 0) {
nfs_direct_set_hdr_verf(dreq, hdr);
return 0;
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 2af32fc..520cbc5 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -492,7 +492,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
/* No multipath support. Use first DS */
atomic_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- hdr->ds_idx = idx;
+ hdr->ds_commit_idx = idx;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
hdr->args.fh = fh;
@@ -536,7 +536,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->pgio_done_cb = filelayout_write_done_cb;
atomic_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- hdr->ds_idx = idx;
+ hdr->ds_commit_idx = idx;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
hdr->args.fh = fh;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 4fd7793..5bc99f0 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1328,7 +1328,7 @@ struct nfs_pgio_header {
__u64 mds_offset; /* Filelayout dense stripe */
struct nfs_page_array page_array;
struct nfs_client *ds_clp; /* pNFS data server */
- int ds_idx; /* ds index if ds_clp is set */
+ int ds_commit_idx; /* ds index if ds_clp is set */
};
struct nfs_mds_commit_info {
--
1.9.3
From: Weston Andros Adamson <[email protected]>
Pass ds_commit_idx through the nfs commit path. It's used to select
the commit bucket when using pnfs and is ignored when not using pnfs.
Several functions had to be changed: nfs_retry_commit,
nfs_mark_request_commit, pnfs_mark_request_commit and the pnfs layout
driver .mark_request_commit functions.
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/direct.c | 5 +++--
fs/nfs/filelayout/filelayout.c | 3 ++-
fs/nfs/internal.h | 6 ++++--
fs/nfs/pnfs.h | 9 +++++----
fs/nfs/pnfs_nfsio.c | 4 ++--
fs/nfs/write.c | 14 ++++++++------
6 files changed, 24 insertions(+), 17 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index d7c2d43..1ee41d7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -649,7 +649,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
nfs_list_remove_request(req);
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) {
/* Note the rewrite will go through mds */
- nfs_mark_request_commit(req, NULL, &cinfo);
+ nfs_mark_request_commit(req, NULL, &cinfo, 0);
} else
nfs_release_request(req);
nfs_unlock_and_release_request(req);
@@ -748,7 +748,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
nfs_list_remove_request(req);
if (request_commit) {
kref_get(&req->wb_kref);
- nfs_mark_request_commit(req, hdr->lseg, &cinfo);
+ nfs_mark_request_commit(req, hdr->lseg, &cinfo,
+ hdr->ds_commit_idx);
}
nfs_unlock_and_release_request(req);
}
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 520cbc5..3c97694 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -954,7 +954,8 @@ static u32 select_bucket_index(struct nfs4_filelayout_segment *fl, u32 j)
static void
filelayout_mark_request_commit(struct nfs_page *req,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx)
{
struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e9305e9..05f9a87 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -450,13 +450,15 @@ int nfs_scan_commit(struct inode *inode, struct list_head *dst,
struct nfs_commit_info *cinfo);
void nfs_mark_request_commit(struct nfs_page *req,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo);
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx);
int nfs_write_need_commit(struct nfs_pgio_header *);
int nfs_generic_commit_list(struct inode *inode, struct list_head *head,
int how, struct nfs_commit_info *cinfo);
void nfs_retry_commit(struct list_head *page_list,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo);
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx);
void nfs_commitdata_release(struct nfs_commit_data *data);
void nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
struct nfs_commit_info *cinfo);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index d3cbb6e..ebb4e82 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -137,7 +137,8 @@ struct pnfs_layoutdriver_type {
struct pnfs_ds_commit_info *(*get_ds_info) (struct inode *inode);
void (*mark_request_commit) (struct nfs_page *req,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo);
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx);
void (*clear_request_commit) (struct nfs_page *req,
struct nfs_commit_info *cinfo);
int (*scan_commit_lists) (struct nfs_commit_info *cinfo,
@@ -388,14 +389,14 @@ pnfs_generic_mark_devid_invalid(struct nfs4_deviceid_node *node)
static inline bool
pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo, u32 ds_commit_idx)
{
struct inode *inode = req->wb_context->dentry->d_inode;
struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
if (lseg == NULL || ld->mark_request_commit == NULL)
return false;
- ld->mark_request_commit(req, lseg, cinfo);
+ ld->mark_request_commit(req, lseg, cinfo, ds_commit_idx);
return true;
}
@@ -573,7 +574,7 @@ pnfs_get_ds_info(struct inode *inode)
static inline bool
pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo, u32 ds_commit_idx)
{
return false;
}
diff --git a/fs/nfs/pnfs_nfsio.c b/fs/nfs/pnfs_nfsio.c
index c6c3efc..842258e 100644
--- a/fs/nfs/pnfs_nfsio.c
+++ b/fs/nfs/pnfs_nfsio.c
@@ -183,7 +183,7 @@ static void pnfs_generic_retry_commit(struct nfs_commit_info *cinfo, int idx)
bucket = &fl_cinfo->buckets[i];
if (list_empty(&bucket->committing))
continue;
- nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
+ nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo, i);
spin_lock(cinfo->lock);
freeme = bucket->clseg;
bucket->clseg = NULL;
@@ -242,7 +242,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
list_add(&data->pages, &list);
nreq++;
} else {
- nfs_retry_commit(mds_pages, NULL, cinfo);
+ nfs_retry_commit(mds_pages, NULL, cinfo, 0);
pnfs_generic_retry_commit(cinfo, 0);
cinfo->completion_ops->error_cleanup(NFS_I(inode));
return -ENOMEM;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a6eadac..db802d9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -847,9 +847,9 @@ EXPORT_SYMBOL_GPL(nfs_init_cinfo);
*/
void
nfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo, u32 ds_commit_idx)
{
- if (pnfs_mark_request_commit(req, lseg, cinfo))
+ if (pnfs_mark_request_commit(req, lseg, cinfo, ds_commit_idx))
return;
nfs_request_add_commit_list(req, &cinfo->mds->list, cinfo);
}
@@ -905,7 +905,8 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
}
if (nfs_write_need_commit(hdr)) {
memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
- nfs_mark_request_commit(req, hdr->lseg, &cinfo);
+ nfs_mark_request_commit(req, hdr->lseg, &cinfo,
+ 0);
goto next;
}
remove_req:
@@ -1561,14 +1562,15 @@ EXPORT_SYMBOL_GPL(nfs_init_commit);
void nfs_retry_commit(struct list_head *page_list,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx)
{
struct nfs_page *req;
while (!list_empty(page_list)) {
req = nfs_list_entry(page_list->next);
nfs_list_remove_request(req);
- nfs_mark_request_commit(req, lseg, cinfo);
+ nfs_mark_request_commit(req, lseg, cinfo, ds_commit_idx);
if (!cinfo->dreq) {
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
dec_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
@@ -1599,7 +1601,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how, 0);
out_bad:
- nfs_retry_commit(head, NULL, cinfo);
+ nfs_retry_commit(head, NULL, cinfo, 0);
cinfo->completion_ops->error_cleanup(NFS_I(inode));
return -ENOMEM;
}
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.
The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 17 ++-
fs/nfs/internal.h | 1 +
fs/nfs/objlayout/objio_osd.c | 3 +-
fs/nfs/pagelist.c | 270 +++++++++++++++++++++++++++++++++++--------
fs/nfs/pnfs.c | 26 +++--
fs/nfs/read.c | 30 ++++-
fs/nfs/write.c | 10 +-
include/linux/nfs_page.h | 20 +++-
include/linux/nfs_xdr.h | 1 +
9 files changed, 311 insertions(+), 67 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1ee41d7..0178d4f 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -360,8 +360,14 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
spin_lock(&dreq->lock);
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
dreq->error = hdr->error;
- else
- dreq->count += hdr->good_bytes;
+ else {
+ /*
+ * FIXME: right now this only accounts for bytes written
+ * to the first mirror
+ */
+ if (hdr->pgio_mirror_idx == 0)
+ dreq->count += hdr->good_bytes;
+ }
spin_unlock(&dreq->lock);
while (!list_empty(&hdr->pages)) {
@@ -724,7 +730,12 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
dreq->error = hdr->error;
}
if (dreq->error == 0) {
- dreq->count += hdr->good_bytes;
+ /*
+ * FIXME: right now this only accounts for bytes written
+ * to the first mirror
+ */
+ if (hdr->pgio_mirror_idx == 0)
+ dreq->count += hdr->good_bytes;
if (nfs_write_need_commit(hdr)) {
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
request_commit = true;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 05f9a87..ef1c703 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -469,6 +469,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
struct nfs_direct_req *dreq);
int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
+void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index d007780..9a5f2ee 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -537,11 +537,12 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req)
{
+ struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
unsigned int size;
size = pnfs_generic_pg_test(pgio, prev, req);
- if (!size || pgio->pg_count + req->wb_bytes >
+ if (!size || mirror->pg_count + req->wb_bytes >
(unsigned long)pgio->pg_layout_private)
return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 1c03187..eec12b7 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -46,17 +46,22 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr,
void (*release)(struct nfs_pgio_header *hdr))
{
- hdr->req = nfs_list_entry(desc->pg_list.next);
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
+ hdr->req = nfs_list_entry(mirror->pg_list.next);
hdr->inode = desc->pg_inode;
hdr->cred = hdr->req->wb_context->cred;
hdr->io_start = req_offset(hdr->req);
- hdr->good_bytes = desc->pg_count;
+ hdr->good_bytes = mirror->pg_count;
hdr->dreq = desc->pg_dreq;
hdr->layout_private = desc->pg_layout_private;
hdr->release = release;
hdr->completion_ops = desc->pg_completion_ops;
if (hdr->completion_ops->init_hdr)
hdr->completion_ops->init_hdr(hdr);
+
+ hdr->pgio_mirror_idx = desc->pg_mirror_idx;
}
EXPORT_SYMBOL_GPL(nfs_pgheader_init);
@@ -480,7 +485,10 @@ nfs_wait_on_request(struct nfs_page *req)
size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev, struct nfs_page *req)
{
- if (desc->pg_count > desc->pg_bsize) {
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
+ if (mirror->pg_count > mirror->pg_bsize) {
/* should never happen */
WARN_ON_ONCE(1);
return 0;
@@ -490,11 +498,11 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
* Limit the request size so that we can still allocate a page array
* for it without upsetting the slab allocator.
*/
- if (((desc->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
+ if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
sizeof(struct page) > PAGE_SIZE)
return 0;
- return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
+ return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
}
EXPORT_SYMBOL_GPL(nfs_generic_pg_test);
@@ -651,10 +659,18 @@ EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror;
+ u32 midx;
+
set_bit(NFS_IOHDR_REDO, &hdr->flags);
nfs_pgio_data_destroy(hdr);
hdr->completion_ops->completion(hdr);
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ /* TODO: Make sure it's right to clean up all mirrors here
+ * and not just hdr->pgio_mirror_idx */
+ for (midx = 0; midx < desc->pg_mirror_count; midx++) {
+ mirror = &desc->pg_mirrors[midx];
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
+ }
return -ENOMEM;
}
@@ -671,6 +687,17 @@ static void nfs_pgio_release(void *calldata)
hdr->completion_ops->completion(hdr);
}
+static void nfs_pageio_mirror_init(struct nfs_pgio_mirror *mirror,
+ unsigned int bsize)
+{
+ INIT_LIST_HEAD(&mirror->pg_list);
+ mirror->pg_bytes_written = 0;
+ mirror->pg_count = 0;
+ mirror->pg_bsize = bsize;
+ mirror->pg_base = 0;
+ mirror->pg_recoalesce = 0;
+}
+
/**
* nfs_pageio_init - initialise a page io descriptor
* @desc: pointer to descriptor
@@ -687,13 +714,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
size_t bsize,
int io_flags)
{
- INIT_LIST_HEAD(&desc->pg_list);
- desc->pg_bytes_written = 0;
- desc->pg_count = 0;
- desc->pg_bsize = bsize;
- desc->pg_base = 0;
+ struct nfs_pgio_mirror *new;
+ int i;
+
desc->pg_moreio = 0;
- desc->pg_recoalesce = 0;
desc->pg_inode = inode;
desc->pg_ops = pg_ops;
desc->pg_completion_ops = compl_ops;
@@ -703,6 +727,26 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
desc->pg_lseg = NULL;
desc->pg_dreq = NULL;
desc->pg_layout_private = NULL;
+ desc->pg_bsize = bsize;
+
+ desc->pg_mirror_count = 1;
+ desc->pg_mirror_idx = 0;
+
+ if (pg_ops->pg_get_mirror_count) {
+ /* until we have a request, we don't have an lseg and no
+ * idea how many mirrors there will be */
+ new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
+ sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
+ desc->pg_mirrors_dynamic = new;
+ desc->pg_mirrors = new;
+
+ for (i = 0; i < NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX; i++)
+ nfs_pageio_mirror_init(&desc->pg_mirrors[i], bsize);
+ } else {
+ desc->pg_mirrors_dynamic = NULL;
+ desc->pg_mirrors = desc->pg_mirrors_static;
+ nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
+ }
}
EXPORT_SYMBOL_GPL(nfs_pageio_init);
@@ -738,14 +782,16 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_page *req;
struct page **pages,
*last_page;
- struct list_head *head = &desc->pg_list;
+ struct list_head *head = &mirror->pg_list;
struct nfs_commit_info cinfo;
unsigned int pagecount, pageused;
- pagecount = nfs_page_array_len(desc->pg_base, desc->pg_count);
+ pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
if (!nfs_pgarray_set(&hdr->page_array, pagecount))
return nfs_pgio_error(desc, hdr);
@@ -773,7 +819,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
desc->pg_ioflags &= ~FLUSH_COND_STABLE;
/* Set up the argument struct */
- nfs_pgio_rpcsetup(hdr, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
+ nfs_pgio_rpcsetup(hdr, mirror->pg_count, 0, desc->pg_ioflags, &cinfo);
desc->pg_rpc_callops = &nfs_pgio_common_ops;
return 0;
}
@@ -781,12 +827,17 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror;
struct nfs_pgio_header *hdr;
int ret;
+ mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ /* TODO: make sure this is right with mirroring - or
+ * should it back out all mirrors? */
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
@@ -801,6 +852,49 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
return ret;
}
+/*
+ * nfs_pageio_setup_mirroring - determine if mirroring is to be used
+ * by calling the pg_get_mirror_count op
+ */
+static int nfs_pageio_setup_mirroring(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ int mirror_count = 1;
+
+ if (!pgio->pg_ops->pg_get_mirror_count)
+ return 0;
+
+ mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
+
+ if (!mirror_count || mirror_count > NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX)
+ return -EINVAL;
+
+ if (WARN_ON_ONCE(!pgio->pg_mirrors_dynamic))
+ return -EINVAL;
+
+ pgio->pg_mirror_count = mirror_count;
+
+ return 0;
+}
+
+/*
+ * nfs_pageio_stop_mirroring - stop using mirroring (set mirror count to 1)
+ */
+void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio)
+{
+ pgio->pg_mirror_count = 1;
+ pgio->pg_mirror_idx = 0;
+}
+
+static void nfs_pageio_cleanup_mirroring(struct nfs_pageio_descriptor *pgio)
+{
+ pgio->pg_mirror_count = 1;
+ pgio->pg_mirror_idx = 0;
+ pgio->pg_mirrors = pgio->pg_mirrors_static;
+ kfree(pgio->pg_mirrors_dynamic);
+ pgio->pg_mirrors_dynamic = NULL;
+}
+
static bool nfs_match_open_context(const struct nfs_open_context *ctx1,
const struct nfs_open_context *ctx2)
{
@@ -867,19 +961,22 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_page *prev = NULL;
- if (desc->pg_count != 0) {
- prev = nfs_list_entry(desc->pg_list.prev);
+
+ if (mirror->pg_count != 0) {
+ prev = nfs_list_entry(mirror->pg_list.prev);
} else {
if (desc->pg_ops->pg_init)
desc->pg_ops->pg_init(desc, req);
- desc->pg_base = req->wb_pgbase;
+ mirror->pg_base = req->wb_pgbase;
}
if (!nfs_can_coalesce_requests(prev, req, desc))
return 0;
nfs_list_remove_request(req);
- nfs_list_add_request(req, &desc->pg_list);
- desc->pg_count += req->wb_bytes;
+ nfs_list_add_request(req, &mirror->pg_list);
+ mirror->pg_count += req->wb_bytes;
return 1;
}
@@ -888,16 +985,19 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
*/
static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
{
- if (!list_empty(&desc->pg_list)) {
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
+ if (!list_empty(&mirror->pg_list)) {
int error = desc->pg_ops->pg_doio(desc);
if (error < 0)
desc->pg_error = error;
else
- desc->pg_bytes_written += desc->pg_count;
+ mirror->pg_bytes_written += mirror->pg_count;
}
- if (list_empty(&desc->pg_list)) {
- desc->pg_count = 0;
- desc->pg_base = 0;
+ if (list_empty(&mirror->pg_list)) {
+ mirror->pg_count = 0;
+ mirror->pg_base = 0;
}
}
@@ -915,10 +1015,14 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_page *subreq;
unsigned int bytes_left = 0;
unsigned int offset, pgbase;
+ WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
+
nfs_page_group_lock(req, false);
subreq = req;
@@ -938,7 +1042,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
nfs_pageio_doio(desc);
if (desc->pg_error < 0)
return 0;
- if (desc->pg_recoalesce)
+ if (mirror->pg_recoalesce)
return 0;
/* retry add_request for this subreq */
nfs_page_group_lock(req, false);
@@ -976,14 +1080,16 @@ err_ptr:
static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
LIST_HEAD(head);
do {
- list_splice_init(&desc->pg_list, &head);
- desc->pg_bytes_written -= desc->pg_count;
- desc->pg_count = 0;
- desc->pg_base = 0;
- desc->pg_recoalesce = 0;
+ list_splice_init(&mirror->pg_list, &head);
+ mirror->pg_bytes_written -= mirror->pg_count;
+ mirror->pg_count = 0;
+ mirror->pg_base = 0;
+ mirror->pg_recoalesce = 0;
+
desc->pg_moreio = 0;
while (!list_empty(&head)) {
@@ -997,11 +1103,11 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
return 0;
break;
}
- } while (desc->pg_recoalesce);
+ } while (mirror->pg_recoalesce);
return 1;
}
-int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
+static int nfs_pageio_add_request_mirror(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
int ret;
@@ -1014,9 +1120,78 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
break;
ret = nfs_do_recoalesce(desc);
} while (ret);
+
return ret;
}
+int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
+ struct nfs_page *req)
+{
+ u32 midx;
+ unsigned int pgbase, offset, bytes;
+ struct nfs_page *dupreq, *lastreq;
+
+ pgbase = req->wb_pgbase;
+ offset = req->wb_offset;
+ bytes = req->wb_bytes;
+
+ nfs_pageio_setup_mirroring(desc, req);
+
+ for (midx = 0; midx < desc->pg_mirror_count; midx++) {
+ if (midx) {
+ nfs_page_group_lock(req, false);
+
+ /* find the last request */
+ for (lastreq = req->wb_head;
+ lastreq->wb_this_page != req->wb_head;
+ lastreq = lastreq->wb_this_page)
+ ;
+
+ dupreq = nfs_create_request(req->wb_context,
+ req->wb_page, lastreq, pgbase, bytes);
+
+ if (IS_ERR(dupreq)) {
+ nfs_page_group_unlock(req);
+ return 0;
+ }
+
+ nfs_lock_request(dupreq);
+ nfs_page_group_unlock(req);
+ dupreq->wb_offset = offset;
+ dupreq->wb_index = req->wb_index;
+ } else
+ dupreq = req;
+
+ desc->pg_mirror_idx = midx;
+ if (!nfs_pageio_add_request_mirror(desc, dupreq))
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * nfs_pageio_complete_mirror - Complete I/O on the current mirror of an
+ * nfs_pageio_descriptor
+ * @desc: pointer to io descriptor
+ */
+static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
+ u32 mirror_idx)
+{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
+ u32 restore_idx = desc->pg_mirror_idx;
+
+ desc->pg_mirror_idx = mirror_idx;
+ for (;;) {
+ nfs_pageio_doio(desc);
+ if (!mirror->pg_recoalesce)
+ break;
+ if (!nfs_do_recoalesce(desc))
+ break;
+ }
+ desc->pg_mirror_idx = restore_idx;
+}
+
/*
* nfs_pageio_resend - Transfer requests to new descriptor and resend
* @hdr - the pgio header to move request from
@@ -1055,16 +1230,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_resend);
*/
void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
{
- for (;;) {
- nfs_pageio_doio(desc);
- if (!desc->pg_recoalesce)
- break;
- if (!nfs_do_recoalesce(desc))
- break;
- }
+ u32 midx;
+
+ for (midx = 0; midx < desc->pg_mirror_count; midx++)
+ nfs_pageio_complete_mirror(desc, midx);
if (desc->pg_ops->pg_cleanup)
desc->pg_ops->pg_cleanup(desc);
+ nfs_pageio_cleanup_mirroring(desc);
}
/**
@@ -1080,10 +1253,17 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
*/
void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
{
- if (!list_empty(&desc->pg_list)) {
- struct nfs_page *prev = nfs_list_entry(desc->pg_list.prev);
- if (index != prev->wb_index + 1)
- nfs_pageio_complete(desc);
+ struct nfs_pgio_mirror *mirror;
+ struct nfs_page *prev;
+ u32 midx;
+
+ for (midx = 0; midx < desc->pg_mirror_count; midx++) {
+ mirror = &desc->pg_mirrors[midx];
+ if (!list_empty(&mirror->pg_list)) {
+ prev = nfs_list_entry(mirror->pg_list.prev);
+ if (index != prev->wb_index + 1)
+ nfs_pageio_complete_mirror(desc, midx);
+ }
}
}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2da2e77..5f7c422 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1646,8 +1646,8 @@ EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
* of bytes (maximum @req->wb_bytes) that can be coalesced.
*/
size_t
-pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
- struct nfs_page *req)
+pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *prev, struct nfs_page *req)
{
unsigned int size;
u64 seg_end, req_start, seg_left;
@@ -1729,10 +1729,12 @@ static void
pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
- list_splice_tail_init(&hdr->pages, &desc->pg_list);
+ list_splice_tail_init(&hdr->pages, &mirror->pg_list);
nfs_pageio_reset_write_mds(desc);
- desc->pg_recoalesce = 1;
+ mirror->pg_recoalesce = 1;
}
nfs_pgio_data_destroy(hdr);
}
@@ -1781,12 +1783,14 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
int
pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_pgio_header *hdr;
int ret;
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
@@ -1795,6 +1799,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (!ret)
pnfs_do_write(desc, hdr, desc->pg_ioflags);
+
return ret;
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
@@ -1839,10 +1844,13 @@ static void
pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
- list_splice_tail_init(&hdr->pages, &desc->pg_list);
+ list_splice_tail_init(&hdr->pages, &mirror->pg_list);
nfs_pageio_reset_read_mds(desc);
- desc->pg_recoalesce = 1;
+ mirror->pg_recoalesce = 1;
}
nfs_pgio_data_destroy(hdr);
}
@@ -1893,12 +1901,14 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
int
pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_pgio_header *hdr;
int ret;
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 092ab49..568ecf0 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -70,8 +70,15 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
{
+ struct nfs_pgio_mirror *mirror;
+
pgio->pg_ops = &nfs_pgio_rw_ops;
- pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
+
+ /* read path should never have more than one mirror */
+ WARN_ON_ONCE(pgio->pg_mirror_count != 1);
+
+ mirror = &pgio->pg_mirrors[0];
+ mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
}
EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
@@ -81,6 +88,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
struct nfs_page *new;
unsigned int len;
struct nfs_pageio_descriptor pgio;
+ struct nfs_pgio_mirror *pgm;
len = nfs_page_length(page);
if (len == 0)
@@ -97,7 +105,13 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
&nfs_async_read_completion_ops);
nfs_pageio_add_request(&pgio, new);
nfs_pageio_complete(&pgio);
- NFS_I(inode)->read_io += pgio.pg_bytes_written;
+
+ /* It doesn't make sense to do mirrored reads! */
+ WARN_ON_ONCE(pgio.pg_mirror_count != 1);
+
+ pgm = &pgio.pg_mirrors[0];
+ NFS_I(inode)->read_io += pgm->pg_bytes_written;
+
return 0;
}
@@ -352,6 +366,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages)
{
struct nfs_pageio_descriptor pgio;
+ struct nfs_pgio_mirror *pgm;
struct nfs_readdesc desc = {
.pgio = &pgio,
};
@@ -387,10 +402,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
&nfs_async_read_completion_ops);
ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
-
nfs_pageio_complete(&pgio);
- NFS_I(inode)->read_io += pgio.pg_bytes_written;
- npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+
+ /* It doesn't make sense to do mirrored reads! */
+ WARN_ON_ONCE(pgio.pg_mirror_count != 1);
+
+ pgm = &pgio.pg_mirrors[0];
+ NFS_I(inode)->read_io += pgm->pg_bytes_written;
+ npages = (pgm->pg_bytes_written + PAGE_CACHE_SIZE - 1) >>
+ PAGE_CACHE_SHIFT;
nfs_add_stats(inode, NFSIOS_READPAGES, npages);
read_complete:
put_nfs_open_context(desc.ctx);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index db802d9..2f6ee8e 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -906,7 +906,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
if (nfs_write_need_commit(hdr)) {
memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
nfs_mark_request_commit(req, hdr->lseg, &cinfo,
- 0);
+ hdr->pgio_mirror_idx);
goto next;
}
remove_req:
@@ -1305,8 +1305,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_write);
void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
{
+ struct nfs_pgio_mirror *mirror;
+
pgio->pg_ops = &nfs_pgio_rw_ops;
- pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
+
+ nfs_pageio_stop_mirroring(pgio);
+
+ mirror = &pgio->pg_mirrors[0];
+ mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
}
EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 479c566..3eb072d 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -58,6 +58,8 @@ struct nfs_pageio_ops {
size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
struct nfs_page *);
int (*pg_doio)(struct nfs_pageio_descriptor *);
+ unsigned int (*pg_get_mirror_count)(struct nfs_pageio_descriptor *,
+ struct nfs_page *);
void (*pg_cleanup)(struct nfs_pageio_descriptor *);
};
@@ -74,15 +76,17 @@ struct nfs_rw_ops {
struct rpc_task_setup *, int);
};
-struct nfs_pageio_descriptor {
+struct nfs_pgio_mirror {
struct list_head pg_list;
unsigned long pg_bytes_written;
size_t pg_count;
size_t pg_bsize;
unsigned int pg_base;
- unsigned char pg_moreio : 1,
- pg_recoalesce : 1;
+ unsigned char pg_recoalesce : 1;
+};
+struct nfs_pageio_descriptor {
+ unsigned char pg_moreio : 1;
struct inode *pg_inode;
const struct nfs_pageio_ops *pg_ops;
const struct nfs_rw_ops *pg_rw_ops;
@@ -93,8 +97,18 @@ struct nfs_pageio_descriptor {
struct pnfs_layout_segment *pg_lseg;
struct nfs_direct_req *pg_dreq;
void *pg_layout_private;
+ unsigned int pg_bsize; /* default bsize for mirrors */
+
+ u32 pg_mirror_count;
+ struct nfs_pgio_mirror *pg_mirrors;
+ struct nfs_pgio_mirror pg_mirrors_static[1];
+ struct nfs_pgio_mirror *pg_mirrors_dynamic;
+ u32 pg_mirror_idx; /* current mirror */
};
+/* arbitrarily selected limit to number of mirrors */
+#define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
+
#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 5bc99f0..6400a1e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1329,6 +1329,7 @@ struct nfs_pgio_header {
struct nfs_page_array page_array;
struct nfs_client *ds_clp; /* pNFS data server */
int ds_commit_idx; /* ds index if ds_clp is set */
+ int pgio_mirror_idx;/* mirror index in pgio layer */
};
struct nfs_mds_commit_info {
--
1.9.3
From: Weston Andros Adamson <[email protected]>
The current mirroring code only notices short writes to the first
mirror. This patch keeps per-mirror byte counts and only considers
a byte to be written once all mirrors report so.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 71 +++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 57 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 0178d4f..651387b 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -66,6 +66,10 @@ static struct kmem_cache *nfs_direct_cachep;
/*
* This represents a set of asynchronous requests that we're waiting on
*/
+struct nfs_direct_mirror {
+ ssize_t count;
+};
+
struct nfs_direct_req {
struct kref kref; /* release manager */
@@ -78,6 +82,10 @@ struct nfs_direct_req {
/* completion state */
atomic_t io_count; /* i/os we're waiting for */
spinlock_t lock; /* protect completion state */
+
+ struct nfs_direct_mirror mirrors[NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX];
+ int mirror_count;
+
ssize_t count, /* bytes actually processed */
bytes_left, /* bytes left to be sent */
error; /* any reported error */
@@ -108,6 +116,29 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
return atomic_dec_and_test(&dreq->io_count);
}
+static void
+nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr)
+{
+ int i;
+ ssize_t count;
+
+ WARN_ON_ONCE(hdr->pgio_mirror_idx >= dreq->mirror_count);
+
+ dreq->mirrors[hdr->pgio_mirror_idx].count += hdr->good_bytes;
+
+ if (hdr->pgio_mirror_idx == 0)
+ dreq->count += hdr->good_bytes;
+
+ /* update the dreq->count by finding the minimum agreed count from all
+ * mirrors */
+ count = dreq->mirrors[0].count;
+
+ for (i = 1; i < dreq->mirror_count; i++)
+ count = min(count, dreq->mirrors[i].count);
+
+ dreq->count = count;
+}
+
/*
* nfs_direct_select_verf - select the right verifier
* @dreq - direct request possibly spanning multiple servers
@@ -241,6 +272,18 @@ void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
cinfo->completion_ops = &nfs_direct_commit_completion_ops;
}
+static inline void nfs_direct_setup_mirroring(struct nfs_direct_req *dreq,
+ struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ int mirror_count = 1;
+
+ if (pgio->pg_ops->pg_get_mirror_count)
+ mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
+
+ dreq->mirror_count = mirror_count;
+}
+
static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
{
struct nfs_direct_req *dreq;
@@ -255,6 +298,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
INIT_LIST_HEAD(&dreq->mds_cinfo.list);
dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
+ dreq->mirror_count = 1;
spin_lock_init(&dreq->lock);
return dreq;
@@ -360,14 +404,9 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
spin_lock(&dreq->lock);
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
dreq->error = hdr->error;
- else {
- /*
- * FIXME: right now this only accounts for bytes written
- * to the first mirror
- */
- if (hdr->pgio_mirror_idx == 0)
- dreq->count += hdr->good_bytes;
- }
+ else
+ nfs_direct_good_bytes(dreq, hdr);
+
spin_unlock(&dreq->lock);
while (!list_empty(&hdr->pages)) {
@@ -598,17 +637,23 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
LIST_HEAD(reqs);
struct nfs_commit_info cinfo;
LIST_HEAD(failed);
+ int i;
nfs_init_cinfo_from_dreq(&cinfo, dreq);
nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo);
dreq->count = 0;
+ for (i = 0; i < dreq->mirror_count; i++)
+ dreq->mirrors[i].count = 0;
get_dreq(dreq);
nfs_pageio_init_write(&desc, dreq->inode, FLUSH_STABLE, false,
&nfs_direct_write_completion_ops);
desc.pg_dreq = dreq;
+ req = nfs_list_entry(reqs.next);
+ nfs_direct_setup_mirroring(dreq, &desc, req);
+
list_for_each_entry_safe(req, tmp, &reqs, wb_list) {
if (!nfs_pageio_add_request(&desc, req)) {
nfs_list_remove_request(req);
@@ -730,12 +775,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
dreq->error = hdr->error;
}
if (dreq->error == 0) {
- /*
- * FIXME: right now this only accounts for bytes written
- * to the first mirror
- */
- if (hdr->pgio_mirror_idx == 0)
- dreq->count += hdr->good_bytes;
+ nfs_direct_good_bytes(dreq, hdr);
if (nfs_write_need_commit(hdr)) {
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
request_commit = true;
@@ -841,6 +881,9 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
result = PTR_ERR(req);
break;
}
+
+ nfs_direct_setup_mirroring(dreq, &desc, req);
+
nfs_lock_request(req);
req->wb_index = pos >> PAGE_SHIFT;
req->wb_offset = pos & ~PAGE_MASK;
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This skips the WARN_ON_ONCE, but doesnt change behavior (the memcmp would
fail).
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/direct.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 651387b..eb81478 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -222,7 +222,11 @@ static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
verfp = nfs_direct_select_verf(dreq, data->ds_clp,
data->ds_commit_index);
- WARN_ON_ONCE(verfp->committed < 0);
+
+ /* verifier not set so always fail */
+ if (verfp->committed < 0)
+ return 1;
+
return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
}
--
1.9.3
From: Peng Tao <[email protected]>
So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/pnfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5f7c422..e123cfc 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -242,6 +242,8 @@ pnfs_put_layout_hdr(struct pnfs_layout_hdr *lo)
struct inode *inode = lo->plh_inode;
if (atomic_dec_and_lock(&lo->plh_refcount, &inode->i_lock)) {
+ if (!list_empty(&lo->plh_segs))
+ WARN_ONCE(1, "NFS: BUG unfreed layout segments.\n");
pnfs_detach_layout_hdr(lo);
spin_unlock(&inode->i_lock);
pnfs_free_layout_hdr(lo);
--
1.9.3
From: Peng Tao <[email protected]>
so that we don't reset desc->pg_mirror_idx for read unnecessarily.
Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to
set pg_mirror_idx for read where pg_mirror_count is always 1.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/internal.h | 7 +++++++
fs/nfs/pagelist.c | 8 ++++----
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ef1c703..5be06bc 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -6,6 +6,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/crc32.h>
+#include <linux/nfs_page.h>
#define NFS_MS_MASK (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_SYNCHRONOUS)
@@ -261,6 +262,12 @@ static inline void nfs_iocounter_init(struct nfs_io_counter *c)
atomic_set(&c->io_count, 0);
}
+static inline bool nfs_pgio_has_mirroring(struct nfs_pageio_descriptor *desc)
+{
+ WARN_ON_ONCE(desc->pg_mirror_count < 1);
+ return desc->pg_mirror_count > 1;
+}
+
/* nfs2xdr.c */
extern struct rpc_procinfo nfs_procedures[];
extern int nfs2_decode_dirent(struct xdr_stream *,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index eec12b7..f9d8c46 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1021,8 +1021,6 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
unsigned int bytes_left = 0;
unsigned int offset, pgbase;
- WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
-
nfs_page_group_lock(req, false);
subreq = req;
@@ -1162,7 +1160,8 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
} else
dupreq = req;
- desc->pg_mirror_idx = midx;
+ if (nfs_pgio_has_mirroring(desc))
+ desc->pg_mirror_idx = midx;
if (!nfs_pageio_add_request_mirror(desc, dupreq))
return 0;
}
@@ -1181,7 +1180,8 @@ static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
u32 restore_idx = desc->pg_mirror_idx;
- desc->pg_mirror_idx = mirror_idx;
+ if (nfs_pgio_has_mirroring(desc))
+ desc->pg_mirror_idx = mirror_idx;
for (;;) {
nfs_pageio_doio(desc);
if (!mirror->pg_recoalesce)
--
1.9.3
From: Peng Tao <[email protected]>
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/internal.h | 2 ++
fs/nfs/objlayout/objio_osd.c | 2 +-
fs/nfs/pagelist.c | 25 +++++++++++++++++--------
fs/nfs/pnfs.c | 9 ++++-----
4 files changed, 24 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5be06bc..ffe4b7a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -255,6 +255,8 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
struct rpc_cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
+struct nfs_pgio_mirror *
+nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
static inline void nfs_iocounter_init(struct nfs_io_counter *c)
{
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 9a5f2ee..24e1d74 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -537,7 +537,7 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(pgio);
unsigned int size;
size = pnfs_generic_pg_test(pgio, prev, req);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index f9d8c46..960c99f 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -42,11 +42,20 @@ static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
return p->pagevec != NULL;
}
+struct nfs_pgio_mirror *
+nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc)
+{
+ return nfs_pgio_has_mirroring(desc) ?
+ &desc->pg_mirrors[desc->pg_mirror_idx] :
+ &desc->pg_mirrors[0];
+}
+EXPORT_SYMBOL_GPL(nfs_pgio_current_mirror);
+
void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr,
void (*release)(struct nfs_pgio_header *hdr))
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
hdr->req = nfs_list_entry(mirror->pg_list.next);
@@ -485,7 +494,7 @@ nfs_wait_on_request(struct nfs_page *req)
size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev, struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (mirror->pg_count > mirror->pg_bsize) {
@@ -782,7 +791,7 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_page *req;
struct page **pages,
@@ -831,7 +840,7 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
struct nfs_pgio_header *hdr;
int ret;
- mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ mirror = nfs_pgio_current_mirror(desc);
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
@@ -961,7 +970,7 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_page *prev = NULL;
@@ -985,7 +994,7 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
*/
static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (!list_empty(&mirror->pg_list)) {
@@ -1015,7 +1024,7 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_page *subreq;
unsigned int bytes_left = 0;
@@ -1078,7 +1087,7 @@ err_ptr:
static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
LIST_HEAD(head);
do {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e123cfc..b822b17 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1731,7 +1731,7 @@ static void
pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
list_splice_tail_init(&hdr->pages, &mirror->pg_list);
@@ -1785,7 +1785,7 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
int
pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_pgio_header *hdr;
int ret;
@@ -1846,8 +1846,7 @@ static void
pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
-
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
list_splice_tail_init(&hdr->pages, &mirror->pg_list);
@@ -1903,7 +1902,7 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
int
pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_pgio_header *hdr;
int ret;
--
1.9.3
From: Peng Tao <[email protected]>
If current IO cannot be completed due to some transient errors,
LD may want to ask generic layer to resend the request through
pnfs again.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 15 ++++++++++++++-
fs/nfs/pnfs.h | 2 ++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b822b17..685af4f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1880,15 +1880,28 @@ pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
return trypnfs;
}
+/* Resend all requests through pnfs. */
+int pnfs_read_resend_pnfs(struct nfs_pgio_header *hdr)
+{
+ struct nfs_pageio_descriptor pgio;
+
+ nfs_pageio_init_read(&pgio, hdr->inode, false, hdr->completion_ops);
+ return nfs_pageio_resend(&pgio, hdr);
+}
+EXPORT_SYMBOL_GPL(pnfs_read_resend_pnfs);
+
static void
pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
{
const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
+ int err = 0;
trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
- if (trypnfs == PNFS_NOT_ATTEMPTED)
+ if (trypnfs == PNFS_TRY_AGAIN)
+ err = pnfs_read_resend_pnfs(hdr);
+ if (trypnfs == PNFS_NOT_ATTEMPTED || err)
pnfs_read_through_mds(desc, hdr);
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index ebb4e82..26e7cd8 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -72,6 +72,7 @@ struct pnfs_layout_segment {
enum pnfs_try_status {
PNFS_ATTEMPTED = 0,
PNFS_NOT_ATTEMPTED = 1,
+ PNFS_TRY_AGAIN = 2,
};
#ifdef CONFIG_NFS_V4_1
@@ -268,6 +269,7 @@ int _pnfs_return_layout(struct inode *);
int pnfs_commit_and_return_layout(struct inode *);
void pnfs_ld_write_done(struct nfs_pgio_header *);
void pnfs_ld_read_done(struct nfs_pgio_header *);
+int pnfs_read_resend_pnfs(struct nfs_pgio_header *);
struct pnfs_layout_segment *pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
loff_t pos,
--
1.9.3
From: Peng Tao <[email protected]>
So that callers can specify which range to return.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4xdr.c | 6 +++---
fs/nfs/pnfs.c | 14 +++++++++-----
include/linux/nfs_xdr.h | 2 +-
3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 3c3ff63..56d4c91 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2012,11 +2012,11 @@ encode_layoutreturn(struct xdr_stream *xdr,
p = reserve_space(xdr, 16);
*p++ = cpu_to_be32(0); /* reclaim. always 0 for now */
*p++ = cpu_to_be32(args->layout_type);
- *p++ = cpu_to_be32(args->iomode);
+ *p++ = cpu_to_be32(args->range.iomode);
*p = cpu_to_be32(RETURN_FILE);
p = reserve_space(xdr, 16);
- p = xdr_encode_hyper(p, 0);
- p = xdr_encode_hyper(p, NFS4_MAX_UINT64);
+ p = xdr_encode_hyper(p, args->range.offset);
+ p = xdr_encode_hyper(p, args->range.length);
spin_lock(&args->inode->i_lock);
encode_nfs4_stateid(xdr, &args->stateid);
spin_unlock(&args->inode->i_lock);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 685af4f..63beace 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -52,7 +52,7 @@ static LIST_HEAD(pnfs_modules_tbl);
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
- enum pnfs_iomode iomode);
+ enum pnfs_iomode iomode, u64 offset, u64 length);
/* Return the registered pnfs layout driver module matching given id */
static struct pnfs_layoutdriver_type *
@@ -392,7 +392,8 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
spin_unlock(&inode->i_lock);
pnfs_free_lseg(lseg);
if (need_return)
- pnfs_send_layoutreturn(lo, stateid, iomode);
+ pnfs_send_layoutreturn(lo, stateid, iomode, 0,
+ NFS4_MAX_UINT64);
else
pnfs_put_layout_hdr(lo);
}
@@ -897,7 +898,7 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
- enum pnfs_iomode iomode)
+ enum pnfs_iomode iomode, u64 offset, u64 length)
{
struct inode *ino = lo->plh_inode;
struct nfs4_layoutreturn *lrp;
@@ -916,7 +917,9 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
lrp->args.stateid = stateid;
lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
lrp->args.inode = ino;
- lrp->args.iomode = iomode;
+ lrp->args.range.iomode = iomode;
+ lrp->args.range.offset = offset;
+ lrp->args.range.length = length;
lrp->args.layout = lo;
lrp->clp = NFS_SERVER(ino)->nfs_client;
lrp->cred = lo->plh_lc_cred;
@@ -987,7 +990,8 @@ _pnfs_return_layout(struct inode *ino)
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&tmp_list);
- status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY);
+ status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
+ NFS4_MAX_UINT64);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 6400a1e..3637923 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -293,7 +293,7 @@ struct nfs4_layoutreturn_args {
struct nfs4_sequence_args seq_args;
struct pnfs_layout_hdr *layout;
struct inode *inode;
- enum pnfs_iomode iomode;
+ struct pnfs_layout_range range;
nfs4_stateid stateid;
__u32 layout_type;
};
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 11 +++++++++--
fs/nfs/pnfs.c | 12 +++++++-----
fs/nfs/pnfs.h | 2 +-
3 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index bf5ef58..53df457 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7800,7 +7800,7 @@ static const struct rpc_call_ops nfs4_layoutreturn_call_ops = {
.rpc_release = nfs4_layoutreturn_release,
};
-int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
+int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
{
struct rpc_task *task;
struct rpc_message msg = {
@@ -7814,16 +7814,23 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
.rpc_message = &msg,
.callback_ops = &nfs4_layoutreturn_call_ops,
.callback_data = lrp,
+ .flags = RPC_TASK_ASYNC,
};
- int status;
+ int status = 0;
dprintk("--> %s\n", __func__);
nfs4_init_sequence(&lrp->args.seq_args, &lrp->res.seq_res, 1);
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
+ if (sync == false)
+ goto out;
+ status = nfs4_wait_for_completion_rpc_task(task);
+ if (status != 0)
+ goto out;
status = task->tk_status;
trace_nfs4_layoutreturn(lrp->args.inode, status);
+out:
dprintk("<-- %s status=%d\n", __func__, status);
rpc_put_task(task);
return status;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 63beace..e889b97 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -52,7 +52,8 @@ static LIST_HEAD(pnfs_modules_tbl);
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
- enum pnfs_iomode iomode, u64 offset, u64 length);
+ enum pnfs_iomode iomode, u64 offset, u64 length,
+ bool sync);
/* Return the registered pnfs layout driver module matching given id */
static struct pnfs_layoutdriver_type *
@@ -393,7 +394,7 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
pnfs_free_lseg(lseg);
if (need_return)
pnfs_send_layoutreturn(lo, stateid, iomode, 0,
- NFS4_MAX_UINT64);
+ NFS4_MAX_UINT64, true);
else
pnfs_put_layout_hdr(lo);
}
@@ -898,7 +899,8 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
- enum pnfs_iomode iomode, u64 offset, u64 length)
+ enum pnfs_iomode iomode, u64 offset, u64 length,
+ bool sync)
{
struct inode *ino = lo->plh_inode;
struct nfs4_layoutreturn *lrp;
@@ -924,7 +926,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
lrp->clp = NFS_SERVER(ino)->nfs_client;
lrp->cred = lo->plh_lc_cred;
- status = nfs4_proc_layoutreturn(lrp);
+ status = nfs4_proc_layoutreturn(lrp, sync);
out:
if (status) {
spin_lock(&ino->i_lock);
@@ -991,7 +993,7 @@ _pnfs_return_layout(struct inode *ino)
pnfs_free_lseg_list(&tmp_list);
status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
- NFS4_MAX_UINT64);
+ NFS4_MAX_UINT64, true);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 26e7cd8..7a33c50 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -219,7 +219,7 @@ extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
struct pnfs_device *dev,
struct rpc_cred *cred);
extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags);
-extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
+extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
/* pnfs.c */
void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
--
1.9.3
From: Peng Tao <[email protected]>
When it is set, generic pnfs would try to send layoutreturn right
before last close/delegation_return regard less NFS_LAYOUT_ROC is
set or not. LD can then make sure layoutreturn is always sent
rather than being omitted.
The difference against NFS_LAYOUT_RETURN is that
NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so
LD can set it and expect generic layer to try pnfs path at the
same time.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 2 ++
fs/nfs/pnfs.c | 40 +++++++++++++++++++++++++++++++++-------
fs/nfs/pnfs.h | 1 +
3 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 53df457..72c5e01 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7787,6 +7787,8 @@ static void nfs4_layoutreturn_release(void *calldata)
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE, &lo->plh_flags);
+ rpc_wake_up(&NFS_SERVER(lo->plh_inode)->roc_rpcwaitq);
lo->plh_block_lgets--;
spin_unlock(&lo->plh_inode->i_lock);
pnfs_put_layout_hdr(lrp->args.layout);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e889b97..e80014a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -911,6 +911,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = -ENOMEM;
spin_lock(&ino->i_lock);
lo->plh_block_lgets--;
+ rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
spin_unlock(&ino->i_lock);
pnfs_put_layout_hdr(lo);
goto out;
@@ -928,11 +929,6 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = nfs4_proc_layoutreturn(lrp, sync);
out:
- if (status) {
- spin_lock(&ino->i_lock);
- clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
- spin_unlock(&ino->i_lock);
- }
dprintk("<-- %s status: %d\n", __func__, status);
return status;
}
@@ -1031,8 +1027,9 @@ bool pnfs_roc(struct inode *ino)
{
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg, *tmp;
+ nfs4_stateid stateid;
LIST_HEAD(tmp_list);
- bool found = false;
+ bool found = false, layoutreturn = false;
spin_lock(&ino->i_lock);
lo = NFS_I(ino)->layout;
@@ -1053,7 +1050,20 @@ bool pnfs_roc(struct inode *ino)
return true;
out_nolayout:
+ if (lo) {
+ stateid = lo->plh_stateid;
+ layoutreturn =
+ test_and_clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &lo->plh_flags);
+ if (layoutreturn) {
+ lo->plh_block_lgets++;
+ pnfs_get_layout_hdr(lo);
+ }
+ }
spin_unlock(&ino->i_lock);
+ if (layoutreturn)
+ pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
+ NFS4_MAX_UINT64, true);
return false;
}
@@ -1088,8 +1098,9 @@ bool pnfs_roc_drain(struct inode *ino, u32 *barrier, struct rpc_task *task)
struct nfs_inode *nfsi = NFS_I(ino);
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg;
+ nfs4_stateid stateid;
u32 current_seqid;
- bool found = false;
+ bool found = false, layoutreturn = false;
spin_lock(&ino->i_lock);
list_for_each_entry(lseg, &nfsi->layout->plh_segs, pls_list)
@@ -1106,7 +1117,22 @@ bool pnfs_roc_drain(struct inode *ino, u32 *barrier, struct rpc_task *task)
*/
*barrier = current_seqid + atomic_read(&lo->plh_outstanding);
out:
+ if (!found) {
+ stateid = lo->plh_stateid;
+ layoutreturn =
+ test_and_clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &lo->plh_flags);
+ if (layoutreturn) {
+ lo->plh_block_lgets++;
+ pnfs_get_layout_hdr(lo);
+ }
+ }
spin_unlock(&ino->i_lock);
+ if (layoutreturn) {
+ rpc_sleep_on(&NFS_SERVER(ino)->roc_rpcwaitq, task, NULL);
+ pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
+ NFS4_MAX_UINT64, false);
+ }
return found;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7a33c50..04a5a31 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -96,6 +96,7 @@ enum {
NFS_LAYOUT_BULK_RECALL, /* bulk recall affecting layout */
NFS_LAYOUT_ROC, /* some lseg had roc bit set */
NFS_LAYOUT_RETURN, /* Return this layout ASAP */
+ NFS_LAYOUT_RETURN_BEFORE_CLOSE, /* Return this layout before close */
NFS_LAYOUT_INVALID_STID, /* layout stateid id is invalid */
NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */
};
--
1.9.3
From: Peng Tao <[email protected]>
Otherwise we'll lose error tracking information when
encoding layoutreturn.
pnfs_put_lseg may be called from rpc callbacks. So we should not
call pnfs_send_layoutreturn directly because it can deadlock in
the rpc layer.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 76 ++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 55 insertions(+), 21 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e80014a..9e7092f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -347,8 +347,7 @@ pnfs_layout_remove_lseg(struct pnfs_layout_hdr *lo,
/* Return true if layoutreturn is needed */
static bool
pnfs_layout_need_return(struct pnfs_layout_hdr *lo,
- struct pnfs_layout_segment *lseg,
- nfs4_stateid *stateid, enum pnfs_iomode *iomode)
+ struct pnfs_layout_segment *lseg)
{
struct pnfs_layout_segment *s;
@@ -356,17 +355,55 @@ pnfs_layout_need_return(struct pnfs_layout_hdr *lo,
return false;
list_for_each_entry(s, &lo->plh_segs, pls_list)
- if (test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags))
+ if (s != lseg && test_bit(NFS_LSEG_LAYOUTRETURN, &s->pls_flags))
return false;
- *stateid = lo->plh_stateid;
- *iomode = lo->plh_return_iomode;
- /* decreased in pnfs_send_layoutreturn() */
- lo->plh_block_lgets++;
- lo->plh_return_iomode = 0;
return true;
}
+static void pnfs_layoutreturn_free_lseg(struct work_struct *work)
+{
+ struct pnfs_layout_segment *lseg;
+ struct pnfs_layout_hdr *lo;
+ struct inode *inode;
+
+ lseg = container_of(work, struct pnfs_layout_segment, pls_work);
+ WARN_ON(atomic_read(&lseg->pls_refcount));
+ lo = lseg->pls_layout;
+ inode = lo->plh_inode;
+
+ spin_lock(&inode->i_lock);
+ if (pnfs_layout_need_return(lo, lseg)) {
+ nfs4_stateid stateid;
+ enum pnfs_iomode iomode;
+
+ stateid = lo->plh_stateid;
+ iomode = lo->plh_return_iomode;
+ /* decreased in pnfs_send_layoutreturn() */
+ lo->plh_block_lgets++;
+ lo->plh_return_iomode = 0;
+ spin_unlock(&inode->i_lock);
+
+ pnfs_send_layoutreturn(lo, stateid, iomode, 0,
+ NFS4_MAX_UINT64, true);
+ spin_lock(&inode->i_lock);
+ } else
+ /* match pnfs_get_layout_hdr #2 in pnfs_put_lseg */
+ pnfs_put_layout_hdr(lo);
+ pnfs_layout_remove_lseg(lo, lseg);
+ spin_unlock(&inode->i_lock);
+ pnfs_free_lseg(lseg);
+ /* match pnfs_get_layout_hdr #1 in pnfs_put_lseg */
+ pnfs_put_layout_hdr(lo);
+}
+
+static void
+pnfs_layoutreturn_free_lseg_async(struct pnfs_layout_segment *lseg)
+{
+ INIT_WORK(&lseg->pls_work, pnfs_layoutreturn_free_lseg);
+ queue_work(nfsiod_workqueue, &lseg->pls_work);
+}
+
void
pnfs_put_lseg(struct pnfs_layout_segment *lseg)
{
@@ -382,21 +419,18 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
lo = lseg->pls_layout;
inode = lo->plh_inode;
if (atomic_dec_and_lock(&lseg->pls_refcount, &inode->i_lock)) {
- bool need_return;
- nfs4_stateid stateid;
- enum pnfs_iomode iomode;
-
pnfs_get_layout_hdr(lo);
- pnfs_layout_remove_lseg(lo, lseg);
- need_return = pnfs_layout_need_return(lo, lseg,
- &stateid, &iomode);
- spin_unlock(&inode->i_lock);
- pnfs_free_lseg(lseg);
- if (need_return)
- pnfs_send_layoutreturn(lo, stateid, iomode, 0,
- NFS4_MAX_UINT64, true);
- else
+ if (pnfs_layout_need_return(lo, lseg)) {
+ spin_unlock(&inode->i_lock);
+ /* hdr reference dropped in nfs4_layoutreturn_release */
+ pnfs_get_layout_hdr(lo);
+ pnfs_layoutreturn_free_lseg_async(lseg);
+ } else {
+ pnfs_layout_remove_lseg(lo, lseg);
+ spin_unlock(&inode->i_lock);
+ pnfs_free_lseg(lseg);
pnfs_put_layout_hdr(lo);
+ }
}
}
EXPORT_SYMBOL_GPL(pnfs_put_lseg);
--
1.9.3
From: Peng Tao <[email protected]>
Use it to indicate that LD wants to retry layoutget. LD can set
it whenever it wants the common pnfs code to return and retry
pnfs path through a new layout.
The bit gets cleared when client does a new layoutget, when client
closes the file (ROC case), or when kernel needs to evict the inode
(non-ROC case).
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/pnfs.c | 3 +++
fs/nfs/pnfs.h | 18 ++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 9e7092f..fec1d897 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -617,6 +617,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
pnfs_get_layout_hdr(lo);
pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
+ pnfs_clear_retry_layoutget(lo);
spin_unlock(&nfsi->vfs_inode.i_lock);
pnfs_free_lseg_list(&tmp_list);
pnfs_put_layout_hdr(lo);
@@ -1070,6 +1071,7 @@ bool pnfs_roc(struct inode *ino)
if (!lo || !test_and_clear_bit(NFS_LAYOUT_ROC, &lo->plh_flags) ||
test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags))
goto out_nolayout;
+ pnfs_clear_retry_layoutget(lo);
list_for_each_entry_safe(lseg, tmp, &lo->plh_segs, pls_list)
if (test_bit(NFS_LSEG_ROC, &lseg->pls_flags)) {
mark_lseg_invalid(lseg, &tmp_list);
@@ -1497,6 +1499,7 @@ lookup_again:
arg.length = PAGE_CACHE_ALIGN(arg.length);
lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
+ pnfs_clear_retry_layoutget(lo);
atomic_dec(&lo->plh_outstanding);
out_put_layout_hdr:
if (first) {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 04a5a31..67a436b 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -99,6 +99,7 @@ enum {
NFS_LAYOUT_RETURN_BEFORE_CLOSE, /* Return this layout before close */
NFS_LAYOUT_INVALID_STID, /* layout stateid id is invalid */
NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */
+ NFS_LAYOUT_RETRY_LAYOUTGET, /* Retry layoutget */
};
enum layoutdriver_policy_flags {
@@ -349,6 +350,23 @@ nfs4_get_deviceid(struct nfs4_deviceid_node *d)
return d;
}
+static inline void pnfs_set_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ if (!test_and_set_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags))
+ atomic_inc(&lo->plh_refcount);
+}
+
+static inline void pnfs_clear_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ if (test_and_clear_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags))
+ atomic_dec(&lo->plh_refcount);
+}
+
+static inline bool pnfs_should_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ return test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags);
+}
+
static inline struct pnfs_layout_segment *
pnfs_get_lseg(struct pnfs_layout_segment *lseg)
{
--
1.9.3
From: Peng Tao <[email protected]>
To allow pnfs LD to ask direct writes to be resend.
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/direct.c | 6 ++++++
fs/nfs/internal.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index eb81478..4fad6b7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -116,6 +116,12 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
return atomic_dec_and_test(&dreq->io_count);
}
+void nfs_direct_set_resched_writes(struct nfs_direct_req *dreq)
+{
+ dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
+}
+EXPORT_SYMBOL_GPL(nfs_direct_set_resched_writes);
+
static void
nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr)
{
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ffe4b7a..44e8496 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -502,6 +502,7 @@ static inline void nfs_inode_dio_wait(struct inode *inode)
inode_dio_wait(inode);
}
extern ssize_t nfs_dreq_bytes_left(struct nfs_direct_req *dreq);
+extern void nfs_direct_set_resched_writes(struct nfs_direct_req *dreq);
/* nfs4proc.c */
extern void __nfs4_read_done_cb(struct nfs_pgio_header *);
--
1.9.3
From: Peng Tao <[email protected]>
Also take care to stop waiting if someone clears retry bit.
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs4proc.c | 4 +++-
fs/nfs/pnfs.c | 39 ++++++++++++++++++++++++++++++++++++++-
fs/nfs/pnfs.h | 5 ++++-
3 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 72c5e01..f05e965 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7786,7 +7786,9 @@ static void nfs4_layoutreturn_release(void *calldata)
spin_lock(&lo->plh_inode->i_lock);
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
- clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ clear_bit_unlock(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ smp_mb__after_atomic();
+ wake_up_bit(&lo->plh_flags, NFS_LAYOUT_RETURN);
clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE, &lo->plh_flags);
rpc_wake_up(&NFS_SERVER(lo->plh_inode)->roc_rpcwaitq);
lo->plh_block_lgets--;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index fec1d897..8c1440d 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1404,6 +1404,26 @@ static bool pnfs_within_mdsthreshold(struct nfs_open_context *ctx,
return ret;
}
+/* stop waiting if someone clears NFS_LAYOUT_RETRY_LAYOUTGET bit. */
+static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key)
+{
+ if (!test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, key->flags))
+ return 1;
+ return nfs_wait_bit_killable(key);
+}
+
+static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ /*
+ * send layoutcommit as it can hold up layoutreturn due to lseg
+ * reference
+ */
+ pnfs_layoutcommit_inode(lo->plh_inode, false);
+ return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
+ pnfs_layoutget_retry_bit_wait,
+ TASK_UNINTERRUPTIBLE);
+}
+
/*
* Layout segment is retreived from the server if not cached.
* The appropriate layout segment is referenced and returned to the caller.
@@ -1450,7 +1470,8 @@ lookup_again:
}
/* if LAYOUTGET already failed once we don't try again */
- if (pnfs_layout_io_test_failed(lo, iomode))
+ if (pnfs_layout_io_test_failed(lo, iomode) &&
+ !pnfs_should_retry_layoutget(lo))
goto out_unlock;
first = list_empty(&lo->plh_segs);
@@ -1475,6 +1496,22 @@ lookup_again:
goto out_unlock;
}
+ /*
+ * Because we free lsegs before sending LAYOUTRETURN, we need to wait
+ * for LAYOUTRETURN even if first is true.
+ */
+ if (!lseg && pnfs_should_retry_layoutget(lo) &&
+ test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags)) {
+ spin_unlock(&ino->i_lock);
+ dprintk("%s wait for layoutreturn\n", __func__);
+ if (pnfs_prepare_to_retry_layoutget(lo)) {
+ pnfs_put_layout_hdr(lo);
+ dprintk("%s retrying\n", __func__);
+ goto lookup_again;
+ }
+ goto out_put_layout_hdr;
+ }
+
if (pnfs_layoutgets_blocked(lo, &arg, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 67a436b..c2b4328 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -358,8 +358,11 @@ static inline void pnfs_set_retry_layoutget(struct pnfs_layout_hdr *lo)
static inline void pnfs_clear_retry_layoutget(struct pnfs_layout_hdr *lo)
{
- if (test_and_clear_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags))
+ if (test_and_clear_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags)) {
atomic_dec(&lo->plh_refcount);
+ /* wake up waiters for LAYOUTRETURN as that is not needed */
+ wake_up_bit(&lo->plh_flags, NFS_LAYOUT_RETURN);
+ }
}
static inline bool pnfs_should_retry_layoutget(struct pnfs_layout_hdr *lo)
--
1.9.3
The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
Signed-off-by: Tao Peng <[email protected]>
---
fs/nfs/Kconfig | 5 +
fs/nfs/Makefile | 1 +
fs/nfs/flexfilelayout/Makefile | 5 +
fs/nfs/flexfilelayout/flexfilelayout.c | 1600 +++++++++++++++++++++++++++++
fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
include/linux/nfs4.h | 1 +
7 files changed, 2322 insertions(+)
create mode 100644 fs/nfs/flexfilelayout/Makefile
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 3dece03..c7abc10 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -128,6 +128,11 @@ config PNFS_OBJLAYOUT
depends on NFS_V4_1 && SCSI_OSD_ULD
default NFS_V4
+config PNFS_FLEXFILE_LAYOUT
+ tristate
+ depends on NFS_V4_1 && NFS_V3
+ default m
+
config NFS_V4_1_IMPLEMENTATION_ID_DOMAIN
string "NFSv4.1 Implementation ID Domain"
depends on NFS_V4_1
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 7973c4e3..3c97bd9 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -33,3 +33,4 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayout/
obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
+obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/
diff --git a/fs/nfs/flexfilelayout/Makefile b/fs/nfs/flexfilelayout/Makefile
new file mode 100644
index 0000000..1d2c9f6
--- /dev/null
+++ b/fs/nfs/flexfilelayout/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the pNFS Flexfile Layout Driver kernel module
+#
+obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += nfs_layout_flexfiles.o
+nfs_layout_flexfiles-y := flexfilelayout.o flexfilelayoutdev.o
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
new file mode 100644
index 0000000..fddd3e6
--- /dev/null
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -0,0 +1,1600 @@
+/*
+ * Module for pnfs flexfile layout driver.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <[email protected]>
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
+#include <linux/module.h>
+
+#include <linux/sunrpc/metrics.h>
+
+#include "flexfilelayout.h"
+#include "../nfs4session.h"
+#include "../internal.h"
+#include "../delegation.h"
+#include "../nfs4trace.h"
+#include "../iostat.h"
+#include "../nfs.h"
+
+#define NFSDBG_FACILITY NFSDBG_PNFS_LD
+
+#define FF_LAYOUT_POLL_RETRY_MAX (15*HZ)
+
+static struct pnfs_layout_hdr *
+ff_layout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
+{
+ struct nfs4_flexfile_layout *ffl;
+
+ ffl = kzalloc(sizeof(*ffl), gfp_flags);
+ INIT_LIST_HEAD(&ffl->error_list);
+ return ffl != NULL ? &ffl->generic_hdr : NULL;
+}
+
+static void
+ff_layout_free_layout_hdr(struct pnfs_layout_hdr *lo)
+{
+ struct nfs4_ff_layout_ds_err *err, *n;
+
+ list_for_each_entry_safe(err, n, &FF_LAYOUT_FROM_HDR(lo)->error_list,
+ list) {
+ list_del(&err->list);
+ kfree(err);
+ }
+ kfree(FF_LAYOUT_FROM_HDR(lo));
+}
+
+static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
+{
+ __be32 *p;
+
+ p = xdr_inline_decode(xdr, NFS4_STATEID_SIZE);
+ if (unlikely(p == NULL))
+ return -ENOBUFS;
+ memcpy(stateid, p, NFS4_STATEID_SIZE);
+ dprintk("%s: stateid id= [%x%x%x%x]\n", __func__,
+ p[0], p[1], p[2], p[3]);
+ return 0;
+}
+
+static int decode_deviceid(struct xdr_stream *xdr, struct nfs4_deviceid *devid)
+{
+ __be32 *p;
+
+ p = xdr_inline_decode(xdr, NFS4_DEVICEID4_SIZE);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ memcpy(devid, p, NFS4_DEVICEID4_SIZE);
+ nfs4_print_deviceid(devid);
+ return 0;
+}
+
+static int decode_nfs_fh(struct xdr_stream *xdr, struct nfs_fh *fh)
+{
+ __be32 *p;
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ fh->size = be32_to_cpup(p++);
+ if (fh->size > sizeof(struct nfs_fh)) {
+ printk(KERN_ERR "NFS flexfiles: Too big fh received %d\n",
+ fh->size);
+ return -EOVERFLOW;
+ }
+ /* fh.data */
+ p = xdr_inline_decode(xdr, fh->size);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ memcpy(&fh->data, p, fh->size);
+ dprintk("%s: fh len %d\n", __func__, fh->size);
+
+ return 0;
+}
+
+/*
+ * we only handle AUTH_NONE and AUTH_UNIX for now.
+ *
+ * For AUTH_UNIX, we want to parse
+ * struct authsys_parms {
+ * unsigned int stamp;
+ * string machinename<255>;
+ * unsigned int uid;
+ * unsigned int gid;
+ * unsigned int gids<16>;
+ * };
+ */
+static int
+ff_layout_parse_auth(struct xdr_stream *xdr,
+ struct nfs4_ff_layout_mirror *mirror)
+{
+ __be32 *p;
+ int flavor, len, gid_it = 0;
+
+ /* authflavor(4) + opaque_length(4)*/
+ p = xdr_inline_decode(xdr, 8);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ flavor = be32_to_cpup(p++);
+ len = be32_to_cpup(p++);
+ if (flavor < RPC_AUTH_NULL || flavor >= RPC_AUTH_MAXFLAVOR ||
+ len < 0)
+ return -EINVAL;
+
+ dprintk("%s: flavor %u len %u\n", __func__, flavor, len);
+
+ if (flavor == RPC_AUTH_NULL && len == 0)
+ goto out_fill;
+
+ /* opaque body */
+ p = xdr_inline_decode(xdr, len);
+ if (unlikely(!p))
+ return -ENOBUFS;
+
+ if (flavor == RPC_AUTH_NULL) {
+ mirror->uid = -1;
+ mirror->gid = -1;
+ } else if (flavor == RPC_AUTH_UNIX) {
+ int len2;
+
+ p++; /* stamp */
+ len2 = be32_to_cpup(p++); /* machinename length */
+ dprintk("%s: machinename length %u\n", __func__, len2);
+ if (len2 < 0 || len2 >= len || len2 > 255)
+ return -EINVAL;
+ p += XDR_QUADLEN(len2); /* machinename */
+
+ mirror->uid = be32_to_cpup(p++);
+ mirror->gid = be32_to_cpup(p++);
+
+ len2 = be32_to_cpup(p++); /* gid array length */
+ dprintk("%s: gid array length %u\n", __func__, len2);
+ if (len2 > 16)
+ return -EINVAL;
+ for (; gid_it < len2; gid_it++)
+ mirror->gids[gid_it] = be32_to_cpup(p++);
+ } else {
+ return -EPROTONOSUPPORT;
+ }
+
+out_fill:
+ /* filling the rest of gids */
+ for (; gid_it < 16; gid_it++)
+ mirror->gids[gid_it] = -1;
+
+ return 0;
+}
+
+static void ff_layout_free_mirror_array(struct nfs4_ff_layout_segment *fls)
+{
+ int i;
+
+ if (fls->mirror_array) {
+ for (i = 0; i < fls->mirror_array_cnt; i++) {
+ /* normally mirror_ds is freed in
+ * .free_deviceid_node but we still do it here
+ * for .alloc_lseg error path */
+ if (fls->mirror_array[i]) {
+ kfree(fls->mirror_array[i]->fh_versions);
+ nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
+ kfree(fls->mirror_array[i]);
+ }
+ }
+ kfree(fls->mirror_array);
+ fls->mirror_array = NULL;
+ }
+}
+
+static int ff_layout_check_layout(struct nfs4_layoutget_res *lgr)
+{
+ int ret = 0;
+
+ dprintk("--> %s\n", __func__);
+
+ /* FIXME: remove this check when layout segment support is added */
+ if (lgr->range.offset != 0 ||
+ lgr->range.length != NFS4_MAX_UINT64) {
+ dprintk("%s Only whole file layouts supported. Use MDS i/o\n",
+ __func__);
+ ret = -EINVAL;
+ }
+
+ dprintk("--> %s returns %d\n", __func__, ret);
+ return ret;
+}
+
+static void _ff_layout_free_lseg(struct nfs4_ff_layout_segment *fls)
+{
+ if (fls) {
+ ff_layout_free_mirror_array(fls);
+ kfree(fls);
+ }
+}
+
+static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
+{
+ struct nfs4_ff_layout_mirror *tmp;
+ int i, j;
+
+ for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
+ for (j = i + 1; j < fls->mirror_array_cnt; j++)
+ if (fls->mirror_array[i]->efficiency <
+ fls->mirror_array[j]->efficiency) {
+ tmp = fls->mirror_array[i];
+ fls->mirror_array[i] = fls->mirror_array[j];
+ fls->mirror_array[j] = tmp;
+ }
+ }
+}
+
+static struct pnfs_layout_segment *
+ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
+ struct nfs4_layoutget_res *lgr,
+ gfp_t gfp_flags)
+{
+ struct pnfs_layout_segment *ret;
+ struct nfs4_ff_layout_segment *fls = NULL;
+ struct xdr_stream stream;
+ struct xdr_buf buf;
+ struct page *scratch;
+ u64 stripe_unit;
+ u32 mirror_array_cnt;
+ __be32 *p;
+ int i, rc;
+
+ dprintk("--> %s\n", __func__);
+ scratch = alloc_page(gfp_flags);
+ if (!scratch)
+ return ERR_PTR(-ENOMEM);
+
+ xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages,
+ lgr->layoutp->len);
+ xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+
+ /* stripe unit and mirror_array_cnt */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 8 + 4);
+ if (!p)
+ goto out_err_free;
+
+ p = xdr_decode_hyper(p, &stripe_unit);
+ mirror_array_cnt = be32_to_cpup(p++);
+ dprintk("%s: stripe_unit=%llu mirror_array_cnt=%u\n", __func__,
+ stripe_unit, mirror_array_cnt);
+
+ if (mirror_array_cnt > NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT ||
+ mirror_array_cnt == 0)
+ goto out_err_free;
+
+ rc = -ENOMEM;
+ fls = kzalloc(sizeof(*fls), gfp_flags);
+ if (!fls)
+ goto out_err_free;
+
+ fls->mirror_array_cnt = mirror_array_cnt;
+ fls->stripe_unit = stripe_unit;
+ fls->mirror_array = kcalloc(fls->mirror_array_cnt,
+ sizeof(fls->mirror_array[0]), gfp_flags);
+ if (fls->mirror_array == NULL)
+ goto out_err_free;
+
+ for (i = 0; i < fls->mirror_array_cnt; i++) {
+ struct nfs4_deviceid devid;
+ struct nfs4_deviceid_node *idnode;
+ u32 ds_count;
+ u32 fh_count;
+ int j;
+
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ ds_count = be32_to_cpup(p);
+
+ /* FIXME: allow for striping? */
+ if (ds_count != 1)
+ goto out_err_free;
+
+ fls->mirror_array[i] =
+ kzalloc(sizeof(struct nfs4_ff_layout_mirror),
+ gfp_flags);
+ if (fls->mirror_array[i] == NULL) {
+ rc = -ENOMEM;
+ goto out_err_free;
+ }
+
+ spin_lock_init(&fls->mirror_array[i]->lock);
+ fls->mirror_array[i]->ds_count = ds_count;
+
+ /* deviceid */
+ rc = decode_deviceid(&stream, &devid);
+ if (rc)
+ goto out_err_free;
+
+ idnode = nfs4_find_get_deviceid(NFS_SERVER(lh->plh_inode),
+ &devid, lh->plh_lc_cred,
+ gfp_flags);
+ /*
+ * upon success, mirror_ds is allocated by previous
+ * getdeviceinfo, or newly by .alloc_deviceid_node
+ * nfs4_find_get_deviceid failure is indeed getdeviceinfo falure
+ */
+ if (idnode)
+ fls->mirror_array[i]->mirror_ds =
+ FF_LAYOUT_MIRROR_DS(idnode);
+ else
+ goto out_err_free;
+
+ /* efficiency */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ fls->mirror_array[i]->efficiency = be32_to_cpup(p);
+
+ /* stateid */
+ rc = decode_stateid(&stream, &fls->mirror_array[i]->stateid);
+ if (rc)
+ goto out_err_free;
+
+ /* fh */
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ fh_count = be32_to_cpup(p);
+
+ fls->mirror_array[i]->fh_versions =
+ kzalloc(fh_count * sizeof(struct nfs_fh),
+ gfp_flags);
+ if (fls->mirror_array[i]->fh_versions == NULL) {
+ rc = -ENOMEM;
+ goto out_err_free;
+ }
+
+ for (j = 0; j < fh_count; j++) {
+ rc = decode_nfs_fh(&stream,
+ &fls->mirror_array[i]->fh_versions[j]);
+ if (rc)
+ goto out_err_free;
+ }
+
+ fls->mirror_array[i]->fh_versions_cnt = fh_count;
+
+ /* opaque_auth */
+ rc = ff_layout_parse_auth(&stream, fls->mirror_array[i]);
+ if (rc)
+ goto out_err_free;
+
+ dprintk("%s: uid %d gid %d\n", __func__,
+ fls->mirror_array[i]->uid,
+ fls->mirror_array[i]->gid);
+ }
+
+ ff_layout_sort_mirrors(fls);
+ rc = ff_layout_check_layout(lgr);
+ if (rc)
+ goto out_err_free;
+
+ ret = &fls->generic_hdr;
+ dprintk("<-- %s (success)\n", __func__);
+out_free_page:
+ __free_page(scratch);
+ return ret;
+out_err_free:
+ _ff_layout_free_lseg(fls);
+ ret = ERR_PTR(rc);
+ dprintk("<-- %s (%d)\n", __func__, rc);
+ goto out_free_page;
+}
+
+static bool ff_layout_has_rw_segments(struct pnfs_layout_hdr *layout)
+{
+ struct pnfs_layout_segment *lseg;
+
+ list_for_each_entry(lseg, &layout->plh_segs, pls_list)
+ if (lseg->pls_range.iomode == IOMODE_RW)
+ return true;
+
+ return false;
+}
+
+static void
+ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
+{
+ struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
+ int i;
+
+ dprintk("--> %s\n", __func__);
+
+ for (i = 0; i < fls->mirror_array_cnt; i++) {
+ if (fls->mirror_array[i]) {
+ nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
+ fls->mirror_array[i]->mirror_ds = NULL;
+ if (fls->mirror_array[i]->cred) {
+ put_rpccred(fls->mirror_array[i]->cred);
+ fls->mirror_array[i]->cred = NULL;
+ }
+ }
+ }
+
+ if (lseg->pls_range.iomode == IOMODE_RW) {
+ struct nfs4_flexfile_layout *ffl;
+ struct inode *inode;
+
+ ffl = FF_LAYOUT_FROM_HDR(lseg->pls_layout);
+ inode = ffl->generic_hdr.plh_inode;
+ spin_lock(&inode->i_lock);
+ if (!ff_layout_has_rw_segments(lseg->pls_layout)) {
+ ffl->commit_info.nbuckets = 0;
+ kfree(ffl->commit_info.buckets);
+ ffl->commit_info.buckets = NULL;
+ }
+ spin_unlock(&inode->i_lock);
+ }
+ _ff_layout_free_lseg(fls);
+}
+
+/* Return 1 until we have multiple lsegs support */
+static int
+ff_layout_get_lseg_count(struct nfs4_ff_layout_segment *fls)
+{
+ return 1;
+}
+
+static int
+ff_layout_alloc_commit_info(struct pnfs_layout_segment *lseg,
+ struct nfs_commit_info *cinfo,
+ gfp_t gfp_flags)
+{
+ struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
+ struct pnfs_commit_bucket *buckets;
+ int size;
+
+ if (cinfo->ds->nbuckets != 0) {
+ /* This assumes there is only one RW lseg per file.
+ * To support multiple lseg per file, we need to
+ * change struct pnfs_commit_bucket to allow dynamic
+ * increasing nbuckets.
+ */
+ return 0;
+ }
+
+ size = ff_layout_get_lseg_count(fls) * FF_LAYOUT_MIRROR_COUNT(lseg);
+
+ buckets = kcalloc(size, sizeof(struct pnfs_commit_bucket),
+ gfp_flags);
+ if (!buckets)
+ return -ENOMEM;
+ else {
+ int i;
+
+ spin_lock(cinfo->lock);
+ if (cinfo->ds->nbuckets != 0)
+ kfree(buckets);
+ else {
+ cinfo->ds->buckets = buckets;
+ cinfo->ds->nbuckets = size;
+ for (i = 0; i < size; i++) {
+ INIT_LIST_HEAD(&buckets[i].written);
+ INIT_LIST_HEAD(&buckets[i].committing);
+ /* mark direct verifier as unset */
+ buckets[i].direct_verf.committed =
+ NFS_INVALID_STABLE_HOW;
+ }
+ }
+ spin_unlock(cinfo->lock);
+ return 0;
+ }
+}
+
+static struct nfs4_pnfs_ds *
+ff_layout_choose_best_ds_for_read(struct nfs_pageio_descriptor *pgio,
+ int *best_idx)
+{
+ struct nfs4_ff_layout_segment *fls;
+ struct nfs4_pnfs_ds *ds;
+ int idx;
+
+ fls = FF_LAYOUT_LSEG(pgio->pg_lseg);
+ /* mirrors are sorted by efficiency */
+ for (idx = 0; idx < fls->mirror_array_cnt; idx++) {
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, idx, false);
+ if (ds) {
+ *best_idx = idx;
+ return ds;
+ }
+ }
+
+ return NULL;
+}
+
+static void
+ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ struct nfs_pgio_mirror *pgm;
+ struct nfs4_ff_layout_mirror *mirror;
+ struct nfs4_pnfs_ds *ds;
+ int ds_idx;
+
+ /* Use full layout for now */
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ req->wb_context,
+ 0,
+ NFS4_MAX_UINT64,
+ IOMODE_READ,
+ GFP_KERNEL);
+ /* If no lseg, fall back to read through mds */
+ if (pgio->pg_lseg == NULL)
+ goto out_mds;
+
+ ds = ff_layout_choose_best_ds_for_read(pgio, &ds_idx);
+ if (!ds)
+ goto out_mds;
+ mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
+
+ pgio->pg_mirror_idx = ds_idx;
+
+ /* read always uses only one mirror - idx 0 for pgio layer */
+ pgm = &pgio->pg_mirrors[0];
+ pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
+
+ return;
+out_mds:
+ pnfs_put_lseg(pgio->pg_lseg);
+ pgio->pg_lseg = NULL;
+ nfs_pageio_reset_read_mds(pgio);
+}
+
+static void
+ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ struct nfs4_ff_layout_mirror *mirror;
+ struct nfs_pgio_mirror *pgm;
+ struct nfs_commit_info cinfo;
+ struct nfs4_pnfs_ds *ds;
+ int i;
+ int status;
+
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ req->wb_context,
+ 0,
+ NFS4_MAX_UINT64,
+ IOMODE_RW,
+ GFP_NOFS);
+ /* If no lseg, fall back to write through mds */
+ if (pgio->pg_lseg == NULL)
+ goto out_mds;
+
+ nfs_init_cinfo(&cinfo, pgio->pg_inode, pgio->pg_dreq);
+ status = ff_layout_alloc_commit_info(pgio->pg_lseg, &cinfo, GFP_NOFS);
+ if (status < 0)
+ goto out_mds;
+
+ /* Use a direct mapping of ds_idx to pgio mirror_idx */
+ if (WARN_ON_ONCE(pgio->pg_mirror_count !=
+ FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg)))
+ goto out_mds;
+
+ for (i = 0; i < pgio->pg_mirror_count; i++) {
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, i, true);
+ if (!ds)
+ goto out_mds;
+ pgm = &pgio->pg_mirrors[i];
+ mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
+ pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
+ }
+
+ return;
+
+out_mds:
+ pnfs_put_lseg(pgio->pg_lseg);
+ pgio->pg_lseg = NULL;
+ nfs_pageio_reset_write_mds(pgio);
+}
+
+static unsigned int
+ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ req->wb_context,
+ 0,
+ NFS4_MAX_UINT64,
+ IOMODE_RW,
+ GFP_NOFS);
+ if (pgio->pg_lseg)
+ return FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg);
+
+ /* no lseg means that pnfs is not in use, so no mirroring here */
+ pnfs_put_lseg(pgio->pg_lseg);
+ pgio->pg_lseg = NULL;
+ nfs_pageio_reset_write_mds(pgio);
+ return 1;
+}
+
+static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
+ .pg_init = ff_layout_pg_init_read,
+ .pg_test = pnfs_generic_pg_test,
+ .pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
+};
+
+static const struct nfs_pageio_ops ff_layout_pg_write_ops = {
+ .pg_init = ff_layout_pg_init_write,
+ .pg_test = pnfs_generic_pg_test,
+ .pg_doio = pnfs_generic_pg_writepages,
+ .pg_get_mirror_count = ff_layout_pg_get_mirror_count_write,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
+};
+
+static void ff_layout_reset_write(struct nfs_pgio_header *hdr, bool retry_pnfs)
+{
+ struct rpc_task *task = &hdr->task;
+
+ pnfs_layoutcommit_inode(hdr->inode, false);
+
+ if (retry_pnfs) {
+ dprintk("%s Reset task %5u for i/o through pNFS "
+ "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
+ hdr->task.tk_pid,
+ hdr->inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(hdr->inode),
+ hdr->args.count,
+ (unsigned long long)hdr->args.offset);
+
+ if (!hdr->dreq) {
+ struct nfs_open_context *ctx;
+
+ ctx = nfs_list_entry(hdr->pages.next)->wb_context;
+ set_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
+ hdr->completion_ops->error_cleanup(&hdr->pages);
+ } else {
+ nfs_direct_set_resched_writes(hdr->dreq);
+ }
+ return;
+ }
+
+ if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
+ dprintk("%s Reset task %5u for i/o through MDS "
+ "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
+ hdr->task.tk_pid,
+ hdr->inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(hdr->inode),
+ hdr->args.count,
+ (unsigned long long)hdr->args.offset);
+
+ task->tk_status = pnfs_write_done_resend_to_mds(hdr);
+ }
+}
+
+static void ff_layout_reset_read(struct nfs_pgio_header *hdr)
+{
+ struct rpc_task *task = &hdr->task;
+
+ pnfs_layoutcommit_inode(hdr->inode, false);
+
+ if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
+ dprintk("%s Reset task %5u for i/o through MDS "
+ "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
+ hdr->task.tk_pid,
+ hdr->inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(hdr->inode),
+ hdr->args.count,
+ (unsigned long long)hdr->args.offset);
+
+ task->tk_status = pnfs_read_done_resend_to_mds(hdr);
+ }
+}
+
+static int ff_layout_async_handle_error_v4(struct rpc_task *task,
+ struct nfs4_state *state,
+ struct nfs_client *clp,
+ struct pnfs_layout_segment *lseg,
+ int idx)
+{
+ struct pnfs_layout_hdr *lo = lseg->pls_layout;
+ struct inode *inode = lo->plh_inode;
+ struct nfs_server *mds_server = NFS_SERVER(inode);
+
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs_client *mds_client = mds_server->nfs_client;
+ struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
+
+ if (task->tk_status >= 0)
+ return 0;
+
+ switch (task->tk_status) {
+ /* MDS state errors */
+ case -NFS4ERR_DELEG_REVOKED:
+ case -NFS4ERR_ADMIN_REVOKED:
+ case -NFS4ERR_BAD_STATEID:
+ if (state == NULL)
+ break;
+ nfs_remove_bad_delegation(state->inode);
+ case -NFS4ERR_OPENMODE:
+ if (state == NULL)
+ break;
+ if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
+ goto out_bad_stateid;
+ goto wait_on_recovery;
+ case -NFS4ERR_EXPIRED:
+ if (state != NULL) {
+ if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
+ goto out_bad_stateid;
+ }
+ nfs4_schedule_lease_recovery(mds_client);
+ goto wait_on_recovery;
+ /* DS session errors */
+ case -NFS4ERR_BADSESSION:
+ case -NFS4ERR_BADSLOT:
+ case -NFS4ERR_BAD_HIGH_SLOT:
+ case -NFS4ERR_DEADSESSION:
+ case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+ case -NFS4ERR_SEQ_FALSE_RETRY:
+ case -NFS4ERR_SEQ_MISORDERED:
+ dprintk("%s ERROR %d, Reset session. Exchangeid "
+ "flags 0x%x\n", __func__, task->tk_status,
+ clp->cl_exchange_flags);
+ nfs4_schedule_session_recovery(clp->cl_session, task->tk_status);
+ break;
+ case -NFS4ERR_DELAY:
+ case -NFS4ERR_GRACE:
+ rpc_delay(task, FF_LAYOUT_POLL_RETRY_MAX);
+ break;
+ case -NFS4ERR_RETRY_UNCACHED_REP:
+ break;
+ /* Invalidate Layout errors */
+ case -NFS4ERR_PNFS_NO_LAYOUT:
+ case -ESTALE: /* mapped NFS4ERR_STALE */
+ case -EBADHANDLE: /* mapped NFS4ERR_BADHANDLE */
+ case -EISDIR: /* mapped NFS4ERR_ISDIR */
+ case -NFS4ERR_FHEXPIRED:
+ case -NFS4ERR_WRONG_TYPE:
+ dprintk("%s Invalid layout error %d\n", __func__,
+ task->tk_status);
+ /*
+ * Destroy layout so new i/o will get a new layout.
+ * Layout will not be destroyed until all current lseg
+ * references are put. Mark layout as invalid to resend failed
+ * i/o and all i/o waiting on the slot table to the MDS until
+ * layout is destroyed and a new valid layout is obtained.
+ */
+ pnfs_destroy_layout(NFS_I(inode));
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ goto reset;
+ /* RPC connection errors */
+ case -ECONNREFUSED:
+ case -EHOSTDOWN:
+ case -EHOSTUNREACH:
+ case -ENETUNREACH:
+ case -EIO:
+ case -ETIMEDOUT:
+ case -EPIPE:
+ dprintk("%s DS connection error %d\n", __func__,
+ task->tk_status);
+ nfs4_mark_deviceid_unavailable(devid);
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ /* fall through */
+ default:
+ if (ff_layout_has_available_ds(lseg))
+ return -NFS4ERR_RESET_TO_PNFS;
+reset:
+ dprintk("%s Retry through MDS. Error %d\n", __func__,
+ task->tk_status);
+ return -NFS4ERR_RESET_TO_MDS;
+ }
+out:
+ task->tk_status = 0;
+ return -EAGAIN;
+out_bad_stateid:
+ task->tk_status = -EIO;
+ return 0;
+wait_on_recovery:
+ rpc_sleep_on(&mds_client->cl_rpcwaitq, task, NULL);
+ if (test_bit(NFS4CLNT_MANAGER_RUNNING, &mds_client->cl_state) == 0)
+ rpc_wake_up_queued_task(&mds_client->cl_rpcwaitq, task);
+ goto out;
+}
+
+/* Retry all errors through either pNFS or MDS except for -EJUKEBOX */
+static int ff_layout_async_handle_error_v3(struct rpc_task *task,
+ struct pnfs_layout_segment *lseg,
+ int idx)
+{
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+
+ if (task->tk_status >= 0)
+ return 0;
+
+ if (task->tk_status != -EJUKEBOX) {
+ dprintk("%s DS connection error %d\n", __func__,
+ task->tk_status);
+ nfs4_mark_deviceid_unavailable(devid);
+ if (ff_layout_has_available_ds(lseg))
+ return -NFS4ERR_RESET_TO_PNFS;
+ else
+ return -NFS4ERR_RESET_TO_MDS;
+ }
+
+ if (task->tk_status == -EJUKEBOX)
+ nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY);
+ task->tk_status = 0;
+ rpc_restart_call(task);
+ rpc_delay(task, NFS_JUKEBOX_RETRY_TIME);
+ return -EAGAIN;
+}
+
+static int ff_layout_async_handle_error(struct rpc_task *task,
+ struct nfs4_state *state,
+ struct nfs_client *clp,
+ struct pnfs_layout_segment *lseg,
+ int idx)
+{
+ int vers = clp->cl_nfs_mod->rpc_vers->number;
+
+ switch (vers) {
+ case 3:
+ return ff_layout_async_handle_error_v3(task, lseg, idx);
+ case 4:
+ return ff_layout_async_handle_error_v4(task, state, clp,
+ lseg, idx);
+ default:
+ /* should never happen */
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+}
+
+static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
+ int idx, u64 offset, u64 length,
+ u32 status, int opnum)
+{
+ struct nfs4_ff_layout_mirror *mirror;
+ int err;
+
+ mirror = FF_LAYOUT_COMP(lseg, idx);
+ err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
+ mirror, offset, length, status, opnum,
+ GFP_NOIO);
+ dprintk("%s: err %d op %d status %u\n", __func__, err, opnum, status);
+}
+
+/* NFS_PROTO call done callback routines */
+
+static int ff_layout_read_done_cb(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ struct inode *inode;
+ int err;
+
+ trace_nfs4_pnfs_read(hdr, task->tk_status);
+ if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
+ hdr->res.op_status = NFS4ERR_NXIO;
+ if (task->tk_status < 0 && hdr->res.op_status)
+ ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
+ hdr->args.offset, hdr->args.count,
+ hdr->res.op_status, OP_READ);
+ err = ff_layout_async_handle_error(task, hdr->args.context->state,
+ hdr->ds_clp, hdr->lseg,
+ hdr->pgio_mirror_idx);
+
+ switch (err) {
+ case -NFS4ERR_RESET_TO_PNFS:
+ set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &hdr->lseg->pls_layout->plh_flags);
+ pnfs_read_resend_pnfs(hdr);
+ return task->tk_status;
+ case -NFS4ERR_RESET_TO_MDS:
+ inode = hdr->lseg->pls_layout->plh_inode;
+ pnfs_error_mark_layout_for_return(inode, hdr->lseg);
+ ff_layout_reset_read(hdr);
+ return task->tk_status;
+ case -EAGAIN:
+ rpc_restart_call_prepare(task);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+/*
+ * We reference the rpc_cred of the first WRITE that triggers the need for
+ * a LAYOUTCOMMIT, and use it to send the layoutcommit compound.
+ * rfc5661 is not clear about which credential should be used.
+ *
+ * Flexlayout client should treat DS replied FILE_SYNC as DATA_SYNC, so
+ * to follow http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
+ * we always send layoutcommit after DS writes.
+ */
+static void
+ff_layout_set_layoutcommit(struct nfs_pgio_header *hdr)
+{
+ pnfs_set_layoutcommit(hdr);
+ dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino,
+ (unsigned long) NFS_I(hdr->inode)->layout->plh_lwb);
+}
+
+static bool
+ff_layout_reset_to_mds(struct pnfs_layout_segment *lseg, int idx)
+{
+ /* No mirroring for now */
+ struct nfs4_deviceid_node *node = FF_LAYOUT_DEVID_NODE(lseg, idx);
+
+ return ff_layout_test_devid_unavailable(node);
+}
+
+static int ff_layout_read_prepare_common(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
+ rpc_exit(task, -EIO);
+ return -EIO;
+ }
+ if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
+ dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+ if (ff_layout_has_available_ds(hdr->lseg))
+ pnfs_read_resend_pnfs(hdr);
+ else
+ ff_layout_reset_read(hdr);
+ rpc_exit(task, 0);
+ return -EAGAIN;
+ }
+ hdr->pgio_done_cb = ff_layout_read_done_cb;
+
+ return 0;
+}
+
+/*
+ * Call ops for the async read/write cases
+ * In the case of dense layouts, the offset needs to be reset to its
+ * original value.
+ */
+static void ff_layout_read_prepare_v3(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_read_prepare_common(task, hdr))
+ return;
+
+ rpc_call_start(task);
+}
+
+static int ff_layout_setup_sequence(struct nfs_client *ds_clp,
+ struct nfs4_sequence_args *args,
+ struct nfs4_sequence_res *res,
+ struct rpc_task *task)
+{
+ if (ds_clp->cl_session)
+ return nfs41_setup_sequence(ds_clp->cl_session,
+ args,
+ res,
+ task);
+ return nfs40_setup_sequence(ds_clp->cl_slot_tbl,
+ args,
+ res,
+ task);
+}
+
+static void ff_layout_read_prepare_v4(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_read_prepare_common(task, hdr))
+ return;
+
+ if (ff_layout_setup_sequence(hdr->ds_clp,
+ &hdr->args.seq_args,
+ &hdr->res.seq_res,
+ task))
+ return;
+
+ if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
+ hdr->args.lock_context, FMODE_READ) == -EIO)
+ rpc_exit(task, -EIO); /* lost lock, terminate I/O */
+}
+
+static void ff_layout_read_call_done(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
+
+ if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
+ task->tk_status == 0) {
+ nfs4_sequence_done(task, &hdr->res.seq_res);
+ return;
+ }
+
+ /* Note this may cause RPC to be resent */
+ hdr->mds_ops->rpc_call_done(task, hdr);
+}
+
+static void ff_layout_read_count_stats(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ rpc_count_iostats_metrics(task,
+ &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_READ]);
+}
+
+static int ff_layout_write_done_cb(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ struct inode *inode;
+ int err;
+
+ trace_nfs4_pnfs_write(hdr, task->tk_status);
+ if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
+ hdr->res.op_status = NFS4ERR_NXIO;
+ if (task->tk_status < 0 && hdr->res.op_status)
+ ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
+ hdr->args.offset, hdr->args.count,
+ hdr->res.op_status, OP_WRITE);
+ err = ff_layout_async_handle_error(task, hdr->args.context->state,
+ hdr->ds_clp, hdr->lseg,
+ hdr->pgio_mirror_idx);
+
+ switch (err) {
+ case -NFS4ERR_RESET_TO_PNFS:
+ case -NFS4ERR_RESET_TO_MDS:
+ inode = hdr->lseg->pls_layout->plh_inode;
+ pnfs_error_mark_layout_for_return(inode, hdr->lseg);
+ if (err == -NFS4ERR_RESET_TO_PNFS) {
+ pnfs_set_retry_layoutget(hdr->lseg->pls_layout);
+ ff_layout_reset_write(hdr, true);
+ } else {
+ pnfs_clear_retry_layoutget(hdr->lseg->pls_layout);
+ ff_layout_reset_write(hdr, false);
+ }
+ return task->tk_status;
+ case -EAGAIN:
+ rpc_restart_call_prepare(task);
+ return -EAGAIN;
+ }
+
+ if (hdr->res.verf->committed == NFS_FILE_SYNC ||
+ hdr->res.verf->committed == NFS_DATA_SYNC)
+ ff_layout_set_layoutcommit(hdr);
+
+ return 0;
+}
+
+static int ff_layout_commit_done_cb(struct rpc_task *task,
+ struct nfs_commit_data *data)
+{
+ struct inode *inode;
+ int err;
+
+ trace_nfs4_pnfs_commit_ds(data, task->tk_status);
+ if (task->tk_status == -ETIMEDOUT && !data->res.op_status)
+ data->res.op_status = NFS4ERR_NXIO;
+ if (task->tk_status < 0 && data->res.op_status)
+ ff_layout_io_track_ds_error(data->lseg, data->ds_commit_index,
+ data->args.offset, data->args.count,
+ data->res.op_status, OP_COMMIT);
+ err = ff_layout_async_handle_error(task, NULL, data->ds_clp,
+ data->lseg, data->ds_commit_index);
+
+ switch (err) {
+ case -NFS4ERR_RESET_TO_PNFS:
+ case -NFS4ERR_RESET_TO_MDS:
+ inode = data->lseg->pls_layout->plh_inode;
+ pnfs_error_mark_layout_for_return(inode, data->lseg);
+ if (err == -NFS4ERR_RESET_TO_PNFS)
+ pnfs_set_retry_layoutget(data->lseg->pls_layout);
+ else
+ pnfs_clear_retry_layoutget(data->lseg->pls_layout);
+ pnfs_generic_prepare_to_resend_writes(data);
+ return -EAGAIN;
+ case -EAGAIN:
+ rpc_restart_call_prepare(task);
+ return -EAGAIN;
+ }
+
+ if (data->verf.committed == NFS_UNSTABLE)
+ pnfs_commit_set_layoutcommit(data);
+
+ return 0;
+}
+
+static int ff_layout_write_prepare_common(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
+ rpc_exit(task, -EIO);
+ return -EIO;
+ }
+
+ if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
+ bool retry_pnfs;
+
+ retry_pnfs = ff_layout_has_available_ds(hdr->lseg);
+ dprintk("%s task %u reset io to %s\n", __func__,
+ task->tk_pid, retry_pnfs ? "pNFS" : "MDS");
+ ff_layout_reset_write(hdr, retry_pnfs);
+ rpc_exit(task, 0);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+static void ff_layout_write_prepare_v3(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_write_prepare_common(task, hdr))
+ return;
+
+ rpc_call_start(task);
+}
+
+static void ff_layout_write_prepare_v4(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_write_prepare_common(task, hdr))
+ return;
+
+ if (ff_layout_setup_sequence(hdr->ds_clp,
+ &hdr->args.seq_args,
+ &hdr->res.seq_res,
+ task))
+ return;
+
+ if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
+ hdr->args.lock_context, FMODE_WRITE) == -EIO)
+ rpc_exit(task, -EIO); /* lost lock, terminate I/O */
+}
+
+static void ff_layout_write_call_done(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
+ task->tk_status == 0) {
+ nfs4_sequence_done(task, &hdr->res.seq_res);
+ return;
+ }
+
+ /* Note this may cause RPC to be resent */
+ hdr->mds_ops->rpc_call_done(task, hdr);
+}
+
+static void ff_layout_write_count_stats(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ rpc_count_iostats_metrics(task,
+ &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_WRITE]);
+}
+
+static void ff_layout_commit_prepare_v3(struct rpc_task *task, void *data)
+{
+ rpc_call_start(task);
+}
+
+static void ff_layout_commit_prepare_v4(struct rpc_task *task, void *data)
+{
+ struct nfs_commit_data *wdata = data;
+
+ ff_layout_setup_sequence(wdata->ds_clp,
+ &wdata->args.seq_args,
+ &wdata->res.seq_res,
+ task);
+}
+
+static void ff_layout_commit_count_stats(struct rpc_task *task, void *data)
+{
+ struct nfs_commit_data *cdata = data;
+
+ rpc_count_iostats_metrics(task,
+ &NFS_CLIENT(cdata->inode)->cl_metrics[NFSPROC4_CLNT_COMMIT]);
+}
+
+static const struct rpc_call_ops ff_layout_read_call_ops_v3 = {
+ .rpc_call_prepare = ff_layout_read_prepare_v3,
+ .rpc_call_done = ff_layout_read_call_done,
+ .rpc_count_stats = ff_layout_read_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_read_call_ops_v4 = {
+ .rpc_call_prepare = ff_layout_read_prepare_v4,
+ .rpc_call_done = ff_layout_read_call_done,
+ .rpc_count_stats = ff_layout_read_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_write_call_ops_v3 = {
+ .rpc_call_prepare = ff_layout_write_prepare_v3,
+ .rpc_call_done = ff_layout_write_call_done,
+ .rpc_count_stats = ff_layout_write_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_write_call_ops_v4 = {
+ .rpc_call_prepare = ff_layout_write_prepare_v4,
+ .rpc_call_done = ff_layout_write_call_done,
+ .rpc_count_stats = ff_layout_write_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_commit_call_ops_v3 = {
+ .rpc_call_prepare = ff_layout_commit_prepare_v3,
+ .rpc_call_done = pnfs_generic_write_commit_done,
+ .rpc_count_stats = ff_layout_commit_count_stats,
+ .rpc_release = pnfs_generic_commit_release,
+};
+
+static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
+ .rpc_call_prepare = ff_layout_commit_prepare_v4,
+ .rpc_call_done = pnfs_generic_write_commit_done,
+ .rpc_count_stats = ff_layout_commit_count_stats,
+ .rpc_release = pnfs_generic_commit_release,
+};
+
+static enum pnfs_try_status
+ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
+{
+ struct pnfs_layout_segment *lseg = hdr->lseg;
+ struct nfs4_pnfs_ds *ds;
+ struct rpc_clnt *ds_clnt;
+ struct rpc_cred *ds_cred;
+ loff_t offset = hdr->args.offset;
+ u32 idx = hdr->pgio_mirror_idx;
+ int vers;
+ struct nfs_fh *fh;
+
+ dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
+ __func__, hdr->inode->i_ino,
+ hdr->args.pgbase, (size_t)hdr->args.count, offset);
+
+ ds = nfs4_ff_layout_prepare_ds(lseg, idx, false);
+ if (!ds)
+ goto out_failed;
+
+ ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
+ hdr->inode);
+ if (IS_ERR(ds_clnt))
+ goto out_failed;
+
+ ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
+ if (IS_ERR(ds_cred))
+ goto out_failed;
+
+ vers = nfs4_ff_layout_ds_version(lseg, idx);
+
+ dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
+ ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count), vers);
+
+ atomic_inc(&ds->ds_clp->cl_count);
+ hdr->ds_clp = ds->ds_clp;
+ fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
+ if (fh)
+ hdr->args.fh = fh;
+
+ /*
+ * Note that if we ever decide to split across DSes,
+ * then we may need to handle dense-like offsets.
+ */
+ hdr->args.offset = offset;
+ hdr->mds_offset = offset;
+
+ /* Perform an asynchronous read to ds */
+ nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ vers == 3 ? &ff_layout_read_call_ops_v3 :
+ &ff_layout_read_call_ops_v4,
+ 0, RPC_TASK_SOFTCONN);
+
+ return PNFS_ATTEMPTED;
+
+out_failed:
+ if (ff_layout_has_available_ds(lseg))
+ return PNFS_TRY_AGAIN;
+ return PNFS_NOT_ATTEMPTED;
+}
+
+/* Perform async writes. */
+static enum pnfs_try_status
+ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+{
+ struct pnfs_layout_segment *lseg = hdr->lseg;
+ struct nfs4_pnfs_ds *ds;
+ struct rpc_clnt *ds_clnt;
+ struct rpc_cred *ds_cred;
+ loff_t offset = hdr->args.offset;
+ int vers;
+ struct nfs_fh *fh;
+ int idx = hdr->pgio_mirror_idx;
+
+ ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
+ if (!ds)
+ return PNFS_NOT_ATTEMPTED;
+
+ ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
+ hdr->inode);
+ if (IS_ERR(ds_clnt))
+ return PNFS_NOT_ATTEMPTED;
+
+ ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
+ if (IS_ERR(ds_cred))
+ return PNFS_NOT_ATTEMPTED;
+
+ vers = nfs4_ff_layout_ds_version(lseg, idx);
+
+ dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d vers %d\n",
+ __func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
+ offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count),
+ vers);
+
+ hdr->pgio_done_cb = ff_layout_write_done_cb;
+ atomic_inc(&ds->ds_clp->cl_count);
+ hdr->ds_clp = ds->ds_clp;
+ hdr->ds_commit_idx = idx;
+ fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
+ if (fh)
+ hdr->args.fh = fh;
+
+ /*
+ * Note that if we ever decide to split across DSes,
+ * then we may need to handle dense-like offsets.
+ */
+ hdr->args.offset = offset;
+
+ /* Perform an asynchronous write */
+ nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ vers == 3 ? &ff_layout_write_call_ops_v3 :
+ &ff_layout_write_call_ops_v4,
+ sync, RPC_TASK_SOFTCONN);
+ return PNFS_ATTEMPTED;
+}
+
+static void
+ff_layout_mark_request_commit(struct nfs_page *req,
+ struct pnfs_layout_segment *lseg,
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx)
+{
+ struct list_head *list;
+ struct pnfs_commit_bucket *buckets;
+
+ spin_lock(cinfo->lock);
+ buckets = cinfo->ds->buckets;
+ list = &buckets[ds_commit_idx].written;
+ if (list_empty(list)) {
+ /* Non-empty buckets hold a reference on the lseg. That ref
+ * is normally transferred to the COMMIT call and released
+ * there. It could also be released if the last req is pulled
+ * off due to a rewrite, in which case it will be done in
+ * pnfs_common_clear_request_commit
+ */
+ WARN_ON_ONCE(buckets[ds_commit_idx].wlseg != NULL);
+ buckets[ds_commit_idx].wlseg = pnfs_get_lseg(lseg);
+ }
+ set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
+ cinfo->ds->nwritten++;
+
+ /* nfs_request_add_commit_list(). We need to add req to list without
+ * dropping cinfo lock.
+ */
+ set_bit(PG_CLEAN, &(req)->wb_flags);
+ nfs_list_add_request(req, list);
+ cinfo->mds->ncommit++;
+ spin_unlock(cinfo->lock);
+ if (!cinfo->dreq) {
+ inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+ inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
+ BDI_RECLAIMABLE);
+ __mark_inode_dirty(req->wb_context->dentry->d_inode,
+ I_DIRTY_DATASYNC);
+ }
+}
+
+static u32 calc_ds_index_from_commit(struct pnfs_layout_segment *lseg, u32 i)
+{
+ return i;
+}
+
+static struct nfs_fh *
+select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
+{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+
+ /* FIXME: Assume that there is only one NFS version available
+ * for the DS.
+ */
+ return &flseg->mirror_array[i]->fh_versions[0];
+}
+
+static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
+{
+ struct pnfs_layout_segment *lseg = data->lseg;
+ struct nfs4_pnfs_ds *ds;
+ struct rpc_clnt *ds_clnt;
+ struct rpc_cred *ds_cred;
+ u32 idx;
+ int vers;
+ struct nfs_fh *fh;
+
+ idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
+ ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
+ if (!ds)
+ goto out_err;
+
+ ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
+ data->inode);
+ if (IS_ERR(ds_clnt))
+ goto out_err;
+
+ ds_cred = ff_layout_get_ds_cred(lseg, idx, data->cred);
+ if (IS_ERR(ds_cred))
+ goto out_err;
+
+ vers = nfs4_ff_layout_ds_version(lseg, idx);
+
+ dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
+ data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count),
+ vers);
+ data->commit_done_cb = ff_layout_commit_done_cb;
+ data->cred = ds_cred;
+ atomic_inc(&ds->ds_clp->cl_count);
+ data->ds_clp = ds->ds_clp;
+ fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
+ if (fh)
+ data->args.fh = fh;
+ return nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
+ vers == 3 ? &ff_layout_commit_call_ops_v3 :
+ &ff_layout_commit_call_ops_v4,
+ how, RPC_TASK_SOFTCONN);
+out_err:
+ pnfs_generic_prepare_to_resend_writes(data);
+ pnfs_generic_commit_release(data);
+ return -EAGAIN;
+}
+
+static int
+ff_layout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
+ int how, struct nfs_commit_info *cinfo)
+{
+ return pnfs_generic_commit_pagelist(inode, mds_pages, how, cinfo,
+ ff_layout_initiate_commit);
+}
+
+static struct pnfs_ds_commit_info *
+ff_layout_get_ds_info(struct inode *inode)
+{
+ struct pnfs_layout_hdr *layout = NFS_I(inode)->layout;
+
+ if (layout == NULL)
+ return NULL;
+ else
+ return &FF_LAYOUT_FROM_HDR(layout)->commit_info;
+}
+
+static void
+ff_layout_free_deveiceid_node(struct nfs4_deviceid_node *d)
+{
+ nfs4_ff_layout_free_deviceid(container_of(d, struct nfs4_ff_layout_ds,
+ id_node));
+}
+
+static int ff_layout_encode_ioerr(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ struct pnfs_layout_hdr *hdr = &flo->generic_hdr;
+ __be32 *start;
+ int count = 0, ret = 0;
+
+ start = xdr_reserve_space(xdr, 4);
+ if (unlikely(!start))
+ return -E2BIG;
+
+ /* This assume we always return _ALL_ layouts */
+ spin_lock(&hdr->plh_inode->i_lock);
+ ret = ff_layout_encode_ds_ioerr(flo, xdr, &count, &args->range);
+ spin_unlock(&hdr->plh_inode->i_lock);
+
+ *start = cpu_to_be32(count);
+
+ return ret;
+}
+
+/* report nothing for now */
+static void ff_layout_encode_iostats(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 4);
+ if (likely(p))
+ *p = cpu_to_be32(0);
+}
+
+static struct nfs4_deviceid_node *
+ff_layout_alloc_deviceid_node(struct nfs_server *server,
+ struct pnfs_device *pdev, gfp_t gfp_flags)
+{
+ struct nfs4_ff_layout_ds *dsaddr;
+
+ dsaddr = nfs4_ff_alloc_deviceid_node(server, pdev, gfp_flags);
+ if (!dsaddr)
+ return NULL;
+ return &dsaddr->id_node;
+}
+
+static void
+ff_layout_encode_layoutreturn(struct pnfs_layout_hdr *lo,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ struct nfs4_flexfile_layout *flo = FF_LAYOUT_FROM_HDR(lo);
+ __be32 *start;
+
+ dprintk("%s: Begin\n", __func__);
+ start = xdr_reserve_space(xdr, 4);
+ BUG_ON(!start);
+
+ if (ff_layout_encode_ioerr(flo, xdr, args))
+ goto out;
+
+ ff_layout_encode_iostats(flo, xdr, args);
+out:
+ *start = cpu_to_be32((xdr->p - start - 1) * 4);
+ dprintk("%s: Return\n", __func__);
+}
+
+static struct pnfs_layoutdriver_type flexfilelayout_type = {
+ .id = LAYOUT_FLEX_FILES,
+ .name = "LAYOUT_FLEX_FILES",
+ .owner = THIS_MODULE,
+ .alloc_layout_hdr = ff_layout_alloc_layout_hdr,
+ .free_layout_hdr = ff_layout_free_layout_hdr,
+ .alloc_lseg = ff_layout_alloc_lseg,
+ .free_lseg = ff_layout_free_lseg,
+ .pg_read_ops = &ff_layout_pg_read_ops,
+ .pg_write_ops = &ff_layout_pg_write_ops,
+ .get_ds_info = ff_layout_get_ds_info,
+ .free_deviceid_node = ff_layout_free_deveiceid_node,
+ .mark_request_commit = ff_layout_mark_request_commit,
+ .clear_request_commit = pnfs_generic_clear_request_commit,
+ .scan_commit_lists = pnfs_generic_scan_commit_lists,
+ .recover_commit_reqs = pnfs_generic_recover_commit_reqs,
+ .commit_pagelist = ff_layout_commit_pagelist,
+ .read_pagelist = ff_layout_read_pagelist,
+ .write_pagelist = ff_layout_write_pagelist,
+ .alloc_deviceid_node = ff_layout_alloc_deviceid_node,
+ .encode_layoutreturn = ff_layout_encode_layoutreturn,
+};
+
+static int __init nfs4flexfilelayout_init(void)
+{
+ printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Registering...\n",
+ __func__);
+ return pnfs_register_layoutdriver(&flexfilelayout_type);
+}
+
+static void __exit nfs4flexfilelayout_exit(void)
+{
+ printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Unregistering...\n",
+ __func__);
+ pnfs_unregister_layoutdriver(&flexfilelayout_type);
+}
+
+MODULE_ALIAS("nfs-layouttype4-4");
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("The NFSv4 flexfile layout driver");
+
+module_init(nfs4flexfilelayout_init);
+module_exit(nfs4flexfilelayout_exit);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
new file mode 100644
index 0000000..712fc55
--- /dev/null
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -0,0 +1,158 @@
+/*
+ * NFSv4 flexfile layout driver data structures.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <[email protected]>
+ */
+
+#ifndef FS_NFS_NFS4FLEXFILELAYOUT_H
+#define FS_NFS_NFS4FLEXFILELAYOUT_H
+
+#include "../pnfs.h"
+
+/* XXX: Let's filter out insanely large mirror count for now to avoid oom
+ * due to network error etc. */
+#define NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT 4096
+
+struct nfs4_ff_ds_version {
+ u32 version;
+ u32 minor_version;
+ u32 rsize;
+ u32 wsize;
+ bool tightly_coupled;
+};
+
+/* chained in global deviceid hlist */
+struct nfs4_ff_layout_ds {
+ struct nfs4_deviceid_node id_node;
+ u32 ds_versions_cnt;
+ struct nfs4_ff_ds_version *ds_versions;
+ struct nfs4_pnfs_ds *ds;
+};
+
+struct nfs4_ff_layout_ds_err {
+ struct list_head list; /* linked in mirror error_list */
+ u64 offset;
+ u64 length;
+ int status;
+ enum nfs_opnum4 opnum;
+ nfs4_stateid stateid;
+ struct nfs4_deviceid deviceid;
+};
+
+struct nfs4_ff_layout_mirror {
+ u32 ds_count;
+ u32 efficiency;
+ struct nfs4_ff_layout_ds *mirror_ds;
+ u32 fh_versions_cnt;
+ struct nfs_fh *fh_versions;
+ nfs4_stateid stateid;
+ union {
+ struct { /* same as struct unx_cred */
+ u32 uid; /* -1 iff AUTH_NONE */
+ u32 gid; /* -1 iff AUTH_NONE */
+ u32 gids[16];
+ };
+ };
+ struct rpc_cred *cred;
+ spinlock_t lock;
+};
+
+struct nfs4_ff_layout_segment {
+ struct pnfs_layout_segment generic_hdr;
+ u64 stripe_unit;
+ u32 mirror_array_cnt;
+ struct nfs4_ff_layout_mirror **mirror_array;
+};
+
+struct nfs4_flexfile_layout {
+ struct pnfs_layout_hdr generic_hdr;
+ struct pnfs_ds_commit_info commit_info;
+ struct list_head error_list; /* nfs4_ff_layout_ds_err */
+};
+
+static inline struct nfs4_flexfile_layout *
+FF_LAYOUT_FROM_HDR(struct pnfs_layout_hdr *lo)
+{
+ return container_of(lo, struct nfs4_flexfile_layout, generic_hdr);
+}
+
+static inline struct nfs4_ff_layout_segment *
+FF_LAYOUT_LSEG(struct pnfs_layout_segment *lseg)
+{
+ return container_of(lseg,
+ struct nfs4_ff_layout_segment,
+ generic_hdr);
+}
+
+static inline struct nfs4_deviceid_node *
+FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
+{
+ if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt ||
+ FF_LAYOUT_LSEG(lseg)->mirror_array[idx] == NULL ||
+ FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds == NULL)
+ return NULL;
+ return &FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds->id_node;
+}
+
+static inline struct nfs4_ff_layout_ds *
+FF_LAYOUT_MIRROR_DS(struct nfs4_deviceid_node *node)
+{
+ return container_of(node, struct nfs4_ff_layout_ds, id_node);
+}
+
+static inline struct nfs4_ff_layout_mirror *
+FF_LAYOUT_COMP(struct pnfs_layout_segment *lseg, u32 idx)
+{
+ if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt)
+ return NULL;
+ return FF_LAYOUT_LSEG(lseg)->mirror_array[idx];
+}
+
+static inline u32
+FF_LAYOUT_MIRROR_COUNT(struct pnfs_layout_segment *lseg)
+{
+ return FF_LAYOUT_LSEG(lseg)->mirror_array_cnt;
+}
+
+static inline bool
+ff_layout_test_devid_unavailable(struct nfs4_deviceid_node *node)
+{
+ return nfs4_test_deviceid_unavailable(node);
+}
+
+static inline int
+nfs4_ff_layout_ds_version(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+ return FF_LAYOUT_COMP(lseg, ds_idx)->mirror_ds->ds_versions[0].version;
+}
+
+struct nfs4_ff_layout_ds *
+nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
+ gfp_t gfp_flags);
+void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
+void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
+int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
+ struct nfs4_ff_layout_mirror *mirror, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ gfp_t gfp_flags);
+int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr, int *count,
+ const struct pnfs_layout_range *range);
+struct nfs_fh *
+nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx);
+
+struct nfs4_pnfs_ds *
+nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ bool fail_return);
+
+struct rpc_clnt *
+nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg,
+ u32 ds_idx,
+ struct nfs_client *ds_clp,
+ struct inode *inode);
+struct rpc_cred *ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg,
+ u32 ds_idx, struct rpc_cred *mdscred);
+bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg);
+#endif /* FS_NFS_NFS4FLEXFILELAYOUT_H */
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
new file mode 100644
index 0000000..5dae5c2
--- /dev/null
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -0,0 +1,552 @@
+/*
+ * Device operations for the pnfs nfs4 file layout driver.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <[email protected]>
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/vmalloc.h>
+#include <linux/module.h>
+#include <linux/sunrpc/addr.h>
+
+#include "../internal.h"
+#include "../nfs4session.h"
+#include "flexfilelayout.h"
+
+#define NFSDBG_FACILITY NFSDBG_PNFS_LD
+
+static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
+static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
+
+void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
+{
+ if (mirror_ds)
+ nfs4_put_deviceid_node(&mirror_ds->id_node);
+}
+
+void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
+{
+ nfs4_print_deviceid(&mirror_ds->id_node.deviceid);
+ nfs4_pnfs_ds_put(mirror_ds->ds);
+ kfree(mirror_ds);
+}
+
+/* Decode opaque device data and construct new_ds using it */
+struct nfs4_ff_layout_ds *
+nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
+ gfp_t gfp_flags)
+{
+ struct xdr_stream stream;
+ struct xdr_buf buf;
+ struct page *scratch;
+ struct list_head dsaddrs;
+ struct nfs4_pnfs_ds_addr *da;
+ struct nfs4_ff_layout_ds *new_ds = NULL;
+ struct nfs4_ff_ds_version *ds_versions = NULL;
+ u32 mp_count;
+ u32 version_count;
+ __be32 *p;
+ int i, ret = -ENOMEM;
+
+ /* set up xdr stream */
+ scratch = alloc_page(gfp_flags);
+ if (!scratch)
+ goto out_err;
+
+ new_ds = kzalloc(sizeof(struct nfs4_ff_layout_ds), gfp_flags);
+ if (!new_ds)
+ goto out_scratch;
+
+ nfs4_init_deviceid_node(&new_ds->id_node,
+ server,
+ &pdev->dev_id);
+ INIT_LIST_HEAD(&dsaddrs);
+
+ xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen);
+ xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+
+ /* multipath count */
+ p = xdr_inline_decode(&stream, 4);
+ if (unlikely(!p))
+ goto out_err_drain_dsaddrs;
+ mp_count = be32_to_cpup(p);
+ dprintk("%s: multipath ds count %d\n", __func__, mp_count);
+
+ for (i = 0; i < mp_count; i++) {
+ /* multipath ds */
+ da = nfs4_decode_mp_ds_addr(server->nfs_client->cl_net,
+ &stream, gfp_flags);
+ if (da)
+ list_add_tail(&da->da_node, &dsaddrs);
+ }
+ if (list_empty(&dsaddrs)) {
+ dprintk("%s: no suitable DS addresses found\n",
+ __func__);
+ ret = -ENOMEDIUM;
+ goto out_err_drain_dsaddrs;
+ }
+
+ /* version count */
+ p = xdr_inline_decode(&stream, 4);
+ if (unlikely(!p))
+ goto out_err_drain_dsaddrs;
+ version_count = be32_to_cpup(p);
+ dprintk("%s: version count %d\n", __func__, version_count);
+
+ ds_versions = kzalloc(version_count * sizeof(struct nfs4_ff_ds_version),
+ gfp_flags);
+ if (!ds_versions)
+ goto out_scratch;
+
+ for (i = 0; i < version_count; i++) {
+ /* 20 = version(4) + minor_version(4) + rsize(4) + wsize(4) +
+ * tightly_coupled(4) */
+ p = xdr_inline_decode(&stream, 20);
+ if (unlikely(!p))
+ goto out_err_drain_dsaddrs;
+ ds_versions[i].version = be32_to_cpup(p++);
+ ds_versions[i].minor_version = be32_to_cpup(p++);
+ ds_versions[i].rsize = nfs_block_size(be32_to_cpup(p++), NULL);
+ ds_versions[i].wsize = nfs_block_size(be32_to_cpup(p++), NULL);
+ ds_versions[i].tightly_coupled = be32_to_cpup(p);
+
+ if (ds_versions[i].rsize > NFS_MAX_FILE_IO_SIZE)
+ ds_versions[i].rsize = NFS_MAX_FILE_IO_SIZE;
+ if (ds_versions[i].wsize > NFS_MAX_FILE_IO_SIZE)
+ ds_versions[i].wsize = NFS_MAX_FILE_IO_SIZE;
+
+ if (ds_versions[i].version != 3 || ds_versions[i].minor_version != 0) {
+ dprintk("%s: [%d] unsupported ds version %d-%d\n", __func__,
+ i, ds_versions[i].version,
+ ds_versions[i].minor_version);
+ ret = -EPROTONOSUPPORT;
+ goto out_err_drain_dsaddrs;
+ }
+
+ dprintk("%s: [%d] vers %u minor_ver %u rsize %u wsize %u coupled %d\n",
+ __func__, i, ds_versions[i].version,
+ ds_versions[i].minor_version,
+ ds_versions[i].rsize,
+ ds_versions[i].wsize,
+ ds_versions[i].tightly_coupled);
+ }
+
+ new_ds->ds_versions = ds_versions;
+ new_ds->ds_versions_cnt = version_count;
+
+ new_ds->ds = nfs4_pnfs_ds_add(&dsaddrs, gfp_flags);
+ if (!new_ds->ds)
+ goto out_err_drain_dsaddrs;
+
+ /* If DS was already in cache, free ds addrs */
+ while (!list_empty(&dsaddrs)) {
+ da = list_first_entry(&dsaddrs,
+ struct nfs4_pnfs_ds_addr,
+ da_node);
+ list_del_init(&da->da_node);
+ kfree(da->da_remotestr);
+ kfree(da);
+ }
+
+ __free_page(scratch);
+ return new_ds;
+
+out_err_drain_dsaddrs:
+ while (!list_empty(&dsaddrs)) {
+ da = list_first_entry(&dsaddrs, struct nfs4_pnfs_ds_addr,
+ da_node);
+ list_del_init(&da->da_node);
+ kfree(da->da_remotestr);
+ kfree(da);
+ }
+
+ kfree(ds_versions);
+out_scratch:
+ __free_page(scratch);
+out_err:
+ kfree(new_ds);
+
+ dprintk("%s ERROR: returning %d\n", __func__, ret);
+ return NULL;
+}
+
+static u64
+end_offset(u64 start, u64 len)
+{
+ u64 end;
+
+ end = start + len;
+ return end >= start ? end : NFS4_MAX_UINT64;
+}
+
+static void extend_ds_error(struct nfs4_ff_layout_ds_err *err,
+ u64 offset, u64 length)
+{
+ u64 end;
+
+ end = max_t(u64, end_offset(err->offset, err->length),
+ end_offset(offset, length));
+ err->offset = min_t(u64, err->offset, offset);
+ err->length = end - err->offset;
+}
+
+static bool ds_error_can_merge(struct nfs4_ff_layout_ds_err *err, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ nfs4_stateid *stateid,
+ struct nfs4_deviceid *deviceid)
+{
+ return err->status == status && err->opnum == opnum &&
+ nfs4_stateid_match(&err->stateid, stateid) &&
+ !memcmp(&err->deviceid, deviceid, sizeof(*deviceid)) &&
+ end_offset(err->offset, err->length) >= offset &&
+ err->offset <= end_offset(offset, length);
+}
+
+static bool merge_ds_error(struct nfs4_ff_layout_ds_err *old,
+ struct nfs4_ff_layout_ds_err *new)
+{
+ if (!ds_error_can_merge(old, new->offset, new->length, new->status,
+ new->opnum, &new->stateid, &new->deviceid))
+ return false;
+
+ extend_ds_error(old, new->offset, new->length);
+ return true;
+}
+
+static bool
+ff_layout_add_ds_error_locked(struct nfs4_flexfile_layout *flo,
+ struct nfs4_ff_layout_ds_err *dserr)
+{
+ struct nfs4_ff_layout_ds_err *err;
+
+ list_for_each_entry(err, &flo->error_list, list) {
+ if (merge_ds_error(err, dserr)) {
+ return true;
+ }
+ }
+
+ list_add(&dserr->list, &flo->error_list);
+ return false;
+}
+
+static bool
+ff_layout_update_ds_error(struct nfs4_flexfile_layout *flo, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ nfs4_stateid *stateid, struct nfs4_deviceid *deviceid)
+{
+ bool found = false;
+ struct nfs4_ff_layout_ds_err *err;
+
+ list_for_each_entry(err, &flo->error_list, list) {
+ if (ds_error_can_merge(err, offset, length, status, opnum,
+ stateid, deviceid)) {
+ found = true;
+ extend_ds_error(err, offset, length);
+ break;
+ }
+ }
+
+ return found;
+}
+
+int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
+ struct nfs4_ff_layout_mirror *mirror, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ gfp_t gfp_flags)
+{
+ struct nfs4_ff_layout_ds_err *dserr;
+ bool needfree;
+
+ if (status == 0)
+ return 0;
+
+ if (mirror->mirror_ds == NULL)
+ return -EINVAL;
+
+ spin_lock(&flo->generic_hdr.plh_inode->i_lock);
+ if (ff_layout_update_ds_error(flo, offset, length, status, opnum,
+ &mirror->stateid,
+ &mirror->mirror_ds->id_node.deviceid)) {
+ spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
+ return 0;
+ }
+ spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
+ dserr = kmalloc(sizeof(*dserr), gfp_flags);
+ if (!dserr)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&dserr->list);
+ dserr->offset = offset;
+ dserr->length = length;
+ dserr->status = status;
+ dserr->opnum = opnum;
+ nfs4_stateid_copy(&dserr->stateid, &mirror->stateid);
+ memcpy(&dserr->deviceid, &mirror->mirror_ds->id_node.deviceid,
+ NFS4_DEVICEID4_SIZE);
+
+ spin_lock(&flo->generic_hdr.plh_inode->i_lock);
+ needfree = ff_layout_add_ds_error_locked(flo, dserr);
+ spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
+ if (needfree)
+ kfree(dserr);
+
+ return 0;
+}
+
+/* currently we only support AUTH_NONE and AUTH_SYS */
+static rpc_authflavor_t
+nfs4_ff_layout_choose_authflavor(struct nfs4_ff_layout_mirror *mirror)
+{
+ if (mirror->uid == (u32)-1)
+ return RPC_AUTH_NULL;
+ return RPC_AUTH_UNIX;
+}
+
+/* fetch cred for NFSv3 DS */
+static int ff_layout_update_mirror_cred(struct nfs4_ff_layout_mirror *mirror,
+ struct nfs4_pnfs_ds *ds)
+{
+ if (ds && !mirror->cred && mirror->mirror_ds->ds_versions[0].version == 3) {
+ struct rpc_auth *auth = ds->ds_clp->cl_rpcclient->cl_auth;
+ struct rpc_cred *cred;
+ struct auth_cred acred = {
+ .uid = make_kuid(&init_user_ns, mirror->uid),
+ .gid = make_kgid(&init_user_ns, mirror->gid),
+ };
+
+ /* AUTH_NULL ignores acred */
+ cred = auth->au_ops->lookup_cred(auth, &acred, 0);
+ if (IS_ERR(cred)) {
+ dprintk("%s: lookup_cred failed with %ld\n",
+ __func__, PTR_ERR(cred));
+ return PTR_ERR(cred);
+ } else {
+ mirror->cred = cred;
+ }
+ }
+ return 0;
+}
+
+struct nfs_fh *
+nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, mirror_idx);
+ struct nfs_fh *fh = NULL;
+ struct nfs4_deviceid_node *devid;
+
+ if (mirror == NULL || mirror->mirror_ds == NULL ||
+ mirror->mirror_ds->ds == NULL) {
+ printk(KERN_ERR "NFS: %s: No data server for mirror offset index %d\n",
+ __func__, mirror_idx);
+ if (mirror && mirror->mirror_ds) {
+ devid = &mirror->mirror_ds->id_node;
+ pnfs_generic_mark_devid_invalid(devid);
+ }
+ goto out;
+ }
+
+ /* FIXME: For now assume there is only 1 version available for the DS */
+ fh = &mirror->fh_versions[0];
+out:
+ return fh;
+}
+
+/* Upon return, either ds is connected, or ds is NULL */
+struct nfs4_pnfs_ds *
+nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ bool fail_return)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct nfs4_pnfs_ds *ds = NULL;
+ struct nfs4_deviceid_node *devid;
+ struct inode *ino = lseg->pls_layout->plh_inode;
+ struct nfs_server *s = NFS_SERVER(ino);
+ unsigned int max_payload;
+ rpc_authflavor_t flavor;
+
+ if (mirror == NULL || mirror->mirror_ds == NULL ||
+ mirror->mirror_ds->ds == NULL) {
+ printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
+ __func__, ds_idx);
+ if (mirror && mirror->mirror_ds) {
+ devid = &mirror->mirror_ds->id_node;
+ pnfs_generic_mark_devid_invalid(devid);
+ }
+ goto out;
+ }
+
+ ds = mirror->mirror_ds->ds;
+ devid = &mirror->mirror_ds->id_node;
+
+ /* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
+ smp_rmb();
+ if (ds->ds_clp)
+ goto out_test_devid;
+
+ flavor = nfs4_ff_layout_choose_authflavor(mirror);
+
+ /* FIXME: For now we assume the server sent only one version of NFS
+ * to use for the DS.
+ */
+ nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
+ dataserver_retrans,
+ mirror->mirror_ds->ds_versions[0].version,
+ mirror->mirror_ds->ds_versions[0].minor_version,
+ flavor);
+
+ /* connect success, check rsize/wsize limit */
+ if (ds->ds_clp) {
+ max_payload =
+ nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
+ NULL);
+ if (mirror->mirror_ds->ds_versions[0].rsize > max_payload)
+ mirror->mirror_ds->ds_versions[0].rsize = max_payload;
+ if (mirror->mirror_ds->ds_versions[0].wsize > max_payload)
+ mirror->mirror_ds->ds_versions[0].wsize = max_payload;
+ } else {
+ ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
+ mirror, lseg->pls_range.offset,
+ lseg->pls_range.length, NFS4ERR_NXIO,
+ OP_ILLEGAL, GFP_NOIO);
+ if (fail_return) {
+ pnfs_error_mark_layout_for_return(ino, lseg);
+ if (ff_layout_has_available_ds(lseg))
+ pnfs_set_retry_layoutget(lseg->pls_layout);
+ else
+ pnfs_clear_retry_layoutget(lseg->pls_layout);
+
+ } else {
+ if (ff_layout_has_available_ds(lseg))
+ set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &lseg->pls_layout->plh_flags);
+ else {
+ pnfs_error_mark_layout_for_return(ino, lseg);
+ pnfs_clear_retry_layoutget(lseg->pls_layout);
+ }
+ }
+ }
+
+out_test_devid:
+ if (ff_layout_test_devid_unavailable(devid))
+ ds = NULL;
+out:
+ if (ff_layout_update_mirror_cred(mirror, ds))
+ ds = NULL;
+ return ds;
+}
+
+struct rpc_cred *
+ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ struct rpc_cred *mdscred)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct rpc_cred *cred = ERR_PTR(-EINVAL);
+
+ if (!nfs4_ff_layout_prepare_ds(lseg, ds_idx, true))
+ goto out;
+
+ if (mirror && mirror->cred)
+ cred = mirror->cred;
+ else
+ cred = mdscred;
+out:
+ return cred;
+}
+
+/**
+* Find or create a DS rpc client with th MDS server rpc client auth flavor
+* in the nfs_client cl_ds_clients list.
+*/
+struct rpc_clnt *
+nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ struct nfs_client *ds_clp, struct inode *inode)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+
+ switch (mirror->mirror_ds->ds_versions[0].version) {
+ case 3:
+ /* For NFSv3 DS, flavor is set when creating DS connections */
+ return ds_clp->cl_rpcclient;
+ case 4:
+ return nfs4_find_or_create_ds_client(ds_clp, inode);
+ default:
+ BUG();
+ }
+}
+
+static bool is_range_intersecting(u64 offset1, u64 length1,
+ u64 offset2, u64 length2)
+{
+ u64 end1 = end_offset(offset1, length1);
+ u64 end2 = end_offset(offset2, length2);
+
+ return (end1 == NFS4_MAX_UINT64 || end1 > offset2) &&
+ (end2 == NFS4_MAX_UINT64 || end2 > offset1);
+}
+
+/* called with inode i_lock held */
+int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr, int *count,
+ const struct pnfs_layout_range *range)
+{
+ struct nfs4_ff_layout_ds_err *err, *n;
+ __be32 *p;
+
+ list_for_each_entry_safe(err, n, &flo->error_list, list) {
+ if (!is_range_intersecting(err->offset, err->length,
+ range->offset, range->length))
+ continue;
+ /* offset(8) + length(8) + stateid(NFS4_STATEID_SIZE)
+ * + deviceid(NFS4_DEVICEID4_SIZE) + status(4) + opnum(4)
+ */
+ p = xdr_reserve_space(xdr,
+ 24 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ p = xdr_encode_hyper(p, err->offset);
+ p = xdr_encode_hyper(p, err->length);
+ p = xdr_encode_opaque_fixed(p, &err->stateid,
+ NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, &err->deviceid,
+ NFS4_DEVICEID4_SIZE);
+ *p++ = cpu_to_be32(err->status);
+ *p++ = cpu_to_be32(err->opnum);
+ *count += 1;
+ list_del(&err->list);
+ kfree(err);
+ dprintk("%s: offset %llu length %llu status %d op %d count %d\n",
+ __func__, err->offset, err->length, err->status,
+ err->opnum, *count);
+ }
+
+ return 0;
+}
+
+bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg)
+{
+ struct nfs4_ff_layout_mirror *mirror;
+ struct nfs4_deviceid_node *devid;
+ int idx;
+
+ for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
+ mirror = FF_LAYOUT_COMP(lseg, idx);
+ if (mirror && mirror->mirror_ds) {
+ devid = &mirror->mirror_ds->id_node;
+ if (!ff_layout_test_devid_unavailable(devid))
+ return true;
+ }
+ }
+
+ return false;
+}
+
+module_param(dataserver_retrans, uint, 0644);
+MODULE_PARM_DESC(dataserver_retrans, "The number of times the NFSv4.1 client "
+ "retries a request before it attempts further "
+ " recovery action.");
+module_param(dataserver_timeo, uint, 0644);
+MODULE_PARM_DESC(dataserver_timeo, "The time (in tenths of a second) the "
+ "NFSv4.1 client waits for a response from a "
+ " data server before it retries an NFS request.");
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 022b761..de7c91c 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -516,6 +516,7 @@ enum pnfs_layouttype {
LAYOUT_NFSV4_1_FILES = 1,
LAYOUT_OSD2_OBJECTS = 2,
LAYOUT_BLOCK_VOLUME = 3,
+ LAYOUT_FLEX_FILES = 4,
};
/* used for both layout return and recall */
--
1.9.3
On Tue, Dec 23, 2014 at 11:12:39PM -0800, Tom Haynes wrote:
> Hi,
>
> This patchset introduces the Flexfile Layout Module for the
> client.
>
> It corresponds to draft 4
> (http://tools.ietf.org/id/draft-ietf-nfsv4-flex-files-02.txt)
Make that http://tools.ietf.org/id/draft-ietf-nfsv4-flex-files-04.txt !
> of the Parallel NFS (pNFS) Flexible File Layout
> (https://datatracker.ietf.org/doc/draft-ietf-nfsv4-flex-files/).
>
> This version fixes the following review comments:
>
> 1) XDR should be for Draft 4 and not Draft 2.
> 2) Can you use get_nfs_version() / put_nfs_version() here rather than
> exposing nfs_v3 to the entire client? If this *has* to be
> global then please put it in nfs3_fs.h.
> 3) Can we consolidate "nfs/flexfiles: send layoutreturn before freeing lseg"
> and "nfs/flexfiles: defer sending layoutreturn in pnfs_put_lseg"
> to remove some temporary hard to read code?
>
> Thanks,
> Tom
>
> Peng Tao (35):
> nfs41: pull data server cache from file layout to generic pnfs
> nfs41: pull nfs4_ds_connect from file layout to generic pnfs
> nfs41: pull decode_ds_addr from file layout to generic pnfs
> nfs41: allow LD to choose DS connection auth flavor
> nfs41: move file layout macros to generic pnfs
> nfsv3: introduce nfs3_set_ds_client
> nfs41: allow LD to choose DS connection version/minor_version
> nfs41: create NFSv3 DS connection if specified
> nfs: allow different protocol in nfs_initiate_commit
> nfs4: pass slot table to nfs40_setup_sequence
> nfs4: export nfs4_sequence_done
> nfs: allow to specify cred in nfs_initiate_pgio
> nfs: set hostname when creating nfsv3 ds connection
> nfs/flexclient: export pnfs_layoutcommit_inode
> nfs41: close a small race window when adding new layout to global list
> nfs41: serialize first layoutget of a file
> nfs: save server READ/WRITE/COMMIT status
> nfs41: pass iomode through layoutreturn args
> nfs41: make a helper function to send layoutreturn
> nfs41: add a helper to mark layout for return
> nfs41: don't use a layout if it is marked for returning
> nfs41: send layoutreturn in last put_lseg
> nfs41: clear NFS_LAYOUT_RETURN if layoutreturn is sent or failed to
> send
> nfs/filelayout: use pnfs_error_mark_layout_for_return
> nfs41: add a debug warning if we destroy an unempty layout
> nfs: only reset desc->pg_mirror_idx when mirroring is supported
> nfs: add nfs_pgio_current_mirror helper
> pnfs: allow LD to ask to resend read through pnfs
> nfs41: add range to layoutreturn args
> nfs41: allow async version layoutreturn
> nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
> nfs/flexfiles: send layoutreturn before freeing lseg
> nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
> nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
> nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
>
> Tom Haynes (4):
> pnfs: Prepare for flexfiles by pulling out common code
> pnfs: Do not grab the commit_info lock twice when rescheduling writes
> pnfs: Add nfs_rpc_ops in calls to nfs_initiate_pgio
> pnfs/flexfiles: Add the FlexFile Layout Driver
>
> Trond Myklebust (1):
> NFSv4.1/NFSv3: Add pNFS callbacks for nfs3_(read|write|commit)_done()
>
> Weston Andros Adamson (9):
> sunrpc: add rpc_count_iostats_idx
> nfs: introduce pg_cleanup op for pgio descriptors
> pnfs: release lseg in pnfs_generic_pg_cleanup
> nfs: handle overlapping reqs in lock_and_join
> nfs: rename pgio header ds_idx to ds_commit_idx
> pnfs: pass ds_commit_idx through the commit path
> nfs: add mirroring support to pgio layer
> nfs: mirroring support for direct io
> pnfs: fail comparison when bucket verifier not set
>
> fs/nfs/Kconfig | 5 +
> fs/nfs/Makefile | 3 +-
> fs/nfs/blocklayout/blocklayout.c | 2 +
> fs/nfs/direct.c | 108 +-
> fs/nfs/filelayout/filelayout.c | 315 +-----
> fs/nfs/filelayout/filelayout.h | 40 -
> fs/nfs/filelayout/filelayoutdev.c | 469 +--------
> fs/nfs/flexfilelayout/Makefile | 5 +
> fs/nfs/flexfilelayout/flexfilelayout.c | 1600 +++++++++++++++++++++++++++++
> fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
> fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
> fs/nfs/internal.h | 31 +-
> fs/nfs/nfs3_fs.h | 3 +-
> fs/nfs/nfs3client.c | 41 +
> fs/nfs/nfs3proc.c | 9 +
> fs/nfs/nfs3super.c | 2 +-
> fs/nfs/nfs3xdr.c | 3 +
> fs/nfs/nfs4_fs.h | 6 +
> fs/nfs/nfs4client.c | 7 +-
> fs/nfs/nfs4proc.c | 45 +-
> fs/nfs/nfs4xdr.c | 9 +-
> fs/nfs/objlayout/objio_osd.c | 5 +-
> fs/nfs/pagelist.c | 294 +++++-
> fs/nfs/pnfs.c | 407 ++++++--
> fs/nfs/pnfs.h | 119 ++-
> fs/nfs/pnfs_dev.c | 522 ++++++++++
> fs/nfs/pnfs_nfsio.c | 283 +++++
> fs/nfs/read.c | 33 +-
> fs/nfs/write.c | 49 +-
> include/linux/nfs4.h | 1 +
> include/linux/nfs_page.h | 22 +-
> include/linux/nfs_xdr.h | 6 +-
> include/linux/sunrpc/metrics.h | 2 +
> net/sunrpc/stats.c | 26 +-
> 34 files changed, 4182 insertions(+), 1000 deletions(-)
> create mode 100644 fs/nfs/flexfilelayout/Makefile
> create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
> create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
> create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
> create mode 100644 fs/nfs/pnfs_nfsio.c
>
> --
> 1.9.3
>
Hey Tom and Peng,
On 12/24/2014 02:12 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> It can be reused by flexfiles layout client.
>
> Reviewed-by: Jeff Layton <[email protected]>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/filelayout/filelayoutdev.c | 78 +++------------------------------------
> fs/nfs/pnfs.h | 3 ++
> 2 files changed, 8 insertions(+), 73 deletions(-)
>
> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
> index fbfbb70..eb2e93b 100644
> --- a/fs/nfs/filelayout/filelayoutdev.c
> +++ b/fs/nfs/filelayout/filelayoutdev.c
> @@ -42,51 +42,6 @@
> static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
> static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
>
> -/*
> - * Create an rpc connection to the nfs4_pnfs_ds data server
> - * Currently only supports IPv4 and IPv6 addresses
> - */
> -static int
> -nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
> -{
> - struct nfs_client *clp = ERR_PTR(-EIO);
> - struct nfs4_pnfs_ds_addr *da;
> - int status = 0;
> -
> - dprintk("--> %s DS %s au_flavor %d\n", __func__, ds->ds_remotestr,
> - mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
> -
> - list_for_each_entry(da, &ds->ds_addrs, da_node) {
> - dprintk("%s: DS %s: trying address %s\n",
> - __func__, ds->ds_remotestr, da->da_remotestr);
> -
> - clp = nfs4_set_ds_client(mds_srv->nfs_client,
> - (struct sockaddr *)&da->da_addr,
> - da->da_addrlen, IPPROTO_TCP,
> - dataserver_timeo, dataserver_retrans);
> - if (!IS_ERR(clp))
> - break;
> - }
> -
> - if (IS_ERR(clp)) {
> - status = PTR_ERR(clp);
> - goto out;
> - }
> -
> - status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
> - if (status)
> - goto out_put;
> -
> - smp_wmb();
> - ds->ds_clp = clp;
> - dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
> -out:
> - return status;
> -out_put:
> - nfs_put_client(clp);
> - goto out;
> -}
> -
> void
> nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
> {
> @@ -450,22 +405,7 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
> return flseg->fh_array[i];
> }
>
> -static void nfs4_wait_ds_connect(struct nfs4_pnfs_ds *ds)
> -{
> - might_sleep();
> - wait_on_bit_action(&ds->ds_state, NFS4DS_CONNECTING,
> - nfs_wait_bit_killable, TASK_KILLABLE);
> -}
> -
> -static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
> -{
> - smp_mb__before_atomic();
> - clear_bit(NFS4DS_CONNECTING, &ds->ds_state);
> - smp_mb__after_atomic();
> - wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
> -}
> -
> -
> +/* Upon return, either ds is connected, or ds is NULL */
> struct nfs4_pnfs_ds *
> nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> {
> @@ -473,6 +413,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
> struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
> struct nfs4_pnfs_ds *ret = ds;
> + struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
>
> if (ds == NULL) {
> printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
> @@ -484,18 +425,9 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> if (ds->ds_clp)
> goto out_test_devid;
>
> - if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
> - struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
> - int err;
> -
> - err = nfs4_ds_connect(s, ds);
> - if (err)
> - nfs4_mark_deviceid_unavailable(devid);
> - nfs4_clear_ds_conn_bit(ds);
> - } else {
> - /* Either ds is connected, or ds is NULL */
> - nfs4_wait_ds_connect(ds);
> - }
> + nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
> + dataserver_retrans);
> +
When I compile this patch I get:
ERROR: "nfs4_pnfs_ds_connect" [fs/nfs/filelayout/nfs_layout_nfsv41_files.ko] undefined!
It looks like this function doesn't exist until patch 5.
Anna
> out_test_devid:
> if (filelayout_test_devid_unavailable(devid))
> ret = NULL;
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index d0b8e0c..a213c2d 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -295,6 +295,9 @@ void nfs4_deviceid_purge_client(const struct nfs_client *);
> void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
> struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
> gfp_t gfp_flags);
> +void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
> + struct nfs4_deviceid_node *devid, unsigned int timeo,
> + unsigned int retrans);
>
> /* pnfs_nfsio.c */
> void pnfs_generic_clear_request_commit(struct nfs_page *req,
>
On Mon, Jan 05, 2015 at 10:51:46AM -0500, Anna Schumaker wrote:
> Hey Tom and Peng,
>
> On 12/24/2014 02:12 AM, Tom Haynes wrote:
> > From: Peng Tao <[email protected]>
> >
> > It can be reused by flexfiles layout client.
> >
> > Reviewed-by: Jeff Layton <[email protected]>
> > Signed-off-by: Peng Tao <[email protected]>
> > Signed-off-by: Tom Haynes <[email protected]>
> > ---
> > fs/nfs/filelayout/filelayoutdev.c | 78 +++------------------------------------
> > fs/nfs/pnfs.h | 3 ++
> > 2 files changed, 8 insertions(+), 73 deletions(-)
> >
> > diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
> > index fbfbb70..eb2e93b 100644
> > --- a/fs/nfs/filelayout/filelayoutdev.c
> > +++ b/fs/nfs/filelayout/filelayoutdev.c
> > @@ -42,51 +42,6 @@
> > static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
> > static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
> >
> > -/*
> > - * Create an rpc connection to the nfs4_pnfs_ds data server
> > - * Currently only supports IPv4 and IPv6 addresses
> > - */
> > -static int
> > -nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
> > -{
> > - struct nfs_client *clp = ERR_PTR(-EIO);
> > - struct nfs4_pnfs_ds_addr *da;
> > - int status = 0;
> > -
> > - dprintk("--> %s DS %s au_flavor %d\n", __func__, ds->ds_remotestr,
> > - mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
> > -
> > - list_for_each_entry(da, &ds->ds_addrs, da_node) {
> > - dprintk("%s: DS %s: trying address %s\n",
> > - __func__, ds->ds_remotestr, da->da_remotestr);
> > -
> > - clp = nfs4_set_ds_client(mds_srv->nfs_client,
> > - (struct sockaddr *)&da->da_addr,
> > - da->da_addrlen, IPPROTO_TCP,
> > - dataserver_timeo, dataserver_retrans);
> > - if (!IS_ERR(clp))
> > - break;
> > - }
> > -
> > - if (IS_ERR(clp)) {
> > - status = PTR_ERR(clp);
> > - goto out;
> > - }
> > -
> > - status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
> > - if (status)
> > - goto out_put;
> > -
> > - smp_wmb();
> > - ds->ds_clp = clp;
> > - dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
> > -out:
> > - return status;
> > -out_put:
> > - nfs_put_client(clp);
> > - goto out;
> > -}
> > -
> > void
> > nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
> > {
> > @@ -450,22 +405,7 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
> > return flseg->fh_array[i];
> > }
> >
> > -static void nfs4_wait_ds_connect(struct nfs4_pnfs_ds *ds)
> > -{
> > - might_sleep();
> > - wait_on_bit_action(&ds->ds_state, NFS4DS_CONNECTING,
> > - nfs_wait_bit_killable, TASK_KILLABLE);
> > -}
> > -
> > -static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
> > -{
> > - smp_mb__before_atomic();
> > - clear_bit(NFS4DS_CONNECTING, &ds->ds_state);
> > - smp_mb__after_atomic();
> > - wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
> > -}
> > -
> > -
> > +/* Upon return, either ds is connected, or ds is NULL */
> > struct nfs4_pnfs_ds *
> > nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> > {
> > @@ -473,6 +413,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> > struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
> > struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
> > struct nfs4_pnfs_ds *ret = ds;
> > + struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
> >
> > if (ds == NULL) {
> > printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
> > @@ -484,18 +425,9 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> > if (ds->ds_clp)
> > goto out_test_devid;
> >
> > - if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
> > - struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
> > - int err;
> > -
> > - err = nfs4_ds_connect(s, ds);
> > - if (err)
> > - nfs4_mark_deviceid_unavailable(devid);
> > - nfs4_clear_ds_conn_bit(ds);
> > - } else {
> > - /* Either ds is connected, or ds is NULL */
> > - nfs4_wait_ds_connect(ds);
> > - }
> > + nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
> > + dataserver_retrans);
> > +
>
> When I compile this patch I get:
> ERROR: "nfs4_pnfs_ds_connect" [fs/nfs/filelayout/nfs_layout_nfsv41_files.ko] undefined!
>
> It looks like this function doesn't exist until patch 5.
Okay, despite building each patch in succession to catch this
type of issue, this is happening.
I'll fix this issue up, thanks for catching it!
>
> Anna
>
> > out_test_devid:
> > if (filelayout_test_devid_unavailable(devid))
> > ret = NULL;
> > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> > index d0b8e0c..a213c2d 100644
> > --- a/fs/nfs/pnfs.h
> > +++ b/fs/nfs/pnfs.h
> > @@ -295,6 +295,9 @@ void nfs4_deviceid_purge_client(const struct nfs_client *);
> > void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
> > struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
> > gfp_t gfp_flags);
> > +void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
> > + struct nfs4_deviceid_node *devid, unsigned int timeo,
> > + unsigned int retrans);
> >
> > /* pnfs_nfsio.c */
> > void pnfs_generic_clear_request_commit(struct nfs_page *req,
> >
>
Hey,
On 12/24/2014 02:12 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> The flexfiles layout wants to create DS connection over NFSv3.
> Add nfs3_set_ds_client to allow that to happen.
>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/internal.h | 4 ++++
> fs/nfs/nfs3_fs.h | 3 ++-
> fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
> fs/nfs/nfs3super.c | 2 +-
> 4 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 7d7c36f..7332ba1 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
> rpc_authflavor_t au_flavor);
> extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
> struct inode *);
> +extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> + const struct sockaddr *ds_addr, int ds_addrlen,
> + int ds_proto, unsigned int ds_timeo,
> + unsigned int ds_retrans, rpc_authflavor_t au_flavor);
> #ifdef CONFIG_PROC_FS
> extern int __init nfs_fs_proc_init(void);
> extern void nfs_fs_proc_exit(void);
> diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
> index 333ae40..fc9cd85 100644
> --- a/fs/nfs/nfs3_fs.h
> +++ b/fs/nfs/nfs3_fs.h
> @@ -29,6 +29,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
> struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subversion *);
> struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
> struct nfs_fattr *, rpc_authflavor_t);
> -
> +/* nfs3super.c */
> +extern struct nfs_subversion nfs_v3;
nit: Can we keep the blank line between nfs3client.c and nfs3super.c sections?
Thanks,
Anna
>
> #endif /* __LINUX_FS_NFS_NFS3_FS_H */
> diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
> index 8c1b437..52e2344 100644
> --- a/fs/nfs/nfs3client.c
> +++ b/fs/nfs/nfs3client.c
> @@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
> nfs_init_server_aclclient(server);
> return server;
> }
> +
> +/*
> + * Set up a pNFS Data Server client over NFSv3.
> + *
> + * Return any existing nfs_client that matches server address,port,version
> + * and minorversion.
> + *
> + * For a new nfs_client, use a soft mount (default), a low retrans and a
> + * low timeout interval so that if a connection is lost, we retry through
> + * the MDS.
> + */
> +struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> + const struct sockaddr *ds_addr, int ds_addrlen,
> + int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
> + rpc_authflavor_t au_flavor)
> +{
> + struct nfs_client_initdata cl_init = {
> + .addr = ds_addr,
> + .addrlen = ds_addrlen,
> + .nfs_mod = &nfs_v3,
> + .proto = ds_proto,
> + .net = mds_clp->cl_net,
> + };
> + struct rpc_timeout ds_timeout;
> + struct nfs_client *clp;
> +
> + /* Use the MDS nfs_client cl_ipaddr. */
> + nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
> + clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
> + au_flavor);
> +
> + return clp;
> +}
> +EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
> diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
> index 6af29c2..5c4394e 100644
> --- a/fs/nfs/nfs3super.c
> +++ b/fs/nfs/nfs3super.c
> @@ -7,7 +7,7 @@
> #include "nfs3_fs.h"
> #include "nfs.h"
>
> -static struct nfs_subversion nfs_v3 = {
> +struct nfs_subversion nfs_v3 = {
> .owner = THIS_MODULE,
> .nfs_fs = &nfs_fs_type,
> .rpc_vers = &nfs_version3,
>
On 01/05/2015 11:17 AM, Anna Schumaker wrote:
> Hey,
>
> On 12/24/2014 02:12 AM, Tom Haynes wrote:
>> From: Peng Tao <[email protected]>
>>
>> The flexfiles layout wants to create DS connection over NFSv3.
>> Add nfs3_set_ds_client to allow that to happen.
>>
>> Signed-off-by: Peng Tao <[email protected]>
>> Signed-off-by: Tom Haynes <[email protected]>
>> ---
>> fs/nfs/internal.h | 4 ++++
>> fs/nfs/nfs3_fs.h | 3 ++-
>> fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
>> fs/nfs/nfs3super.c | 2 +-
>> 4 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index 7d7c36f..7332ba1 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
>> rpc_authflavor_t au_flavor);
>> extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
>> struct inode *);
>> +extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
>> + const struct sockaddr *ds_addr, int ds_addrlen,
>> + int ds_proto, unsigned int ds_timeo,
>> + unsigned int ds_retrans, rpc_authflavor_t au_flavor);
>> #ifdef CONFIG_PROC_FS
>> extern int __init nfs_fs_proc_init(void);
>> extern void nfs_fs_proc_exit(void);
>> diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
>> index 333ae40..fc9cd85 100644
>> --- a/fs/nfs/nfs3_fs.h
>> +++ b/fs/nfs/nfs3_fs.h
>> @@ -29,6 +29,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
>> struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subversion *);
>> struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
>> struct nfs_fattr *, rpc_authflavor_t);
>> -
>> +/* nfs3super.c */
>> +extern struct nfs_subversion nfs_v3;
>
> nit: Can we keep the blank line between nfs3client.c and nfs3super.c sections?
>
> Thanks,
> Anna
>
>>
>> #endif /* __LINUX_FS_NFS_NFS3_FS_H */
>> diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
>> index 8c1b437..52e2344 100644
>> --- a/fs/nfs/nfs3client.c
>> +++ b/fs/nfs/nfs3client.c
>> @@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
>> nfs_init_server_aclclient(server);
>> return server;
>> }
>> +
>> +/*
>> + * Set up a pNFS Data Server client over NFSv3.
>> + *
>> + * Return any existing nfs_client that matches server address,port,version
>> + * and minorversion.
>> + *
>> + * For a new nfs_client, use a soft mount (default), a low retrans and a
>> + * low timeout interval so that if a connection is lost, we retry through
>> + * the MDS.
>> + */
>> +struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
>> + const struct sockaddr *ds_addr, int ds_addrlen,
>> + int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
>> + rpc_authflavor_t au_flavor)
>> +{
>> + struct nfs_client_initdata cl_init = {
>> + .addr = ds_addr,
>> + .addrlen = ds_addrlen,
>> + .nfs_mod = &nfs_v3,
>> + .proto = ds_proto,
>> + .net = mds_clp->cl_net,
>> + };
>> + struct rpc_timeout ds_timeout;
>> + struct nfs_client *clp;
>> +
>> + /* Use the MDS nfs_client cl_ipaddr. */
>> + nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
>> + clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
>> + au_flavor);
>> +
>> + return clp;
>> +}
>> +EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
Compiling with:
CONFIG_NFS_V4_2=n
CONFIG_NFS_V4_1=n
CONFIG_NFS_V4=n
CONFIG_NFS_V3=y
CONFIG_NFS_V2=y
Gives me:
fs/nfs/nfs3client.c: In function 'nfs3_set_ds_client':
fs/nfs/nfs3client.c:95:53: error: 'struct nfs_client' has no member named 'cl_ipaddr'
clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
^
scripts/Makefile.build:257: recipe for target 'fs/nfs/nfs3client.o' failed
make[2]: *** [fs/nfs/nfs3client.o] Error 1
scripts/Makefile.build:402: recipe for target 'fs/nfs' failed
make[1]: *** [fs/nfs] Error 2
Makefile:938: recipe for target 'fs' failed
make: *** [fs] Error 2
make: *** Waiting for unfinished jobs....
I would prefer a fix that doesn't make the NFS v3 module depend on CONFIG_NFS_V4* options, but I'll try not to grumble if that isn't possible here.
Thanks,
Anna
>> diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
>> index 6af29c2..5c4394e 100644
>> --- a/fs/nfs/nfs3super.c
>> +++ b/fs/nfs/nfs3super.c
>> @@ -7,7 +7,7 @@
>> #include "nfs3_fs.h"
>> #include "nfs.h"
>>
>> -static struct nfs_subversion nfs_v3 = {
>> +struct nfs_subversion nfs_v3 = {
>> .owner = THIS_MODULE,
>> .nfs_fs = &nfs_fs_type,
>> .rpc_vers = &nfs_version3,
>>
>
On Mon, Jan 05, 2015 at 11:25:57AM -0500, Anna Schumaker wrote:
> On 01/05/2015 11:17 AM, Anna Schumaker wrote:
> > Hey,
> >
> > On 12/24/2014 02:12 AM, Tom Haynes wrote:
> >> From: Peng Tao <[email protected]>
> >>
> >> The flexfiles layout wants to create DS connection over NFSv3.
> >> Add nfs3_set_ds_client to allow that to happen.
> >>
> >> Signed-off-by: Peng Tao <[email protected]>
> >> Signed-off-by: Tom Haynes <[email protected]>
> >> ---
> >> fs/nfs/internal.h | 4 ++++
> >> fs/nfs/nfs3_fs.h | 3 ++-
> >> fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
> >> fs/nfs/nfs3super.c | 2 +-
> >> 4 files changed, 41 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> >> index 7d7c36f..7332ba1 100644
> >> --- a/fs/nfs/internal.h
> >> +++ b/fs/nfs/internal.h
> >> @@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
> >> rpc_authflavor_t au_flavor);
> >> extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
> >> struct inode *);
> >> +extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> >> + const struct sockaddr *ds_addr, int ds_addrlen,
> >> + int ds_proto, unsigned int ds_timeo,
> >> + unsigned int ds_retrans, rpc_authflavor_t au_flavor);
> >> #ifdef CONFIG_PROC_FS
> >> extern int __init nfs_fs_proc_init(void);
> >> extern void nfs_fs_proc_exit(void);
> >> diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
> >> index 333ae40..fc9cd85 100644
> >> --- a/fs/nfs/nfs3_fs.h
> >> +++ b/fs/nfs/nfs3_fs.h
> >> @@ -29,6 +29,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
> >> struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subversion *);
> >> struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
> >> struct nfs_fattr *, rpc_authflavor_t);
> >> -
> >> +/* nfs3super.c */
> >> +extern struct nfs_subversion nfs_v3;
> >
> > nit: Can we keep the blank line between nfs3client.c and nfs3super.c sections?
> >
> > Thanks,
> > Anna
> >
> >>
> >> #endif /* __LINUX_FS_NFS_NFS3_FS_H */
> >> diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
> >> index 8c1b437..52e2344 100644
> >> --- a/fs/nfs/nfs3client.c
> >> +++ b/fs/nfs/nfs3client.c
> >> @@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
> >> nfs_init_server_aclclient(server);
> >> return server;
> >> }
> >> +
> >> +/*
> >> + * Set up a pNFS Data Server client over NFSv3.
> >> + *
> >> + * Return any existing nfs_client that matches server address,port,version
> >> + * and minorversion.
> >> + *
> >> + * For a new nfs_client, use a soft mount (default), a low retrans and a
> >> + * low timeout interval so that if a connection is lost, we retry through
> >> + * the MDS.
> >> + */
> >> +struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> >> + const struct sockaddr *ds_addr, int ds_addrlen,
> >> + int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
> >> + rpc_authflavor_t au_flavor)
> >> +{
> >> + struct nfs_client_initdata cl_init = {
> >> + .addr = ds_addr,
> >> + .addrlen = ds_addrlen,
> >> + .nfs_mod = &nfs_v3,
> >> + .proto = ds_proto,
> >> + .net = mds_clp->cl_net,
> >> + };
> >> + struct rpc_timeout ds_timeout;
> >> + struct nfs_client *clp;
> >> +
> >> + /* Use the MDS nfs_client cl_ipaddr. */
> >> + nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
> >> + clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
> >> + au_flavor);
> >> +
> >> + return clp;
> >> +}
> >> +EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
>
> Compiling with:
>
> CONFIG_NFS_V4_2=n
> CONFIG_NFS_V4_1=n
> CONFIG_NFS_V4=n
> CONFIG_NFS_V3=y
> CONFIG_NFS_V2=y
>
> Gives me:
>
> fs/nfs/nfs3client.c: In function 'nfs3_set_ds_client':
> fs/nfs/nfs3client.c:95:53: error: 'struct nfs_client' has no member named 'cl_ipaddr'
> clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
> ^
> scripts/Makefile.build:257: recipe for target 'fs/nfs/nfs3client.o' failed
> make[2]: *** [fs/nfs/nfs3client.o] Error 1
> scripts/Makefile.build:402: recipe for target 'fs/nfs' failed
> make[1]: *** [fs/nfs] Error 2
> Makefile:938: recipe for target 'fs' failed
> make: *** [fs] Error 2
> make: *** Waiting for unfinished jobs....
>
>
> I would prefer a fix that doesn't make the NFS v3 module depend on CONFIG_NFS_V4* options, but I'll try not to grumble if that isn't possible here.
Does that mean you do not want to see this:
+
+#if IS_ENABLED(CONFIG_NFS_V4)
+/*
+ * Set up a pNFS Data Server client over NFSv3.
+ *
+ * Return any existing nfs_client that matches server address,port,version
+ * and minorversion.
+ *
+ * For a new nfs_client, use a soft mount (default), a low retrans and a
+ * low timeout interval so that if a connection is lost, we retry through
+ * the MDS.
+ */
+struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
+ const struct sockaddr *ds_addr, int ds_addrlen,
+ int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor)
+{
+ struct nfs_client_initdata cl_init = {
+ .addr = ds_addr,
+ .addrlen = ds_addrlen,
+ .nfs_mod = &nfs_v3,
+ .proto = ds_proto,
+ .net = mds_clp->cl_net,
+ };
+ struct rpc_timeout ds_timeout;
+ struct nfs_client *clp;
+
+ /* Use the MDS nfs_client cl_ipaddr. */
+ nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
+ clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
+ au_flavor);
+
+ return clp;
+}
+EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
+#endif
? :-)
>
> Thanks,
> Anna
>
> >> diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
> >> index 6af29c2..5c4394e 100644
> >> --- a/fs/nfs/nfs3super.c
> >> +++ b/fs/nfs/nfs3super.c
> >> @@ -7,7 +7,7 @@
> >> #include "nfs3_fs.h"
> >> #include "nfs.h"
> >>
> >> -static struct nfs_subversion nfs_v3 = {
> >> +struct nfs_subversion nfs_v3 = {
> >> .owner = THIS_MODULE,
> >> .nfs_fs = &nfs_fs_type,
> >> .rpc_vers = &nfs_version3,
> >>
> >
>
On 01/05/2015 07:02 PM, Tom Haynes wrote:
> On Mon, Jan 05, 2015 at 11:25:57AM -0500, Anna Schumaker wrote:
>> On 01/05/2015 11:17 AM, Anna Schumaker wrote:
>>> Hey,
>>>
>>> On 12/24/2014 02:12 AM, Tom Haynes wrote:
>>>> From: Peng Tao <[email protected]>
>>>>
>>>> The flexfiles layout wants to create DS connection over NFSv3.
>>>> Add nfs3_set_ds_client to allow that to happen.
>>>>
>>>> Signed-off-by: Peng Tao <[email protected]>
>>>> Signed-off-by: Tom Haynes <[email protected]>
>>>> ---
>>>> fs/nfs/internal.h | 4 ++++
>>>> fs/nfs/nfs3_fs.h | 3 ++-
>>>> fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
>>>> fs/nfs/nfs3super.c | 2 +-
>>>> 4 files changed, 41 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>>> index 7d7c36f..7332ba1 100644
>>>> --- a/fs/nfs/internal.h
>>>> +++ b/fs/nfs/internal.h
>>>> @@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
>>>> rpc_authflavor_t au_flavor);
>>>> extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
>>>> struct inode *);
>>>> +extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
>>>> + const struct sockaddr *ds_addr, int ds_addrlen,
>>>> + int ds_proto, unsigned int ds_timeo,
>>>> + unsigned int ds_retrans, rpc_authflavor_t au_flavor);
>>>> #ifdef CONFIG_PROC_FS
>>>> extern int __init nfs_fs_proc_init(void);
>>>> extern void nfs_fs_proc_exit(void);
>>>> diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
>>>> index 333ae40..fc9cd85 100644
>>>> --- a/fs/nfs/nfs3_fs.h
>>>> +++ b/fs/nfs/nfs3_fs.h
>>>> @@ -29,6 +29,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
>>>> struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subversion *);
>>>> struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
>>>> struct nfs_fattr *, rpc_authflavor_t);
>>>> -
>>>> +/* nfs3super.c */
>>>> +extern struct nfs_subversion nfs_v3;
>>>
>>> nit: Can we keep the blank line between nfs3client.c and nfs3super.c sections?
>>>
>>> Thanks,
>>> Anna
>>>
>>>>
>>>> #endif /* __LINUX_FS_NFS_NFS3_FS_H */
>>>> diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
>>>> index 8c1b437..52e2344 100644
>>>> --- a/fs/nfs/nfs3client.c
>>>> +++ b/fs/nfs/nfs3client.c
>>>> @@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
>>>> nfs_init_server_aclclient(server);
>>>> return server;
>>>> }
>>>> +
>>>> +/*
>>>> + * Set up a pNFS Data Server client over NFSv3.
>>>> + *
>>>> + * Return any existing nfs_client that matches server address,port,version
>>>> + * and minorversion.
>>>> + *
>>>> + * For a new nfs_client, use a soft mount (default), a low retrans and a
>>>> + * low timeout interval so that if a connection is lost, we retry through
>>>> + * the MDS.
>>>> + */
>>>> +struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
>>>> + const struct sockaddr *ds_addr, int ds_addrlen,
>>>> + int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
>>>> + rpc_authflavor_t au_flavor)
>>>> +{
>>>> + struct nfs_client_initdata cl_init = {
>>>> + .addr = ds_addr,
>>>> + .addrlen = ds_addrlen,
>>>> + .nfs_mod = &nfs_v3,
>>>> + .proto = ds_proto,
>>>> + .net = mds_clp->cl_net,
>>>> + };
>>>> + struct rpc_timeout ds_timeout;
>>>> + struct nfs_client *clp;
>>>> +
>>>> + /* Use the MDS nfs_client cl_ipaddr. */
>>>> + nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
>>>> + clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
>>>> + au_flavor);
>>>> +
>>>> + return clp;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
>>
>> Compiling with:
>>
>> CONFIG_NFS_V4_2=n
>> CONFIG_NFS_V4_1=n
>> CONFIG_NFS_V4=n
>> CONFIG_NFS_V3=y
>> CONFIG_NFS_V2=y
>>
>> Gives me:
>>
>> fs/nfs/nfs3client.c: In function 'nfs3_set_ds_client':
>> fs/nfs/nfs3client.c:95:53: error: 'struct nfs_client' has no member named 'cl_ipaddr'
>> clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
>> ^
>> scripts/Makefile.build:257: recipe for target 'fs/nfs/nfs3client.o' failed
>> make[2]: *** [fs/nfs/nfs3client.o] Error 1
>> scripts/Makefile.build:402: recipe for target 'fs/nfs' failed
>> make[1]: *** [fs/nfs] Error 2
>> Makefile:938: recipe for target 'fs' failed
>> make: *** [fs] Error 2
>> make: *** Waiting for unfinished jobs....
>>
>>
>> I would prefer a fix that doesn't make the NFS v3 module depend on CONFIG_NFS_V4* options, but I'll try not to grumble if that isn't possible here.
>
> Does that mean you do not want to see this:
>
> +
> +#if IS_ENABLED(CONFIG_NFS_V4)
> +/*
> + * Set up a pNFS Data Server client over NFSv3.
> + *
> + * Return any existing nfs_client that matches server address,port,version
> + * and minorversion.
> + *
> + * For a new nfs_client, use a soft mount (default), a low retrans and a
> + * low timeout interval so that if a connection is lost, we retry through
> + * the MDS.
> + */
> +struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> + const struct sockaddr *ds_addr, int ds_addrlen,
> + int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
> + rpc_authflavor_t au_flavor)
> +{
> + struct nfs_client_initdata cl_init = {
> + .addr = ds_addr,
> + .addrlen = ds_addrlen,
> + .nfs_mod = &nfs_v3,
> + .proto = ds_proto,
> + .net = mds_clp->cl_net,
> + };
> + struct rpc_timeout ds_timeout;
> + struct nfs_client *clp;
> +
> + /* Use the MDS nfs_client cl_ipaddr. */
> + nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
> + clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
> + au_flavor);
> +
> + return clp;
> +}
> +EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
> +#endif
>
> ? :-)
Correct! Maybe instead the struct nfs_client can be changed to have cl_ipaddr when v3 is enabled? :)
Anna
>
>>
>> Thanks,
>> Anna
>>
>>>> diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
>>>> index 6af29c2..5c4394e 100644
>>>> --- a/fs/nfs/nfs3super.c
>>>> +++ b/fs/nfs/nfs3super.c
>>>> @@ -7,7 +7,7 @@
>>>> #include "nfs3_fs.h"
>>>> #include "nfs.h"
>>>>
>>>> -static struct nfs_subversion nfs_v3 = {
>>>> +struct nfs_subversion nfs_v3 = {
>>>> .owner = THIS_MODULE,
>>>> .nfs_fs = &nfs_fs_type,
>>>> .rpc_vers = &nfs_version3,
>>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Mon, Jan 05, 2015 at 11:17:55AM -0500, Anna Schumaker wrote:
> Hey,
>
> On 12/24/2014 02:12 AM, Tom Haynes wrote:
> > From: Peng Tao <[email protected]>
> >
> > The flexfiles layout wants to create DS connection over NFSv3.
> > Add nfs3_set_ds_client to allow that to happen.
> >
> > Signed-off-by: Peng Tao <[email protected]>
> > Signed-off-by: Tom Haynes <[email protected]>
> > ---
> > fs/nfs/internal.h | 4 ++++
> > fs/nfs/nfs3_fs.h | 3 ++-
> > fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
> > fs/nfs/nfs3super.c | 2 +-
> > 4 files changed, 41 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > index 7d7c36f..7332ba1 100644
> > --- a/fs/nfs/internal.h
> > +++ b/fs/nfs/internal.h
> > @@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
> > rpc_authflavor_t au_flavor);
> > extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
> > struct inode *);
> > +extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> > + const struct sockaddr *ds_addr, int ds_addrlen,
> > + int ds_proto, unsigned int ds_timeo,
> > + unsigned int ds_retrans, rpc_authflavor_t au_flavor);
> > #ifdef CONFIG_PROC_FS
> > extern int __init nfs_fs_proc_init(void);
> > extern void nfs_fs_proc_exit(void);
> > diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
> > index 333ae40..fc9cd85 100644
> > --- a/fs/nfs/nfs3_fs.h
> > +++ b/fs/nfs/nfs3_fs.h
> > @@ -29,6 +29,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
> > struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subversion *);
> > struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
> > struct nfs_fattr *, rpc_authflavor_t);
> > -
> > +/* nfs3super.c */
> > +extern struct nfs_subversion nfs_v3;
>
> nit: Can we keep the blank line between nfs3client.c and nfs3super.c sections?
Yes, I will get it back in the next revision.
>
> Thanks,
> Anna
>
> >
> > #endif /* __LINUX_FS_NFS_NFS3_FS_H */
> > diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
> > index 8c1b437..52e2344 100644
> > --- a/fs/nfs/nfs3client.c
> > +++ b/fs/nfs/nfs3client.c
> > @@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
> > nfs_init_server_aclclient(server);
> > return server;
> > }
> > +
> > +/*
> > + * Set up a pNFS Data Server client over NFSv3.
> > + *
> > + * Return any existing nfs_client that matches server address,port,version
> > + * and minorversion.
> > + *
> > + * For a new nfs_client, use a soft mount (default), a low retrans and a
> > + * low timeout interval so that if a connection is lost, we retry through
> > + * the MDS.
> > + */
> > +struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
> > + const struct sockaddr *ds_addr, int ds_addrlen,
> > + int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
> > + rpc_authflavor_t au_flavor)
> > +{
> > + struct nfs_client_initdata cl_init = {
> > + .addr = ds_addr,
> > + .addrlen = ds_addrlen,
> > + .nfs_mod = &nfs_v3,
> > + .proto = ds_proto,
> > + .net = mds_clp->cl_net,
> > + };
> > + struct rpc_timeout ds_timeout;
> > + struct nfs_client *clp;
> > +
> > + /* Use the MDS nfs_client cl_ipaddr. */
> > + nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
> > + clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
> > + au_flavor);
> > +
> > + return clp;
> > +}
> > +EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
> > diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
> > index 6af29c2..5c4394e 100644
> > --- a/fs/nfs/nfs3super.c
> > +++ b/fs/nfs/nfs3super.c
> > @@ -7,7 +7,7 @@
> > #include "nfs3_fs.h"
> > #include "nfs.h"
> >
> > -static struct nfs_subversion nfs_v3 = {
> > +struct nfs_subversion nfs_v3 = {
> > .owner = THIS_MODULE,
> > .nfs_fs = &nfs_fs_type,
> > .rpc_vers = &nfs_version3,
> >
>
Hey Tom and Peng,
On 12/24/2014 02:12 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/pnfs_dev.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
> index 56f5c16..655333d 100644
> --- a/fs/nfs/pnfs_dev.c
> +++ b/fs/nfs/pnfs_dev.c
> @@ -615,7 +615,44 @@ static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
> wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
> }
>
> -static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
> +static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv,
> + struct nfs4_pnfs_ds *ds,
> + unsigned int timeo,
> + unsigned int retrans,
> + rpc_authflavor_t au_flavor)
> +{
> + struct nfs_client *clp = ERR_PTR(-EIO);
> + struct nfs4_pnfs_ds_addr *da;
> + int status = 0;
> +
> + dprintk("--> %s DS %s au_flavor %d\n", __func__,
> + ds->ds_remotestr, au_flavor);
> +
> + list_for_each_entry(da, &ds->ds_addrs, da_node) {
> + dprintk("%s: DS %s: trying address %s\n",
> + __func__, ds->ds_remotestr, da->da_remotestr);
> +
> + clp = nfs3_set_ds_client(mds_srv->nfs_client,
> + (struct sockaddr *)&da->da_addr,
> + da->da_addrlen, IPPROTO_TCP,
> + timeo, retrans, au_flavor);
> + if (!IS_ERR(clp))
> + break;
> + }
> +
> + if (IS_ERR(clp)) {
> + status = PTR_ERR(clp);
> + goto out;
> + }
> +
> + smp_wmb();
> + ds->ds_clp = clp;
> + dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
> +out:
> + return status;
> +}
> +
> +static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
> struct nfs4_pnfs_ds *ds,
> unsigned int timeo,
> unsigned int retrans,
> @@ -674,8 +711,19 @@ void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
> if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
> int err = 0;
>
> - err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans,
> - minor_version, au_flavor);
> + if (version == 3) {
> + err = _nfs4_pnfs_v3_ds_connect(mds_srv, ds, timeo,
> + retrans, au_flavor);
> + } else if (version == 4) {
> + err = _nfs4_pnfs_v4_ds_connect(mds_srv, ds, timeo,
> + retrans, minor_version,
> + au_flavor);
Is it possible to do this with NFS-version specific function pointers, similar to how we have the rpc_ops array?
Thanks,
Anna
> + } else {
> + dprintk("%s: unsupported DS version %d\n", __func__,
> + version);
> + err = -EPROTONOSUPPORT;
> + }
> +
> if (err)
> nfs4_mark_deviceid_unavailable(devid);
> nfs4_clear_ds_conn_bit(ds);
>
On Mon, Jan 05, 2015 at 11:57:17AM -0500, Anna Schumaker wrote:
> Hey Tom and Peng,
>
> On 12/24/2014 02:12 AM, Tom Haynes wrote:
> > From: Peng Tao <[email protected]>
> >
> > Signed-off-by: Peng Tao <[email protected]>
> > Signed-off-by: Tom Haynes <[email protected]>
> > ---
> > fs/nfs/pnfs_dev.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 51 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
> > index 56f5c16..655333d 100644
> > --- a/fs/nfs/pnfs_dev.c
> > +++ b/fs/nfs/pnfs_dev.c
> > @@ -615,7 +615,44 @@ static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
> > wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
> > }
> >
> > -static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
> > +static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv,
> > + struct nfs4_pnfs_ds *ds,
> > + unsigned int timeo,
> > + unsigned int retrans,
> > + rpc_authflavor_t au_flavor)
> > +{
> > + struct nfs_client *clp = ERR_PTR(-EIO);
> > + struct nfs4_pnfs_ds_addr *da;
> > + int status = 0;
> > +
> > + dprintk("--> %s DS %s au_flavor %d\n", __func__,
> > + ds->ds_remotestr, au_flavor);
> > +
> > + list_for_each_entry(da, &ds->ds_addrs, da_node) {
> > + dprintk("%s: DS %s: trying address %s\n",
> > + __func__, ds->ds_remotestr, da->da_remotestr);
> > +
> > + clp = nfs3_set_ds_client(mds_srv->nfs_client,
> > + (struct sockaddr *)&da->da_addr,
> > + da->da_addrlen, IPPROTO_TCP,
> > + timeo, retrans, au_flavor);
> > + if (!IS_ERR(clp))
> > + break;
> > + }
> > +
> > + if (IS_ERR(clp)) {
> > + status = PTR_ERR(clp);
> > + goto out;
> > + }
> > +
> > + smp_wmb();
> > + ds->ds_clp = clp;
> > + dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
> > +out:
> > + return status;
> > +}
> > +
> > +static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
> > struct nfs4_pnfs_ds *ds,
> > unsigned int timeo,
> > unsigned int retrans,
> > @@ -674,8 +711,19 @@ void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
> > if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
> > int err = 0;
> >
> > - err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans,
> > - minor_version, au_flavor);
> > + if (version == 3) {
> > + err = _nfs4_pnfs_v3_ds_connect(mds_srv, ds, timeo,
> > + retrans, au_flavor);
> > + } else if (version == 4) {
> > + err = _nfs4_pnfs_v4_ds_connect(mds_srv, ds, timeo,
> > + retrans, minor_version,
> > + au_flavor);
>
> Is it possible to do this with NFS-version specific function pointers, similar to how we have the rpc_ops array?
>
> Thanks,
> Anna
Hi Anna,
I think that nfs4_pnfs_ds_connect is the function that abstracts
away calling the version specific code. To set up the function
pointers, we would have to have the callers do so. And while
nfs4_fl_prepare_ds() will always call _nfs4_pnfs_v4_ds_connect(),
nfs4_ff_layout_prepare_ds() could call either. I.e., I think this
approach hides that complexity from the caller.
Thanks,
Tom
>
> > + } else {
> > + dprintk("%s: unsupported DS version %d\n", __func__,
> > + version);
> > + err = -EPROTONOSUPPORT;
> > + }
> > +
> > if (err)
> > nfs4_mark_deviceid_unavailable(devid);
> > nfs4_clear_ds_conn_bit(ds);
> >
>
Hey Tom,
On 12/24/2014 02:12 AM, Tom Haynes wrote:
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/filelayout/filelayout.c | 4 ++--
> fs/nfs/internal.h | 1 +
> fs/nfs/pagelist.c | 6 ++++--
> fs/nfs/read.c | 3 ++-
> fs/nfs/write.c | 3 ++-
> include/linux/nfs_page.h | 1 +
> 6 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
> index bc36ed3..25c4896 100644
> --- a/fs/nfs/filelayout/filelayout.c
> +++ b/fs/nfs/filelayout/filelayout.c
> @@ -501,7 +501,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
> hdr->mds_offset = offset;
>
> /* Perform an asynchronous read to ds */
> - nfs_initiate_pgio(ds_clnt, hdr,
> + nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
> &filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
> return PNFS_ATTEMPTED;
> }
> @@ -542,7 +542,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
> hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
>
> /* Perform an asynchronous write */
> - nfs_initiate_pgio(ds_clnt, hdr,
> + nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
> &filelayout_write_call_ops, sync,
> RPC_TASK_SOFTCONN);
> return PNFS_ATTEMPTED;
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 5543850..1d15ffa 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -251,6 +251,7 @@ void nfs_pgio_header_free(struct nfs_pgio_header *);
> void nfs_pgio_data_destroy(struct nfs_pgio_header *);
> int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
> int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
> + const struct nfs_rpc_ops *,
> const struct rpc_call_ops *, int, int);
> void nfs_free_request(struct nfs_page *req);
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 2b5e769..35a2626 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -597,6 +597,7 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
> }
>
> int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
> + const struct nfs_rpc_ops *rpc_ops,
> const struct rpc_call_ops *call_ops, int how, int flags)
> {
> struct rpc_task *task;
> @@ -616,7 +617,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
> };
> int ret = 0;
>
> - hdr->rw_ops->rw_initiate(hdr, &msg, &task_setup_data, how);
> + hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how);
>
> dprintk("NFS: %5u initiated pgio call "
> "(req %s/%llu, %u bytes @ offset %llu)\n",
> @@ -792,7 +793,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
> ret = nfs_generic_pgio(desc, hdr);
> if (ret == 0)
> ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
> - hdr, desc->pg_rpc_callops,
> + hdr, NFS_PROTO(hdr->inode),
> + desc->pg_rpc_callops,
> desc->pg_ioflags, 0);
> return ret;
> }
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index c91a479..092ab49 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -168,13 +168,14 @@ out:
>
> static void nfs_initiate_read(struct nfs_pgio_header *hdr,
> struct rpc_message *msg,
> + const struct nfs_rpc_ops *rpc_ops,
> struct rpc_task_setup *task_setup_data, int how)
> {
> struct inode *inode = hdr->inode;
> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
>
> task_setup_data->flags |= swap_flags;
> - NFS_PROTO(inode)->read_setup(hdr, msg);
> + rpc_ops->read_setup(hdr, msg);
> }
>
> static void
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index af3af68..e5ed21c 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1240,13 +1240,14 @@ static int flush_task_priority(int how)
>
> static void nfs_initiate_write(struct nfs_pgio_header *hdr,
> struct rpc_message *msg,
> + const struct nfs_rpc_ops *rpc_ops,
> struct rpc_task_setup *task_setup_data, int how)
> {
> struct inode *inode = hdr->inode;
> int priority = flush_task_priority(how);
>
> task_setup_data->priority = priority;
> - NFS_PROTO(inode)->write_setup(hdr, msg);
> + rpc_ops->write_setup(hdr, msg);
>
> nfs4_state_protect_write(NFS_SERVER(inode)->nfs_client,
> &task_setup_data->rpc_client, msg, hdr);
When I compile without NFS v4 I get:
fs/nfs/write.c: In function 'nfs_initiate_write':
fs/nfs/write.c:1246:16: error: unused variable 'inode' [-Werror=unused-variable]
struct inode *inode = hdr->inode;
The call to nfs4_state_protect_write() is getting optimized out?
Anna
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index 6c3e06e..4c3aa80 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -69,6 +69,7 @@ struct nfs_rw_ops {
> struct inode *);
> void (*rw_result)(struct rpc_task *, struct nfs_pgio_header *);
> void (*rw_initiate)(struct nfs_pgio_header *, struct rpc_message *,
> + const struct nfs_rpc_ops *,
> struct rpc_task_setup *, int);
> };
>
>
On Mon, Jan 05, 2015 at 01:10:04PM -0500, Anna Schumaker wrote:
> Hey Tom,
>
> On 12/24/2014 02:12 AM, Tom Haynes wrote:
> > Signed-off-by: Tom Haynes <[email protected]>
> > ---
> > fs/nfs/filelayout/filelayout.c | 4 ++--
> > fs/nfs/internal.h | 1 +
> > fs/nfs/pagelist.c | 6 ++++--
> > fs/nfs/read.c | 3 ++-
> > fs/nfs/write.c | 3 ++-
> > include/linux/nfs_page.h | 1 +
> > 6 files changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
> > index bc36ed3..25c4896 100644
> > --- a/fs/nfs/filelayout/filelayout.c
> > +++ b/fs/nfs/filelayout/filelayout.c
> > @@ -501,7 +501,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
> > hdr->mds_offset = offset;
> >
> > /* Perform an asynchronous read to ds */
> > - nfs_initiate_pgio(ds_clnt, hdr,
> > + nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
> > &filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
> > return PNFS_ATTEMPTED;
> > }
> > @@ -542,7 +542,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
> > hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
> >
> > /* Perform an asynchronous write */
> > - nfs_initiate_pgio(ds_clnt, hdr,
> > + nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
> > &filelayout_write_call_ops, sync,
> > RPC_TASK_SOFTCONN);
> > return PNFS_ATTEMPTED;
> > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > index 5543850..1d15ffa 100644
> > --- a/fs/nfs/internal.h
> > +++ b/fs/nfs/internal.h
> > @@ -251,6 +251,7 @@ void nfs_pgio_header_free(struct nfs_pgio_header *);
> > void nfs_pgio_data_destroy(struct nfs_pgio_header *);
> > int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
> > int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
> > + const struct nfs_rpc_ops *,
> > const struct rpc_call_ops *, int, int);
> > void nfs_free_request(struct nfs_page *req);
> >
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index 2b5e769..35a2626 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -597,6 +597,7 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
> > }
> >
> > int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
> > + const struct nfs_rpc_ops *rpc_ops,
> > const struct rpc_call_ops *call_ops, int how, int flags)
> > {
> > struct rpc_task *task;
> > @@ -616,7 +617,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
> > };
> > int ret = 0;
> >
> > - hdr->rw_ops->rw_initiate(hdr, &msg, &task_setup_data, how);
> > + hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how);
> >
> > dprintk("NFS: %5u initiated pgio call "
> > "(req %s/%llu, %u bytes @ offset %llu)\n",
> > @@ -792,7 +793,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
> > ret = nfs_generic_pgio(desc, hdr);
> > if (ret == 0)
> > ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
> > - hdr, desc->pg_rpc_callops,
> > + hdr, NFS_PROTO(hdr->inode),
> > + desc->pg_rpc_callops,
> > desc->pg_ioflags, 0);
> > return ret;
> > }
> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> > index c91a479..092ab49 100644
> > --- a/fs/nfs/read.c
> > +++ b/fs/nfs/read.c
> > @@ -168,13 +168,14 @@ out:
> >
> > static void nfs_initiate_read(struct nfs_pgio_header *hdr,
> > struct rpc_message *msg,
> > + const struct nfs_rpc_ops *rpc_ops,
> > struct rpc_task_setup *task_setup_data, int how)
> > {
> > struct inode *inode = hdr->inode;
> > int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
> >
> > task_setup_data->flags |= swap_flags;
> > - NFS_PROTO(inode)->read_setup(hdr, msg);
> > + rpc_ops->read_setup(hdr, msg);
> > }
> >
> > static void
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index af3af68..e5ed21c 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1240,13 +1240,14 @@ static int flush_task_priority(int how)
> >
> > static void nfs_initiate_write(struct nfs_pgio_header *hdr,
> > struct rpc_message *msg,
> > + const struct nfs_rpc_ops *rpc_ops,
> > struct rpc_task_setup *task_setup_data, int how)
> > {
> > struct inode *inode = hdr->inode;
> > int priority = flush_task_priority(how);
> >
> > task_setup_data->priority = priority;
> > - NFS_PROTO(inode)->write_setup(hdr, msg);
> > + rpc_ops->write_setup(hdr, msg);
> >
> > nfs4_state_protect_write(NFS_SERVER(inode)->nfs_client,
> > &task_setup_data->rpc_client, msg, hdr);
>
> When I compile without NFS v4 I get:
>
> fs/nfs/write.c: In function 'nfs_initiate_write':
> fs/nfs/write.c:1246:16: error: unused variable 'inode' [-Werror=unused-variable]
> struct inode *inode = hdr->inode;
>
> The call to nfs4_state_protect_write() is getting optimized out?
>
> Anna
Hi Anna,
If it is getting optimized out, then the issue is a compiler bug.
I.e., if it optimized out the call, then it should optimize out the
variable and not whine about it.
If you tell it not to optimize, does the warning go away?
Tom
>
> > diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> > index 6c3e06e..4c3aa80 100644
> > --- a/include/linux/nfs_page.h
> > +++ b/include/linux/nfs_page.h
> > @@ -69,6 +69,7 @@ struct nfs_rw_ops {
> > struct inode *);
> > void (*rw_result)(struct rpc_task *, struct nfs_pgio_header *);
> > void (*rw_initiate)(struct nfs_pgio_header *, struct rpc_message *,
> > + const struct nfs_rpc_ops *,
> > struct rpc_task_setup *, int);
> > };
> >
> >
>
On 01/05/2015 01:26 PM, Tom Haynes wrote:
> On Mon, Jan 05, 2015 at 01:10:04PM -0500, Anna Schumaker wrote:
>> Hey Tom,
>>
>> On 12/24/2014 02:12 AM, Tom Haynes wrote:
>>> Signed-off-by: Tom Haynes <[email protected]>
>>> ---
>>> fs/nfs/filelayout/filelayout.c | 4 ++--
>>> fs/nfs/internal.h | 1 +
>>> fs/nfs/pagelist.c | 6 ++++--
>>> fs/nfs/read.c | 3 ++-
>>> fs/nfs/write.c | 3 ++-
>>> include/linux/nfs_page.h | 1 +
>>> 6 files changed, 12 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
>>> index bc36ed3..25c4896 100644
>>> --- a/fs/nfs/filelayout/filelayout.c
>>> +++ b/fs/nfs/filelayout/filelayout.c
>>> @@ -501,7 +501,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
>>> hdr->mds_offset = offset;
>>>
>>> /* Perform an asynchronous read to ds */
>>> - nfs_initiate_pgio(ds_clnt, hdr,
>>> + nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
>>> &filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
>>> return PNFS_ATTEMPTED;
>>> }
>>> @@ -542,7 +542,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
>>> hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
>>>
>>> /* Perform an asynchronous write */
>>> - nfs_initiate_pgio(ds_clnt, hdr,
>>> + nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
>>> &filelayout_write_call_ops, sync,
>>> RPC_TASK_SOFTCONN);
>>> return PNFS_ATTEMPTED;
>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>> index 5543850..1d15ffa 100644
>>> --- a/fs/nfs/internal.h
>>> +++ b/fs/nfs/internal.h
>>> @@ -251,6 +251,7 @@ void nfs_pgio_header_free(struct nfs_pgio_header *);
>>> void nfs_pgio_data_destroy(struct nfs_pgio_header *);
>>> int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
>>> int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
>>> + const struct nfs_rpc_ops *,
>>> const struct rpc_call_ops *, int, int);
>>> void nfs_free_request(struct nfs_page *req);
>>>
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index 2b5e769..35a2626 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -597,6 +597,7 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
>>> }
>>>
>>> int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
>>> + const struct nfs_rpc_ops *rpc_ops,
>>> const struct rpc_call_ops *call_ops, int how, int flags)
>>> {
>>> struct rpc_task *task;
>>> @@ -616,7 +617,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
>>> };
>>> int ret = 0;
>>>
>>> - hdr->rw_ops->rw_initiate(hdr, &msg, &task_setup_data, how);
>>> + hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how);
>>>
>>> dprintk("NFS: %5u initiated pgio call "
>>> "(req %s/%llu, %u bytes @ offset %llu)\n",
>>> @@ -792,7 +793,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>>> ret = nfs_generic_pgio(desc, hdr);
>>> if (ret == 0)
>>> ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
>>> - hdr, desc->pg_rpc_callops,
>>> + hdr, NFS_PROTO(hdr->inode),
>>> + desc->pg_rpc_callops,
>>> desc->pg_ioflags, 0);
>>> return ret;
>>> }
>>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>>> index c91a479..092ab49 100644
>>> --- a/fs/nfs/read.c
>>> +++ b/fs/nfs/read.c
>>> @@ -168,13 +168,14 @@ out:
>>>
>>> static void nfs_initiate_read(struct nfs_pgio_header *hdr,
>>> struct rpc_message *msg,
>>> + const struct nfs_rpc_ops *rpc_ops,
>>> struct rpc_task_setup *task_setup_data, int how)
>>> {
>>> struct inode *inode = hdr->inode;
>>> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
>>>
>>> task_setup_data->flags |= swap_flags;
>>> - NFS_PROTO(inode)->read_setup(hdr, msg);
>>> + rpc_ops->read_setup(hdr, msg);
>>> }
>>>
>>> static void
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index af3af68..e5ed21c 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -1240,13 +1240,14 @@ static int flush_task_priority(int how)
>>>
>>> static void nfs_initiate_write(struct nfs_pgio_header *hdr,
>>> struct rpc_message *msg,
>>> + const struct nfs_rpc_ops *rpc_ops,
>>> struct rpc_task_setup *task_setup_data, int how)
>>> {
>>> struct inode *inode = hdr->inode;
>>> int priority = flush_task_priority(how);
>>>
>>> task_setup_data->priority = priority;
>>> - NFS_PROTO(inode)->write_setup(hdr, msg);
>>> + rpc_ops->write_setup(hdr, msg);
>>>
>>> nfs4_state_protect_write(NFS_SERVER(inode)->nfs_client,
>>> &task_setup_data->rpc_client, msg, hdr);
>>
>> When I compile without NFS v4 I get:
>>
>> fs/nfs/write.c: In function 'nfs_initiate_write':
>> fs/nfs/write.c:1246:16: error: unused variable 'inode' [-Werror=unused-variable]
>> struct inode *inode = hdr->inode;
>>
>> The call to nfs4_state_protect_write() is getting optimized out?
>>
>> Anna
>
> Hi Anna,
>
> If it is getting optimized out, then the issue is a compiler bug.
>
> I.e., if it optimized out the call, then it should optimize out the
> variable and not whine about it.
>
> If you tell it not to optimize, does the warning go away?
I'm not sure what flag would control that, but my new guess is that it's something related to expanding the macros at the bottom of nfs4_fs.h. I'm working on patches to move calls to nfs4_state_protect() into the v4 commit_setup() and write_setup() functions.
Anna
>
> Tom
>
>>
>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>> index 6c3e06e..4c3aa80 100644
>>> --- a/include/linux/nfs_page.h
>>> +++ b/include/linux/nfs_page.h
>>> @@ -69,6 +69,7 @@ struct nfs_rw_ops {
>>> struct inode *);
>>> void (*rw_result)(struct rpc_task *, struct nfs_pgio_header *);
>>> void (*rw_initiate)(struct nfs_pgio_header *, struct rpc_message *,
>>> + const struct nfs_rpc_ops *,
>>> struct rpc_task_setup *, int);
>>> };
>>>
>>>
>>
Hey Peng and Tom,
On 12/24/2014 02:12 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/nfs4_fs.h | 2 ++
> fs/nfs/nfs4proc.c | 4 ++--
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index 90c4ffe..b3c771e 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -447,6 +447,8 @@ extern int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
> struct nfs4_sequence_args *args,
> struct nfs4_sequence_res *res,
> struct rpc_task *task);
> +extern int nfs4_sequence_done(struct rpc_task *task,
> + struct nfs4_sequence_res *res);
>
> extern void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp);
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 9b1a481..4883a42 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -694,8 +694,7 @@ out_retry:
> }
> EXPORT_SYMBOL_GPL(nfs41_sequence_done);
>
> -static int nfs4_sequence_done(struct rpc_task *task,
> - struct nfs4_sequence_res *res)
> +int nfs4_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res)
> {
> if (res->sr_slot == NULL)
> return 1;
When I compile with:
CONFIG_NFS_V4_2=n
CONFIG_NFS_V4_1=n
CONFIG_NFS_V4=y
CONFIG_NFS_V3=y
CONFIG_NFS_V2=y
I get:
fs/nfs/nfs4proc.c:826:12: error: static declaration of 'nfs4_sequence_done' follows non-static declaration
static int nfs4_sequence_done(struct rpc_task *task,
^
In file included from fs/nfs/nfs4proc.c:59:0:
fs/nfs/nfs4_fs.h:450:12: note: previous declaration of 'nfs4_sequence_done' was here
extern int nfs4_sequence_done(struct rpc_task *task,
^
scripts/Makefile.build:257: recipe for target 'fs/nfs/nfs4proc.o' failed
make[2]: *** [fs/nfs/nfs4proc.o] Error 1
scripts/Makefile.build:402: recipe for target 'fs/nfs' failed
make[1]: *** [fs/nfs] Error 2
Makefile:938: recipe for target 'fs' failed
make: *** [fs] Error 2
Thanks,
Anna
> @@ -703,6 +702,7 @@ static int nfs4_sequence_done(struct rpc_task *task,
> return nfs40_sequence_done(task, res);
> return nfs41_sequence_done(task, res);
> }
> +EXPORT_SYMBOL_GPL(nfs4_sequence_done);
>
> int nfs41_setup_sequence(struct nfs4_session *session,
> struct nfs4_sequence_args *args,
>
On Mon, Jan 05, 2015 at 03:11:14PM -0500, Anna Schumaker wrote:
> Hey Peng and Tom,
>
> On 12/24/2014 02:12 AM, Tom Haynes wrote:
> > From: Peng Tao <[email protected]>
> >
> > Signed-off-by: Peng Tao <[email protected]>
> > Signed-off-by: Tom Haynes <[email protected]>
> > ---
> > fs/nfs/nfs4_fs.h | 2 ++
> > fs/nfs/nfs4proc.c | 4 ++--
> > 2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> > index 90c4ffe..b3c771e 100644
> > --- a/fs/nfs/nfs4_fs.h
> > +++ b/fs/nfs/nfs4_fs.h
> > @@ -447,6 +447,8 @@ extern int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
> > struct nfs4_sequence_args *args,
> > struct nfs4_sequence_res *res,
> > struct rpc_task *task);
> > +extern int nfs4_sequence_done(struct rpc_task *task,
> > + struct nfs4_sequence_res *res);
> >
> > extern void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp);
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 9b1a481..4883a42 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -694,8 +694,7 @@ out_retry:
> > }
> > EXPORT_SYMBOL_GPL(nfs41_sequence_done);
> >
> > -static int nfs4_sequence_done(struct rpc_task *task,
> > - struct nfs4_sequence_res *res)
> > +int nfs4_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res)
> > {
> > if (res->sr_slot == NULL)
> > return 1;
>
> When I compile with:
> CONFIG_NFS_V4_2=n
> CONFIG_NFS_V4_1=n
> CONFIG_NFS_V4=y
> CONFIG_NFS_V3=y
> CONFIG_NFS_V2=y
>
> I get:
> fs/nfs/nfs4proc.c:826:12: error: static declaration of 'nfs4_sequence_done' follows non-static declaration
> static int nfs4_sequence_done(struct rpc_task *task,
> ^
> In file included from fs/nfs/nfs4proc.c:59:0:
> fs/nfs/nfs4_fs.h:450:12: note: previous declaration of 'nfs4_sequence_done' was here
> extern int nfs4_sequence_done(struct rpc_task *task,
> ^
> scripts/Makefile.build:257: recipe for target 'fs/nfs/nfs4proc.o' failed
> make[2]: *** [fs/nfs/nfs4proc.o] Error 1
> scripts/Makefile.build:402: recipe for target 'fs/nfs' failed
> make[1]: *** [fs/nfs] Error 2
> Makefile:938: recipe for target 'fs' failed
> make: *** [fs] Error 2
Right, I see the issue. I'll fix this up.
>
> Thanks,
> Anna
>
> > @@ -703,6 +702,7 @@ static int nfs4_sequence_done(struct rpc_task *task,
> > return nfs40_sequence_done(task, res);
> > return nfs41_sequence_done(task, res);
> > }
> > +EXPORT_SYMBOL_GPL(nfs4_sequence_done);
> >
> > int nfs41_setup_sequence(struct nfs4_session *session,
> > struct nfs4_sequence_args *args,
> >
>
Hi again,
On 12/24/2014 02:12 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/pnfs.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 2d25670..fa00b56 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1288,7 +1288,6 @@ pnfs_update_layout(struct inode *ino,
> struct nfs_client *clp = server->nfs_client;
> struct pnfs_layout_hdr *lo;
> struct pnfs_layout_segment *lseg = NULL;
> - bool first;
>
> if (!pnfs_enabled_sb(NFS_SERVER(ino)))
> goto out;
> @@ -1321,16 +1320,15 @@ pnfs_update_layout(struct inode *ino,
> if (pnfs_layoutgets_blocked(lo, 0))
> goto out_unlock;
> atomic_inc(&lo->plh_outstanding);
> -
> - first = list_empty(&lo->plh_layouts) ? true : false;
> spin_unlock(&ino->i_lock);
>
> - if (first) {
> + if (list_empty(&lo->plh_layouts)) {
> /* The lo must be on the clp list if there is any
> * chance of a CB_LAYOUTRECALL(FILE) coming in.
> */
> spin_lock(&clp->cl_lock);
> - list_add_tail(&lo->plh_layouts, &server->layouts);
> + if (list_empty(&lo->plh_layouts))
> + list_add_tail(&lo->plh_layouts, &server->layouts);
> spin_unlock(&clp->cl_lock);
> }
Do we really need to call list_empty() twice? Would there be a serious performance drawback if we removed the outer layer if condition and then always call list_empty() under the cl_lock?
Thanks,
Anna
>
>
On 01/05/2015 04:20 PM, Trond Myklebust wrote:
>
> On Jan 5, 2015 12:59 PM, "Anna Schumaker" <[email protected] <mailto:[email protected]>> wrote:
>>
>> Hi again,
>>
>> On 12/24/2014 02:12 AM, Tom Haynes wrote:
>> > From: Peng Tao <[email protected] <mailto:[email protected]>>
>> >
>> > Signed-off-by: Peng Tao <[email protected] <mailto:[email protected]>>
>> > Signed-off-by: Tom Haynes <[email protected] <mailto:[email protected]>>
>> > ---
>> > fs/nfs/pnfs.c | 8 +++-----
>> > 1 file changed, 3 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> > index 2d25670..fa00b56 100644
>> > --- a/fs/nfs/pnfs.c
>> > +++ b/fs/nfs/pnfs.c
>> > @@ -1288,7 +1288,6 @@ pnfs_update_layout(struct inode *ino,
>> > struct nfs_client *clp = server->nfs_client;
>> > struct pnfs_layout_hdr *lo;
>> > struct pnfs_layout_segment *lseg = NULL;
>> > - bool first;
>> >
>> > if (!pnfs_enabled_sb(NFS_SERVER(ino)))
>> > goto out;
>> > @@ -1321,16 +1320,15 @@ pnfs_update_layout(struct inode *ino,
>> > if (pnfs_layoutgets_blocked(lo, 0))
>> > goto out_unlock;
>> > atomic_inc(&lo->plh_outstanding);
>> > -
>> > - first = list_empty(&lo->plh_layouts) ? true : false;
>> > spin_unlock(&ino->i_lock);
>> >
>> > - if (first) {
>> > + if (list_empty(&lo->plh_layouts)) {
>> > /* The lo must be on the clp list if there is any
>> > * chance of a CB_LAYOUTRECALL(FILE) coming in.
>> > */
>> > spin_lock(&clp->cl_lock);
>> > - list_add_tail(&lo->plh_layouts, &server->layouts);
>> > + if (list_empty(&lo->plh_layouts))
>> > + list_add_tail(&lo->plh_layouts, &server->layouts);
>> > spin_unlock(&clp->cl_lock);
>> > }
>>
>> Do we really need to call list_empty() twice? Would there be a serious performance drawback if we removed the outer layer if condition and then always call list_empty() under the cl_lock?
>
> What is the problem with that? It avoids unnecessary contention on a per-server global lock.
I was thinking about the case where the plh_layouts list becomes empty after the outer if. I took a closer look at the code and that only happens when the layout is being freed, so it shouldn't be an issue.
Anna
>
> Please keep it,
>
>>
>> >
>> >
>>
>
Hey Peng,
Can you add this to NFS v2 as well? Then we'll be covered if we ever want to re-use this variable somewhere else! :)
Thanks,
Anna
On 12/24/2014 02:13 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> Flexfiles layout would want to use them to report DS IO status.
>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/nfs3xdr.c | 3 +++
> fs/nfs/nfs4xdr.c | 3 +++
> include/linux/nfs_xdr.h | 2 ++
> 3 files changed, 8 insertions(+)
>
> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> index 8f4cbe7..2a932fd 100644
> --- a/fs/nfs/nfs3xdr.c
> +++ b/fs/nfs/nfs3xdr.c
> @@ -1636,6 +1636,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr,
> error = decode_post_op_attr(xdr, result->fattr);
> if (unlikely(error))
> goto out;
> + result->op_status = status;
> if (status != NFS3_OK)
> goto out_status;
> error = decode_read3resok(xdr, result);
> @@ -1708,6 +1709,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr,
> error = decode_wcc_data(xdr, result->fattr);
> if (unlikely(error))
> goto out;
> + result->op_status = status;
> if (status != NFS3_OK)
> goto out_status;
> error = decode_write3resok(xdr, result);
> @@ -2323,6 +2325,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req,
> error = decode_wcc_data(xdr, result->fattr);
> if (unlikely(error))
> goto out;
> + result->op_status = status;
> if (status != NFS3_OK)
> goto out_status;
> error = decode_writeverf3(xdr, &result->verf->verifier);
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index cb4376b..7d8d7a4 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -6567,6 +6567,7 @@ static int nfs4_xdr_dec_read(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> int status;
>
> status = decode_compound_hdr(xdr, &hdr);
> + res->op_status = hdr.status;
> if (status)
> goto out;
> status = decode_sequence(xdr, &res->seq_res, rqstp);
> @@ -6592,6 +6593,7 @@ static int nfs4_xdr_dec_write(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> int status;
>
> status = decode_compound_hdr(xdr, &hdr);
> + res->op_status = hdr.status;
> if (status)
> goto out;
> status = decode_sequence(xdr, &res->seq_res, rqstp);
> @@ -6621,6 +6623,7 @@ static int nfs4_xdr_dec_commit(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> int status;
>
> status = decode_compound_hdr(xdr, &hdr);
> + res->op_status = hdr.status;
> if (status)
> goto out;
> status = decode_sequence(xdr, &res->seq_res, rqstp);
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 467c84e..962f461 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -513,6 +513,7 @@ struct nfs_pgio_res {
> struct nfs4_sequence_res seq_res;
> struct nfs_fattr * fattr;
> __u32 count;
> + __u32 op_status;
> int eof; /* used by read */
> struct nfs_writeverf * verf; /* used by write */
> const struct nfs_server *server; /* used by write */
> @@ -532,6 +533,7 @@ struct nfs_commitargs {
>
> struct nfs_commitres {
> struct nfs4_sequence_res seq_res;
> + __u32 op_status;
> struct nfs_fattr *fattr;
> struct nfs_writeverf *verf;
> const struct nfs_server *server;
>
On Tue, Jan 6, 2015 at 5:41 AM, Anna Schumaker
<[email protected]> wrote:
> Hey Peng,
>
> Can you add this to NFS v2 as well? Then we'll be covered if we ever want to re-use this variable somewhere else! :)
>
Yup, sure. I'll add it in the next version.
Thanks,
Tao
> Thanks,
> Anna
>
> On 12/24/2014 02:13 AM, Tom Haynes wrote:
>> From: Peng Tao <[email protected]>
>>
>> Flexfiles layout would want to use them to report DS IO status.
>>
>> Signed-off-by: Peng Tao <[email protected]>
>> Signed-off-by: Tom Haynes <[email protected]>
>> ---
>> fs/nfs/nfs3xdr.c | 3 +++
>> fs/nfs/nfs4xdr.c | 3 +++
>> include/linux/nfs_xdr.h | 2 ++
>> 3 files changed, 8 insertions(+)
>>
>> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
>> index 8f4cbe7..2a932fd 100644
>> --- a/fs/nfs/nfs3xdr.c
>> +++ b/fs/nfs/nfs3xdr.c
>> @@ -1636,6 +1636,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr,
>> error = decode_post_op_attr(xdr, result->fattr);
>> if (unlikely(error))
>> goto out;
>> + result->op_status = status;
>> if (status != NFS3_OK)
>> goto out_status;
>> error = decode_read3resok(xdr, result);
>> @@ -1708,6 +1709,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr,
>> error = decode_wcc_data(xdr, result->fattr);
>> if (unlikely(error))
>> goto out;
>> + result->op_status = status;
>> if (status != NFS3_OK)
>> goto out_status;
>> error = decode_write3resok(xdr, result);
>> @@ -2323,6 +2325,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req,
>> error = decode_wcc_data(xdr, result->fattr);
>> if (unlikely(error))
>> goto out;
>> + result->op_status = status;
>> if (status != NFS3_OK)
>> goto out_status;
>> error = decode_writeverf3(xdr, &result->verf->verifier);
>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
>> index cb4376b..7d8d7a4 100644
>> --- a/fs/nfs/nfs4xdr.c
>> +++ b/fs/nfs/nfs4xdr.c
>> @@ -6567,6 +6567,7 @@ static int nfs4_xdr_dec_read(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
>> int status;
>>
>> status = decode_compound_hdr(xdr, &hdr);
>> + res->op_status = hdr.status;
>> if (status)
>> goto out;
>> status = decode_sequence(xdr, &res->seq_res, rqstp);
>> @@ -6592,6 +6593,7 @@ static int nfs4_xdr_dec_write(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
>> int status;
>>
>> status = decode_compound_hdr(xdr, &hdr);
>> + res->op_status = hdr.status;
>> if (status)
>> goto out;
>> status = decode_sequence(xdr, &res->seq_res, rqstp);
>> @@ -6621,6 +6623,7 @@ static int nfs4_xdr_dec_commit(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
>> int status;
>>
>> status = decode_compound_hdr(xdr, &hdr);
>> + res->op_status = hdr.status;
>> if (status)
>> goto out;
>> status = decode_sequence(xdr, &res->seq_res, rqstp);
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 467c84e..962f461 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -513,6 +513,7 @@ struct nfs_pgio_res {
>> struct nfs4_sequence_res seq_res;
>> struct nfs_fattr * fattr;
>> __u32 count;
>> + __u32 op_status;
>> int eof; /* used by read */
>> struct nfs_writeverf * verf; /* used by write */
>> const struct nfs_server *server; /* used by write */
>> @@ -532,6 +533,7 @@ struct nfs_commitargs {
>>
>> struct nfs_commitres {
>> struct nfs4_sequence_res seq_res;
>> + __u32 op_status;
>> struct nfs_fattr *fattr;
>> struct nfs_writeverf *verf;
>> const struct nfs_server *server;
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Dros and Tom,
I see you're adding some new FIXME and TODOs in the comments. Is there a plan for addressing these eventually?
Thanks,
Anna
On 12/24/2014 02:13 AM, Tom Haynes wrote:
> From: Weston Andros Adamson <[email protected]>
>
> This patch adds mirrored write support to the pgio layer. The default
> is to use one mirror, but pgio callers may define callbacks to change
> this to any value up to the (arbitrarily selected) limit of 16.
>
> The basic idea is to break out members of nfs_pageio_descriptor that cannot
> be shared between mirrored DSes and put them in a new structure.
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> ---
> fs/nfs/direct.c | 17 ++-
> fs/nfs/internal.h | 1 +
> fs/nfs/objlayout/objio_osd.c | 3 +-
> fs/nfs/pagelist.c | 270 +++++++++++++++++++++++++++++++++++--------
> fs/nfs/pnfs.c | 26 +++--
> fs/nfs/read.c | 30 ++++-
> fs/nfs/write.c | 10 +-
> include/linux/nfs_page.h | 20 +++-
> include/linux/nfs_xdr.h | 1 +
> 9 files changed, 311 insertions(+), 67 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 1ee41d7..0178d4f 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -360,8 +360,14 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
> spin_lock(&dreq->lock);
> if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
> dreq->error = hdr->error;
> - else
> - dreq->count += hdr->good_bytes;
> + else {
> + /*
> + * FIXME: right now this only accounts for bytes written
> + * to the first mirror
> + */
> + if (hdr->pgio_mirror_idx == 0)
> + dreq->count += hdr->good_bytes;
> + }
> spin_unlock(&dreq->lock);
>
> while (!list_empty(&hdr->pages)) {
> @@ -724,7 +730,12 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
> dreq->error = hdr->error;
> }
> if (dreq->error == 0) {
> - dreq->count += hdr->good_bytes;
> + /*
> + * FIXME: right now this only accounts for bytes written
> + * to the first mirror
> + */
> + if (hdr->pgio_mirror_idx == 0)
> + dreq->count += hdr->good_bytes;
> if (nfs_write_need_commit(hdr)) {
> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
> request_commit = true;
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 05f9a87..ef1c703 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -469,6 +469,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
> struct nfs_direct_req *dreq);
> int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
> bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
>
> #ifdef CONFIG_MIGRATION
> extern int nfs_migrate_page(struct address_space *,
> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
> index d007780..9a5f2ee 100644
> --- a/fs/nfs/objlayout/objio_osd.c
> +++ b/fs/nfs/objlayout/objio_osd.c
> @@ -537,11 +537,12 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
> static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
> struct nfs_page *prev, struct nfs_page *req)
> {
> + struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
> unsigned int size;
>
> size = pnfs_generic_pg_test(pgio, prev, req);
>
> - if (!size || pgio->pg_count + req->wb_bytes >
> + if (!size || mirror->pg_count + req->wb_bytes >
> (unsigned long)pgio->pg_layout_private)
> return 0;
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 1c03187..eec12b7 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -46,17 +46,22 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
> struct nfs_pgio_header *hdr,
> void (*release)(struct nfs_pgio_header *hdr))
> {
> - hdr->req = nfs_list_entry(desc->pg_list.next);
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> +
> + hdr->req = nfs_list_entry(mirror->pg_list.next);
> hdr->inode = desc->pg_inode;
> hdr->cred = hdr->req->wb_context->cred;
> hdr->io_start = req_offset(hdr->req);
> - hdr->good_bytes = desc->pg_count;
> + hdr->good_bytes = mirror->pg_count;
> hdr->dreq = desc->pg_dreq;
> hdr->layout_private = desc->pg_layout_private;
> hdr->release = release;
> hdr->completion_ops = desc->pg_completion_ops;
> if (hdr->completion_ops->init_hdr)
> hdr->completion_ops->init_hdr(hdr);
> +
> + hdr->pgio_mirror_idx = desc->pg_mirror_idx;
> }
> EXPORT_SYMBOL_GPL(nfs_pgheader_init);
>
> @@ -480,7 +485,10 @@ nfs_wait_on_request(struct nfs_page *req)
> size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
> struct nfs_page *prev, struct nfs_page *req)
> {
> - if (desc->pg_count > desc->pg_bsize) {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> +
> + if (mirror->pg_count > mirror->pg_bsize) {
> /* should never happen */
> WARN_ON_ONCE(1);
> return 0;
> @@ -490,11 +498,11 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
> * Limit the request size so that we can still allocate a page array
> * for it without upsetting the slab allocator.
> */
> - if (((desc->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
> + if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
> sizeof(struct page) > PAGE_SIZE)
> return 0;
>
> - return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
> + return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
> }
> EXPORT_SYMBOL_GPL(nfs_generic_pg_test);
>
> @@ -651,10 +659,18 @@ EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
> static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
> struct nfs_pgio_header *hdr)
> {
> + struct nfs_pgio_mirror *mirror;
> + u32 midx;
> +
> set_bit(NFS_IOHDR_REDO, &hdr->flags);
> nfs_pgio_data_destroy(hdr);
> hdr->completion_ops->completion(hdr);
> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
> + /* TODO: Make sure it's right to clean up all mirrors here
> + * and not just hdr->pgio_mirror_idx */
> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
> + mirror = &desc->pg_mirrors[midx];
> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
> + }
> return -ENOMEM;
> }
>
> @@ -671,6 +687,17 @@ static void nfs_pgio_release(void *calldata)
> hdr->completion_ops->completion(hdr);
> }
>
> +static void nfs_pageio_mirror_init(struct nfs_pgio_mirror *mirror,
> + unsigned int bsize)
> +{
> + INIT_LIST_HEAD(&mirror->pg_list);
> + mirror->pg_bytes_written = 0;
> + mirror->pg_count = 0;
> + mirror->pg_bsize = bsize;
> + mirror->pg_base = 0;
> + mirror->pg_recoalesce = 0;
> +}
> +
> /**
> * nfs_pageio_init - initialise a page io descriptor
> * @desc: pointer to descriptor
> @@ -687,13 +714,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> size_t bsize,
> int io_flags)
> {
> - INIT_LIST_HEAD(&desc->pg_list);
> - desc->pg_bytes_written = 0;
> - desc->pg_count = 0;
> - desc->pg_bsize = bsize;
> - desc->pg_base = 0;
> + struct nfs_pgio_mirror *new;
> + int i;
> +
> desc->pg_moreio = 0;
> - desc->pg_recoalesce = 0;
> desc->pg_inode = inode;
> desc->pg_ops = pg_ops;
> desc->pg_completion_ops = compl_ops;
> @@ -703,6 +727,26 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> desc->pg_lseg = NULL;
> desc->pg_dreq = NULL;
> desc->pg_layout_private = NULL;
> + desc->pg_bsize = bsize;
> +
> + desc->pg_mirror_count = 1;
> + desc->pg_mirror_idx = 0;
> +
> + if (pg_ops->pg_get_mirror_count) {
> + /* until we have a request, we don't have an lseg and no
> + * idea how many mirrors there will be */
> + new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
> + sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
> + desc->pg_mirrors_dynamic = new;
> + desc->pg_mirrors = new;
> +
> + for (i = 0; i < NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX; i++)
> + nfs_pageio_mirror_init(&desc->pg_mirrors[i], bsize);
> + } else {
> + desc->pg_mirrors_dynamic = NULL;
> + desc->pg_mirrors = desc->pg_mirrors_static;
> + nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
> + }
> }
> EXPORT_SYMBOL_GPL(nfs_pageio_init);
>
> @@ -738,14 +782,16 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
> int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
> struct nfs_pgio_header *hdr)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> struct nfs_page *req;
> struct page **pages,
> *last_page;
> - struct list_head *head = &desc->pg_list;
> + struct list_head *head = &mirror->pg_list;
> struct nfs_commit_info cinfo;
> unsigned int pagecount, pageused;
>
> - pagecount = nfs_page_array_len(desc->pg_base, desc->pg_count);
> + pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
> if (!nfs_pgarray_set(&hdr->page_array, pagecount))
> return nfs_pgio_error(desc, hdr);
>
> @@ -773,7 +819,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
> desc->pg_ioflags &= ~FLUSH_COND_STABLE;
>
> /* Set up the argument struct */
> - nfs_pgio_rpcsetup(hdr, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
> + nfs_pgio_rpcsetup(hdr, mirror->pg_count, 0, desc->pg_ioflags, &cinfo);
> desc->pg_rpc_callops = &nfs_pgio_common_ops;
> return 0;
> }
> @@ -781,12 +827,17 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
>
> static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
> {
> + struct nfs_pgio_mirror *mirror;
> struct nfs_pgio_header *hdr;
> int ret;
>
> + mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
> if (!hdr) {
> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
> + /* TODO: make sure this is right with mirroring - or
> + * should it back out all mirrors? */
> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
> return -ENOMEM;
> }
> nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
> @@ -801,6 +852,49 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
> return ret;
> }
>
> +/*
> + * nfs_pageio_setup_mirroring - determine if mirroring is to be used
> + * by calling the pg_get_mirror_count op
> + */
> +static int nfs_pageio_setup_mirroring(struct nfs_pageio_descriptor *pgio,
> + struct nfs_page *req)
> +{
> + int mirror_count = 1;
> +
> + if (!pgio->pg_ops->pg_get_mirror_count)
> + return 0;
> +
> + mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
> +
> + if (!mirror_count || mirror_count > NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX)
> + return -EINVAL;
> +
> + if (WARN_ON_ONCE(!pgio->pg_mirrors_dynamic))
> + return -EINVAL;
> +
> + pgio->pg_mirror_count = mirror_count;
> +
> + return 0;
> +}
> +
> +/*
> + * nfs_pageio_stop_mirroring - stop using mirroring (set mirror count to 1)
> + */
> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio)
> +{
> + pgio->pg_mirror_count = 1;
> + pgio->pg_mirror_idx = 0;
> +}
> +
> +static void nfs_pageio_cleanup_mirroring(struct nfs_pageio_descriptor *pgio)
> +{
> + pgio->pg_mirror_count = 1;
> + pgio->pg_mirror_idx = 0;
> + pgio->pg_mirrors = pgio->pg_mirrors_static;
> + kfree(pgio->pg_mirrors_dynamic);
> + pgio->pg_mirrors_dynamic = NULL;
> +}
> +
> static bool nfs_match_open_context(const struct nfs_open_context *ctx1,
> const struct nfs_open_context *ctx2)
> {
> @@ -867,19 +961,22 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
> static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
> struct nfs_page *req)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> struct nfs_page *prev = NULL;
> - if (desc->pg_count != 0) {
> - prev = nfs_list_entry(desc->pg_list.prev);
> +
> + if (mirror->pg_count != 0) {
> + prev = nfs_list_entry(mirror->pg_list.prev);
> } else {
> if (desc->pg_ops->pg_init)
> desc->pg_ops->pg_init(desc, req);
> - desc->pg_base = req->wb_pgbase;
> + mirror->pg_base = req->wb_pgbase;
> }
> if (!nfs_can_coalesce_requests(prev, req, desc))
> return 0;
> nfs_list_remove_request(req);
> - nfs_list_add_request(req, &desc->pg_list);
> - desc->pg_count += req->wb_bytes;
> + nfs_list_add_request(req, &mirror->pg_list);
> + mirror->pg_count += req->wb_bytes;
> return 1;
> }
>
> @@ -888,16 +985,19 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
> */
> static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
> {
> - if (!list_empty(&desc->pg_list)) {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> +
> + if (!list_empty(&mirror->pg_list)) {
> int error = desc->pg_ops->pg_doio(desc);
> if (error < 0)
> desc->pg_error = error;
> else
> - desc->pg_bytes_written += desc->pg_count;
> + mirror->pg_bytes_written += mirror->pg_count;
> }
> - if (list_empty(&desc->pg_list)) {
> - desc->pg_count = 0;
> - desc->pg_base = 0;
> + if (list_empty(&mirror->pg_list)) {
> + mirror->pg_count = 0;
> + mirror->pg_base = 0;
> }
> }
>
> @@ -915,10 +1015,14 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> struct nfs_page *req)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> struct nfs_page *subreq;
> unsigned int bytes_left = 0;
> unsigned int offset, pgbase;
>
> + WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
> +
> nfs_page_group_lock(req, false);
>
> subreq = req;
> @@ -938,7 +1042,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> nfs_pageio_doio(desc);
> if (desc->pg_error < 0)
> return 0;
> - if (desc->pg_recoalesce)
> + if (mirror->pg_recoalesce)
> return 0;
> /* retry add_request for this subreq */
> nfs_page_group_lock(req, false);
> @@ -976,14 +1080,16 @@ err_ptr:
>
> static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> LIST_HEAD(head);
>
> do {
> - list_splice_init(&desc->pg_list, &head);
> - desc->pg_bytes_written -= desc->pg_count;
> - desc->pg_count = 0;
> - desc->pg_base = 0;
> - desc->pg_recoalesce = 0;
> + list_splice_init(&mirror->pg_list, &head);
> + mirror->pg_bytes_written -= mirror->pg_count;
> + mirror->pg_count = 0;
> + mirror->pg_base = 0;
> + mirror->pg_recoalesce = 0;
> +
> desc->pg_moreio = 0;
>
> while (!list_empty(&head)) {
> @@ -997,11 +1103,11 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
> return 0;
> break;
> }
> - } while (desc->pg_recoalesce);
> + } while (mirror->pg_recoalesce);
> return 1;
> }
>
> -int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> +static int nfs_pageio_add_request_mirror(struct nfs_pageio_descriptor *desc,
> struct nfs_page *req)
> {
> int ret;
> @@ -1014,9 +1120,78 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> break;
> ret = nfs_do_recoalesce(desc);
> } while (ret);
> +
> return ret;
> }
>
> +int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> + struct nfs_page *req)
> +{
> + u32 midx;
> + unsigned int pgbase, offset, bytes;
> + struct nfs_page *dupreq, *lastreq;
> +
> + pgbase = req->wb_pgbase;
> + offset = req->wb_offset;
> + bytes = req->wb_bytes;
> +
> + nfs_pageio_setup_mirroring(desc, req);
> +
> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
> + if (midx) {
> + nfs_page_group_lock(req, false);
> +
> + /* find the last request */
> + for (lastreq = req->wb_head;
> + lastreq->wb_this_page != req->wb_head;
> + lastreq = lastreq->wb_this_page)
> + ;
> +
> + dupreq = nfs_create_request(req->wb_context,
> + req->wb_page, lastreq, pgbase, bytes);
> +
> + if (IS_ERR(dupreq)) {
> + nfs_page_group_unlock(req);
> + return 0;
> + }
> +
> + nfs_lock_request(dupreq);
> + nfs_page_group_unlock(req);
> + dupreq->wb_offset = offset;
> + dupreq->wb_index = req->wb_index;
> + } else
> + dupreq = req;
> +
> + desc->pg_mirror_idx = midx;
> + if (!nfs_pageio_add_request_mirror(desc, dupreq))
> + return 0;
> + }
> +
> + return 1;
> +}
> +
> +/*
> + * nfs_pageio_complete_mirror - Complete I/O on the current mirror of an
> + * nfs_pageio_descriptor
> + * @desc: pointer to io descriptor
> + */
> +static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
> + u32 mirror_idx)
> +{
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
> + u32 restore_idx = desc->pg_mirror_idx;
> +
> + desc->pg_mirror_idx = mirror_idx;
> + for (;;) {
> + nfs_pageio_doio(desc);
> + if (!mirror->pg_recoalesce)
> + break;
> + if (!nfs_do_recoalesce(desc))
> + break;
> + }
> + desc->pg_mirror_idx = restore_idx;
> +}
> +
> /*
> * nfs_pageio_resend - Transfer requests to new descriptor and resend
> * @hdr - the pgio header to move request from
> @@ -1055,16 +1230,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_resend);
> */
> void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
> {
> - for (;;) {
> - nfs_pageio_doio(desc);
> - if (!desc->pg_recoalesce)
> - break;
> - if (!nfs_do_recoalesce(desc))
> - break;
> - }
> + u32 midx;
> +
> + for (midx = 0; midx < desc->pg_mirror_count; midx++)
> + nfs_pageio_complete_mirror(desc, midx);
>
> if (desc->pg_ops->pg_cleanup)
> desc->pg_ops->pg_cleanup(desc);
> + nfs_pageio_cleanup_mirroring(desc);
> }
>
> /**
> @@ -1080,10 +1253,17 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
> */
> void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
> {
> - if (!list_empty(&desc->pg_list)) {
> - struct nfs_page *prev = nfs_list_entry(desc->pg_list.prev);
> - if (index != prev->wb_index + 1)
> - nfs_pageio_complete(desc);
> + struct nfs_pgio_mirror *mirror;
> + struct nfs_page *prev;
> + u32 midx;
> +
> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
> + mirror = &desc->pg_mirrors[midx];
> + if (!list_empty(&mirror->pg_list)) {
> + prev = nfs_list_entry(mirror->pg_list.prev);
> + if (index != prev->wb_index + 1)
> + nfs_pageio_complete_mirror(desc, midx);
> + }
> }
> }
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 2da2e77..5f7c422 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1646,8 +1646,8 @@ EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
> * of bytes (maximum @req->wb_bytes) that can be coalesced.
> */
> size_t
> -pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> - struct nfs_page *req)
> +pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
> + struct nfs_page *prev, struct nfs_page *req)
> {
> unsigned int size;
> u64 seg_end, req_start, seg_left;
> @@ -1729,10 +1729,12 @@ static void
> pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
> struct nfs_pgio_header *hdr)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
> nfs_pageio_reset_write_mds(desc);
> - desc->pg_recoalesce = 1;
> + mirror->pg_recoalesce = 1;
> }
> nfs_pgio_data_destroy(hdr);
> }
> @@ -1781,12 +1783,14 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
> int
> pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> struct nfs_pgio_header *hdr;
> int ret;
>
> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
> if (!hdr) {
> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
> return -ENOMEM;
> }
> nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
> @@ -1795,6 +1799,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
> ret = nfs_generic_pgio(desc, hdr);
> if (!ret)
> pnfs_do_write(desc, hdr, desc->pg_ioflags);
> +
> return ret;
> }
> EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
> @@ -1839,10 +1844,13 @@ static void
> pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
> struct nfs_pgio_header *hdr)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> +
> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
> nfs_pageio_reset_read_mds(desc);
> - desc->pg_recoalesce = 1;
> + mirror->pg_recoalesce = 1;
> }
> nfs_pgio_data_destroy(hdr);
> }
> @@ -1893,12 +1901,14 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
> int
> pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
> {
> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
> +
> struct nfs_pgio_header *hdr;
> int ret;
>
> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
> if (!hdr) {
> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
> return -ENOMEM;
> }
> nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 092ab49..568ecf0 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -70,8 +70,15 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
>
> void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
> {
> + struct nfs_pgio_mirror *mirror;
> +
> pgio->pg_ops = &nfs_pgio_rw_ops;
> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
> +
> + /* read path should never have more than one mirror */
> + WARN_ON_ONCE(pgio->pg_mirror_count != 1);
> +
> + mirror = &pgio->pg_mirrors[0];
> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
> }
> EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
>
> @@ -81,6 +88,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> struct nfs_page *new;
> unsigned int len;
> struct nfs_pageio_descriptor pgio;
> + struct nfs_pgio_mirror *pgm;
>
> len = nfs_page_length(page);
> if (len == 0)
> @@ -97,7 +105,13 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> &nfs_async_read_completion_ops);
> nfs_pageio_add_request(&pgio, new);
> nfs_pageio_complete(&pgio);
> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
> +
> + /* It doesn't make sense to do mirrored reads! */
> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
> +
> + pgm = &pgio.pg_mirrors[0];
> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
> +
> return 0;
> }
>
> @@ -352,6 +366,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
> struct list_head *pages, unsigned nr_pages)
> {
> struct nfs_pageio_descriptor pgio;
> + struct nfs_pgio_mirror *pgm;
> struct nfs_readdesc desc = {
> .pgio = &pgio,
> };
> @@ -387,10 +402,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
> &nfs_async_read_completion_ops);
>
> ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
> -
> nfs_pageio_complete(&pgio);
> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
> - npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> +
> + /* It doesn't make sense to do mirrored reads! */
> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
> +
> + pgm = &pgio.pg_mirrors[0];
> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
> + npages = (pgm->pg_bytes_written + PAGE_CACHE_SIZE - 1) >>
> + PAGE_CACHE_SHIFT;
> nfs_add_stats(inode, NFSIOS_READPAGES, npages);
> read_complete:
> put_nfs_open_context(desc.ctx);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index db802d9..2f6ee8e 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -906,7 +906,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
> if (nfs_write_need_commit(hdr)) {
> memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
> nfs_mark_request_commit(req, hdr->lseg, &cinfo,
> - 0);
> + hdr->pgio_mirror_idx);
> goto next;
> }
> remove_req:
> @@ -1305,8 +1305,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_write);
>
> void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
> {
> + struct nfs_pgio_mirror *mirror;
> +
> pgio->pg_ops = &nfs_pgio_rw_ops;
> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
> +
> + nfs_pageio_stop_mirroring(pgio);
> +
> + mirror = &pgio->pg_mirrors[0];
> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
> }
> EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
>
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index 479c566..3eb072d 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -58,6 +58,8 @@ struct nfs_pageio_ops {
> size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
> struct nfs_page *);
> int (*pg_doio)(struct nfs_pageio_descriptor *);
> + unsigned int (*pg_get_mirror_count)(struct nfs_pageio_descriptor *,
> + struct nfs_page *);
> void (*pg_cleanup)(struct nfs_pageio_descriptor *);
> };
>
> @@ -74,15 +76,17 @@ struct nfs_rw_ops {
> struct rpc_task_setup *, int);
> };
>
> -struct nfs_pageio_descriptor {
> +struct nfs_pgio_mirror {
> struct list_head pg_list;
> unsigned long pg_bytes_written;
> size_t pg_count;
> size_t pg_bsize;
> unsigned int pg_base;
> - unsigned char pg_moreio : 1,
> - pg_recoalesce : 1;
> + unsigned char pg_recoalesce : 1;
> +};
>
> +struct nfs_pageio_descriptor {
> + unsigned char pg_moreio : 1;
> struct inode *pg_inode;
> const struct nfs_pageio_ops *pg_ops;
> const struct nfs_rw_ops *pg_rw_ops;
> @@ -93,8 +97,18 @@ struct nfs_pageio_descriptor {
> struct pnfs_layout_segment *pg_lseg;
> struct nfs_direct_req *pg_dreq;
> void *pg_layout_private;
> + unsigned int pg_bsize; /* default bsize for mirrors */
> +
> + u32 pg_mirror_count;
> + struct nfs_pgio_mirror *pg_mirrors;
> + struct nfs_pgio_mirror pg_mirrors_static[1];
> + struct nfs_pgio_mirror *pg_mirrors_dynamic;
> + u32 pg_mirror_idx; /* current mirror */
> };
>
> +/* arbitrarily selected limit to number of mirrors */
> +#define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
> +
> #define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
>
> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 5bc99f0..6400a1e 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1329,6 +1329,7 @@ struct nfs_pgio_header {
> struct nfs_page_array page_array;
> struct nfs_client *ds_clp; /* pNFS data server */
> int ds_commit_idx; /* ds index if ds_clp is set */
> + int pgio_mirror_idx;/* mirror index in pgio layer */
> };
>
> struct nfs_mds_commit_info {
>
These issues are addressed and the comments are removed in subsequent patches
from the same series.
Instead of having one huge patch that implements all of mirroring, I chose split
it out into smaller patches. These notes were useful in making sure that the issues
were addressed and should be useful as a guide to someone bisecting, etc.
-dros
> On Jan 6, 2015, at 1:11 PM, Anna Schumaker <[email protected]> wrote:
>
> Hey Dros and Tom,
>
> I see you're adding some new FIXME and TODOs in the comments. Is there a plan for addressing these eventually?
>
> Thanks,
> Anna
>
> On 12/24/2014 02:13 AM, Tom Haynes wrote:
>> From: Weston Andros Adamson <[email protected]>
>>
>> This patch adds mirrored write support to the pgio layer. The default
>> is to use one mirror, but pgio callers may define callbacks to change
>> this to any value up to the (arbitrarily selected) limit of 16.
>>
>> The basic idea is to break out members of nfs_pageio_descriptor that cannot
>> be shared between mirrored DSes and put them in a new structure.
>>
>> Signed-off-by: Weston Andros Adamson <[email protected]>
>> ---
>> fs/nfs/direct.c | 17 ++-
>> fs/nfs/internal.h | 1 +
>> fs/nfs/objlayout/objio_osd.c | 3 +-
>> fs/nfs/pagelist.c | 270 +++++++++++++++++++++++++++++++++++--------
>> fs/nfs/pnfs.c | 26 +++--
>> fs/nfs/read.c | 30 ++++-
>> fs/nfs/write.c | 10 +-
>> include/linux/nfs_page.h | 20 +++-
>> include/linux/nfs_xdr.h | 1 +
>> 9 files changed, 311 insertions(+), 67 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index 1ee41d7..0178d4f 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -360,8 +360,14 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
>> spin_lock(&dreq->lock);
>> if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
>> dreq->error = hdr->error;
>> - else
>> - dreq->count += hdr->good_bytes;
>> + else {
>> + /*
>> + * FIXME: right now this only accounts for bytes written
>> + * to the first mirror
>> + */
>> + if (hdr->pgio_mirror_idx == 0)
>> + dreq->count += hdr->good_bytes;
>> + }
>> spin_unlock(&dreq->lock);
>>
>> while (!list_empty(&hdr->pages)) {
>> @@ -724,7 +730,12 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>> dreq->error = hdr->error;
>> }
>> if (dreq->error == 0) {
>> - dreq->count += hdr->good_bytes;
>> + /*
>> + * FIXME: right now this only accounts for bytes written
>> + * to the first mirror
>> + */
>> + if (hdr->pgio_mirror_idx == 0)
>> + dreq->count += hdr->good_bytes;
>> if (nfs_write_need_commit(hdr)) {
>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>> request_commit = true;
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index 05f9a87..ef1c703 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -469,6 +469,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
>> struct nfs_direct_req *dreq);
>> int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
>> bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
>> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
>>
>> #ifdef CONFIG_MIGRATION
>> extern int nfs_migrate_page(struct address_space *,
>> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
>> index d007780..9a5f2ee 100644
>> --- a/fs/nfs/objlayout/objio_osd.c
>> +++ b/fs/nfs/objlayout/objio_osd.c
>> @@ -537,11 +537,12 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
>> static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
>> struct nfs_page *prev, struct nfs_page *req)
>> {
>> + struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
>> unsigned int size;
>>
>> size = pnfs_generic_pg_test(pgio, prev, req);
>>
>> - if (!size || pgio->pg_count + req->wb_bytes >
>> + if (!size || mirror->pg_count + req->wb_bytes >
>> (unsigned long)pgio->pg_layout_private)
>> return 0;
>>
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index 1c03187..eec12b7 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -46,17 +46,22 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
>> struct nfs_pgio_header *hdr,
>> void (*release)(struct nfs_pgio_header *hdr))
>> {
>> - hdr->req = nfs_list_entry(desc->pg_list.next);
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> +
>> + hdr->req = nfs_list_entry(mirror->pg_list.next);
>> hdr->inode = desc->pg_inode;
>> hdr->cred = hdr->req->wb_context->cred;
>> hdr->io_start = req_offset(hdr->req);
>> - hdr->good_bytes = desc->pg_count;
>> + hdr->good_bytes = mirror->pg_count;
>> hdr->dreq = desc->pg_dreq;
>> hdr->layout_private = desc->pg_layout_private;
>> hdr->release = release;
>> hdr->completion_ops = desc->pg_completion_ops;
>> if (hdr->completion_ops->init_hdr)
>> hdr->completion_ops->init_hdr(hdr);
>> +
>> + hdr->pgio_mirror_idx = desc->pg_mirror_idx;
>> }
>> EXPORT_SYMBOL_GPL(nfs_pgheader_init);
>>
>> @@ -480,7 +485,10 @@ nfs_wait_on_request(struct nfs_page *req)
>> size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>> struct nfs_page *prev, struct nfs_page *req)
>> {
>> - if (desc->pg_count > desc->pg_bsize) {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> +
>> + if (mirror->pg_count > mirror->pg_bsize) {
>> /* should never happen */
>> WARN_ON_ONCE(1);
>> return 0;
>> @@ -490,11 +498,11 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>> * Limit the request size so that we can still allocate a page array
>> * for it without upsetting the slab allocator.
>> */
>> - if (((desc->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
>> + if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
>> sizeof(struct page) > PAGE_SIZE)
>> return 0;
>>
>> - return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
>> + return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
>> }
>> EXPORT_SYMBOL_GPL(nfs_generic_pg_test);
>>
>> @@ -651,10 +659,18 @@ EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
>> static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
>> struct nfs_pgio_header *hdr)
>> {
>> + struct nfs_pgio_mirror *mirror;
>> + u32 midx;
>> +
>> set_bit(NFS_IOHDR_REDO, &hdr->flags);
>> nfs_pgio_data_destroy(hdr);
>> hdr->completion_ops->completion(hdr);
>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>> + /* TODO: Make sure it's right to clean up all mirrors here
>> + * and not just hdr->pgio_mirror_idx */
>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>> + mirror = &desc->pg_mirrors[midx];
>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>> + }
>> return -ENOMEM;
>> }
>>
>> @@ -671,6 +687,17 @@ static void nfs_pgio_release(void *calldata)
>> hdr->completion_ops->completion(hdr);
>> }
>>
>> +static void nfs_pageio_mirror_init(struct nfs_pgio_mirror *mirror,
>> + unsigned int bsize)
>> +{
>> + INIT_LIST_HEAD(&mirror->pg_list);
>> + mirror->pg_bytes_written = 0;
>> + mirror->pg_count = 0;
>> + mirror->pg_bsize = bsize;
>> + mirror->pg_base = 0;
>> + mirror->pg_recoalesce = 0;
>> +}
>> +
>> /**
>> * nfs_pageio_init - initialise a page io descriptor
>> * @desc: pointer to descriptor
>> @@ -687,13 +714,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>> size_t bsize,
>> int io_flags)
>> {
>> - INIT_LIST_HEAD(&desc->pg_list);
>> - desc->pg_bytes_written = 0;
>> - desc->pg_count = 0;
>> - desc->pg_bsize = bsize;
>> - desc->pg_base = 0;
>> + struct nfs_pgio_mirror *new;
>> + int i;
>> +
>> desc->pg_moreio = 0;
>> - desc->pg_recoalesce = 0;
>> desc->pg_inode = inode;
>> desc->pg_ops = pg_ops;
>> desc->pg_completion_ops = compl_ops;
>> @@ -703,6 +727,26 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>> desc->pg_lseg = NULL;
>> desc->pg_dreq = NULL;
>> desc->pg_layout_private = NULL;
>> + desc->pg_bsize = bsize;
>> +
>> + desc->pg_mirror_count = 1;
>> + desc->pg_mirror_idx = 0;
>> +
>> + if (pg_ops->pg_get_mirror_count) {
>> + /* until we have a request, we don't have an lseg and no
>> + * idea how many mirrors there will be */
>> + new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
>> + sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
>> + desc->pg_mirrors_dynamic = new;
>> + desc->pg_mirrors = new;
>> +
>> + for (i = 0; i < NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX; i++)
>> + nfs_pageio_mirror_init(&desc->pg_mirrors[i], bsize);
>> + } else {
>> + desc->pg_mirrors_dynamic = NULL;
>> + desc->pg_mirrors = desc->pg_mirrors_static;
>> + nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
>> + }
>> }
>> EXPORT_SYMBOL_GPL(nfs_pageio_init);
>>
>> @@ -738,14 +782,16 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
>> int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>> struct nfs_pgio_header *hdr)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> struct nfs_page *req;
>> struct page **pages,
>> *last_page;
>> - struct list_head *head = &desc->pg_list;
>> + struct list_head *head = &mirror->pg_list;
>> struct nfs_commit_info cinfo;
>> unsigned int pagecount, pageused;
>>
>> - pagecount = nfs_page_array_len(desc->pg_base, desc->pg_count);
>> + pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
>> if (!nfs_pgarray_set(&hdr->page_array, pagecount))
>> return nfs_pgio_error(desc, hdr);
>>
>> @@ -773,7 +819,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>> desc->pg_ioflags &= ~FLUSH_COND_STABLE;
>>
>> /* Set up the argument struct */
>> - nfs_pgio_rpcsetup(hdr, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
>> + nfs_pgio_rpcsetup(hdr, mirror->pg_count, 0, desc->pg_ioflags, &cinfo);
>> desc->pg_rpc_callops = &nfs_pgio_common_ops;
>> return 0;
>> }
>> @@ -781,12 +827,17 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
>>
>> static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>> {
>> + struct nfs_pgio_mirror *mirror;
>> struct nfs_pgio_header *hdr;
>> int ret;
>>
>> + mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>> if (!hdr) {
>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>> + /* TODO: make sure this is right with mirroring - or
>> + * should it back out all mirrors? */
>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>> return -ENOMEM;
>> }
>> nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
>> @@ -801,6 +852,49 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>> return ret;
>> }
>>
>> +/*
>> + * nfs_pageio_setup_mirroring - determine if mirroring is to be used
>> + * by calling the pg_get_mirror_count op
>> + */
>> +static int nfs_pageio_setup_mirroring(struct nfs_pageio_descriptor *pgio,
>> + struct nfs_page *req)
>> +{
>> + int mirror_count = 1;
>> +
>> + if (!pgio->pg_ops->pg_get_mirror_count)
>> + return 0;
>> +
>> + mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
>> +
>> + if (!mirror_count || mirror_count > NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX)
>> + return -EINVAL;
>> +
>> + if (WARN_ON_ONCE(!pgio->pg_mirrors_dynamic))
>> + return -EINVAL;
>> +
>> + pgio->pg_mirror_count = mirror_count;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * nfs_pageio_stop_mirroring - stop using mirroring (set mirror count to 1)
>> + */
>> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio)
>> +{
>> + pgio->pg_mirror_count = 1;
>> + pgio->pg_mirror_idx = 0;
>> +}
>> +
>> +static void nfs_pageio_cleanup_mirroring(struct nfs_pageio_descriptor *pgio)
>> +{
>> + pgio->pg_mirror_count = 1;
>> + pgio->pg_mirror_idx = 0;
>> + pgio->pg_mirrors = pgio->pg_mirrors_static;
>> + kfree(pgio->pg_mirrors_dynamic);
>> + pgio->pg_mirrors_dynamic = NULL;
>> +}
>> +
>> static bool nfs_match_open_context(const struct nfs_open_context *ctx1,
>> const struct nfs_open_context *ctx2)
>> {
>> @@ -867,19 +961,22 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
>> static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>> struct nfs_page *req)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> struct nfs_page *prev = NULL;
>> - if (desc->pg_count != 0) {
>> - prev = nfs_list_entry(desc->pg_list.prev);
>> +
>> + if (mirror->pg_count != 0) {
>> + prev = nfs_list_entry(mirror->pg_list.prev);
>> } else {
>> if (desc->pg_ops->pg_init)
>> desc->pg_ops->pg_init(desc, req);
>> - desc->pg_base = req->wb_pgbase;
>> + mirror->pg_base = req->wb_pgbase;
>> }
>> if (!nfs_can_coalesce_requests(prev, req, desc))
>> return 0;
>> nfs_list_remove_request(req);
>> - nfs_list_add_request(req, &desc->pg_list);
>> - desc->pg_count += req->wb_bytes;
>> + nfs_list_add_request(req, &mirror->pg_list);
>> + mirror->pg_count += req->wb_bytes;
>> return 1;
>> }
>>
>> @@ -888,16 +985,19 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>> */
>> static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>> {
>> - if (!list_empty(&desc->pg_list)) {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> +
>> + if (!list_empty(&mirror->pg_list)) {
>> int error = desc->pg_ops->pg_doio(desc);
>> if (error < 0)
>> desc->pg_error = error;
>> else
>> - desc->pg_bytes_written += desc->pg_count;
>> + mirror->pg_bytes_written += mirror->pg_count;
>> }
>> - if (list_empty(&desc->pg_list)) {
>> - desc->pg_count = 0;
>> - desc->pg_base = 0;
>> + if (list_empty(&mirror->pg_list)) {
>> + mirror->pg_count = 0;
>> + mirror->pg_base = 0;
>> }
>> }
>>
>> @@ -915,10 +1015,14 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> struct nfs_page *req)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> struct nfs_page *subreq;
>> unsigned int bytes_left = 0;
>> unsigned int offset, pgbase;
>>
>> + WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
>> +
>> nfs_page_group_lock(req, false);
>>
>> subreq = req;
>> @@ -938,7 +1042,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> nfs_pageio_doio(desc);
>> if (desc->pg_error < 0)
>> return 0;
>> - if (desc->pg_recoalesce)
>> + if (mirror->pg_recoalesce)
>> return 0;
>> /* retry add_request for this subreq */
>> nfs_page_group_lock(req, false);
>> @@ -976,14 +1080,16 @@ err_ptr:
>>
>> static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> LIST_HEAD(head);
>>
>> do {
>> - list_splice_init(&desc->pg_list, &head);
>> - desc->pg_bytes_written -= desc->pg_count;
>> - desc->pg_count = 0;
>> - desc->pg_base = 0;
>> - desc->pg_recoalesce = 0;
>> + list_splice_init(&mirror->pg_list, &head);
>> + mirror->pg_bytes_written -= mirror->pg_count;
>> + mirror->pg_count = 0;
>> + mirror->pg_base = 0;
>> + mirror->pg_recoalesce = 0;
>> +
>> desc->pg_moreio = 0;
>>
>> while (!list_empty(&head)) {
>> @@ -997,11 +1103,11 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
>> return 0;
>> break;
>> }
>> - } while (desc->pg_recoalesce);
>> + } while (mirror->pg_recoalesce);
>> return 1;
>> }
>>
>> -int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> +static int nfs_pageio_add_request_mirror(struct nfs_pageio_descriptor *desc,
>> struct nfs_page *req)
>> {
>> int ret;
>> @@ -1014,9 +1120,78 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> break;
>> ret = nfs_do_recoalesce(desc);
>> } while (ret);
>> +
>> return ret;
>> }
>>
>> +int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> + struct nfs_page *req)
>> +{
>> + u32 midx;
>> + unsigned int pgbase, offset, bytes;
>> + struct nfs_page *dupreq, *lastreq;
>> +
>> + pgbase = req->wb_pgbase;
>> + offset = req->wb_offset;
>> + bytes = req->wb_bytes;
>> +
>> + nfs_pageio_setup_mirroring(desc, req);
>> +
>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>> + if (midx) {
>> + nfs_page_group_lock(req, false);
>> +
>> + /* find the last request */
>> + for (lastreq = req->wb_head;
>> + lastreq->wb_this_page != req->wb_head;
>> + lastreq = lastreq->wb_this_page)
>> + ;
>> +
>> + dupreq = nfs_create_request(req->wb_context,
>> + req->wb_page, lastreq, pgbase, bytes);
>> +
>> + if (IS_ERR(dupreq)) {
>> + nfs_page_group_unlock(req);
>> + return 0;
>> + }
>> +
>> + nfs_lock_request(dupreq);
>> + nfs_page_group_unlock(req);
>> + dupreq->wb_offset = offset;
>> + dupreq->wb_index = req->wb_index;
>> + } else
>> + dupreq = req;
>> +
>> + desc->pg_mirror_idx = midx;
>> + if (!nfs_pageio_add_request_mirror(desc, dupreq))
>> + return 0;
>> + }
>> +
>> + return 1;
>> +}
>> +
>> +/*
>> + * nfs_pageio_complete_mirror - Complete I/O on the current mirror of an
>> + * nfs_pageio_descriptor
>> + * @desc: pointer to io descriptor
>> + */
>> +static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
>> + u32 mirror_idx)
>> +{
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
>> + u32 restore_idx = desc->pg_mirror_idx;
>> +
>> + desc->pg_mirror_idx = mirror_idx;
>> + for (;;) {
>> + nfs_pageio_doio(desc);
>> + if (!mirror->pg_recoalesce)
>> + break;
>> + if (!nfs_do_recoalesce(desc))
>> + break;
>> + }
>> + desc->pg_mirror_idx = restore_idx;
>> +}
>> +
>> /*
>> * nfs_pageio_resend - Transfer requests to new descriptor and resend
>> * @hdr - the pgio header to move request from
>> @@ -1055,16 +1230,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_resend);
>> */
>> void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>> {
>> - for (;;) {
>> - nfs_pageio_doio(desc);
>> - if (!desc->pg_recoalesce)
>> - break;
>> - if (!nfs_do_recoalesce(desc))
>> - break;
>> - }
>> + u32 midx;
>> +
>> + for (midx = 0; midx < desc->pg_mirror_count; midx++)
>> + nfs_pageio_complete_mirror(desc, midx);
>>
>> if (desc->pg_ops->pg_cleanup)
>> desc->pg_ops->pg_cleanup(desc);
>> + nfs_pageio_cleanup_mirroring(desc);
>> }
>>
>> /**
>> @@ -1080,10 +1253,17 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>> */
>> void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>> {
>> - if (!list_empty(&desc->pg_list)) {
>> - struct nfs_page *prev = nfs_list_entry(desc->pg_list.prev);
>> - if (index != prev->wb_index + 1)
>> - nfs_pageio_complete(desc);
>> + struct nfs_pgio_mirror *mirror;
>> + struct nfs_page *prev;
>> + u32 midx;
>> +
>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>> + mirror = &desc->pg_mirrors[midx];
>> + if (!list_empty(&mirror->pg_list)) {
>> + prev = nfs_list_entry(mirror->pg_list.prev);
>> + if (index != prev->wb_index + 1)
>> + nfs_pageio_complete_mirror(desc, midx);
>> + }
>> }
>> }
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 2da2e77..5f7c422 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1646,8 +1646,8 @@ EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
>> * of bytes (maximum @req->wb_bytes) that can be coalesced.
>> */
>> size_t
>> -pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>> - struct nfs_page *req)
>> +pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
>> + struct nfs_page *prev, struct nfs_page *req)
>> {
>> unsigned int size;
>> u64 seg_end, req_start, seg_left;
>> @@ -1729,10 +1729,12 @@ static void
>> pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
>> struct nfs_pgio_header *hdr)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
>> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
>> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
>> nfs_pageio_reset_write_mds(desc);
>> - desc->pg_recoalesce = 1;
>> + mirror->pg_recoalesce = 1;
>> }
>> nfs_pgio_data_destroy(hdr);
>> }
>> @@ -1781,12 +1783,14 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
>> int
>> pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> struct nfs_pgio_header *hdr;
>> int ret;
>>
>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>> if (!hdr) {
>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>> return -ENOMEM;
>> }
>> nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
>> @@ -1795,6 +1799,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>> ret = nfs_generic_pgio(desc, hdr);
>> if (!ret)
>> pnfs_do_write(desc, hdr, desc->pg_ioflags);
>> +
>> return ret;
>> }
>> EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
>> @@ -1839,10 +1844,13 @@ static void
>> pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
>> struct nfs_pgio_header *hdr)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> +
>> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
>> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
>> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
>> nfs_pageio_reset_read_mds(desc);
>> - desc->pg_recoalesce = 1;
>> + mirror->pg_recoalesce = 1;
>> }
>> nfs_pgio_data_destroy(hdr);
>> }
>> @@ -1893,12 +1901,14 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
>> int
>> pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
>> {
>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>> +
>> struct nfs_pgio_header *hdr;
>> int ret;
>>
>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>> if (!hdr) {
>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>> return -ENOMEM;
>> }
>> nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> index 092ab49..568ecf0 100644
>> --- a/fs/nfs/read.c
>> +++ b/fs/nfs/read.c
>> @@ -70,8 +70,15 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
>>
>> void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
>> {
>> + struct nfs_pgio_mirror *mirror;
>> +
>> pgio->pg_ops = &nfs_pgio_rw_ops;
>> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
>> +
>> + /* read path should never have more than one mirror */
>> + WARN_ON_ONCE(pgio->pg_mirror_count != 1);
>> +
>> + mirror = &pgio->pg_mirrors[0];
>> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
>> }
>> EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
>>
>> @@ -81,6 +88,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>> struct nfs_page *new;
>> unsigned int len;
>> struct nfs_pageio_descriptor pgio;
>> + struct nfs_pgio_mirror *pgm;
>>
>> len = nfs_page_length(page);
>> if (len == 0)
>> @@ -97,7 +105,13 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>> &nfs_async_read_completion_ops);
>> nfs_pageio_add_request(&pgio, new);
>> nfs_pageio_complete(&pgio);
>> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
>> +
>> + /* It doesn't make sense to do mirrored reads! */
>> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
>> +
>> + pgm = &pgio.pg_mirrors[0];
>> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
>> +
>> return 0;
>> }
>>
>> @@ -352,6 +366,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
>> struct list_head *pages, unsigned nr_pages)
>> {
>> struct nfs_pageio_descriptor pgio;
>> + struct nfs_pgio_mirror *pgm;
>> struct nfs_readdesc desc = {
>> .pgio = &pgio,
>> };
>> @@ -387,10 +402,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
>> &nfs_async_read_completion_ops);
>>
>> ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
>> -
>> nfs_pageio_complete(&pgio);
>> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
>> - npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
>> +
>> + /* It doesn't make sense to do mirrored reads! */
>> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
>> +
>> + pgm = &pgio.pg_mirrors[0];
>> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
>> + npages = (pgm->pg_bytes_written + PAGE_CACHE_SIZE - 1) >>
>> + PAGE_CACHE_SHIFT;
>> nfs_add_stats(inode, NFSIOS_READPAGES, npages);
>> read_complete:
>> put_nfs_open_context(desc.ctx);
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index db802d9..2f6ee8e 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -906,7 +906,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>> if (nfs_write_need_commit(hdr)) {
>> memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
>> nfs_mark_request_commit(req, hdr->lseg, &cinfo,
>> - 0);
>> + hdr->pgio_mirror_idx);
>> goto next;
>> }
>> remove_req:
>> @@ -1305,8 +1305,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_write);
>>
>> void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
>> {
>> + struct nfs_pgio_mirror *mirror;
>> +
>> pgio->pg_ops = &nfs_pgio_rw_ops;
>> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
>> +
>> + nfs_pageio_stop_mirroring(pgio);
>> +
>> + mirror = &pgio->pg_mirrors[0];
>> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
>> }
>> EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
>>
>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>> index 479c566..3eb072d 100644
>> --- a/include/linux/nfs_page.h
>> +++ b/include/linux/nfs_page.h
>> @@ -58,6 +58,8 @@ struct nfs_pageio_ops {
>> size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
>> struct nfs_page *);
>> int (*pg_doio)(struct nfs_pageio_descriptor *);
>> + unsigned int (*pg_get_mirror_count)(struct nfs_pageio_descriptor *,
>> + struct nfs_page *);
>> void (*pg_cleanup)(struct nfs_pageio_descriptor *);
>> };
>>
>> @@ -74,15 +76,17 @@ struct nfs_rw_ops {
>> struct rpc_task_setup *, int);
>> };
>>
>> -struct nfs_pageio_descriptor {
>> +struct nfs_pgio_mirror {
>> struct list_head pg_list;
>> unsigned long pg_bytes_written;
>> size_t pg_count;
>> size_t pg_bsize;
>> unsigned int pg_base;
>> - unsigned char pg_moreio : 1,
>> - pg_recoalesce : 1;
>> + unsigned char pg_recoalesce : 1;
>> +};
>>
>> +struct nfs_pageio_descriptor {
>> + unsigned char pg_moreio : 1;
>> struct inode *pg_inode;
>> const struct nfs_pageio_ops *pg_ops;
>> const struct nfs_rw_ops *pg_rw_ops;
>> @@ -93,8 +97,18 @@ struct nfs_pageio_descriptor {
>> struct pnfs_layout_segment *pg_lseg;
>> struct nfs_direct_req *pg_dreq;
>> void *pg_layout_private;
>> + unsigned int pg_bsize; /* default bsize for mirrors */
>> +
>> + u32 pg_mirror_count;
>> + struct nfs_pgio_mirror *pg_mirrors;
>> + struct nfs_pgio_mirror pg_mirrors_static[1];
>> + struct nfs_pgio_mirror *pg_mirrors_dynamic;
>> + u32 pg_mirror_idx; /* current mirror */
>> };
>>
>> +/* arbitrarily selected limit to number of mirrors */
>> +#define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
>> +
>> #define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
>>
>> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 5bc99f0..6400a1e 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1329,6 +1329,7 @@ struct nfs_pgio_header {
>> struct nfs_page_array page_array;
>> struct nfs_client *ds_clp; /* pNFS data server */
>> int ds_commit_idx; /* ds index if ds_clp is set */
>> + int pgio_mirror_idx;/* mirror index in pgio layer */
>> };
>>
>> struct nfs_mds_commit_info {
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2015 01:27 PM, Weston Andros Adamson wrote:
> These issues are addressed and the comments are removed in subsequent patches
> from the same series.
>
> Instead of having one huge patch that implements all of mirroring, I chose split
> it out into smaller patches. These notes were useful in making sure that the issues
> were addressed and should be useful as a guide to someone bisecting, etc.
Got it. I'm still working my way through these patches, so I haven't seen the ones that remove the comments yet.
Thanks!
Anna
>
> -dros
>
>
>> On Jan 6, 2015, at 1:11 PM, Anna Schumaker <[email protected]> wrote:
>>
>> Hey Dros and Tom,
>>
>> I see you're adding some new FIXME and TODOs in the comments. Is there a plan for addressing these eventually?
>>
>> Thanks,
>> Anna
>>
>> On 12/24/2014 02:13 AM, Tom Haynes wrote:
>>> From: Weston Andros Adamson <[email protected]>
>>>
>>> This patch adds mirrored write support to the pgio layer. The default
>>> is to use one mirror, but pgio callers may define callbacks to change
>>> this to any value up to the (arbitrarily selected) limit of 16.
>>>
>>> The basic idea is to break out members of nfs_pageio_descriptor that cannot
>>> be shared between mirrored DSes and put them in a new structure.
>>>
>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>> ---
>>> fs/nfs/direct.c | 17 ++-
>>> fs/nfs/internal.h | 1 +
>>> fs/nfs/objlayout/objio_osd.c | 3 +-
>>> fs/nfs/pagelist.c | 270 +++++++++++++++++++++++++++++++++++--------
>>> fs/nfs/pnfs.c | 26 +++--
>>> fs/nfs/read.c | 30 ++++-
>>> fs/nfs/write.c | 10 +-
>>> include/linux/nfs_page.h | 20 +++-
>>> include/linux/nfs_xdr.h | 1 +
>>> 9 files changed, 311 insertions(+), 67 deletions(-)
>>>
>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>> index 1ee41d7..0178d4f 100644
>>> --- a/fs/nfs/direct.c
>>> +++ b/fs/nfs/direct.c
>>> @@ -360,8 +360,14 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
>>> spin_lock(&dreq->lock);
>>> if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
>>> dreq->error = hdr->error;
>>> - else
>>> - dreq->count += hdr->good_bytes;
>>> + else {
>>> + /*
>>> + * FIXME: right now this only accounts for bytes written
>>> + * to the first mirror
>>> + */
>>> + if (hdr->pgio_mirror_idx == 0)
>>> + dreq->count += hdr->good_bytes;
>>> + }
>>> spin_unlock(&dreq->lock);
>>>
>>> while (!list_empty(&hdr->pages)) {
>>> @@ -724,7 +730,12 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>> dreq->error = hdr->error;
>>> }
>>> if (dreq->error == 0) {
>>> - dreq->count += hdr->good_bytes;
>>> + /*
>>> + * FIXME: right now this only accounts for bytes written
>>> + * to the first mirror
>>> + */
>>> + if (hdr->pgio_mirror_idx == 0)
>>> + dreq->count += hdr->good_bytes;
>>> if (nfs_write_need_commit(hdr)) {
>>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>>> request_commit = true;
>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>> index 05f9a87..ef1c703 100644
>>> --- a/fs/nfs/internal.h
>>> +++ b/fs/nfs/internal.h
>>> @@ -469,6 +469,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
>>> struct nfs_direct_req *dreq);
>>> int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
>>> bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
>>> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
>>>
>>> #ifdef CONFIG_MIGRATION
>>> extern int nfs_migrate_page(struct address_space *,
>>> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
>>> index d007780..9a5f2ee 100644
>>> --- a/fs/nfs/objlayout/objio_osd.c
>>> +++ b/fs/nfs/objlayout/objio_osd.c
>>> @@ -537,11 +537,12 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
>>> static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
>>> struct nfs_page *prev, struct nfs_page *req)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
>>> unsigned int size;
>>>
>>> size = pnfs_generic_pg_test(pgio, prev, req);
>>>
>>> - if (!size || pgio->pg_count + req->wb_bytes >
>>> + if (!size || mirror->pg_count + req->wb_bytes >
>>> (unsigned long)pgio->pg_layout_private)
>>> return 0;
>>>
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index 1c03187..eec12b7 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -46,17 +46,22 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
>>> struct nfs_pgio_header *hdr,
>>> void (*release)(struct nfs_pgio_header *hdr))
>>> {
>>> - hdr->req = nfs_list_entry(desc->pg_list.next);
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> +
>>> + hdr->req = nfs_list_entry(mirror->pg_list.next);
>>> hdr->inode = desc->pg_inode;
>>> hdr->cred = hdr->req->wb_context->cred;
>>> hdr->io_start = req_offset(hdr->req);
>>> - hdr->good_bytes = desc->pg_count;
>>> + hdr->good_bytes = mirror->pg_count;
>>> hdr->dreq = desc->pg_dreq;
>>> hdr->layout_private = desc->pg_layout_private;
>>> hdr->release = release;
>>> hdr->completion_ops = desc->pg_completion_ops;
>>> if (hdr->completion_ops->init_hdr)
>>> hdr->completion_ops->init_hdr(hdr);
>>> +
>>> + hdr->pgio_mirror_idx = desc->pg_mirror_idx;
>>> }
>>> EXPORT_SYMBOL_GPL(nfs_pgheader_init);
>>>
>>> @@ -480,7 +485,10 @@ nfs_wait_on_request(struct nfs_page *req)
>>> size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>>> struct nfs_page *prev, struct nfs_page *req)
>>> {
>>> - if (desc->pg_count > desc->pg_bsize) {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> +
>>> + if (mirror->pg_count > mirror->pg_bsize) {
>>> /* should never happen */
>>> WARN_ON_ONCE(1);
>>> return 0;
>>> @@ -490,11 +498,11 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>>> * Limit the request size so that we can still allocate a page array
>>> * for it without upsetting the slab allocator.
>>> */
>>> - if (((desc->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
>>> + if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
>>> sizeof(struct page) > PAGE_SIZE)
>>> return 0;
>>>
>>> - return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
>>> + return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
>>> }
>>> EXPORT_SYMBOL_GPL(nfs_generic_pg_test);
>>>
>>> @@ -651,10 +659,18 @@ EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
>>> static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
>>> struct nfs_pgio_header *hdr)
>>> {
>>> + struct nfs_pgio_mirror *mirror;
>>> + u32 midx;
>>> +
>>> set_bit(NFS_IOHDR_REDO, &hdr->flags);
>>> nfs_pgio_data_destroy(hdr);
>>> hdr->completion_ops->completion(hdr);
>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>> + /* TODO: Make sure it's right to clean up all mirrors here
>>> + * and not just hdr->pgio_mirror_idx */
>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>>> + mirror = &desc->pg_mirrors[midx];
>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>> + }
>>> return -ENOMEM;
>>> }
>>>
>>> @@ -671,6 +687,17 @@ static void nfs_pgio_release(void *calldata)
>>> hdr->completion_ops->completion(hdr);
>>> }
>>>
>>> +static void nfs_pageio_mirror_init(struct nfs_pgio_mirror *mirror,
>>> + unsigned int bsize)
>>> +{
>>> + INIT_LIST_HEAD(&mirror->pg_list);
>>> + mirror->pg_bytes_written = 0;
>>> + mirror->pg_count = 0;
>>> + mirror->pg_bsize = bsize;
>>> + mirror->pg_base = 0;
>>> + mirror->pg_recoalesce = 0;
>>> +}
>>> +
>>> /**
>>> * nfs_pageio_init - initialise a page io descriptor
>>> * @desc: pointer to descriptor
>>> @@ -687,13 +714,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>> size_t bsize,
>>> int io_flags)
>>> {
>>> - INIT_LIST_HEAD(&desc->pg_list);
>>> - desc->pg_bytes_written = 0;
>>> - desc->pg_count = 0;
>>> - desc->pg_bsize = bsize;
>>> - desc->pg_base = 0;
>>> + struct nfs_pgio_mirror *new;
>>> + int i;
>>> +
>>> desc->pg_moreio = 0;
>>> - desc->pg_recoalesce = 0;
>>> desc->pg_inode = inode;
>>> desc->pg_ops = pg_ops;
>>> desc->pg_completion_ops = compl_ops;
>>> @@ -703,6 +727,26 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>> desc->pg_lseg = NULL;
>>> desc->pg_dreq = NULL;
>>> desc->pg_layout_private = NULL;
>>> + desc->pg_bsize = bsize;
>>> +
>>> + desc->pg_mirror_count = 1;
>>> + desc->pg_mirror_idx = 0;
>>> +
>>> + if (pg_ops->pg_get_mirror_count) {
>>> + /* until we have a request, we don't have an lseg and no
>>> + * idea how many mirrors there will be */
>>> + new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
>>> + sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
>>> + desc->pg_mirrors_dynamic = new;
>>> + desc->pg_mirrors = new;
>>> +
>>> + for (i = 0; i < NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX; i++)
>>> + nfs_pageio_mirror_init(&desc->pg_mirrors[i], bsize);
>>> + } else {
>>> + desc->pg_mirrors_dynamic = NULL;
>>> + desc->pg_mirrors = desc->pg_mirrors_static;
>>> + nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
>>> + }
>>> }
>>> EXPORT_SYMBOL_GPL(nfs_pageio_init);
>>>
>>> @@ -738,14 +782,16 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
>>> int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>>> struct nfs_pgio_header *hdr)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> struct nfs_page *req;
>>> struct page **pages,
>>> *last_page;
>>> - struct list_head *head = &desc->pg_list;
>>> + struct list_head *head = &mirror->pg_list;
>>> struct nfs_commit_info cinfo;
>>> unsigned int pagecount, pageused;
>>>
>>> - pagecount = nfs_page_array_len(desc->pg_base, desc->pg_count);
>>> + pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
>>> if (!nfs_pgarray_set(&hdr->page_array, pagecount))
>>> return nfs_pgio_error(desc, hdr);
>>>
>>> @@ -773,7 +819,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>>> desc->pg_ioflags &= ~FLUSH_COND_STABLE;
>>>
>>> /* Set up the argument struct */
>>> - nfs_pgio_rpcsetup(hdr, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
>>> + nfs_pgio_rpcsetup(hdr, mirror->pg_count, 0, desc->pg_ioflags, &cinfo);
>>> desc->pg_rpc_callops = &nfs_pgio_common_ops;
>>> return 0;
>>> }
>>> @@ -781,12 +827,17 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
>>>
>>> static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>>> {
>>> + struct nfs_pgio_mirror *mirror;
>>> struct nfs_pgio_header *hdr;
>>> int ret;
>>>
>>> + mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>>> if (!hdr) {
>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>> + /* TODO: make sure this is right with mirroring - or
>>> + * should it back out all mirrors? */
>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>> return -ENOMEM;
>>> }
>>> nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
>>> @@ -801,6 +852,49 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>>> return ret;
>>> }
>>>
>>> +/*
>>> + * nfs_pageio_setup_mirroring - determine if mirroring is to be used
>>> + * by calling the pg_get_mirror_count op
>>> + */
>>> +static int nfs_pageio_setup_mirroring(struct nfs_pageio_descriptor *pgio,
>>> + struct nfs_page *req)
>>> +{
>>> + int mirror_count = 1;
>>> +
>>> + if (!pgio->pg_ops->pg_get_mirror_count)
>>> + return 0;
>>> +
>>> + mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
>>> +
>>> + if (!mirror_count || mirror_count > NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX)
>>> + return -EINVAL;
>>> +
>>> + if (WARN_ON_ONCE(!pgio->pg_mirrors_dynamic))
>>> + return -EINVAL;
>>> +
>>> + pgio->pg_mirror_count = mirror_count;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/*
>>> + * nfs_pageio_stop_mirroring - stop using mirroring (set mirror count to 1)
>>> + */
>>> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio)
>>> +{
>>> + pgio->pg_mirror_count = 1;
>>> + pgio->pg_mirror_idx = 0;
>>> +}
>>> +
>>> +static void nfs_pageio_cleanup_mirroring(struct nfs_pageio_descriptor *pgio)
>>> +{
>>> + pgio->pg_mirror_count = 1;
>>> + pgio->pg_mirror_idx = 0;
>>> + pgio->pg_mirrors = pgio->pg_mirrors_static;
>>> + kfree(pgio->pg_mirrors_dynamic);
>>> + pgio->pg_mirrors_dynamic = NULL;
>>> +}
>>> +
>>> static bool nfs_match_open_context(const struct nfs_open_context *ctx1,
>>> const struct nfs_open_context *ctx2)
>>> {
>>> @@ -867,19 +961,22 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
>>> static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>>> struct nfs_page *req)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> struct nfs_page *prev = NULL;
>>> - if (desc->pg_count != 0) {
>>> - prev = nfs_list_entry(desc->pg_list.prev);
>>> +
>>> + if (mirror->pg_count != 0) {
>>> + prev = nfs_list_entry(mirror->pg_list.prev);
>>> } else {
>>> if (desc->pg_ops->pg_init)
>>> desc->pg_ops->pg_init(desc, req);
>>> - desc->pg_base = req->wb_pgbase;
>>> + mirror->pg_base = req->wb_pgbase;
>>> }
>>> if (!nfs_can_coalesce_requests(prev, req, desc))
>>> return 0;
>>> nfs_list_remove_request(req);
>>> - nfs_list_add_request(req, &desc->pg_list);
>>> - desc->pg_count += req->wb_bytes;
>>> + nfs_list_add_request(req, &mirror->pg_list);
>>> + mirror->pg_count += req->wb_bytes;
>>> return 1;
>>> }
>>>
>>> @@ -888,16 +985,19 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>>> */
>>> static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>> {
>>> - if (!list_empty(&desc->pg_list)) {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> +
>>> + if (!list_empty(&mirror->pg_list)) {
>>> int error = desc->pg_ops->pg_doio(desc);
>>> if (error < 0)
>>> desc->pg_error = error;
>>> else
>>> - desc->pg_bytes_written += desc->pg_count;
>>> + mirror->pg_bytes_written += mirror->pg_count;
>>> }
>>> - if (list_empty(&desc->pg_list)) {
>>> - desc->pg_count = 0;
>>> - desc->pg_base = 0;
>>> + if (list_empty(&mirror->pg_list)) {
>>> + mirror->pg_count = 0;
>>> + mirror->pg_base = 0;
>>> }
>>> }
>>>
>>> @@ -915,10 +1015,14 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> struct nfs_page *req)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> struct nfs_page *subreq;
>>> unsigned int bytes_left = 0;
>>> unsigned int offset, pgbase;
>>>
>>> + WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
>>> +
>>> nfs_page_group_lock(req, false);
>>>
>>> subreq = req;
>>> @@ -938,7 +1042,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> nfs_pageio_doio(desc);
>>> if (desc->pg_error < 0)
>>> return 0;
>>> - if (desc->pg_recoalesce)
>>> + if (mirror->pg_recoalesce)
>>> return 0;
>>> /* retry add_request for this subreq */
>>> nfs_page_group_lock(req, false);
>>> @@ -976,14 +1080,16 @@ err_ptr:
>>>
>>> static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> LIST_HEAD(head);
>>>
>>> do {
>>> - list_splice_init(&desc->pg_list, &head);
>>> - desc->pg_bytes_written -= desc->pg_count;
>>> - desc->pg_count = 0;
>>> - desc->pg_base = 0;
>>> - desc->pg_recoalesce = 0;
>>> + list_splice_init(&mirror->pg_list, &head);
>>> + mirror->pg_bytes_written -= mirror->pg_count;
>>> + mirror->pg_count = 0;
>>> + mirror->pg_base = 0;
>>> + mirror->pg_recoalesce = 0;
>>> +
>>> desc->pg_moreio = 0;
>>>
>>> while (!list_empty(&head)) {
>>> @@ -997,11 +1103,11 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
>>> return 0;
>>> break;
>>> }
>>> - } while (desc->pg_recoalesce);
>>> + } while (mirror->pg_recoalesce);
>>> return 1;
>>> }
>>>
>>> -int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> +static int nfs_pageio_add_request_mirror(struct nfs_pageio_descriptor *desc,
>>> struct nfs_page *req)
>>> {
>>> int ret;
>>> @@ -1014,9 +1120,78 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> break;
>>> ret = nfs_do_recoalesce(desc);
>>> } while (ret);
>>> +
>>> return ret;
>>> }
>>>
>>> +int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> + struct nfs_page *req)
>>> +{
>>> + u32 midx;
>>> + unsigned int pgbase, offset, bytes;
>>> + struct nfs_page *dupreq, *lastreq;
>>> +
>>> + pgbase = req->wb_pgbase;
>>> + offset = req->wb_offset;
>>> + bytes = req->wb_bytes;
>>> +
>>> + nfs_pageio_setup_mirroring(desc, req);
>>> +
>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>>> + if (midx) {
>>> + nfs_page_group_lock(req, false);
>>> +
>>> + /* find the last request */
>>> + for (lastreq = req->wb_head;
>>> + lastreq->wb_this_page != req->wb_head;
>>> + lastreq = lastreq->wb_this_page)
>>> + ;
>>> +
>>> + dupreq = nfs_create_request(req->wb_context,
>>> + req->wb_page, lastreq, pgbase, bytes);
>>> +
>>> + if (IS_ERR(dupreq)) {
>>> + nfs_page_group_unlock(req);
>>> + return 0;
>>> + }
>>> +
>>> + nfs_lock_request(dupreq);
>>> + nfs_page_group_unlock(req);
>>> + dupreq->wb_offset = offset;
>>> + dupreq->wb_index = req->wb_index;
>>> + } else
>>> + dupreq = req;
>>> +
>>> + desc->pg_mirror_idx = midx;
>>> + if (!nfs_pageio_add_request_mirror(desc, dupreq))
>>> + return 0;
>>> + }
>>> +
>>> + return 1;
>>> +}
>>> +
>>> +/*
>>> + * nfs_pageio_complete_mirror - Complete I/O on the current mirror of an
>>> + * nfs_pageio_descriptor
>>> + * @desc: pointer to io descriptor
>>> + */
>>> +static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
>>> + u32 mirror_idx)
>>> +{
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
>>> + u32 restore_idx = desc->pg_mirror_idx;
>>> +
>>> + desc->pg_mirror_idx = mirror_idx;
>>> + for (;;) {
>>> + nfs_pageio_doio(desc);
>>> + if (!mirror->pg_recoalesce)
>>> + break;
>>> + if (!nfs_do_recoalesce(desc))
>>> + break;
>>> + }
>>> + desc->pg_mirror_idx = restore_idx;
>>> +}
>>> +
>>> /*
>>> * nfs_pageio_resend - Transfer requests to new descriptor and resend
>>> * @hdr - the pgio header to move request from
>>> @@ -1055,16 +1230,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_resend);
>>> */
>>> void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>>> {
>>> - for (;;) {
>>> - nfs_pageio_doio(desc);
>>> - if (!desc->pg_recoalesce)
>>> - break;
>>> - if (!nfs_do_recoalesce(desc))
>>> - break;
>>> - }
>>> + u32 midx;
>>> +
>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++)
>>> + nfs_pageio_complete_mirror(desc, midx);
>>>
>>> if (desc->pg_ops->pg_cleanup)
>>> desc->pg_ops->pg_cleanup(desc);
>>> + nfs_pageio_cleanup_mirroring(desc);
>>> }
>>>
>>> /**
>>> @@ -1080,10 +1253,17 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>>> */
>>> void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>>> {
>>> - if (!list_empty(&desc->pg_list)) {
>>> - struct nfs_page *prev = nfs_list_entry(desc->pg_list.prev);
>>> - if (index != prev->wb_index + 1)
>>> - nfs_pageio_complete(desc);
>>> + struct nfs_pgio_mirror *mirror;
>>> + struct nfs_page *prev;
>>> + u32 midx;
>>> +
>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>>> + mirror = &desc->pg_mirrors[midx];
>>> + if (!list_empty(&mirror->pg_list)) {
>>> + prev = nfs_list_entry(mirror->pg_list.prev);
>>> + if (index != prev->wb_index + 1)
>>> + nfs_pageio_complete_mirror(desc, midx);
>>> + }
>>> }
>>> }
>>>
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 2da2e77..5f7c422 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -1646,8 +1646,8 @@ EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
>>> * of bytes (maximum @req->wb_bytes) that can be coalesced.
>>> */
>>> size_t
>>> -pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>>> - struct nfs_page *req)
>>> +pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
>>> + struct nfs_page *prev, struct nfs_page *req)
>>> {
>>> unsigned int size;
>>> u64 seg_end, req_start, seg_left;
>>> @@ -1729,10 +1729,12 @@ static void
>>> pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
>>> struct nfs_pgio_header *hdr)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
>>> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
>>> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
>>> nfs_pageio_reset_write_mds(desc);
>>> - desc->pg_recoalesce = 1;
>>> + mirror->pg_recoalesce = 1;
>>> }
>>> nfs_pgio_data_destroy(hdr);
>>> }
>>> @@ -1781,12 +1783,14 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
>>> int
>>> pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> struct nfs_pgio_header *hdr;
>>> int ret;
>>>
>>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>>> if (!hdr) {
>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>> return -ENOMEM;
>>> }
>>> nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
>>> @@ -1795,6 +1799,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>>> ret = nfs_generic_pgio(desc, hdr);
>>> if (!ret)
>>> pnfs_do_write(desc, hdr, desc->pg_ioflags);
>>> +
>>> return ret;
>>> }
>>> EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
>>> @@ -1839,10 +1844,13 @@ static void
>>> pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
>>> struct nfs_pgio_header *hdr)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> +
>>> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
>>> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
>>> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
>>> nfs_pageio_reset_read_mds(desc);
>>> - desc->pg_recoalesce = 1;
>>> + mirror->pg_recoalesce = 1;
>>> }
>>> nfs_pgio_data_destroy(hdr);
>>> }
>>> @@ -1893,12 +1901,14 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
>>> int
>>> pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
>>> {
>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>> +
>>> struct nfs_pgio_header *hdr;
>>> int ret;
>>>
>>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>>> if (!hdr) {
>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>> return -ENOMEM;
>>> }
>>> nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
>>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>>> index 092ab49..568ecf0 100644
>>> --- a/fs/nfs/read.c
>>> +++ b/fs/nfs/read.c
>>> @@ -70,8 +70,15 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
>>>
>>> void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
>>> {
>>> + struct nfs_pgio_mirror *mirror;
>>> +
>>> pgio->pg_ops = &nfs_pgio_rw_ops;
>>> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
>>> +
>>> + /* read path should never have more than one mirror */
>>> + WARN_ON_ONCE(pgio->pg_mirror_count != 1);
>>> +
>>> + mirror = &pgio->pg_mirrors[0];
>>> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
>>> }
>>> EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
>>>
>>> @@ -81,6 +88,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>>> struct nfs_page *new;
>>> unsigned int len;
>>> struct nfs_pageio_descriptor pgio;
>>> + struct nfs_pgio_mirror *pgm;
>>>
>>> len = nfs_page_length(page);
>>> if (len == 0)
>>> @@ -97,7 +105,13 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>>> &nfs_async_read_completion_ops);
>>> nfs_pageio_add_request(&pgio, new);
>>> nfs_pageio_complete(&pgio);
>>> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
>>> +
>>> + /* It doesn't make sense to do mirrored reads! */
>>> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
>>> +
>>> + pgm = &pgio.pg_mirrors[0];
>>> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
>>> +
>>> return 0;
>>> }
>>>
>>> @@ -352,6 +366,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
>>> struct list_head *pages, unsigned nr_pages)
>>> {
>>> struct nfs_pageio_descriptor pgio;
>>> + struct nfs_pgio_mirror *pgm;
>>> struct nfs_readdesc desc = {
>>> .pgio = &pgio,
>>> };
>>> @@ -387,10 +402,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
>>> &nfs_async_read_completion_ops);
>>>
>>> ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
>>> -
>>> nfs_pageio_complete(&pgio);
>>> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
>>> - npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
>>> +
>>> + /* It doesn't make sense to do mirrored reads! */
>>> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
>>> +
>>> + pgm = &pgio.pg_mirrors[0];
>>> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
>>> + npages = (pgm->pg_bytes_written + PAGE_CACHE_SIZE - 1) >>
>>> + PAGE_CACHE_SHIFT;
>>> nfs_add_stats(inode, NFSIOS_READPAGES, npages);
>>> read_complete:
>>> put_nfs_open_context(desc.ctx);
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index db802d9..2f6ee8e 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -906,7 +906,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>>> if (nfs_write_need_commit(hdr)) {
>>> memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
>>> nfs_mark_request_commit(req, hdr->lseg, &cinfo,
>>> - 0);
>>> + hdr->pgio_mirror_idx);
>>> goto next;
>>> }
>>> remove_req:
>>> @@ -1305,8 +1305,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_write);
>>>
>>> void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
>>> {
>>> + struct nfs_pgio_mirror *mirror;
>>> +
>>> pgio->pg_ops = &nfs_pgio_rw_ops;
>>> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
>>> +
>>> + nfs_pageio_stop_mirroring(pgio);
>>> +
>>> + mirror = &pgio->pg_mirrors[0];
>>> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
>>> }
>>> EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
>>>
>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>> index 479c566..3eb072d 100644
>>> --- a/include/linux/nfs_page.h
>>> +++ b/include/linux/nfs_page.h
>>> @@ -58,6 +58,8 @@ struct nfs_pageio_ops {
>>> size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
>>> struct nfs_page *);
>>> int (*pg_doio)(struct nfs_pageio_descriptor *);
>>> + unsigned int (*pg_get_mirror_count)(struct nfs_pageio_descriptor *,
>>> + struct nfs_page *);
>>> void (*pg_cleanup)(struct nfs_pageio_descriptor *);
>>> };
>>>
>>> @@ -74,15 +76,17 @@ struct nfs_rw_ops {
>>> struct rpc_task_setup *, int);
>>> };
>>>
>>> -struct nfs_pageio_descriptor {
>>> +struct nfs_pgio_mirror {
>>> struct list_head pg_list;
>>> unsigned long pg_bytes_written;
>>> size_t pg_count;
>>> size_t pg_bsize;
>>> unsigned int pg_base;
>>> - unsigned char pg_moreio : 1,
>>> - pg_recoalesce : 1;
>>> + unsigned char pg_recoalesce : 1;
>>> +};
>>>
>>> +struct nfs_pageio_descriptor {
>>> + unsigned char pg_moreio : 1;
>>> struct inode *pg_inode;
>>> const struct nfs_pageio_ops *pg_ops;
>>> const struct nfs_rw_ops *pg_rw_ops;
>>> @@ -93,8 +97,18 @@ struct nfs_pageio_descriptor {
>>> struct pnfs_layout_segment *pg_lseg;
>>> struct nfs_direct_req *pg_dreq;
>>> void *pg_layout_private;
>>> + unsigned int pg_bsize; /* default bsize for mirrors */
>>> +
>>> + u32 pg_mirror_count;
>>> + struct nfs_pgio_mirror *pg_mirrors;
>>> + struct nfs_pgio_mirror pg_mirrors_static[1];
>>> + struct nfs_pgio_mirror *pg_mirrors_dynamic;
>>> + u32 pg_mirror_idx; /* current mirror */
>>> };
>>>
>>> +/* arbitrarily selected limit to number of mirrors */
>>> +#define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
>>> +
>>> #define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
>>>
>>> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>> index 5bc99f0..6400a1e 100644
>>> --- a/include/linux/nfs_xdr.h
>>> +++ b/include/linux/nfs_xdr.h
>>> @@ -1329,6 +1329,7 @@ struct nfs_pgio_header {
>>> struct nfs_page_array page_array;
>>> struct nfs_client *ds_clp; /* pNFS data server */
>>> int ds_commit_idx; /* ds index if ds_clp is set */
>>> + int pgio_mirror_idx;/* mirror index in pgio layer */
>>> };
>>>
>>> struct nfs_mds_commit_info {
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> On Jan 6, 2015, at 1:32 PM, Anna Schumaker <[email protected]> wrote:
>
> On 01/06/2015 01:27 PM, Weston Andros Adamson wrote:
>> These issues are addressed and the comments are removed in subsequent patches
>> from the same series.
>>
>> Instead of having one huge patch that implements all of mirroring, I chose split
>> it out into smaller patches. These notes were useful in making sure that the issues
>> were addressed and should be useful as a guide to someone bisecting, etc.
>
> Got it. I'm still working my way through these patches, so I haven't seen the ones that remove the comments yet.
>
> Thanks!
>>
Thanks for reviewing!
-dros
>>
>>
>>> On Jan 6, 2015, at 1:11 PM, Anna Schumaker <[email protected]> wrote:
>>>
>>> Hey Dros and Tom,
>>>
>>> I see you're adding some new FIXME and TODOs in the comments. Is there a plan for addressing these eventually?
>>>
>>> Thanks,
>>> Anna
>>>
>>> On 12/24/2014 02:13 AM, Tom Haynes wrote:
>>>> From: Weston Andros Adamson <[email protected]>
>>>>
>>>> This patch adds mirrored write support to the pgio layer. The default
>>>> is to use one mirror, but pgio callers may define callbacks to change
>>>> this to any value up to the (arbitrarily selected) limit of 16.
>>>>
>>>> The basic idea is to break out members of nfs_pageio_descriptor that cannot
>>>> be shared between mirrored DSes and put them in a new structure.
>>>>
>>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>>> ---
>>>> fs/nfs/direct.c | 17 ++-
>>>> fs/nfs/internal.h | 1 +
>>>> fs/nfs/objlayout/objio_osd.c | 3 +-
>>>> fs/nfs/pagelist.c | 270 +++++++++++++++++++++++++++++++++++--------
>>>> fs/nfs/pnfs.c | 26 +++--
>>>> fs/nfs/read.c | 30 ++++-
>>>> fs/nfs/write.c | 10 +-
>>>> include/linux/nfs_page.h | 20 +++-
>>>> include/linux/nfs_xdr.h | 1 +
>>>> 9 files changed, 311 insertions(+), 67 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>>> index 1ee41d7..0178d4f 100644
>>>> --- a/fs/nfs/direct.c
>>>> +++ b/fs/nfs/direct.c
>>>> @@ -360,8 +360,14 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
>>>> spin_lock(&dreq->lock);
>>>> if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
>>>> dreq->error = hdr->error;
>>>> - else
>>>> - dreq->count += hdr->good_bytes;
>>>> + else {
>>>> + /*
>>>> + * FIXME: right now this only accounts for bytes written
>>>> + * to the first mirror
>>>> + */
>>>> + if (hdr->pgio_mirror_idx == 0)
>>>> + dreq->count += hdr->good_bytes;
>>>> + }
>>>> spin_unlock(&dreq->lock);
>>>>
>>>> while (!list_empty(&hdr->pages)) {
>>>> @@ -724,7 +730,12 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>>> dreq->error = hdr->error;
>>>> }
>>>> if (dreq->error == 0) {
>>>> - dreq->count += hdr->good_bytes;
>>>> + /*
>>>> + * FIXME: right now this only accounts for bytes written
>>>> + * to the first mirror
>>>> + */
>>>> + if (hdr->pgio_mirror_idx == 0)
>>>> + dreq->count += hdr->good_bytes;
>>>> if (nfs_write_need_commit(hdr)) {
>>>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>>>> request_commit = true;
>>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>>> index 05f9a87..ef1c703 100644
>>>> --- a/fs/nfs/internal.h
>>>> +++ b/fs/nfs/internal.h
>>>> @@ -469,6 +469,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
>>>> struct nfs_direct_req *dreq);
>>>> int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
>>>> bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
>>>> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
>>>>
>>>> #ifdef CONFIG_MIGRATION
>>>> extern int nfs_migrate_page(struct address_space *,
>>>> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
>>>> index d007780..9a5f2ee 100644
>>>> --- a/fs/nfs/objlayout/objio_osd.c
>>>> +++ b/fs/nfs/objlayout/objio_osd.c
>>>> @@ -537,11 +537,12 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
>>>> static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
>>>> struct nfs_page *prev, struct nfs_page *req)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
>>>> unsigned int size;
>>>>
>>>> size = pnfs_generic_pg_test(pgio, prev, req);
>>>>
>>>> - if (!size || pgio->pg_count + req->wb_bytes >
>>>> + if (!size || mirror->pg_count + req->wb_bytes >
>>>> (unsigned long)pgio->pg_layout_private)
>>>> return 0;
>>>>
>>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>>> index 1c03187..eec12b7 100644
>>>> --- a/fs/nfs/pagelist.c
>>>> +++ b/fs/nfs/pagelist.c
>>>> @@ -46,17 +46,22 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_pgio_header *hdr,
>>>> void (*release)(struct nfs_pgio_header *hdr))
>>>> {
>>>> - hdr->req = nfs_list_entry(desc->pg_list.next);
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> +
>>>> + hdr->req = nfs_list_entry(mirror->pg_list.next);
>>>> hdr->inode = desc->pg_inode;
>>>> hdr->cred = hdr->req->wb_context->cred;
>>>> hdr->io_start = req_offset(hdr->req);
>>>> - hdr->good_bytes = desc->pg_count;
>>>> + hdr->good_bytes = mirror->pg_count;
>>>> hdr->dreq = desc->pg_dreq;
>>>> hdr->layout_private = desc->pg_layout_private;
>>>> hdr->release = release;
>>>> hdr->completion_ops = desc->pg_completion_ops;
>>>> if (hdr->completion_ops->init_hdr)
>>>> hdr->completion_ops->init_hdr(hdr);
>>>> +
>>>> + hdr->pgio_mirror_idx = desc->pg_mirror_idx;
>>>> }
>>>> EXPORT_SYMBOL_GPL(nfs_pgheader_init);
>>>>
>>>> @@ -480,7 +485,10 @@ nfs_wait_on_request(struct nfs_page *req)
>>>> size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_page *prev, struct nfs_page *req)
>>>> {
>>>> - if (desc->pg_count > desc->pg_bsize) {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> +
>>>> + if (mirror->pg_count > mirror->pg_bsize) {
>>>> /* should never happen */
>>>> WARN_ON_ONCE(1);
>>>> return 0;
>>>> @@ -490,11 +498,11 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>>>> * Limit the request size so that we can still allocate a page array
>>>> * for it without upsetting the slab allocator.
>>>> */
>>>> - if (((desc->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
>>>> + if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
>>>> sizeof(struct page) > PAGE_SIZE)
>>>> return 0;
>>>>
>>>> - return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
>>>> + return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
>>>> }
>>>> EXPORT_SYMBOL_GPL(nfs_generic_pg_test);
>>>>
>>>> @@ -651,10 +659,18 @@ EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
>>>> static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_pgio_header *hdr)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror;
>>>> + u32 midx;
>>>> +
>>>> set_bit(NFS_IOHDR_REDO, &hdr->flags);
>>>> nfs_pgio_data_destroy(hdr);
>>>> hdr->completion_ops->completion(hdr);
>>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>>> + /* TODO: Make sure it's right to clean up all mirrors here
>>>> + * and not just hdr->pgio_mirror_idx */
>>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>>>> + mirror = &desc->pg_mirrors[midx];
>>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>>> + }
>>>> return -ENOMEM;
>>>> }
>>>>
>>>> @@ -671,6 +687,17 @@ static void nfs_pgio_release(void *calldata)
>>>> hdr->completion_ops->completion(hdr);
>>>> }
>>>>
>>>> +static void nfs_pageio_mirror_init(struct nfs_pgio_mirror *mirror,
>>>> + unsigned int bsize)
>>>> +{
>>>> + INIT_LIST_HEAD(&mirror->pg_list);
>>>> + mirror->pg_bytes_written = 0;
>>>> + mirror->pg_count = 0;
>>>> + mirror->pg_bsize = bsize;
>>>> + mirror->pg_base = 0;
>>>> + mirror->pg_recoalesce = 0;
>>>> +}
>>>> +
>>>> /**
>>>> * nfs_pageio_init - initialise a page io descriptor
>>>> * @desc: pointer to descriptor
>>>> @@ -687,13 +714,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>>> size_t bsize,
>>>> int io_flags)
>>>> {
>>>> - INIT_LIST_HEAD(&desc->pg_list);
>>>> - desc->pg_bytes_written = 0;
>>>> - desc->pg_count = 0;
>>>> - desc->pg_bsize = bsize;
>>>> - desc->pg_base = 0;
>>>> + struct nfs_pgio_mirror *new;
>>>> + int i;
>>>> +
>>>> desc->pg_moreio = 0;
>>>> - desc->pg_recoalesce = 0;
>>>> desc->pg_inode = inode;
>>>> desc->pg_ops = pg_ops;
>>>> desc->pg_completion_ops = compl_ops;
>>>> @@ -703,6 +727,26 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>>> desc->pg_lseg = NULL;
>>>> desc->pg_dreq = NULL;
>>>> desc->pg_layout_private = NULL;
>>>> + desc->pg_bsize = bsize;
>>>> +
>>>> + desc->pg_mirror_count = 1;
>>>> + desc->pg_mirror_idx = 0;
>>>> +
>>>> + if (pg_ops->pg_get_mirror_count) {
>>>> + /* until we have a request, we don't have an lseg and no
>>>> + * idea how many mirrors there will be */
>>>> + new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
>>>> + sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
>>>> + desc->pg_mirrors_dynamic = new;
>>>> + desc->pg_mirrors = new;
>>>> +
>>>> + for (i = 0; i < NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX; i++)
>>>> + nfs_pageio_mirror_init(&desc->pg_mirrors[i], bsize);
>>>> + } else {
>>>> + desc->pg_mirrors_dynamic = NULL;
>>>> + desc->pg_mirrors = desc->pg_mirrors_static;
>>>> + nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
>>>> + }
>>>> }
>>>> EXPORT_SYMBOL_GPL(nfs_pageio_init);
>>>>
>>>> @@ -738,14 +782,16 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
>>>> int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_pgio_header *hdr)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> struct nfs_page *req;
>>>> struct page **pages,
>>>> *last_page;
>>>> - struct list_head *head = &desc->pg_list;
>>>> + struct list_head *head = &mirror->pg_list;
>>>> struct nfs_commit_info cinfo;
>>>> unsigned int pagecount, pageused;
>>>>
>>>> - pagecount = nfs_page_array_len(desc->pg_base, desc->pg_count);
>>>> + pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
>>>> if (!nfs_pgarray_set(&hdr->page_array, pagecount))
>>>> return nfs_pgio_error(desc, hdr);
>>>>
>>>> @@ -773,7 +819,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>>>> desc->pg_ioflags &= ~FLUSH_COND_STABLE;
>>>>
>>>> /* Set up the argument struct */
>>>> - nfs_pgio_rpcsetup(hdr, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
>>>> + nfs_pgio_rpcsetup(hdr, mirror->pg_count, 0, desc->pg_ioflags, &cinfo);
>>>> desc->pg_rpc_callops = &nfs_pgio_common_ops;
>>>> return 0;
>>>> }
>>>> @@ -781,12 +827,17 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
>>>>
>>>> static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror;
>>>> struct nfs_pgio_header *hdr;
>>>> int ret;
>>>>
>>>> + mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>>>> if (!hdr) {
>>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>>> + /* TODO: make sure this is right with mirroring - or
>>>> + * should it back out all mirrors? */
>>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>>> return -ENOMEM;
>>>> }
>>>> nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
>>>> @@ -801,6 +852,49 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>>>> return ret;
>>>> }
>>>>
>>>> +/*
>>>> + * nfs_pageio_setup_mirroring - determine if mirroring is to be used
>>>> + * by calling the pg_get_mirror_count op
>>>> + */
>>>> +static int nfs_pageio_setup_mirroring(struct nfs_pageio_descriptor *pgio,
>>>> + struct nfs_page *req)
>>>> +{
>>>> + int mirror_count = 1;
>>>> +
>>>> + if (!pgio->pg_ops->pg_get_mirror_count)
>>>> + return 0;
>>>> +
>>>> + mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
>>>> +
>>>> + if (!mirror_count || mirror_count > NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX)
>>>> + return -EINVAL;
>>>> +
>>>> + if (WARN_ON_ONCE(!pgio->pg_mirrors_dynamic))
>>>> + return -EINVAL;
>>>> +
>>>> + pgio->pg_mirror_count = mirror_count;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_pageio_stop_mirroring - stop using mirroring (set mirror count to 1)
>>>> + */
>>>> +void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio)
>>>> +{
>>>> + pgio->pg_mirror_count = 1;
>>>> + pgio->pg_mirror_idx = 0;
>>>> +}
>>>> +
>>>> +static void nfs_pageio_cleanup_mirroring(struct nfs_pageio_descriptor *pgio)
>>>> +{
>>>> + pgio->pg_mirror_count = 1;
>>>> + pgio->pg_mirror_idx = 0;
>>>> + pgio->pg_mirrors = pgio->pg_mirrors_static;
>>>> + kfree(pgio->pg_mirrors_dynamic);
>>>> + pgio->pg_mirrors_dynamic = NULL;
>>>> +}
>>>> +
>>>> static bool nfs_match_open_context(const struct nfs_open_context *ctx1,
>>>> const struct nfs_open_context *ctx2)
>>>> {
>>>> @@ -867,19 +961,22 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
>>>> static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_page *req)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> struct nfs_page *prev = NULL;
>>>> - if (desc->pg_count != 0) {
>>>> - prev = nfs_list_entry(desc->pg_list.prev);
>>>> +
>>>> + if (mirror->pg_count != 0) {
>>>> + prev = nfs_list_entry(mirror->pg_list.prev);
>>>> } else {
>>>> if (desc->pg_ops->pg_init)
>>>> desc->pg_ops->pg_init(desc, req);
>>>> - desc->pg_base = req->wb_pgbase;
>>>> + mirror->pg_base = req->wb_pgbase;
>>>> }
>>>> if (!nfs_can_coalesce_requests(prev, req, desc))
>>>> return 0;
>>>> nfs_list_remove_request(req);
>>>> - nfs_list_add_request(req, &desc->pg_list);
>>>> - desc->pg_count += req->wb_bytes;
>>>> + nfs_list_add_request(req, &mirror->pg_list);
>>>> + mirror->pg_count += req->wb_bytes;
>>>> return 1;
>>>> }
>>>>
>>>> @@ -888,16 +985,19 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>>>> */
>>>> static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>>> {
>>>> - if (!list_empty(&desc->pg_list)) {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> +
>>>> + if (!list_empty(&mirror->pg_list)) {
>>>> int error = desc->pg_ops->pg_doio(desc);
>>>> if (error < 0)
>>>> desc->pg_error = error;
>>>> else
>>>> - desc->pg_bytes_written += desc->pg_count;
>>>> + mirror->pg_bytes_written += mirror->pg_count;
>>>> }
>>>> - if (list_empty(&desc->pg_list)) {
>>>> - desc->pg_count = 0;
>>>> - desc->pg_base = 0;
>>>> + if (list_empty(&mirror->pg_list)) {
>>>> + mirror->pg_count = 0;
>>>> + mirror->pg_base = 0;
>>>> }
>>>> }
>>>>
>>>> @@ -915,10 +1015,14 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>>> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_page *req)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> struct nfs_page *subreq;
>>>> unsigned int bytes_left = 0;
>>>> unsigned int offset, pgbase;
>>>>
>>>> + WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
>>>> +
>>>> nfs_page_group_lock(req, false);
>>>>
>>>> subreq = req;
>>>> @@ -938,7 +1042,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>>> nfs_pageio_doio(desc);
>>>> if (desc->pg_error < 0)
>>>> return 0;
>>>> - if (desc->pg_recoalesce)
>>>> + if (mirror->pg_recoalesce)
>>>> return 0;
>>>> /* retry add_request for this subreq */
>>>> nfs_page_group_lock(req, false);
>>>> @@ -976,14 +1080,16 @@ err_ptr:
>>>>
>>>> static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> LIST_HEAD(head);
>>>>
>>>> do {
>>>> - list_splice_init(&desc->pg_list, &head);
>>>> - desc->pg_bytes_written -= desc->pg_count;
>>>> - desc->pg_count = 0;
>>>> - desc->pg_base = 0;
>>>> - desc->pg_recoalesce = 0;
>>>> + list_splice_init(&mirror->pg_list, &head);
>>>> + mirror->pg_bytes_written -= mirror->pg_count;
>>>> + mirror->pg_count = 0;
>>>> + mirror->pg_base = 0;
>>>> + mirror->pg_recoalesce = 0;
>>>> +
>>>> desc->pg_moreio = 0;
>>>>
>>>> while (!list_empty(&head)) {
>>>> @@ -997,11 +1103,11 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
>>>> return 0;
>>>> break;
>>>> }
>>>> - } while (desc->pg_recoalesce);
>>>> + } while (mirror->pg_recoalesce);
>>>> return 1;
>>>> }
>>>>
>>>> -int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>>> +static int nfs_pageio_add_request_mirror(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_page *req)
>>>> {
>>>> int ret;
>>>> @@ -1014,9 +1120,78 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>>> break;
>>>> ret = nfs_do_recoalesce(desc);
>>>> } while (ret);
>>>> +
>>>> return ret;
>>>> }
>>>>
>>>> +int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>>> + struct nfs_page *req)
>>>> +{
>>>> + u32 midx;
>>>> + unsigned int pgbase, offset, bytes;
>>>> + struct nfs_page *dupreq, *lastreq;
>>>> +
>>>> + pgbase = req->wb_pgbase;
>>>> + offset = req->wb_offset;
>>>> + bytes = req->wb_bytes;
>>>> +
>>>> + nfs_pageio_setup_mirroring(desc, req);
>>>> +
>>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>>>> + if (midx) {
>>>> + nfs_page_group_lock(req, false);
>>>> +
>>>> + /* find the last request */
>>>> + for (lastreq = req->wb_head;
>>>> + lastreq->wb_this_page != req->wb_head;
>>>> + lastreq = lastreq->wb_this_page)
>>>> + ;
>>>> +
>>>> + dupreq = nfs_create_request(req->wb_context,
>>>> + req->wb_page, lastreq, pgbase, bytes);
>>>> +
>>>> + if (IS_ERR(dupreq)) {
>>>> + nfs_page_group_unlock(req);
>>>> + return 0;
>>>> + }
>>>> +
>>>> + nfs_lock_request(dupreq);
>>>> + nfs_page_group_unlock(req);
>>>> + dupreq->wb_offset = offset;
>>>> + dupreq->wb_index = req->wb_index;
>>>> + } else
>>>> + dupreq = req;
>>>> +
>>>> + desc->pg_mirror_idx = midx;
>>>> + if (!nfs_pageio_add_request_mirror(desc, dupreq))
>>>> + return 0;
>>>> + }
>>>> +
>>>> + return 1;
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_pageio_complete_mirror - Complete I/O on the current mirror of an
>>>> + * nfs_pageio_descriptor
>>>> + * @desc: pointer to io descriptor
>>>> + */
>>>> +static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
>>>> + u32 mirror_idx)
>>>> +{
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
>>>> + u32 restore_idx = desc->pg_mirror_idx;
>>>> +
>>>> + desc->pg_mirror_idx = mirror_idx;
>>>> + for (;;) {
>>>> + nfs_pageio_doio(desc);
>>>> + if (!mirror->pg_recoalesce)
>>>> + break;
>>>> + if (!nfs_do_recoalesce(desc))
>>>> + break;
>>>> + }
>>>> + desc->pg_mirror_idx = restore_idx;
>>>> +}
>>>> +
>>>> /*
>>>> * nfs_pageio_resend - Transfer requests to new descriptor and resend
>>>> * @hdr - the pgio header to move request from
>>>> @@ -1055,16 +1230,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_resend);
>>>> */
>>>> void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>>>> {
>>>> - for (;;) {
>>>> - nfs_pageio_doio(desc);
>>>> - if (!desc->pg_recoalesce)
>>>> - break;
>>>> - if (!nfs_do_recoalesce(desc))
>>>> - break;
>>>> - }
>>>> + u32 midx;
>>>> +
>>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++)
>>>> + nfs_pageio_complete_mirror(desc, midx);
>>>>
>>>> if (desc->pg_ops->pg_cleanup)
>>>> desc->pg_ops->pg_cleanup(desc);
>>>> + nfs_pageio_cleanup_mirroring(desc);
>>>> }
>>>>
>>>> /**
>>>> @@ -1080,10 +1253,17 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>>>> */
>>>> void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>>>> {
>>>> - if (!list_empty(&desc->pg_list)) {
>>>> - struct nfs_page *prev = nfs_list_entry(desc->pg_list.prev);
>>>> - if (index != prev->wb_index + 1)
>>>> - nfs_pageio_complete(desc);
>>>> + struct nfs_pgio_mirror *mirror;
>>>> + struct nfs_page *prev;
>>>> + u32 midx;
>>>> +
>>>> + for (midx = 0; midx < desc->pg_mirror_count; midx++) {
>>>> + mirror = &desc->pg_mirrors[midx];
>>>> + if (!list_empty(&mirror->pg_list)) {
>>>> + prev = nfs_list_entry(mirror->pg_list.prev);
>>>> + if (index != prev->wb_index + 1)
>>>> + nfs_pageio_complete_mirror(desc, midx);
>>>> + }
>>>> }
>>>> }
>>>>
>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>> index 2da2e77..5f7c422 100644
>>>> --- a/fs/nfs/pnfs.c
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -1646,8 +1646,8 @@ EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
>>>> * of bytes (maximum @req->wb_bytes) that can be coalesced.
>>>> */
>>>> size_t
>>>> -pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>>>> - struct nfs_page *req)
>>>> +pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
>>>> + struct nfs_page *prev, struct nfs_page *req)
>>>> {
>>>> unsigned int size;
>>>> u64 seg_end, req_start, seg_left;
>>>> @@ -1729,10 +1729,12 @@ static void
>>>> pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_pgio_header *hdr)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
>>>> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
>>>> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
>>>> nfs_pageio_reset_write_mds(desc);
>>>> - desc->pg_recoalesce = 1;
>>>> + mirror->pg_recoalesce = 1;
>>>> }
>>>> nfs_pgio_data_destroy(hdr);
>>>> }
>>>> @@ -1781,12 +1783,14 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
>>>> int
>>>> pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> struct nfs_pgio_header *hdr;
>>>> int ret;
>>>>
>>>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>>>> if (!hdr) {
>>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>>> return -ENOMEM;
>>>> }
>>>> nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
>>>> @@ -1795,6 +1799,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>>>> ret = nfs_generic_pgio(desc, hdr);
>>>> if (!ret)
>>>> pnfs_do_write(desc, hdr, desc->pg_ioflags);
>>>> +
>>>> return ret;
>>>> }
>>>> EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
>>>> @@ -1839,10 +1844,13 @@ static void
>>>> pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_pgio_header *hdr)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> +
>>>> if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
>>>> - list_splice_tail_init(&hdr->pages, &desc->pg_list);
>>>> + list_splice_tail_init(&hdr->pages, &mirror->pg_list);
>>>> nfs_pageio_reset_read_mds(desc);
>>>> - desc->pg_recoalesce = 1;
>>>> + mirror->pg_recoalesce = 1;
>>>> }
>>>> nfs_pgio_data_destroy(hdr);
>>>> }
>>>> @@ -1893,12 +1901,14 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
>>>> int
>>>> pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
>>>> +
>>>> struct nfs_pgio_header *hdr;
>>>> int ret;
>>>>
>>>> hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
>>>> if (!hdr) {
>>>> - desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>>>> + desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
>>>> return -ENOMEM;
>>>> }
>>>> nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
>>>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>>>> index 092ab49..568ecf0 100644
>>>> --- a/fs/nfs/read.c
>>>> +++ b/fs/nfs/read.c
>>>> @@ -70,8 +70,15 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
>>>>
>>>> void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror;
>>>> +
>>>> pgio->pg_ops = &nfs_pgio_rw_ops;
>>>> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
>>>> +
>>>> + /* read path should never have more than one mirror */
>>>> + WARN_ON_ONCE(pgio->pg_mirror_count != 1);
>>>> +
>>>> + mirror = &pgio->pg_mirrors[0];
>>>> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
>>>> }
>>>> EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
>>>>
>>>> @@ -81,6 +88,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>>>> struct nfs_page *new;
>>>> unsigned int len;
>>>> struct nfs_pageio_descriptor pgio;
>>>> + struct nfs_pgio_mirror *pgm;
>>>>
>>>> len = nfs_page_length(page);
>>>> if (len == 0)
>>>> @@ -97,7 +105,13 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>>>> &nfs_async_read_completion_ops);
>>>> nfs_pageio_add_request(&pgio, new);
>>>> nfs_pageio_complete(&pgio);
>>>> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
>>>> +
>>>> + /* It doesn't make sense to do mirrored reads! */
>>>> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
>>>> +
>>>> + pgm = &pgio.pg_mirrors[0];
>>>> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
>>>> +
>>>> return 0;
>>>> }
>>>>
>>>> @@ -352,6 +366,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
>>>> struct list_head *pages, unsigned nr_pages)
>>>> {
>>>> struct nfs_pageio_descriptor pgio;
>>>> + struct nfs_pgio_mirror *pgm;
>>>> struct nfs_readdesc desc = {
>>>> .pgio = &pgio,
>>>> };
>>>> @@ -387,10 +402,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
>>>> &nfs_async_read_completion_ops);
>>>>
>>>> ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
>>>> -
>>>> nfs_pageio_complete(&pgio);
>>>> - NFS_I(inode)->read_io += pgio.pg_bytes_written;
>>>> - npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
>>>> +
>>>> + /* It doesn't make sense to do mirrored reads! */
>>>> + WARN_ON_ONCE(pgio.pg_mirror_count != 1);
>>>> +
>>>> + pgm = &pgio.pg_mirrors[0];
>>>> + NFS_I(inode)->read_io += pgm->pg_bytes_written;
>>>> + npages = (pgm->pg_bytes_written + PAGE_CACHE_SIZE - 1) >>
>>>> + PAGE_CACHE_SHIFT;
>>>> nfs_add_stats(inode, NFSIOS_READPAGES, npages);
>>>> read_complete:
>>>> put_nfs_open_context(desc.ctx);
>>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>>> index db802d9..2f6ee8e 100644
>>>> --- a/fs/nfs/write.c
>>>> +++ b/fs/nfs/write.c
>>>> @@ -906,7 +906,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>>>> if (nfs_write_need_commit(hdr)) {
>>>> memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
>>>> nfs_mark_request_commit(req, hdr->lseg, &cinfo,
>>>> - 0);
>>>> + hdr->pgio_mirror_idx);
>>>> goto next;
>>>> }
>>>> remove_req:
>>>> @@ -1305,8 +1305,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_write);
>>>>
>>>> void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
>>>> {
>>>> + struct nfs_pgio_mirror *mirror;
>>>> +
>>>> pgio->pg_ops = &nfs_pgio_rw_ops;
>>>> - pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
>>>> +
>>>> + nfs_pageio_stop_mirroring(pgio);
>>>> +
>>>> + mirror = &pgio->pg_mirrors[0];
>>>> + mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
>>>> }
>>>> EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
>>>>
>>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>>> index 479c566..3eb072d 100644
>>>> --- a/include/linux/nfs_page.h
>>>> +++ b/include/linux/nfs_page.h
>>>> @@ -58,6 +58,8 @@ struct nfs_pageio_ops {
>>>> size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
>>>> struct nfs_page *);
>>>> int (*pg_doio)(struct nfs_pageio_descriptor *);
>>>> + unsigned int (*pg_get_mirror_count)(struct nfs_pageio_descriptor *,
>>>> + struct nfs_page *);
>>>> void (*pg_cleanup)(struct nfs_pageio_descriptor *);
>>>> };
>>>>
>>>> @@ -74,15 +76,17 @@ struct nfs_rw_ops {
>>>> struct rpc_task_setup *, int);
>>>> };
>>>>
>>>> -struct nfs_pageio_descriptor {
>>>> +struct nfs_pgio_mirror {
>>>> struct list_head pg_list;
>>>> unsigned long pg_bytes_written;
>>>> size_t pg_count;
>>>> size_t pg_bsize;
>>>> unsigned int pg_base;
>>>> - unsigned char pg_moreio : 1,
>>>> - pg_recoalesce : 1;
>>>> + unsigned char pg_recoalesce : 1;
>>>> +};
>>>>
>>>> +struct nfs_pageio_descriptor {
>>>> + unsigned char pg_moreio : 1;
>>>> struct inode *pg_inode;
>>>> const struct nfs_pageio_ops *pg_ops;
>>>> const struct nfs_rw_ops *pg_rw_ops;
>>>> @@ -93,8 +97,18 @@ struct nfs_pageio_descriptor {
>>>> struct pnfs_layout_segment *pg_lseg;
>>>> struct nfs_direct_req *pg_dreq;
>>>> void *pg_layout_private;
>>>> + unsigned int pg_bsize; /* default bsize for mirrors */
>>>> +
>>>> + u32 pg_mirror_count;
>>>> + struct nfs_pgio_mirror *pg_mirrors;
>>>> + struct nfs_pgio_mirror pg_mirrors_static[1];
>>>> + struct nfs_pgio_mirror *pg_mirrors_dynamic;
>>>> + u32 pg_mirror_idx; /* current mirror */
>>>> };
>>>>
>>>> +/* arbitrarily selected limit to number of mirrors */
>>>> +#define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
>>>> +
>>>> #define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
>>>>
>>>> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
>>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>>> index 5bc99f0..6400a1e 100644
>>>> --- a/include/linux/nfs_xdr.h
>>>> +++ b/include/linux/nfs_xdr.h
>>>> @@ -1329,6 +1329,7 @@ struct nfs_pgio_header {
>>>> struct nfs_page_array page_array;
>>>> struct nfs_client *ds_clp; /* pNFS data server */
>>>> int ds_commit_idx; /* ds index if ds_clp is set */
>>>> + int pgio_mirror_idx;/* mirror index in pgio layer */
>>>> };
>>>>
>>>> struct nfs_mds_commit_info {
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
Hey Tao and Tom,
On 12/24/2014 02:13 AM, Tom Haynes wrote:
> From: Peng Tao <[email protected]>
>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 11 +++++++++--
> fs/nfs/pnfs.c | 12 +++++++-----
> fs/nfs/pnfs.h | 2 +-
> 3 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index bf5ef58..53df457 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7800,7 +7800,7 @@ static const struct rpc_call_ops nfs4_layoutreturn_call_ops = {
> .rpc_release = nfs4_layoutreturn_release,
> };
>
> -int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
> +int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
> {
> struct rpc_task *task;
> struct rpc_message msg = {
> @@ -7814,16 +7814,23 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
> .rpc_message = &msg,
> .callback_ops = &nfs4_layoutreturn_call_ops,
> .callback_data = lrp,
> + .flags = RPC_TASK_ASYNC,
> };
> - int status;
> + int status = 0;
>
> dprintk("--> %s\n", __func__);
> nfs4_init_sequence(&lrp->args.seq_args, &lrp->res.seq_res, 1);
> task = rpc_run_task(&task_setup_data);
> if (IS_ERR(task))
> return PTR_ERR(task);
> + if (sync == false)
> + goto out;
> + status = nfs4_wait_for_completion_rpc_task(task);
> + if (status != 0)
> + goto out;
Is there any way to share this code with nfs4_proc_layoutcommit?
Thanks,
Anna
> status = task->tk_status;
> trace_nfs4_layoutreturn(lrp->args.inode, status);
> +out:
> dprintk("<-- %s status=%d\n", __func__, status);
> rpc_put_task(task);
> return status;
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 63beace..e889b97 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -52,7 +52,8 @@ static LIST_HEAD(pnfs_modules_tbl);
>
> static int
> pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
> - enum pnfs_iomode iomode, u64 offset, u64 length);
> + enum pnfs_iomode iomode, u64 offset, u64 length,
> + bool sync);
>
> /* Return the registered pnfs layout driver module matching given id */
> static struct pnfs_layoutdriver_type *
> @@ -393,7 +394,7 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
> pnfs_free_lseg(lseg);
> if (need_return)
> pnfs_send_layoutreturn(lo, stateid, iomode, 0,
> - NFS4_MAX_UINT64);
> + NFS4_MAX_UINT64, true);
> else
> pnfs_put_layout_hdr(lo);
> }
> @@ -898,7 +899,8 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
>
> static int
> pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
> - enum pnfs_iomode iomode, u64 offset, u64 length)
> + enum pnfs_iomode iomode, u64 offset, u64 length,
> + bool sync)
> {
> struct inode *ino = lo->plh_inode;
> struct nfs4_layoutreturn *lrp;
> @@ -924,7 +926,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
> lrp->clp = NFS_SERVER(ino)->nfs_client;
> lrp->cred = lo->plh_lc_cred;
>
> - status = nfs4_proc_layoutreturn(lrp);
> + status = nfs4_proc_layoutreturn(lrp, sync);
> out:
> if (status) {
> spin_lock(&ino->i_lock);
> @@ -991,7 +993,7 @@ _pnfs_return_layout(struct inode *ino)
> pnfs_free_lseg_list(&tmp_list);
>
> status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
> - NFS4_MAX_UINT64);
> + NFS4_MAX_UINT64, true);
> out:
> dprintk("<-- %s status: %d\n", __func__, status);
> return status;
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 26e7cd8..7a33c50 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -219,7 +219,7 @@ extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
> struct pnfs_device *dev,
> struct rpc_cred *cred);
> extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags);
> -extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
> +extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
>
> /* pnfs.c */
> void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
>
On Tue, Jan 06, 2015 at 01:59:20PM -0500, Anna Schumaker wrote:
> Hey Tao and Tom,
>
> On 12/24/2014 02:13 AM, Tom Haynes wrote:
> > From: Peng Tao <[email protected]>
> >
> > Signed-off-by: Peng Tao <[email protected]>
> > Signed-off-by: Tom Haynes <[email protected]>
> > ---
> > fs/nfs/nfs4proc.c | 11 +++++++++--
> > fs/nfs/pnfs.c | 12 +++++++-----
> > fs/nfs/pnfs.h | 2 +-
> > 3 files changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index bf5ef58..53df457 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -7800,7 +7800,7 @@ static const struct rpc_call_ops nfs4_layoutreturn_call_ops = {
> > .rpc_release = nfs4_layoutreturn_release,
> > };
> >
> > -int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
> > +int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
> > {
> > struct rpc_task *task;
> > struct rpc_message msg = {
> > @@ -7814,16 +7814,23 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
> > .rpc_message = &msg,
> > .callback_ops = &nfs4_layoutreturn_call_ops,
> > .callback_data = lrp,
> > + .flags = RPC_TASK_ASYNC,
> > };
> > - int status;
> > + int status = 0;
> >
> > dprintk("--> %s\n", __func__);
> > nfs4_init_sequence(&lrp->args.seq_args, &lrp->res.seq_res, 1);
> > task = rpc_run_task(&task_setup_data);
> > if (IS_ERR(task))
> > return PTR_ERR(task);
> > + if (sync == false)
> > + goto out;
> > + status = nfs4_wait_for_completion_rpc_task(task);
> > + if (status != 0)
> > + goto out;
>
> Is there any way to share this code with nfs4_proc_layoutcommit?
Yes, except for the difference between trace_nfs4_layoutcommit()
and trace_nfs4_layoutreturn(). Not sure how to pass those into a
common function and I'm not sure there is a huge savings to
merge this code.
>
> Thanks,
> Anna
>
> > status = task->tk_status;
> > trace_nfs4_layoutreturn(lrp->args.inode, status);
> > +out:
> > dprintk("<-- %s status=%d\n", __func__, status);
> > rpc_put_task(task);
> > return status;
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index 63beace..e889b97 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -52,7 +52,8 @@ static LIST_HEAD(pnfs_modules_tbl);
> >
> > static int
> > pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
> > - enum pnfs_iomode iomode, u64 offset, u64 length);
> > + enum pnfs_iomode iomode, u64 offset, u64 length,
> > + bool sync);
> >
> > /* Return the registered pnfs layout driver module matching given id */
> > static struct pnfs_layoutdriver_type *
> > @@ -393,7 +394,7 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
> > pnfs_free_lseg(lseg);
> > if (need_return)
> > pnfs_send_layoutreturn(lo, stateid, iomode, 0,
> > - NFS4_MAX_UINT64);
> > + NFS4_MAX_UINT64, true);
> > else
> > pnfs_put_layout_hdr(lo);
> > }
> > @@ -898,7 +899,8 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
> >
> > static int
> > pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
> > - enum pnfs_iomode iomode, u64 offset, u64 length)
> > + enum pnfs_iomode iomode, u64 offset, u64 length,
> > + bool sync)
> > {
> > struct inode *ino = lo->plh_inode;
> > struct nfs4_layoutreturn *lrp;
> > @@ -924,7 +926,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
> > lrp->clp = NFS_SERVER(ino)->nfs_client;
> > lrp->cred = lo->plh_lc_cred;
> >
> > - status = nfs4_proc_layoutreturn(lrp);
> > + status = nfs4_proc_layoutreturn(lrp, sync);
> > out:
> > if (status) {
> > spin_lock(&ino->i_lock);
> > @@ -991,7 +993,7 @@ _pnfs_return_layout(struct inode *ino)
> > pnfs_free_lseg_list(&tmp_list);
> >
> > status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
> > - NFS4_MAX_UINT64);
> > + NFS4_MAX_UINT64, true);
> > out:
> > dprintk("<-- %s status: %d\n", __func__, status);
> > return status;
> > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> > index 26e7cd8..7a33c50 100644
> > --- a/fs/nfs/pnfs.h
> > +++ b/fs/nfs/pnfs.h
> > @@ -219,7 +219,7 @@ extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
> > struct pnfs_device *dev,
> > struct rpc_cred *cred);
> > extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags);
> > -extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
> > +extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
> >
> > /* pnfs.c */
> > void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
> >
>
Hey Tom,
On 12/24/2014 02:13 AM, Tom Haynes wrote:
> The flexfile layout is a new layout that extends the
> file layout. It is currently being drafted as a specification at
> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> Signed-off-by: Tom Haynes <[email protected]>
> Signed-off-by: Tao Peng <[email protected]>
> ---
> fs/nfs/Kconfig | 5 +
> fs/nfs/Makefile | 1 +
> fs/nfs/flexfilelayout/Makefile | 5 +
> fs/nfs/flexfilelayout/flexfilelayout.c | 1600 +++++++++++++++++++++++++++++
> fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
> fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
> include/linux/nfs4.h | 1 +
> 7 files changed, 2322 insertions(+)
> create mode 100644 fs/nfs/flexfilelayout/Makefile
> create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
> create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
> create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
>
> diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
> index 3dece03..c7abc10 100644
> --- a/fs/nfs/Kconfig
> +++ b/fs/nfs/Kconfig
> @@ -128,6 +128,11 @@ config PNFS_OBJLAYOUT
> depends on NFS_V4_1 && SCSI_OSD_ULD
> default NFS_V4
>
> +config PNFS_FLEXFILE_LAYOUT
> + tristate
> + depends on NFS_V4_1 && NFS_V3
> + default m
> +
> config NFS_V4_1_IMPLEMENTATION_ID_DOMAIN
> string "NFSv4.1 Implementation ID Domain"
> depends on NFS_V4_1
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 7973c4e3..3c97bd9 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -33,3 +33,4 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
> obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
> obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayout/
> obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
> +obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/
> diff --git a/fs/nfs/flexfilelayout/Makefile b/fs/nfs/flexfilelayout/Makefile
> new file mode 100644
> index 0000000..1d2c9f6
> --- /dev/null
> +++ b/fs/nfs/flexfilelayout/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the pNFS Flexfile Layout Driver kernel module
> +#
> +obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += nfs_layout_flexfiles.o
> +nfs_layout_flexfiles-y := flexfilelayout.o flexfilelayoutdev.o
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> new file mode 100644
> index 0000000..fddd3e6
> --- /dev/null
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -0,0 +1,1600 @@
> +/*
> + * Module for pnfs flexfile layout driver.
> + *
> + * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
> + *
> + * Tao Peng <[email protected]>
> + */
> +
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_page.h>
> +#include <linux/module.h>
> +
> +#include <linux/sunrpc/metrics.h>
> +
> +#include "flexfilelayout.h"
> +#include "../nfs4session.h"
> +#include "../internal.h"
> +#include "../delegation.h"
> +#include "../nfs4trace.h"
> +#include "../iostat.h"
> +#include "../nfs.h"
> +
> +#define NFSDBG_FACILITY NFSDBG_PNFS_LD
> +
> +#define FF_LAYOUT_POLL_RETRY_MAX (15*HZ)
> +
> +static struct pnfs_layout_hdr *
> +ff_layout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
> +{
> + struct nfs4_flexfile_layout *ffl;
> +
> + ffl = kzalloc(sizeof(*ffl), gfp_flags);
> + INIT_LIST_HEAD(&ffl->error_list);
> + return ffl != NULL ? &ffl->generic_hdr : NULL;
> +}
> +
> +static void
> +ff_layout_free_layout_hdr(struct pnfs_layout_hdr *lo)
> +{
> + struct nfs4_ff_layout_ds_err *err, *n;
> +
> + list_for_each_entry_safe(err, n, &FF_LAYOUT_FROM_HDR(lo)->error_list,
> + list) {
> + list_del(&err->list);
> + kfree(err);
> + }
> + kfree(FF_LAYOUT_FROM_HDR(lo));
> +}
> +
> +static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
> +{
> + __be32 *p;
> +
> + p = xdr_inline_decode(xdr, NFS4_STATEID_SIZE);
> + if (unlikely(p == NULL))
> + return -ENOBUFS;
> + memcpy(stateid, p, NFS4_STATEID_SIZE);
> + dprintk("%s: stateid id= [%x%x%x%x]\n", __func__,
> + p[0], p[1], p[2], p[3]);
> + return 0;
> +}
> +
> +static int decode_deviceid(struct xdr_stream *xdr, struct nfs4_deviceid *devid)
> +{
> + __be32 *p;
> +
> + p = xdr_inline_decode(xdr, NFS4_DEVICEID4_SIZE);
> + if (unlikely(!p))
> + return -ENOBUFS;
> + memcpy(devid, p, NFS4_DEVICEID4_SIZE);
> + nfs4_print_deviceid(devid);
> + return 0;
> +}
> +
> +static int decode_nfs_fh(struct xdr_stream *xdr, struct nfs_fh *fh)
> +{
> + __be32 *p;
> +
> + p = xdr_inline_decode(xdr, 4);
> + if (unlikely(!p))
> + return -ENOBUFS;
> + fh->size = be32_to_cpup(p++);
> + if (fh->size > sizeof(struct nfs_fh)) {
> + printk(KERN_ERR "NFS flexfiles: Too big fh received %d\n",
> + fh->size);
> + return -EOVERFLOW;
> + }
> + /* fh.data */
> + p = xdr_inline_decode(xdr, fh->size);
> + if (unlikely(!p))
> + return -ENOBUFS;
> + memcpy(&fh->data, p, fh->size);
> + dprintk("%s: fh len %d\n", __func__, fh->size);
> +
> + return 0;
> +}
> +
> +/*
> + * we only handle AUTH_NONE and AUTH_UNIX for now.
> + *
> + * For AUTH_UNIX, we want to parse
> + * struct authsys_parms {
> + * unsigned int stamp;
> + * string machinename<255>;
> + * unsigned int uid;
> + * unsigned int gid;
> + * unsigned int gids<16>;
> + * };
> + */
> +static int
> +ff_layout_parse_auth(struct xdr_stream *xdr,
> + struct nfs4_ff_layout_mirror *mirror)
> +{
> + __be32 *p;
> + int flavor, len, gid_it = 0;
> +
> + /* authflavor(4) + opaque_length(4)*/
> + p = xdr_inline_decode(xdr, 8);
> + if (unlikely(!p))
> + return -ENOBUFS;
> + flavor = be32_to_cpup(p++);
> + len = be32_to_cpup(p++);
> + if (flavor < RPC_AUTH_NULL || flavor >= RPC_AUTH_MAXFLAVOR ||
> + len < 0)
> + return -EINVAL;
> +
> + dprintk("%s: flavor %u len %u\n", __func__, flavor, len);
> +
> + if (flavor == RPC_AUTH_NULL && len == 0)
> + goto out_fill;
> +
> + /* opaque body */
> + p = xdr_inline_decode(xdr, len);
> + if (unlikely(!p))
> + return -ENOBUFS;
> +
> + if (flavor == RPC_AUTH_NULL) {
> + mirror->uid = -1;
> + mirror->gid = -1;
> + } else if (flavor == RPC_AUTH_UNIX) {
> + int len2;
> +
> + p++; /* stamp */
> + len2 = be32_to_cpup(p++); /* machinename length */
> + dprintk("%s: machinename length %u\n", __func__, len2);
> + if (len2 < 0 || len2 >= len || len2 > 255)
> + return -EINVAL;
> + p += XDR_QUADLEN(len2); /* machinename */
> +
> + mirror->uid = be32_to_cpup(p++);
> + mirror->gid = be32_to_cpup(p++);
> +
> + len2 = be32_to_cpup(p++); /* gid array length */
> + dprintk("%s: gid array length %u\n", __func__, len2);
> + if (len2 > 16)
> + return -EINVAL;
> + for (; gid_it < len2; gid_it++)
> + mirror->gids[gid_it] = be32_to_cpup(p++);
> + } else {
> + return -EPROTONOSUPPORT;
> + }
> +
> +out_fill:
> + /* filling the rest of gids */
> + for (; gid_it < 16; gid_it++)
> + mirror->gids[gid_it] = -1;
> +
> + return 0;
> +}
> +
> +static void ff_layout_free_mirror_array(struct nfs4_ff_layout_segment *fls)
> +{
> + int i;
> +
> + if (fls->mirror_array) {
> + for (i = 0; i < fls->mirror_array_cnt; i++) {
> + /* normally mirror_ds is freed in
> + * .free_deviceid_node but we still do it here
> + * for .alloc_lseg error path */
> + if (fls->mirror_array[i]) {
> + kfree(fls->mirror_array[i]->fh_versions);
> + nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
> + kfree(fls->mirror_array[i]);
> + }
> + }
> + kfree(fls->mirror_array);
> + fls->mirror_array = NULL;
> + }
> +}
> +
> +static int ff_layout_check_layout(struct nfs4_layoutget_res *lgr)
> +{
> + int ret = 0;
> +
> + dprintk("--> %s\n", __func__);
> +
> + /* FIXME: remove this check when layout segment support is added */
> + if (lgr->range.offset != 0 ||
> + lgr->range.length != NFS4_MAX_UINT64) {
> + dprintk("%s Only whole file layouts supported. Use MDS i/o\n",
> + __func__);
> + ret = -EINVAL;
> + }
> +
> + dprintk("--> %s returns %d\n", __func__, ret);
> + return ret;
> +}
> +
> +static void _ff_layout_free_lseg(struct nfs4_ff_layout_segment *fls)
> +{
> + if (fls) {
> + ff_layout_free_mirror_array(fls);
> + kfree(fls);
> + }
> +}
> +
> +static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
> +{
> + struct nfs4_ff_layout_mirror *tmp;
> + int i, j;
> +
> + for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
> + for (j = i + 1; j < fls->mirror_array_cnt; j++)
> + if (fls->mirror_array[i]->efficiency <
> + fls->mirror_array[j]->efficiency) {
> + tmp = fls->mirror_array[i];
> + fls->mirror_array[i] = fls->mirror_array[j];
> + fls->mirror_array[j] = tmp;
> + }
> + }
> +}
> +
> +static struct pnfs_layout_segment *
> +ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
> + struct nfs4_layoutget_res *lgr,
> + gfp_t gfp_flags)
> +{
> + struct pnfs_layout_segment *ret;
> + struct nfs4_ff_layout_segment *fls = NULL;
> + struct xdr_stream stream;
> + struct xdr_buf buf;
> + struct page *scratch;
> + u64 stripe_unit;
> + u32 mirror_array_cnt;
> + __be32 *p;
> + int i, rc;
> +
> + dprintk("--> %s\n", __func__);
> + scratch = alloc_page(gfp_flags);
> + if (!scratch)
> + return ERR_PTR(-ENOMEM);
> +
> + xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages,
> + lgr->layoutp->len);
> + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
> +
> + /* stripe unit and mirror_array_cnt */
> + rc = -EIO;
> + p = xdr_inline_decode(&stream, 8 + 4);
> + if (!p)
> + goto out_err_free;
> +
> + p = xdr_decode_hyper(p, &stripe_unit);
> + mirror_array_cnt = be32_to_cpup(p++);
> + dprintk("%s: stripe_unit=%llu mirror_array_cnt=%u\n", __func__,
> + stripe_unit, mirror_array_cnt);
> +
> + if (mirror_array_cnt > NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT ||
> + mirror_array_cnt == 0)
> + goto out_err_free;
> +
> + rc = -ENOMEM;
> + fls = kzalloc(sizeof(*fls), gfp_flags);
> + if (!fls)
> + goto out_err_free;
> +
> + fls->mirror_array_cnt = mirror_array_cnt;
> + fls->stripe_unit = stripe_unit;
> + fls->mirror_array = kcalloc(fls->mirror_array_cnt,
> + sizeof(fls->mirror_array[0]), gfp_flags);
> + if (fls->mirror_array == NULL)
> + goto out_err_free;
> +
> + for (i = 0; i < fls->mirror_array_cnt; i++) {
> + struct nfs4_deviceid devid;
> + struct nfs4_deviceid_node *idnode;
> + u32 ds_count;
> + u32 fh_count;
> + int j;
> +
> + rc = -EIO;
> + p = xdr_inline_decode(&stream, 4);
> + if (!p)
> + goto out_err_free;
> + ds_count = be32_to_cpup(p);
> +
> + /* FIXME: allow for striping? */
> + if (ds_count != 1)
> + goto out_err_free;
> +
> + fls->mirror_array[i] =
> + kzalloc(sizeof(struct nfs4_ff_layout_mirror),
> + gfp_flags);
> + if (fls->mirror_array[i] == NULL) {
> + rc = -ENOMEM;
> + goto out_err_free;
> + }
> +
> + spin_lock_init(&fls->mirror_array[i]->lock);
> + fls->mirror_array[i]->ds_count = ds_count;
> +
> + /* deviceid */
> + rc = decode_deviceid(&stream, &devid);
> + if (rc)
> + goto out_err_free;
> +
> + idnode = nfs4_find_get_deviceid(NFS_SERVER(lh->plh_inode),
> + &devid, lh->plh_lc_cred,
> + gfp_flags);
> + /*
> + * upon success, mirror_ds is allocated by previous
> + * getdeviceinfo, or newly by .alloc_deviceid_node
> + * nfs4_find_get_deviceid failure is indeed getdeviceinfo falure
> + */
> + if (idnode)
> + fls->mirror_array[i]->mirror_ds =
> + FF_LAYOUT_MIRROR_DS(idnode);
> + else
> + goto out_err_free;
> +
> + /* efficiency */
> + rc = -EIO;
> + p = xdr_inline_decode(&stream, 4);
> + if (!p)
> + goto out_err_free;
> + fls->mirror_array[i]->efficiency = be32_to_cpup(p);
> +
> + /* stateid */
> + rc = decode_stateid(&stream, &fls->mirror_array[i]->stateid);
> + if (rc)
> + goto out_err_free;
> +
> + /* fh */
> + p = xdr_inline_decode(&stream, 4);
> + if (!p)
> + goto out_err_free;
> + fh_count = be32_to_cpup(p);
> +
> + fls->mirror_array[i]->fh_versions =
> + kzalloc(fh_count * sizeof(struct nfs_fh),
> + gfp_flags);
> + if (fls->mirror_array[i]->fh_versions == NULL) {
> + rc = -ENOMEM;
> + goto out_err_free;
> + }
> +
> + for (j = 0; j < fh_count; j++) {
> + rc = decode_nfs_fh(&stream,
> + &fls->mirror_array[i]->fh_versions[j]);
> + if (rc)
> + goto out_err_free;
> + }
> +
> + fls->mirror_array[i]->fh_versions_cnt = fh_count;
> +
> + /* opaque_auth */
> + rc = ff_layout_parse_auth(&stream, fls->mirror_array[i]);
> + if (rc)
> + goto out_err_free;
> +
> + dprintk("%s: uid %d gid %d\n", __func__,
> + fls->mirror_array[i]->uid,
> + fls->mirror_array[i]->gid);
> + }
> +
> + ff_layout_sort_mirrors(fls);
> + rc = ff_layout_check_layout(lgr);
> + if (rc)
> + goto out_err_free;
> +
> + ret = &fls->generic_hdr;
> + dprintk("<-- %s (success)\n", __func__);
> +out_free_page:
> + __free_page(scratch);
> + return ret;
> +out_err_free:
> + _ff_layout_free_lseg(fls);
> + ret = ERR_PTR(rc);
> + dprintk("<-- %s (%d)\n", __func__, rc);
> + goto out_free_page;
> +}
> +
> +static bool ff_layout_has_rw_segments(struct pnfs_layout_hdr *layout)
> +{
> + struct pnfs_layout_segment *lseg;
> +
> + list_for_each_entry(lseg, &layout->plh_segs, pls_list)
> + if (lseg->pls_range.iomode == IOMODE_RW)
> + return true;
> +
> + return false;
> +}
> +
> +static void
> +ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
> +{
> + struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
> + int i;
> +
> + dprintk("--> %s\n", __func__);
> +
> + for (i = 0; i < fls->mirror_array_cnt; i++) {
> + if (fls->mirror_array[i]) {
> + nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
> + fls->mirror_array[i]->mirror_ds = NULL;
> + if (fls->mirror_array[i]->cred) {
> + put_rpccred(fls->mirror_array[i]->cred);
> + fls->mirror_array[i]->cred = NULL;
> + }
> + }
> + }
> +
> + if (lseg->pls_range.iomode == IOMODE_RW) {
> + struct nfs4_flexfile_layout *ffl;
> + struct inode *inode;
> +
> + ffl = FF_LAYOUT_FROM_HDR(lseg->pls_layout);
> + inode = ffl->generic_hdr.plh_inode;
> + spin_lock(&inode->i_lock);
> + if (!ff_layout_has_rw_segments(lseg->pls_layout)) {
> + ffl->commit_info.nbuckets = 0;
> + kfree(ffl->commit_info.buckets);
> + ffl->commit_info.buckets = NULL;
> + }
> + spin_unlock(&inode->i_lock);
> + }
> + _ff_layout_free_lseg(fls);
> +}
> +
> +/* Return 1 until we have multiple lsegs support */
> +static int
> +ff_layout_get_lseg_count(struct nfs4_ff_layout_segment *fls)
> +{
> + return 1;
> +}
> +
> +static int
> +ff_layout_alloc_commit_info(struct pnfs_layout_segment *lseg,
> + struct nfs_commit_info *cinfo,
> + gfp_t gfp_flags)
> +{
> + struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
> + struct pnfs_commit_bucket *buckets;
> + int size;
> +
> + if (cinfo->ds->nbuckets != 0) {
> + /* This assumes there is only one RW lseg per file.
> + * To support multiple lseg per file, we need to
> + * change struct pnfs_commit_bucket to allow dynamic
> + * increasing nbuckets.
> + */
> + return 0;
> + }
> +
> + size = ff_layout_get_lseg_count(fls) * FF_LAYOUT_MIRROR_COUNT(lseg);
> +
> + buckets = kcalloc(size, sizeof(struct pnfs_commit_bucket),
> + gfp_flags);
> + if (!buckets)
> + return -ENOMEM;
> + else {
> + int i;
> +
> + spin_lock(cinfo->lock);
> + if (cinfo->ds->nbuckets != 0)
> + kfree(buckets);
> + else {
> + cinfo->ds->buckets = buckets;
> + cinfo->ds->nbuckets = size;
> + for (i = 0; i < size; i++) {
> + INIT_LIST_HEAD(&buckets[i].written);
> + INIT_LIST_HEAD(&buckets[i].committing);
> + /* mark direct verifier as unset */
> + buckets[i].direct_verf.committed =
> + NFS_INVALID_STABLE_HOW;
> + }
> + }
> + spin_unlock(cinfo->lock);
> + return 0;
> + }
> +}
> +
> +static struct nfs4_pnfs_ds *
> +ff_layout_choose_best_ds_for_read(struct nfs_pageio_descriptor *pgio,
> + int *best_idx)
> +{
> + struct nfs4_ff_layout_segment *fls;
> + struct nfs4_pnfs_ds *ds;
> + int idx;
> +
> + fls = FF_LAYOUT_LSEG(pgio->pg_lseg);
> + /* mirrors are sorted by efficiency */
> + for (idx = 0; idx < fls->mirror_array_cnt; idx++) {
> + ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, idx, false);
> + if (ds) {
> + *best_idx = idx;
> + return ds;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static void
> +ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
> + struct nfs_page *req)
> +{
> + struct nfs_pgio_mirror *pgm;
> + struct nfs4_ff_layout_mirror *mirror;
> + struct nfs4_pnfs_ds *ds;
> + int ds_idx;
> +
> + /* Use full layout for now */
> + if (!pgio->pg_lseg)
> + pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
> + req->wb_context,
> + 0,
> + NFS4_MAX_UINT64,
> + IOMODE_READ,
> + GFP_KERNEL);
> + /* If no lseg, fall back to read through mds */
> + if (pgio->pg_lseg == NULL)
> + goto out_mds;
> +
> + ds = ff_layout_choose_best_ds_for_read(pgio, &ds_idx);
> + if (!ds)
> + goto out_mds;
> + mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
> +
> + pgio->pg_mirror_idx = ds_idx;
> +
> + /* read always uses only one mirror - idx 0 for pgio layer */
> + pgm = &pgio->pg_mirrors[0];
> + pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
> +
> + return;
> +out_mds:
> + pnfs_put_lseg(pgio->pg_lseg);
> + pgio->pg_lseg = NULL;
> + nfs_pageio_reset_read_mds(pgio);
> +}
> +
> +static void
> +ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
> + struct nfs_page *req)
> +{
> + struct nfs4_ff_layout_mirror *mirror;
> + struct nfs_pgio_mirror *pgm;
> + struct nfs_commit_info cinfo;
> + struct nfs4_pnfs_ds *ds;
> + int i;
> + int status;
> +
> + if (!pgio->pg_lseg)
> + pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
> + req->wb_context,
> + 0,
> + NFS4_MAX_UINT64,
> + IOMODE_RW,
> + GFP_NOFS);
> + /* If no lseg, fall back to write through mds */
> + if (pgio->pg_lseg == NULL)
> + goto out_mds;
> +
> + nfs_init_cinfo(&cinfo, pgio->pg_inode, pgio->pg_dreq);
> + status = ff_layout_alloc_commit_info(pgio->pg_lseg, &cinfo, GFP_NOFS);
> + if (status < 0)
> + goto out_mds;
> +
> + /* Use a direct mapping of ds_idx to pgio mirror_idx */
> + if (WARN_ON_ONCE(pgio->pg_mirror_count !=
> + FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg)))
> + goto out_mds;
> +
> + for (i = 0; i < pgio->pg_mirror_count; i++) {
> + ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, i, true);
> + if (!ds)
> + goto out_mds;
> + pgm = &pgio->pg_mirrors[i];
> + mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
> + pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
> + }
> +
> + return;
> +
> +out_mds:
> + pnfs_put_lseg(pgio->pg_lseg);
> + pgio->pg_lseg = NULL;
> + nfs_pageio_reset_write_mds(pgio);
> +}
> +
> +static unsigned int
> +ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio,
> + struct nfs_page *req)
> +{
> + if (!pgio->pg_lseg)
> + pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
> + req->wb_context,
> + 0,
> + NFS4_MAX_UINT64,
> + IOMODE_RW,
> + GFP_NOFS);
> + if (pgio->pg_lseg)
> + return FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg);
> +
> + /* no lseg means that pnfs is not in use, so no mirroring here */
> + pnfs_put_lseg(pgio->pg_lseg);
> + pgio->pg_lseg = NULL;
> + nfs_pageio_reset_write_mds(pgio);
> + return 1;
> +}
> +
> +static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
> + .pg_init = ff_layout_pg_init_read,
> + .pg_test = pnfs_generic_pg_test,
> + .pg_doio = pnfs_generic_pg_readpages,
> + .pg_cleanup = pnfs_generic_pg_cleanup,
> +};
> +
> +static const struct nfs_pageio_ops ff_layout_pg_write_ops = {
> + .pg_init = ff_layout_pg_init_write,
> + .pg_test = pnfs_generic_pg_test,
> + .pg_doio = pnfs_generic_pg_writepages,
> + .pg_get_mirror_count = ff_layout_pg_get_mirror_count_write,
> + .pg_cleanup = pnfs_generic_pg_cleanup,
> +};
> +
> +static void ff_layout_reset_write(struct nfs_pgio_header *hdr, bool retry_pnfs)
> +{
> + struct rpc_task *task = &hdr->task;
> +
> + pnfs_layoutcommit_inode(hdr->inode, false);
> +
> + if (retry_pnfs) {
> + dprintk("%s Reset task %5u for i/o through pNFS "
> + "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
> + hdr->task.tk_pid,
> + hdr->inode->i_sb->s_id,
> + (unsigned long long)NFS_FILEID(hdr->inode),
> + hdr->args.count,
> + (unsigned long long)hdr->args.offset);
> +
> + if (!hdr->dreq) {
> + struct nfs_open_context *ctx;
> +
> + ctx = nfs_list_entry(hdr->pages.next)->wb_context;
> + set_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
> + hdr->completion_ops->error_cleanup(&hdr->pages);
> + } else {
> + nfs_direct_set_resched_writes(hdr->dreq);
> + }
> + return;
> + }
> +
> + if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
> + dprintk("%s Reset task %5u for i/o through MDS "
> + "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
> + hdr->task.tk_pid,
> + hdr->inode->i_sb->s_id,
> + (unsigned long long)NFS_FILEID(hdr->inode),
> + hdr->args.count,
> + (unsigned long long)hdr->args.offset);
> +
> + task->tk_status = pnfs_write_done_resend_to_mds(hdr);
> + }
> +}
> +
> +static void ff_layout_reset_read(struct nfs_pgio_header *hdr)
> +{
> + struct rpc_task *task = &hdr->task;
> +
> + pnfs_layoutcommit_inode(hdr->inode, false);
> +
> + if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
> + dprintk("%s Reset task %5u for i/o through MDS "
> + "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
> + hdr->task.tk_pid,
> + hdr->inode->i_sb->s_id,
> + (unsigned long long)NFS_FILEID(hdr->inode),
> + hdr->args.count,
> + (unsigned long long)hdr->args.offset);
> +
> + task->tk_status = pnfs_read_done_resend_to_mds(hdr);
> + }
> +}
> +
> +static int ff_layout_async_handle_error_v4(struct rpc_task *task,
> + struct nfs4_state *state,
> + struct nfs_client *clp,
> + struct pnfs_layout_segment *lseg,
> + int idx)
> +{
> + struct pnfs_layout_hdr *lo = lseg->pls_layout;
> + struct inode *inode = lo->plh_inode;
> + struct nfs_server *mds_server = NFS_SERVER(inode);
> +
> + struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
> + struct nfs_client *mds_client = mds_server->nfs_client;
> + struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
> +
> + if (task->tk_status >= 0)
> + return 0;
> +
> + switch (task->tk_status) {
> + /* MDS state errors */
> + case -NFS4ERR_DELEG_REVOKED:
> + case -NFS4ERR_ADMIN_REVOKED:
> + case -NFS4ERR_BAD_STATEID:
> + if (state == NULL)
> + break;
> + nfs_remove_bad_delegation(state->inode);
> + case -NFS4ERR_OPENMODE:
> + if (state == NULL)
> + break;
> + if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
> + goto out_bad_stateid;
> + goto wait_on_recovery;
> + case -NFS4ERR_EXPIRED:
> + if (state != NULL) {
> + if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
> + goto out_bad_stateid;
> + }
> + nfs4_schedule_lease_recovery(mds_client);
> + goto wait_on_recovery;
> + /* DS session errors */
> + case -NFS4ERR_BADSESSION:
> + case -NFS4ERR_BADSLOT:
> + case -NFS4ERR_BAD_HIGH_SLOT:
> + case -NFS4ERR_DEADSESSION:
> + case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
> + case -NFS4ERR_SEQ_FALSE_RETRY:
> + case -NFS4ERR_SEQ_MISORDERED:
> + dprintk("%s ERROR %d, Reset session. Exchangeid "
> + "flags 0x%x\n", __func__, task->tk_status,
> + clp->cl_exchange_flags);
> + nfs4_schedule_session_recovery(clp->cl_session, task->tk_status);
> + break;
> + case -NFS4ERR_DELAY:
> + case -NFS4ERR_GRACE:
> + rpc_delay(task, FF_LAYOUT_POLL_RETRY_MAX);
> + break;
> + case -NFS4ERR_RETRY_UNCACHED_REP:
> + break;
> + /* Invalidate Layout errors */
> + case -NFS4ERR_PNFS_NO_LAYOUT:
> + case -ESTALE: /* mapped NFS4ERR_STALE */
> + case -EBADHANDLE: /* mapped NFS4ERR_BADHANDLE */
> + case -EISDIR: /* mapped NFS4ERR_ISDIR */
> + case -NFS4ERR_FHEXPIRED:
> + case -NFS4ERR_WRONG_TYPE:
> + dprintk("%s Invalid layout error %d\n", __func__,
> + task->tk_status);
> + /*
> + * Destroy layout so new i/o will get a new layout.
> + * Layout will not be destroyed until all current lseg
> + * references are put. Mark layout as invalid to resend failed
> + * i/o and all i/o waiting on the slot table to the MDS until
> + * layout is destroyed and a new valid layout is obtained.
> + */
> + pnfs_destroy_layout(NFS_I(inode));
> + rpc_wake_up(&tbl->slot_tbl_waitq);
> + goto reset;
> + /* RPC connection errors */
> + case -ECONNREFUSED:
> + case -EHOSTDOWN:
> + case -EHOSTUNREACH:
> + case -ENETUNREACH:
> + case -EIO:
> + case -ETIMEDOUT:
> + case -EPIPE:
> + dprintk("%s DS connection error %d\n", __func__,
> + task->tk_status);
> + nfs4_mark_deviceid_unavailable(devid);
> + rpc_wake_up(&tbl->slot_tbl_waitq);
> + /* fall through */
> + default:
> + if (ff_layout_has_available_ds(lseg))
> + return -NFS4ERR_RESET_TO_PNFS;
> +reset:
> + dprintk("%s Retry through MDS. Error %d\n", __func__,
> + task->tk_status);
> + return -NFS4ERR_RESET_TO_MDS;
> + }
> +out:
> + task->tk_status = 0;
> + return -EAGAIN;
> +out_bad_stateid:
> + task->tk_status = -EIO;
> + return 0;
> +wait_on_recovery:
> + rpc_sleep_on(&mds_client->cl_rpcwaitq, task, NULL);
> + if (test_bit(NFS4CLNT_MANAGER_RUNNING, &mds_client->cl_state) == 0)
> + rpc_wake_up_queued_task(&mds_client->cl_rpcwaitq, task);
> + goto out;
> +}
> +
> +/* Retry all errors through either pNFS or MDS except for -EJUKEBOX */
> +static int ff_layout_async_handle_error_v3(struct rpc_task *task,
> + struct pnfs_layout_segment *lseg,
> + int idx)
> +{
> + struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
> +
> + if (task->tk_status >= 0)
> + return 0;
> +
> + if (task->tk_status != -EJUKEBOX) {
> + dprintk("%s DS connection error %d\n", __func__,
> + task->tk_status);
> + nfs4_mark_deviceid_unavailable(devid);
> + if (ff_layout_has_available_ds(lseg))
> + return -NFS4ERR_RESET_TO_PNFS;
> + else
> + return -NFS4ERR_RESET_TO_MDS;
> + }
> +
> + if (task->tk_status == -EJUKEBOX)
> + nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY);
> + task->tk_status = 0;
> + rpc_restart_call(task);
> + rpc_delay(task, NFS_JUKEBOX_RETRY_TIME);
> + return -EAGAIN;
> +}
> +
> +static int ff_layout_async_handle_error(struct rpc_task *task,
> + struct nfs4_state *state,
> + struct nfs_client *clp,
> + struct pnfs_layout_segment *lseg,
> + int idx)
> +{
> + int vers = clp->cl_nfs_mod->rpc_vers->number;
> +
> + switch (vers) {
> + case 3:
> + return ff_layout_async_handle_error_v3(task, lseg, idx);
> + case 4:
> + return ff_layout_async_handle_error_v4(task, state, clp,
> + lseg, idx);
> + default:
> + /* should never happen */
> + WARN_ON_ONCE(1);
> + return 0;
> + }
> +}
> +
> +static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
> + int idx, u64 offset, u64 length,
> + u32 status, int opnum)
> +{
> + struct nfs4_ff_layout_mirror *mirror;
> + int err;
> +
> + mirror = FF_LAYOUT_COMP(lseg, idx);
> + err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
> + mirror, offset, length, status, opnum,
> + GFP_NOIO);
> + dprintk("%s: err %d op %d status %u\n", __func__, err, opnum, status);
> +}
> +
> +/* NFS_PROTO call done callback routines */
> +
> +static int ff_layout_read_done_cb(struct rpc_task *task,
> + struct nfs_pgio_header *hdr)
> +{
> + struct inode *inode;
> + int err;
> +
> + trace_nfs4_pnfs_read(hdr, task->tk_status);
> + if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
> + hdr->res.op_status = NFS4ERR_NXIO;
> + if (task->tk_status < 0 && hdr->res.op_status)
> + ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
> + hdr->args.offset, hdr->args.count,
> + hdr->res.op_status, OP_READ);
> + err = ff_layout_async_handle_error(task, hdr->args.context->state,
> + hdr->ds_clp, hdr->lseg,
> + hdr->pgio_mirror_idx);
> +
> + switch (err) {
> + case -NFS4ERR_RESET_TO_PNFS:
> + set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
> + &hdr->lseg->pls_layout->plh_flags);
> + pnfs_read_resend_pnfs(hdr);
> + return task->tk_status;
> + case -NFS4ERR_RESET_TO_MDS:
> + inode = hdr->lseg->pls_layout->plh_inode;
> + pnfs_error_mark_layout_for_return(inode, hdr->lseg);
> + ff_layout_reset_read(hdr);
> + return task->tk_status;
> + case -EAGAIN:
> + rpc_restart_call_prepare(task);
> + return -EAGAIN;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * We reference the rpc_cred of the first WRITE that triggers the need for
> + * a LAYOUTCOMMIT, and use it to send the layoutcommit compound.
> + * rfc5661 is not clear about which credential should be used.
> + *
> + * Flexlayout client should treat DS replied FILE_SYNC as DATA_SYNC, so
> + * to follow http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
> + * we always send layoutcommit after DS writes.
> + */
> +static void
> +ff_layout_set_layoutcommit(struct nfs_pgio_header *hdr)
> +{
> + pnfs_set_layoutcommit(hdr);
> + dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino,
> + (unsigned long) NFS_I(hdr->inode)->layout->plh_lwb);
> +}
> +
> +static bool
> +ff_layout_reset_to_mds(struct pnfs_layout_segment *lseg, int idx)
> +{
> + /* No mirroring for now */
> + struct nfs4_deviceid_node *node = FF_LAYOUT_DEVID_NODE(lseg, idx);
> +
> + return ff_layout_test_devid_unavailable(node);
> +}
> +
> +static int ff_layout_read_prepare_common(struct rpc_task *task,
> + struct nfs_pgio_header *hdr)
> +{
> + if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
> + rpc_exit(task, -EIO);
> + return -EIO;
> + }
> + if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
> + dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
> + if (ff_layout_has_available_ds(hdr->lseg))
> + pnfs_read_resend_pnfs(hdr);
> + else
> + ff_layout_reset_read(hdr);
> + rpc_exit(task, 0);
> + return -EAGAIN;
> + }
> + hdr->pgio_done_cb = ff_layout_read_done_cb;
> +
> + return 0;
> +}
> +
> +/*
> + * Call ops for the async read/write cases
> + * In the case of dense layouts, the offset needs to be reset to its
> + * original value.
> + */
> +static void ff_layout_read_prepare_v3(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + if (ff_layout_read_prepare_common(task, hdr))
> + return;
> +
> + rpc_call_start(task);
> +}
> +
> +static int ff_layout_setup_sequence(struct nfs_client *ds_clp,
> + struct nfs4_sequence_args *args,
> + struct nfs4_sequence_res *res,
> + struct rpc_task *task)
> +{
> + if (ds_clp->cl_session)
> + return nfs41_setup_sequence(ds_clp->cl_session,
> + args,
> + res,
> + task);
> + return nfs40_setup_sequence(ds_clp->cl_slot_tbl,
> + args,
> + res,
> + task);
I'm not quite seeing how we would end up calling the NFS v4.0 function here.
> +}
> +
> +static void ff_layout_read_prepare_v4(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + if (ff_layout_read_prepare_common(task, hdr))
> + return;
> +
> + if (ff_layout_setup_sequence(hdr->ds_clp,
> + &hdr->args.seq_args,
> + &hdr->res.seq_res,
> + task))
> + return;
> +
> + if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
> + hdr->args.lock_context, FMODE_READ) == -EIO)
> + rpc_exit(task, -EIO); /* lost lock, terminate I/O */
> +}
> +
> +static void ff_layout_read_call_done(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
> +
> + if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
> + task->tk_status == 0) {
> + nfs4_sequence_done(task, &hdr->res.seq_res);
> + return;
> + }
> +
> + /* Note this may cause RPC to be resent */
> + hdr->mds_ops->rpc_call_done(task, hdr);
> +}
> +
> +static void ff_layout_read_count_stats(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + rpc_count_iostats_metrics(task,
> + &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_READ]);
> +}
> +
> +static int ff_layout_write_done_cb(struct rpc_task *task,
> + struct nfs_pgio_header *hdr)
> +{
> + struct inode *inode;
> + int err;
> +
> + trace_nfs4_pnfs_write(hdr, task->tk_status);
> + if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
> + hdr->res.op_status = NFS4ERR_NXIO;
> + if (task->tk_status < 0 && hdr->res.op_status)
> + ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
> + hdr->args.offset, hdr->args.count,
> + hdr->res.op_status, OP_WRITE);
> + err = ff_layout_async_handle_error(task, hdr->args.context->state,
> + hdr->ds_clp, hdr->lseg,
> + hdr->pgio_mirror_idx);
> +
> + switch (err) {
> + case -NFS4ERR_RESET_TO_PNFS:
> + case -NFS4ERR_RESET_TO_MDS:
> + inode = hdr->lseg->pls_layout->plh_inode;
> + pnfs_error_mark_layout_for_return(inode, hdr->lseg);
> + if (err == -NFS4ERR_RESET_TO_PNFS) {
> + pnfs_set_retry_layoutget(hdr->lseg->pls_layout);
> + ff_layout_reset_write(hdr, true);
> + } else {
> + pnfs_clear_retry_layoutget(hdr->lseg->pls_layout);
> + ff_layout_reset_write(hdr, false);
> + }
> + return task->tk_status;
> + case -EAGAIN:
> + rpc_restart_call_prepare(task);
> + return -EAGAIN;
> + }
> +
> + if (hdr->res.verf->committed == NFS_FILE_SYNC ||
> + hdr->res.verf->committed == NFS_DATA_SYNC)
> + ff_layout_set_layoutcommit(hdr);
> +
> + return 0;
> +}
> +
> +static int ff_layout_commit_done_cb(struct rpc_task *task,
> + struct nfs_commit_data *data)
> +{
> + struct inode *inode;
> + int err;
> +
> + trace_nfs4_pnfs_commit_ds(data, task->tk_status);
> + if (task->tk_status == -ETIMEDOUT && !data->res.op_status)
> + data->res.op_status = NFS4ERR_NXIO;
> + if (task->tk_status < 0 && data->res.op_status)
> + ff_layout_io_track_ds_error(data->lseg, data->ds_commit_index,
> + data->args.offset, data->args.count,
> + data->res.op_status, OP_COMMIT);
> + err = ff_layout_async_handle_error(task, NULL, data->ds_clp,
> + data->lseg, data->ds_commit_index);
> +
> + switch (err) {
> + case -NFS4ERR_RESET_TO_PNFS:
> + case -NFS4ERR_RESET_TO_MDS:
> + inode = data->lseg->pls_layout->plh_inode;
> + pnfs_error_mark_layout_for_return(inode, data->lseg);
> + if (err == -NFS4ERR_RESET_TO_PNFS)
> + pnfs_set_retry_layoutget(data->lseg->pls_layout);
> + else
> + pnfs_clear_retry_layoutget(data->lseg->pls_layout);
> + pnfs_generic_prepare_to_resend_writes(data);
> + return -EAGAIN;
> + case -EAGAIN:
> + rpc_restart_call_prepare(task);
> + return -EAGAIN;
> + }
> +
> + if (data->verf.committed == NFS_UNSTABLE)
> + pnfs_commit_set_layoutcommit(data);
> +
> + return 0;
> +}
> +
> +static int ff_layout_write_prepare_common(struct rpc_task *task,
> + struct nfs_pgio_header *hdr)
> +{
> + if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
> + rpc_exit(task, -EIO);
> + return -EIO;
> + }
> +
> + if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
> + bool retry_pnfs;
> +
> + retry_pnfs = ff_layout_has_available_ds(hdr->lseg);
> + dprintk("%s task %u reset io to %s\n", __func__,
> + task->tk_pid, retry_pnfs ? "pNFS" : "MDS");
> + ff_layout_reset_write(hdr, retry_pnfs);
> + rpc_exit(task, 0);
> + return -EAGAIN;
> + }
> +
> + return 0;
> +}
> +
> +static void ff_layout_write_prepare_v3(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + if (ff_layout_write_prepare_common(task, hdr))
> + return;
> +
> + rpc_call_start(task);
> +}
> +
> +static void ff_layout_write_prepare_v4(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + if (ff_layout_write_prepare_common(task, hdr))
> + return;
> +
> + if (ff_layout_setup_sequence(hdr->ds_clp,
> + &hdr->args.seq_args,
> + &hdr->res.seq_res,
> + task))
> + return;
> +
> + if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
> + hdr->args.lock_context, FMODE_WRITE) == -EIO)
> + rpc_exit(task, -EIO); /* lost lock, terminate I/O */
> +}
> +
> +static void ff_layout_write_call_done(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
> + task->tk_status == 0) {
> + nfs4_sequence_done(task, &hdr->res.seq_res);
> + return;
> + }
> +
> + /* Note this may cause RPC to be resent */
> + hdr->mds_ops->rpc_call_done(task, hdr);
> +}
> +
> +static void ff_layout_write_count_stats(struct rpc_task *task, void *data)
> +{
> + struct nfs_pgio_header *hdr = data;
> +
> + rpc_count_iostats_metrics(task,
> + &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_WRITE]);
> +}
> +
> +static void ff_layout_commit_prepare_v3(struct rpc_task *task, void *data)
> +{
> + rpc_call_start(task);
> +}
> +
> +static void ff_layout_commit_prepare_v4(struct rpc_task *task, void *data)
> +{
> + struct nfs_commit_data *wdata = data;
> +
> + ff_layout_setup_sequence(wdata->ds_clp,
> + &wdata->args.seq_args,
> + &wdata->res.seq_res,
> + task);
> +}
> +
> +static void ff_layout_commit_count_stats(struct rpc_task *task, void *data)
> +{
> + struct nfs_commit_data *cdata = data;
> +
> + rpc_count_iostats_metrics(task,
> + &NFS_CLIENT(cdata->inode)->cl_metrics[NFSPROC4_CLNT_COMMIT]);
> +}
> +
> +static const struct rpc_call_ops ff_layout_read_call_ops_v3 = {
> + .rpc_call_prepare = ff_layout_read_prepare_v3,
> + .rpc_call_done = ff_layout_read_call_done,
> + .rpc_count_stats = ff_layout_read_count_stats,
> + .rpc_release = pnfs_generic_rw_release,
> +};
> +
> +static const struct rpc_call_ops ff_layout_read_call_ops_v4 = {
> + .rpc_call_prepare = ff_layout_read_prepare_v4,
> + .rpc_call_done = ff_layout_read_call_done,
> + .rpc_count_stats = ff_layout_read_count_stats,
> + .rpc_release = pnfs_generic_rw_release,
> +};
> +
> +static const struct rpc_call_ops ff_layout_write_call_ops_v3 = {
> + .rpc_call_prepare = ff_layout_write_prepare_v3,
> + .rpc_call_done = ff_layout_write_call_done,
> + .rpc_count_stats = ff_layout_write_count_stats,
> + .rpc_release = pnfs_generic_rw_release,
> +};
> +
> +static const struct rpc_call_ops ff_layout_write_call_ops_v4 = {
> + .rpc_call_prepare = ff_layout_write_prepare_v4,
> + .rpc_call_done = ff_layout_write_call_done,
> + .rpc_count_stats = ff_layout_write_count_stats,
> + .rpc_release = pnfs_generic_rw_release,
> +};
> +
> +static const struct rpc_call_ops ff_layout_commit_call_ops_v3 = {
> + .rpc_call_prepare = ff_layout_commit_prepare_v3,
> + .rpc_call_done = pnfs_generic_write_commit_done,
> + .rpc_count_stats = ff_layout_commit_count_stats,
> + .rpc_release = pnfs_generic_commit_release,
> +};
> +
> +static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
> + .rpc_call_prepare = ff_layout_commit_prepare_v4,
> + .rpc_call_done = pnfs_generic_write_commit_done,
> + .rpc_count_stats = ff_layout_commit_count_stats,
> + .rpc_release = pnfs_generic_commit_release,
> +};
> +
> +static enum pnfs_try_status
> +ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
> +{
> + struct pnfs_layout_segment *lseg = hdr->lseg;
> + struct nfs4_pnfs_ds *ds;
> + struct rpc_clnt *ds_clnt;
> + struct rpc_cred *ds_cred;
> + loff_t offset = hdr->args.offset;
> + u32 idx = hdr->pgio_mirror_idx;
> + int vers;
> + struct nfs_fh *fh;
> +
> + dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
> + __func__, hdr->inode->i_ino,
> + hdr->args.pgbase, (size_t)hdr->args.count, offset);
> +
> + ds = nfs4_ff_layout_prepare_ds(lseg, idx, false);
> + if (!ds)
> + goto out_failed;
> +
> + ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
> + hdr->inode);
> + if (IS_ERR(ds_clnt))
> + goto out_failed;
> +
> + ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
> + if (IS_ERR(ds_cred))
> + goto out_failed;
> +
> + vers = nfs4_ff_layout_ds_version(lseg, idx);
> +
> + dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
> + ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count), vers);
> +
> + atomic_inc(&ds->ds_clp->cl_count);
> + hdr->ds_clp = ds->ds_clp;
> + fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
> + if (fh)
> + hdr->args.fh = fh;
> +
> + /*
> + * Note that if we ever decide to split across DSes,
> + * then we may need to handle dense-like offsets.
> + */
> + hdr->args.offset = offset;
> + hdr->mds_offset = offset;
> +
> + /* Perform an asynchronous read to ds */
> + nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
> + vers == 3 ? &ff_layout_read_call_ops_v3 :
> + &ff_layout_read_call_ops_v4,
> + 0, RPC_TASK_SOFTCONN);
> +
> + return PNFS_ATTEMPTED;
> +
> +out_failed:
> + if (ff_layout_has_available_ds(lseg))
> + return PNFS_TRY_AGAIN;
> + return PNFS_NOT_ATTEMPTED;
> +}
> +
> +/* Perform async writes. */
> +static enum pnfs_try_status
> +ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
> +{
> + struct pnfs_layout_segment *lseg = hdr->lseg;
> + struct nfs4_pnfs_ds *ds;
> + struct rpc_clnt *ds_clnt;
> + struct rpc_cred *ds_cred;
> + loff_t offset = hdr->args.offset;
> + int vers;
> + struct nfs_fh *fh;
> + int idx = hdr->pgio_mirror_idx;
> +
> + ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
> + if (!ds)
> + return PNFS_NOT_ATTEMPTED;
> +
> + ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
> + hdr->inode);
> + if (IS_ERR(ds_clnt))
> + return PNFS_NOT_ATTEMPTED;
> +
> + ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
> + if (IS_ERR(ds_cred))
> + return PNFS_NOT_ATTEMPTED;
> +
> + vers = nfs4_ff_layout_ds_version(lseg, idx);
> +
> + dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d vers %d\n",
> + __func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
> + offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count),
> + vers);
> +
> + hdr->pgio_done_cb = ff_layout_write_done_cb;
> + atomic_inc(&ds->ds_clp->cl_count);
> + hdr->ds_clp = ds->ds_clp;
> + hdr->ds_commit_idx = idx;
> + fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
> + if (fh)
> + hdr->args.fh = fh;
> +
> + /*
> + * Note that if we ever decide to split across DSes,
> + * then we may need to handle dense-like offsets.
> + */
> + hdr->args.offset = offset;
> +
> + /* Perform an asynchronous write */
> + nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
> + vers == 3 ? &ff_layout_write_call_ops_v3 :
> + &ff_layout_write_call_ops_v4,
> + sync, RPC_TASK_SOFTCONN);
> + return PNFS_ATTEMPTED;
> +}
> +
> +static void
> +ff_layout_mark_request_commit(struct nfs_page *req,
> + struct pnfs_layout_segment *lseg,
> + struct nfs_commit_info *cinfo,
> + u32 ds_commit_idx)
> +{
> + struct list_head *list;
> + struct pnfs_commit_bucket *buckets;
> +
> + spin_lock(cinfo->lock);
> + buckets = cinfo->ds->buckets;
> + list = &buckets[ds_commit_idx].written;
> + if (list_empty(list)) {
> + /* Non-empty buckets hold a reference on the lseg. That ref
> + * is normally transferred to the COMMIT call and released
> + * there. It could also be released if the last req is pulled
> + * off due to a rewrite, in which case it will be done in
> + * pnfs_common_clear_request_commit
> + */
> + WARN_ON_ONCE(buckets[ds_commit_idx].wlseg != NULL);
> + buckets[ds_commit_idx].wlseg = pnfs_get_lseg(lseg);
> + }
> + set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
> + cinfo->ds->nwritten++;
> +
> + /* nfs_request_add_commit_list(). We need to add req to list without
> + * dropping cinfo lock.
> + */
> + set_bit(PG_CLEAN, &(req)->wb_flags);
> + nfs_list_add_request(req, list);
> + cinfo->mds->ncommit++;
> + spin_unlock(cinfo->lock);
> + if (!cinfo->dreq) {
> + inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
> + inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
> + BDI_RECLAIMABLE);
> + __mark_inode_dirty(req->wb_context->dentry->d_inode,
> + I_DIRTY_DATASYNC);
> + }
> +}
> +
> +static u32 calc_ds_index_from_commit(struct pnfs_layout_segment *lseg, u32 i)
> +{
> + return i;
> +}
Is calc_ds_index_from_commit() this something that will be expanded on later?
> +
> +static struct nfs_fh *
> +select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
> +{
> + struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
> +
> + /* FIXME: Assume that there is only one NFS version available
> + * for the DS.
> + */
> + return &flseg->mirror_array[i]->fh_versions[0];
> +}
> +
> +static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
> +{
> + struct pnfs_layout_segment *lseg = data->lseg;
> + struct nfs4_pnfs_ds *ds;
> + struct rpc_clnt *ds_clnt;
> + struct rpc_cred *ds_cred;
> + u32 idx;
> + int vers;
> + struct nfs_fh *fh;
> +
> + idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
> + ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
> + if (!ds)
> + goto out_err;
> +
> + ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
> + data->inode);
> + if (IS_ERR(ds_clnt))
> + goto out_err;
> +
> + ds_cred = ff_layout_get_ds_cred(lseg, idx, data->cred);
> + if (IS_ERR(ds_cred))
> + goto out_err;
> +
> + vers = nfs4_ff_layout_ds_version(lseg, idx);
> +
> + dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
> + data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count),
> + vers);
> + data->commit_done_cb = ff_layout_commit_done_cb;
> + data->cred = ds_cred;
> + atomic_inc(&ds->ds_clp->cl_count);
> + data->ds_clp = ds->ds_clp;
> + fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
> + if (fh)
> + data->args.fh = fh;
> + return nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
> + vers == 3 ? &ff_layout_commit_call_ops_v3 :
> + &ff_layout_commit_call_ops_v4,
> + how, RPC_TASK_SOFTCONN);
> +out_err:
> + pnfs_generic_prepare_to_resend_writes(data);
> + pnfs_generic_commit_release(data);
> + return -EAGAIN;
> +}
> +
> +static int
> +ff_layout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
> + int how, struct nfs_commit_info *cinfo)
> +{
> + return pnfs_generic_commit_pagelist(inode, mds_pages, how, cinfo,
> + ff_layout_initiate_commit);
> +}
> +
> +static struct pnfs_ds_commit_info *
> +ff_layout_get_ds_info(struct inode *inode)
> +{
> + struct pnfs_layout_hdr *layout = NFS_I(inode)->layout;
> +
> + if (layout == NULL)
> + return NULL;
> + else
^^^^
Nit: We don't need the else here.
Thanks,
Anna
> + return &FF_LAYOUT_FROM_HDR(layout)->commit_info;
> +}
> +
> +static void
> +ff_layout_free_deveiceid_node(struct nfs4_deviceid_node *d)
> +{
> + nfs4_ff_layout_free_deviceid(container_of(d, struct nfs4_ff_layout_ds,
> + id_node));
> +}
> +
> +static int ff_layout_encode_ioerr(struct nfs4_flexfile_layout *flo,
> + struct xdr_stream *xdr,
> + const struct nfs4_layoutreturn_args *args)
> +{
> + struct pnfs_layout_hdr *hdr = &flo->generic_hdr;
> + __be32 *start;
> + int count = 0, ret = 0;
> +
> + start = xdr_reserve_space(xdr, 4);
> + if (unlikely(!start))
> + return -E2BIG;
> +
> + /* This assume we always return _ALL_ layouts */
> + spin_lock(&hdr->plh_inode->i_lock);
> + ret = ff_layout_encode_ds_ioerr(flo, xdr, &count, &args->range);
> + spin_unlock(&hdr->plh_inode->i_lock);
> +
> + *start = cpu_to_be32(count);
> +
> + return ret;
> +}
> +
> +/* report nothing for now */
> +static void ff_layout_encode_iostats(struct nfs4_flexfile_layout *flo,
> + struct xdr_stream *xdr,
> + const struct nfs4_layoutreturn_args *args)
> +{
> + __be32 *p;
> +
> + p = xdr_reserve_space(xdr, 4);
> + if (likely(p))
> + *p = cpu_to_be32(0);
> +}
> +
> +static struct nfs4_deviceid_node *
> +ff_layout_alloc_deviceid_node(struct nfs_server *server,
> + struct pnfs_device *pdev, gfp_t gfp_flags)
> +{
> + struct nfs4_ff_layout_ds *dsaddr;
> +
> + dsaddr = nfs4_ff_alloc_deviceid_node(server, pdev, gfp_flags);
> + if (!dsaddr)
> + return NULL;
> + return &dsaddr->id_node;
> +}
> +
> +static void
> +ff_layout_encode_layoutreturn(struct pnfs_layout_hdr *lo,
> + struct xdr_stream *xdr,
> + const struct nfs4_layoutreturn_args *args)
> +{
> + struct nfs4_flexfile_layout *flo = FF_LAYOUT_FROM_HDR(lo);
> + __be32 *start;
> +
> + dprintk("%s: Begin\n", __func__);
> + start = xdr_reserve_space(xdr, 4);
> + BUG_ON(!start);
> +
> + if (ff_layout_encode_ioerr(flo, xdr, args))
> + goto out;
> +
> + ff_layout_encode_iostats(flo, xdr, args);
> +out:
> + *start = cpu_to_be32((xdr->p - start - 1) * 4);
> + dprintk("%s: Return\n", __func__);
> +}
> +
> +static struct pnfs_layoutdriver_type flexfilelayout_type = {
> + .id = LAYOUT_FLEX_FILES,
> + .name = "LAYOUT_FLEX_FILES",
> + .owner = THIS_MODULE,
> + .alloc_layout_hdr = ff_layout_alloc_layout_hdr,
> + .free_layout_hdr = ff_layout_free_layout_hdr,
> + .alloc_lseg = ff_layout_alloc_lseg,
> + .free_lseg = ff_layout_free_lseg,
> + .pg_read_ops = &ff_layout_pg_read_ops,
> + .pg_write_ops = &ff_layout_pg_write_ops,
> + .get_ds_info = ff_layout_get_ds_info,
> + .free_deviceid_node = ff_layout_free_deveiceid_node,
> + .mark_request_commit = ff_layout_mark_request_commit,
> + .clear_request_commit = pnfs_generic_clear_request_commit,
> + .scan_commit_lists = pnfs_generic_scan_commit_lists,
> + .recover_commit_reqs = pnfs_generic_recover_commit_reqs,
> + .commit_pagelist = ff_layout_commit_pagelist,
> + .read_pagelist = ff_layout_read_pagelist,
> + .write_pagelist = ff_layout_write_pagelist,
> + .alloc_deviceid_node = ff_layout_alloc_deviceid_node,
> + .encode_layoutreturn = ff_layout_encode_layoutreturn,
> +};
> +
> +static int __init nfs4flexfilelayout_init(void)
> +{
> + printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Registering...\n",
> + __func__);
> + return pnfs_register_layoutdriver(&flexfilelayout_type);
> +}
> +
> +static void __exit nfs4flexfilelayout_exit(void)
> +{
> + printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Unregistering...\n",
> + __func__);
> + pnfs_unregister_layoutdriver(&flexfilelayout_type);
> +}
> +
> +MODULE_ALIAS("nfs-layouttype4-4");
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("The NFSv4 flexfile layout driver");
> +
> +module_init(nfs4flexfilelayout_init);
> +module_exit(nfs4flexfilelayout_exit);
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
> new file mode 100644
> index 0000000..712fc55
> --- /dev/null
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.h
> @@ -0,0 +1,158 @@
> +/*
> + * NFSv4 flexfile layout driver data structures.
> + *
> + * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
> + *
> + * Tao Peng <[email protected]>
> + */
> +
> +#ifndef FS_NFS_NFS4FLEXFILELAYOUT_H
> +#define FS_NFS_NFS4FLEXFILELAYOUT_H
> +
> +#include "../pnfs.h"
> +
> +/* XXX: Let's filter out insanely large mirror count for now to avoid oom
> + * due to network error etc. */
> +#define NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT 4096
> +
> +struct nfs4_ff_ds_version {
> + u32 version;
> + u32 minor_version;
> + u32 rsize;
> + u32 wsize;
> + bool tightly_coupled;
> +};
> +
> +/* chained in global deviceid hlist */
> +struct nfs4_ff_layout_ds {
> + struct nfs4_deviceid_node id_node;
> + u32 ds_versions_cnt;
> + struct nfs4_ff_ds_version *ds_versions;
> + struct nfs4_pnfs_ds *ds;
> +};
> +
> +struct nfs4_ff_layout_ds_err {
> + struct list_head list; /* linked in mirror error_list */
> + u64 offset;
> + u64 length;
> + int status;
> + enum nfs_opnum4 opnum;
> + nfs4_stateid stateid;
> + struct nfs4_deviceid deviceid;
> +};
> +
> +struct nfs4_ff_layout_mirror {
> + u32 ds_count;
> + u32 efficiency;
> + struct nfs4_ff_layout_ds *mirror_ds;
> + u32 fh_versions_cnt;
> + struct nfs_fh *fh_versions;
> + nfs4_stateid stateid;
> + union {
> + struct { /* same as struct unx_cred */
> + u32 uid; /* -1 iff AUTH_NONE */
> + u32 gid; /* -1 iff AUTH_NONE */
> + u32 gids[16];
> + };
> + };
> + struct rpc_cred *cred;
> + spinlock_t lock;
> +};
> +
> +struct nfs4_ff_layout_segment {
> + struct pnfs_layout_segment generic_hdr;
> + u64 stripe_unit;
> + u32 mirror_array_cnt;
> + struct nfs4_ff_layout_mirror **mirror_array;
> +};
> +
> +struct nfs4_flexfile_layout {
> + struct pnfs_layout_hdr generic_hdr;
> + struct pnfs_ds_commit_info commit_info;
> + struct list_head error_list; /* nfs4_ff_layout_ds_err */
> +};
> +
> +static inline struct nfs4_flexfile_layout *
> +FF_LAYOUT_FROM_HDR(struct pnfs_layout_hdr *lo)
> +{
> + return container_of(lo, struct nfs4_flexfile_layout, generic_hdr);
> +}
> +
> +static inline struct nfs4_ff_layout_segment *
> +FF_LAYOUT_LSEG(struct pnfs_layout_segment *lseg)
> +{
> + return container_of(lseg,
> + struct nfs4_ff_layout_segment,
> + generic_hdr);
> +}
> +
> +static inline struct nfs4_deviceid_node *
> +FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
> +{
> + if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt ||
> + FF_LAYOUT_LSEG(lseg)->mirror_array[idx] == NULL ||
> + FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds == NULL)
> + return NULL;
> + return &FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds->id_node;
> +}
> +
> +static inline struct nfs4_ff_layout_ds *
> +FF_LAYOUT_MIRROR_DS(struct nfs4_deviceid_node *node)
> +{
> + return container_of(node, struct nfs4_ff_layout_ds, id_node);
> +}
> +
> +static inline struct nfs4_ff_layout_mirror *
> +FF_LAYOUT_COMP(struct pnfs_layout_segment *lseg, u32 idx)
> +{
> + if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt)
> + return NULL;
> + return FF_LAYOUT_LSEG(lseg)->mirror_array[idx];
> +}
> +
> +static inline u32
> +FF_LAYOUT_MIRROR_COUNT(struct pnfs_layout_segment *lseg)
> +{
> + return FF_LAYOUT_LSEG(lseg)->mirror_array_cnt;
> +}
> +
> +static inline bool
> +ff_layout_test_devid_unavailable(struct nfs4_deviceid_node *node)
> +{
> + return nfs4_test_deviceid_unavailable(node);
> +}
> +
> +static inline int
> +nfs4_ff_layout_ds_version(struct pnfs_layout_segment *lseg, u32 ds_idx)
> +{
> + return FF_LAYOUT_COMP(lseg, ds_idx)->mirror_ds->ds_versions[0].version;
> +}
> +
> +struct nfs4_ff_layout_ds *
> +nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
> + gfp_t gfp_flags);
> +void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
> +void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
> +int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
> + struct nfs4_ff_layout_mirror *mirror, u64 offset,
> + u64 length, int status, enum nfs_opnum4 opnum,
> + gfp_t gfp_flags);
> +int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
> + struct xdr_stream *xdr, int *count,
> + const struct pnfs_layout_range *range);
> +struct nfs_fh *
> +nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx);
> +
> +struct nfs4_pnfs_ds *
> +nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
> + bool fail_return);
> +
> +struct rpc_clnt *
> +nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg,
> + u32 ds_idx,
> + struct nfs_client *ds_clp,
> + struct inode *inode);
> +struct rpc_cred *ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg,
> + u32 ds_idx, struct rpc_cred *mdscred);
> +bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg);
> +#endif /* FS_NFS_NFS4FLEXFILELAYOUT_H */
> diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> new file mode 100644
> index 0000000..5dae5c2
> --- /dev/null
> +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> @@ -0,0 +1,552 @@
> +/*
> + * Device operations for the pnfs nfs4 file layout driver.
> + *
> + * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
> + *
> + * Tao Peng <[email protected]>
> + */
> +
> +#include <linux/nfs_fs.h>
> +#include <linux/vmalloc.h>
> +#include <linux/module.h>
> +#include <linux/sunrpc/addr.h>
> +
> +#include "../internal.h"
> +#include "../nfs4session.h"
> +#include "flexfilelayout.h"
> +
> +#define NFSDBG_FACILITY NFSDBG_PNFS_LD
> +
> +static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
> +static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
> +
> +void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
> +{
> + if (mirror_ds)
> + nfs4_put_deviceid_node(&mirror_ds->id_node);
> +}
> +
> +void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
> +{
> + nfs4_print_deviceid(&mirror_ds->id_node.deviceid);
> + nfs4_pnfs_ds_put(mirror_ds->ds);
> + kfree(mirror_ds);
> +}
> +
> +/* Decode opaque device data and construct new_ds using it */
> +struct nfs4_ff_layout_ds *
> +nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
> + gfp_t gfp_flags)
> +{
> + struct xdr_stream stream;
> + struct xdr_buf buf;
> + struct page *scratch;
> + struct list_head dsaddrs;
> + struct nfs4_pnfs_ds_addr *da;
> + struct nfs4_ff_layout_ds *new_ds = NULL;
> + struct nfs4_ff_ds_version *ds_versions = NULL;
> + u32 mp_count;
> + u32 version_count;
> + __be32 *p;
> + int i, ret = -ENOMEM;
> +
> + /* set up xdr stream */
> + scratch = alloc_page(gfp_flags);
> + if (!scratch)
> + goto out_err;
> +
> + new_ds = kzalloc(sizeof(struct nfs4_ff_layout_ds), gfp_flags);
> + if (!new_ds)
> + goto out_scratch;
> +
> + nfs4_init_deviceid_node(&new_ds->id_node,
> + server,
> + &pdev->dev_id);
> + INIT_LIST_HEAD(&dsaddrs);
> +
> + xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen);
> + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
> +
> + /* multipath count */
> + p = xdr_inline_decode(&stream, 4);
> + if (unlikely(!p))
> + goto out_err_drain_dsaddrs;
> + mp_count = be32_to_cpup(p);
> + dprintk("%s: multipath ds count %d\n", __func__, mp_count);
> +
> + for (i = 0; i < mp_count; i++) {
> + /* multipath ds */
> + da = nfs4_decode_mp_ds_addr(server->nfs_client->cl_net,
> + &stream, gfp_flags);
> + if (da)
> + list_add_tail(&da->da_node, &dsaddrs);
> + }
> + if (list_empty(&dsaddrs)) {
> + dprintk("%s: no suitable DS addresses found\n",
> + __func__);
> + ret = -ENOMEDIUM;
> + goto out_err_drain_dsaddrs;
> + }
> +
> + /* version count */
> + p = xdr_inline_decode(&stream, 4);
> + if (unlikely(!p))
> + goto out_err_drain_dsaddrs;
> + version_count = be32_to_cpup(p);
> + dprintk("%s: version count %d\n", __func__, version_count);
> +
> + ds_versions = kzalloc(version_count * sizeof(struct nfs4_ff_ds_version),
> + gfp_flags);
> + if (!ds_versions)
> + goto out_scratch;
> +
> + for (i = 0; i < version_count; i++) {
> + /* 20 = version(4) + minor_version(4) + rsize(4) + wsize(4) +
> + * tightly_coupled(4) */
> + p = xdr_inline_decode(&stream, 20);
> + if (unlikely(!p))
> + goto out_err_drain_dsaddrs;
> + ds_versions[i].version = be32_to_cpup(p++);
> + ds_versions[i].minor_version = be32_to_cpup(p++);
> + ds_versions[i].rsize = nfs_block_size(be32_to_cpup(p++), NULL);
> + ds_versions[i].wsize = nfs_block_size(be32_to_cpup(p++), NULL);
> + ds_versions[i].tightly_coupled = be32_to_cpup(p);
> +
> + if (ds_versions[i].rsize > NFS_MAX_FILE_IO_SIZE)
> + ds_versions[i].rsize = NFS_MAX_FILE_IO_SIZE;
> + if (ds_versions[i].wsize > NFS_MAX_FILE_IO_SIZE)
> + ds_versions[i].wsize = NFS_MAX_FILE_IO_SIZE;
> +
> + if (ds_versions[i].version != 3 || ds_versions[i].minor_version != 0) {
> + dprintk("%s: [%d] unsupported ds version %d-%d\n", __func__,
> + i, ds_versions[i].version,
> + ds_versions[i].minor_version);
> + ret = -EPROTONOSUPPORT;
> + goto out_err_drain_dsaddrs;
> + }
> +
> + dprintk("%s: [%d] vers %u minor_ver %u rsize %u wsize %u coupled %d\n",
> + __func__, i, ds_versions[i].version,
> + ds_versions[i].minor_version,
> + ds_versions[i].rsize,
> + ds_versions[i].wsize,
> + ds_versions[i].tightly_coupled);
> + }
> +
> + new_ds->ds_versions = ds_versions;
> + new_ds->ds_versions_cnt = version_count;
> +
> + new_ds->ds = nfs4_pnfs_ds_add(&dsaddrs, gfp_flags);
> + if (!new_ds->ds)
> + goto out_err_drain_dsaddrs;
> +
> + /* If DS was already in cache, free ds addrs */
> + while (!list_empty(&dsaddrs)) {
> + da = list_first_entry(&dsaddrs,
> + struct nfs4_pnfs_ds_addr,
> + da_node);
> + list_del_init(&da->da_node);
> + kfree(da->da_remotestr);
> + kfree(da);
> + }
> +
> + __free_page(scratch);
> + return new_ds;
> +
> +out_err_drain_dsaddrs:
> + while (!list_empty(&dsaddrs)) {
> + da = list_first_entry(&dsaddrs, struct nfs4_pnfs_ds_addr,
> + da_node);
> + list_del_init(&da->da_node);
> + kfree(da->da_remotestr);
> + kfree(da);
> + }
> +
> + kfree(ds_versions);
> +out_scratch:
> + __free_page(scratch);
> +out_err:
> + kfree(new_ds);
> +
> + dprintk("%s ERROR: returning %d\n", __func__, ret);
> + return NULL;
> +}
> +
> +static u64
> +end_offset(u64 start, u64 len)
> +{
> + u64 end;
> +
> + end = start + len;
> + return end >= start ? end : NFS4_MAX_UINT64;
> +}
> +
> +static void extend_ds_error(struct nfs4_ff_layout_ds_err *err,
> + u64 offset, u64 length)
> +{
> + u64 end;
> +
> + end = max_t(u64, end_offset(err->offset, err->length),
> + end_offset(offset, length));
> + err->offset = min_t(u64, err->offset, offset);
> + err->length = end - err->offset;
> +}
> +
> +static bool ds_error_can_merge(struct nfs4_ff_layout_ds_err *err, u64 offset,
> + u64 length, int status, enum nfs_opnum4 opnum,
> + nfs4_stateid *stateid,
> + struct nfs4_deviceid *deviceid)
> +{
> + return err->status == status && err->opnum == opnum &&
> + nfs4_stateid_match(&err->stateid, stateid) &&
> + !memcmp(&err->deviceid, deviceid, sizeof(*deviceid)) &&
> + end_offset(err->offset, err->length) >= offset &&
> + err->offset <= end_offset(offset, length);
> +}
> +
> +static bool merge_ds_error(struct nfs4_ff_layout_ds_err *old,
> + struct nfs4_ff_layout_ds_err *new)
> +{
> + if (!ds_error_can_merge(old, new->offset, new->length, new->status,
> + new->opnum, &new->stateid, &new->deviceid))
> + return false;
> +
> + extend_ds_error(old, new->offset, new->length);
> + return true;
> +}
> +
> +static bool
> +ff_layout_add_ds_error_locked(struct nfs4_flexfile_layout *flo,
> + struct nfs4_ff_layout_ds_err *dserr)
> +{
> + struct nfs4_ff_layout_ds_err *err;
> +
> + list_for_each_entry(err, &flo->error_list, list) {
> + if (merge_ds_error(err, dserr)) {
> + return true;
> + }
> + }
> +
> + list_add(&dserr->list, &flo->error_list);
> + return false;
> +}
> +
> +static bool
> +ff_layout_update_ds_error(struct nfs4_flexfile_layout *flo, u64 offset,
> + u64 length, int status, enum nfs_opnum4 opnum,
> + nfs4_stateid *stateid, struct nfs4_deviceid *deviceid)
> +{
> + bool found = false;
> + struct nfs4_ff_layout_ds_err *err;
> +
> + list_for_each_entry(err, &flo->error_list, list) {
> + if (ds_error_can_merge(err, offset, length, status, opnum,
> + stateid, deviceid)) {
> + found = true;
> + extend_ds_error(err, offset, length);
> + break;
> + }
> + }
> +
> + return found;
> +}
> +
> +int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
> + struct nfs4_ff_layout_mirror *mirror, u64 offset,
> + u64 length, int status, enum nfs_opnum4 opnum,
> + gfp_t gfp_flags)
> +{
> + struct nfs4_ff_layout_ds_err *dserr;
> + bool needfree;
> +
> + if (status == 0)
> + return 0;
> +
> + if (mirror->mirror_ds == NULL)
> + return -EINVAL;
> +
> + spin_lock(&flo->generic_hdr.plh_inode->i_lock);
> + if (ff_layout_update_ds_error(flo, offset, length, status, opnum,
> + &mirror->stateid,
> + &mirror->mirror_ds->id_node.deviceid)) {
> + spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
> + return 0;
> + }
> + spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
> + dserr = kmalloc(sizeof(*dserr), gfp_flags);
> + if (!dserr)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&dserr->list);
> + dserr->offset = offset;
> + dserr->length = length;
> + dserr->status = status;
> + dserr->opnum = opnum;
> + nfs4_stateid_copy(&dserr->stateid, &mirror->stateid);
> + memcpy(&dserr->deviceid, &mirror->mirror_ds->id_node.deviceid,
> + NFS4_DEVICEID4_SIZE);
> +
> + spin_lock(&flo->generic_hdr.plh_inode->i_lock);
> + needfree = ff_layout_add_ds_error_locked(flo, dserr);
> + spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
> + if (needfree)
> + kfree(dserr);
> +
> + return 0;
> +}
> +
> +/* currently we only support AUTH_NONE and AUTH_SYS */
> +static rpc_authflavor_t
> +nfs4_ff_layout_choose_authflavor(struct nfs4_ff_layout_mirror *mirror)
> +{
> + if (mirror->uid == (u32)-1)
> + return RPC_AUTH_NULL;
> + return RPC_AUTH_UNIX;
> +}
> +
> +/* fetch cred for NFSv3 DS */
> +static int ff_layout_update_mirror_cred(struct nfs4_ff_layout_mirror *mirror,
> + struct nfs4_pnfs_ds *ds)
> +{
> + if (ds && !mirror->cred && mirror->mirror_ds->ds_versions[0].version == 3) {
> + struct rpc_auth *auth = ds->ds_clp->cl_rpcclient->cl_auth;
> + struct rpc_cred *cred;
> + struct auth_cred acred = {
> + .uid = make_kuid(&init_user_ns, mirror->uid),
> + .gid = make_kgid(&init_user_ns, mirror->gid),
> + };
> +
> + /* AUTH_NULL ignores acred */
> + cred = auth->au_ops->lookup_cred(auth, &acred, 0);
> + if (IS_ERR(cred)) {
> + dprintk("%s: lookup_cred failed with %ld\n",
> + __func__, PTR_ERR(cred));
> + return PTR_ERR(cred);
> + } else {
> + mirror->cred = cred;
> + }
> + }
> + return 0;
> +}
> +
> +struct nfs_fh *
> +nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx)
> +{
> + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, mirror_idx);
> + struct nfs_fh *fh = NULL;
> + struct nfs4_deviceid_node *devid;
> +
> + if (mirror == NULL || mirror->mirror_ds == NULL ||
> + mirror->mirror_ds->ds == NULL) {
> + printk(KERN_ERR "NFS: %s: No data server for mirror offset index %d\n",
> + __func__, mirror_idx);
> + if (mirror && mirror->mirror_ds) {
> + devid = &mirror->mirror_ds->id_node;
> + pnfs_generic_mark_devid_invalid(devid);
> + }
> + goto out;
> + }
> +
> + /* FIXME: For now assume there is only 1 version available for the DS */
> + fh = &mirror->fh_versions[0];
> +out:
> + return fh;
> +}
> +
> +/* Upon return, either ds is connected, or ds is NULL */
> +struct nfs4_pnfs_ds *
> +nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
> + bool fail_return)
> +{
> + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
> + struct nfs4_pnfs_ds *ds = NULL;
> + struct nfs4_deviceid_node *devid;
> + struct inode *ino = lseg->pls_layout->plh_inode;
> + struct nfs_server *s = NFS_SERVER(ino);
> + unsigned int max_payload;
> + rpc_authflavor_t flavor;
> +
> + if (mirror == NULL || mirror->mirror_ds == NULL ||
> + mirror->mirror_ds->ds == NULL) {
> + printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
> + __func__, ds_idx);
> + if (mirror && mirror->mirror_ds) {
> + devid = &mirror->mirror_ds->id_node;
> + pnfs_generic_mark_devid_invalid(devid);
> + }
> + goto out;
> + }
> +
> + ds = mirror->mirror_ds->ds;
> + devid = &mirror->mirror_ds->id_node;
> +
> + /* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
> + smp_rmb();
> + if (ds->ds_clp)
> + goto out_test_devid;
> +
> + flavor = nfs4_ff_layout_choose_authflavor(mirror);
> +
> + /* FIXME: For now we assume the server sent only one version of NFS
> + * to use for the DS.
> + */
> + nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
> + dataserver_retrans,
> + mirror->mirror_ds->ds_versions[0].version,
> + mirror->mirror_ds->ds_versions[0].minor_version,
> + flavor);
> +
> + /* connect success, check rsize/wsize limit */
> + if (ds->ds_clp) {
> + max_payload =
> + nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
> + NULL);
> + if (mirror->mirror_ds->ds_versions[0].rsize > max_payload)
> + mirror->mirror_ds->ds_versions[0].rsize = max_payload;
> + if (mirror->mirror_ds->ds_versions[0].wsize > max_payload)
> + mirror->mirror_ds->ds_versions[0].wsize = max_payload;
> + } else {
> + ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
> + mirror, lseg->pls_range.offset,
> + lseg->pls_range.length, NFS4ERR_NXIO,
> + OP_ILLEGAL, GFP_NOIO);
> + if (fail_return) {
> + pnfs_error_mark_layout_for_return(ino, lseg);
> + if (ff_layout_has_available_ds(lseg))
> + pnfs_set_retry_layoutget(lseg->pls_layout);
> + else
> + pnfs_clear_retry_layoutget(lseg->pls_layout);
> +
> + } else {
> + if (ff_layout_has_available_ds(lseg))
> + set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
> + &lseg->pls_layout->plh_flags);
> + else {
> + pnfs_error_mark_layout_for_return(ino, lseg);
> + pnfs_clear_retry_layoutget(lseg->pls_layout);
> + }
> + }
> + }
> +
> +out_test_devid:
> + if (ff_layout_test_devid_unavailable(devid))
> + ds = NULL;
> +out:
> + if (ff_layout_update_mirror_cred(mirror, ds))
> + ds = NULL;
> + return ds;
> +}
> +
> +struct rpc_cred *
> +ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg, u32 ds_idx,
> + struct rpc_cred *mdscred)
> +{
> + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
> + struct rpc_cred *cred = ERR_PTR(-EINVAL);
> +
> + if (!nfs4_ff_layout_prepare_ds(lseg, ds_idx, true))
> + goto out;
> +
> + if (mirror && mirror->cred)
> + cred = mirror->cred;
> + else
> + cred = mdscred;
> +out:
> + return cred;
> +}
> +
> +/**
> +* Find or create a DS rpc client with th MDS server rpc client auth flavor
> +* in the nfs_client cl_ds_clients list.
> +*/
> +struct rpc_clnt *
> +nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg, u32 ds_idx,
> + struct nfs_client *ds_clp, struct inode *inode)
> +{
> + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
> +
> + switch (mirror->mirror_ds->ds_versions[0].version) {
> + case 3:
> + /* For NFSv3 DS, flavor is set when creating DS connections */
> + return ds_clp->cl_rpcclient;
> + case 4:
> + return nfs4_find_or_create_ds_client(ds_clp, inode);
> + default:
> + BUG();
> + }
> +}
> +
> +static bool is_range_intersecting(u64 offset1, u64 length1,
> + u64 offset2, u64 length2)
> +{
> + u64 end1 = end_offset(offset1, length1);
> + u64 end2 = end_offset(offset2, length2);
> +
> + return (end1 == NFS4_MAX_UINT64 || end1 > offset2) &&
> + (end2 == NFS4_MAX_UINT64 || end2 > offset1);
> +}
> +
> +/* called with inode i_lock held */
> +int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
> + struct xdr_stream *xdr, int *count,
> + const struct pnfs_layout_range *range)
> +{
> + struct nfs4_ff_layout_ds_err *err, *n;
> + __be32 *p;
> +
> + list_for_each_entry_safe(err, n, &flo->error_list, list) {
> + if (!is_range_intersecting(err->offset, err->length,
> + range->offset, range->length))
> + continue;
> + /* offset(8) + length(8) + stateid(NFS4_STATEID_SIZE)
> + * + deviceid(NFS4_DEVICEID4_SIZE) + status(4) + opnum(4)
> + */
> + p = xdr_reserve_space(xdr,
> + 24 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
> + if (unlikely(!p))
> + return -ENOBUFS;
> + p = xdr_encode_hyper(p, err->offset);
> + p = xdr_encode_hyper(p, err->length);
> + p = xdr_encode_opaque_fixed(p, &err->stateid,
> + NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, &err->deviceid,
> + NFS4_DEVICEID4_SIZE);
> + *p++ = cpu_to_be32(err->status);
> + *p++ = cpu_to_be32(err->opnum);
> + *count += 1;
> + list_del(&err->list);
> + kfree(err);
> + dprintk("%s: offset %llu length %llu status %d op %d count %d\n",
> + __func__, err->offset, err->length, err->status,
> + err->opnum, *count);
> + }
> +
> + return 0;
> +}
> +
> +bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg)
> +{
> + struct nfs4_ff_layout_mirror *mirror;
> + struct nfs4_deviceid_node *devid;
> + int idx;
> +
> + for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
> + mirror = FF_LAYOUT_COMP(lseg, idx);
> + if (mirror && mirror->mirror_ds) {
> + devid = &mirror->mirror_ds->id_node;
> + if (!ff_layout_test_devid_unavailable(devid))
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> +module_param(dataserver_retrans, uint, 0644);
> +MODULE_PARM_DESC(dataserver_retrans, "The number of times the NFSv4.1 client "
> + "retries a request before it attempts further "
> + " recovery action.");
> +module_param(dataserver_timeo, uint, 0644);
> +MODULE_PARM_DESC(dataserver_timeo, "The time (in tenths of a second) the "
> + "NFSv4.1 client waits for a response from a "
> + " data server before it retries an NFS request.");
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 022b761..de7c91c 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -516,6 +516,7 @@ enum pnfs_layouttype {
> LAYOUT_NFSV4_1_FILES = 1,
> LAYOUT_OSD2_OBJECTS = 2,
> LAYOUT_BLOCK_VOLUME = 3,
> + LAYOUT_FLEX_FILES = 4,
> };
>
> /* used for both layout return and recall */
>
On Tue, Jan 06, 2015 at 02:59:57PM -0500, Anna Schumaker wrote:
> Hey Tom,
>
> On 12/24/2014 02:13 AM, Tom Haynes wrote:
> > The flexfile layout is a new layout that extends the
> > file layout. It is currently being drafted as a specification at
> > https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/
> >
> > Signed-off-by: Weston Andros Adamson <[email protected]>
> > Signed-off-by: Tom Haynes <[email protected]>
> > Signed-off-by: Tao Peng <[email protected]>
> > ---
> > fs/nfs/Kconfig | 5 +
> > fs/nfs/Makefile | 1 +
> > fs/nfs/flexfilelayout/Makefile | 5 +
> > fs/nfs/flexfilelayout/flexfilelayout.c | 1600 +++++++++++++++++++++++++++++
> > fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
> > fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
> > include/linux/nfs4.h | 1 +
> > 7 files changed, 2322 insertions(+)
> > create mode 100644 fs/nfs/flexfilelayout/Makefile
> > create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
> > create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
> > create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
> >
> > diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
> > index 3dece03..c7abc10 100644
> > --- a/fs/nfs/Kconfig
> > +++ b/fs/nfs/Kconfig
> > @@ -128,6 +128,11 @@ config PNFS_OBJLAYOUT
> > depends on NFS_V4_1 && SCSI_OSD_ULD
> > default NFS_V4
> >
> > +config PNFS_FLEXFILE_LAYOUT
> > + tristate
> > + depends on NFS_V4_1 && NFS_V3
> > + default m
> > +
> > config NFS_V4_1_IMPLEMENTATION_ID_DOMAIN
> > string "NFSv4.1 Implementation ID Domain"
> > depends on NFS_V4_1
> > diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> > index 7973c4e3..3c97bd9 100644
> > --- a/fs/nfs/Makefile
> > +++ b/fs/nfs/Makefile
> > @@ -33,3 +33,4 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
> > obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
> > obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayout/
> > obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
> > +obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/
> > diff --git a/fs/nfs/flexfilelayout/Makefile b/fs/nfs/flexfilelayout/Makefile
> > new file mode 100644
> > index 0000000..1d2c9f6
> > --- /dev/null
> > +++ b/fs/nfs/flexfilelayout/Makefile
> > @@ -0,0 +1,5 @@
> > +#
> > +# Makefile for the pNFS Flexfile Layout Driver kernel module
> > +#
> > +obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += nfs_layout_flexfiles.o
> > +nfs_layout_flexfiles-y := flexfilelayout.o flexfilelayoutdev.o
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> > new file mode 100644
> > index 0000000..fddd3e6
> > --- /dev/null
> > +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> > @@ -0,0 +1,1600 @@
> > +/*
> > + * Module for pnfs flexfile layout driver.
> > + *
> > + * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
> > + *
> > + * Tao Peng <[email protected]>
> > + */
> > +
> > +#include <linux/nfs_fs.h>
> > +#include <linux/nfs_page.h>
> > +#include <linux/module.h>
> > +
> > +#include <linux/sunrpc/metrics.h>
> > +
> > +#include "flexfilelayout.h"
> > +#include "../nfs4session.h"
> > +#include "../internal.h"
> > +#include "../delegation.h"
> > +#include "../nfs4trace.h"
> > +#include "../iostat.h"
> > +#include "../nfs.h"
> > +
> > +#define NFSDBG_FACILITY NFSDBG_PNFS_LD
> > +
> > +#define FF_LAYOUT_POLL_RETRY_MAX (15*HZ)
> > +
> > +static struct pnfs_layout_hdr *
> > +ff_layout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
> > +{
> > + struct nfs4_flexfile_layout *ffl;
> > +
> > + ffl = kzalloc(sizeof(*ffl), gfp_flags);
> > + INIT_LIST_HEAD(&ffl->error_list);
> > + return ffl != NULL ? &ffl->generic_hdr : NULL;
> > +}
> > +
> > +static void
> > +ff_layout_free_layout_hdr(struct pnfs_layout_hdr *lo)
> > +{
> > + struct nfs4_ff_layout_ds_err *err, *n;
> > +
> > + list_for_each_entry_safe(err, n, &FF_LAYOUT_FROM_HDR(lo)->error_list,
> > + list) {
> > + list_del(&err->list);
> > + kfree(err);
> > + }
> > + kfree(FF_LAYOUT_FROM_HDR(lo));
> > +}
> > +
> > +static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
> > +{
> > + __be32 *p;
> > +
> > + p = xdr_inline_decode(xdr, NFS4_STATEID_SIZE);
> > + if (unlikely(p == NULL))
> > + return -ENOBUFS;
> > + memcpy(stateid, p, NFS4_STATEID_SIZE);
> > + dprintk("%s: stateid id= [%x%x%x%x]\n", __func__,
> > + p[0], p[1], p[2], p[3]);
> > + return 0;
> > +}
> > +
> > +static int decode_deviceid(struct xdr_stream *xdr, struct nfs4_deviceid *devid)
> > +{
> > + __be32 *p;
> > +
> > + p = xdr_inline_decode(xdr, NFS4_DEVICEID4_SIZE);
> > + if (unlikely(!p))
> > + return -ENOBUFS;
> > + memcpy(devid, p, NFS4_DEVICEID4_SIZE);
> > + nfs4_print_deviceid(devid);
> > + return 0;
> > +}
> > +
> > +static int decode_nfs_fh(struct xdr_stream *xdr, struct nfs_fh *fh)
> > +{
> > + __be32 *p;
> > +
> > + p = xdr_inline_decode(xdr, 4);
> > + if (unlikely(!p))
> > + return -ENOBUFS;
> > + fh->size = be32_to_cpup(p++);
> > + if (fh->size > sizeof(struct nfs_fh)) {
> > + printk(KERN_ERR "NFS flexfiles: Too big fh received %d\n",
> > + fh->size);
> > + return -EOVERFLOW;
> > + }
> > + /* fh.data */
> > + p = xdr_inline_decode(xdr, fh->size);
> > + if (unlikely(!p))
> > + return -ENOBUFS;
> > + memcpy(&fh->data, p, fh->size);
> > + dprintk("%s: fh len %d\n", __func__, fh->size);
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * we only handle AUTH_NONE and AUTH_UNIX for now.
> > + *
> > + * For AUTH_UNIX, we want to parse
> > + * struct authsys_parms {
> > + * unsigned int stamp;
> > + * string machinename<255>;
> > + * unsigned int uid;
> > + * unsigned int gid;
> > + * unsigned int gids<16>;
> > + * };
> > + */
> > +static int
> > +ff_layout_parse_auth(struct xdr_stream *xdr,
> > + struct nfs4_ff_layout_mirror *mirror)
> > +{
> > + __be32 *p;
> > + int flavor, len, gid_it = 0;
> > +
> > + /* authflavor(4) + opaque_length(4)*/
> > + p = xdr_inline_decode(xdr, 8);
> > + if (unlikely(!p))
> > + return -ENOBUFS;
> > + flavor = be32_to_cpup(p++);
> > + len = be32_to_cpup(p++);
> > + if (flavor < RPC_AUTH_NULL || flavor >= RPC_AUTH_MAXFLAVOR ||
> > + len < 0)
> > + return -EINVAL;
> > +
> > + dprintk("%s: flavor %u len %u\n", __func__, flavor, len);
> > +
> > + if (flavor == RPC_AUTH_NULL && len == 0)
> > + goto out_fill;
> > +
> > + /* opaque body */
> > + p = xdr_inline_decode(xdr, len);
> > + if (unlikely(!p))
> > + return -ENOBUFS;
> > +
> > + if (flavor == RPC_AUTH_NULL) {
> > + mirror->uid = -1;
> > + mirror->gid = -1;
> > + } else if (flavor == RPC_AUTH_UNIX) {
> > + int len2;
> > +
> > + p++; /* stamp */
> > + len2 = be32_to_cpup(p++); /* machinename length */
> > + dprintk("%s: machinename length %u\n", __func__, len2);
> > + if (len2 < 0 || len2 >= len || len2 > 255)
> > + return -EINVAL;
> > + p += XDR_QUADLEN(len2); /* machinename */
> > +
> > + mirror->uid = be32_to_cpup(p++);
> > + mirror->gid = be32_to_cpup(p++);
> > +
> > + len2 = be32_to_cpup(p++); /* gid array length */
> > + dprintk("%s: gid array length %u\n", __func__, len2);
> > + if (len2 > 16)
> > + return -EINVAL;
> > + for (; gid_it < len2; gid_it++)
> > + mirror->gids[gid_it] = be32_to_cpup(p++);
> > + } else {
> > + return -EPROTONOSUPPORT;
> > + }
> > +
> > +out_fill:
> > + /* filling the rest of gids */
> > + for (; gid_it < 16; gid_it++)
> > + mirror->gids[gid_it] = -1;
> > +
> > + return 0;
> > +}
> > +
> > +static void ff_layout_free_mirror_array(struct nfs4_ff_layout_segment *fls)
> > +{
> > + int i;
> > +
> > + if (fls->mirror_array) {
> > + for (i = 0; i < fls->mirror_array_cnt; i++) {
> > + /* normally mirror_ds is freed in
> > + * .free_deviceid_node but we still do it here
> > + * for .alloc_lseg error path */
> > + if (fls->mirror_array[i]) {
> > + kfree(fls->mirror_array[i]->fh_versions);
> > + nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
> > + kfree(fls->mirror_array[i]);
> > + }
> > + }
> > + kfree(fls->mirror_array);
> > + fls->mirror_array = NULL;
> > + }
> > +}
> > +
> > +static int ff_layout_check_layout(struct nfs4_layoutget_res *lgr)
> > +{
> > + int ret = 0;
> > +
> > + dprintk("--> %s\n", __func__);
> > +
> > + /* FIXME: remove this check when layout segment support is added */
> > + if (lgr->range.offset != 0 ||
> > + lgr->range.length != NFS4_MAX_UINT64) {
> > + dprintk("%s Only whole file layouts supported. Use MDS i/o\n",
> > + __func__);
> > + ret = -EINVAL;
> > + }
> > +
> > + dprintk("--> %s returns %d\n", __func__, ret);
> > + return ret;
> > +}
> > +
> > +static void _ff_layout_free_lseg(struct nfs4_ff_layout_segment *fls)
> > +{
> > + if (fls) {
> > + ff_layout_free_mirror_array(fls);
> > + kfree(fls);
> > + }
> > +}
> > +
> > +static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
> > +{
> > + struct nfs4_ff_layout_mirror *tmp;
> > + int i, j;
> > +
> > + for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
> > + for (j = i + 1; j < fls->mirror_array_cnt; j++)
> > + if (fls->mirror_array[i]->efficiency <
> > + fls->mirror_array[j]->efficiency) {
> > + tmp = fls->mirror_array[i];
> > + fls->mirror_array[i] = fls->mirror_array[j];
> > + fls->mirror_array[j] = tmp;
> > + }
> > + }
> > +}
> > +
> > +static struct pnfs_layout_segment *
> > +ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
> > + struct nfs4_layoutget_res *lgr,
> > + gfp_t gfp_flags)
> > +{
> > + struct pnfs_layout_segment *ret;
> > + struct nfs4_ff_layout_segment *fls = NULL;
> > + struct xdr_stream stream;
> > + struct xdr_buf buf;
> > + struct page *scratch;
> > + u64 stripe_unit;
> > + u32 mirror_array_cnt;
> > + __be32 *p;
> > + int i, rc;
> > +
> > + dprintk("--> %s\n", __func__);
> > + scratch = alloc_page(gfp_flags);
> > + if (!scratch)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages,
> > + lgr->layoutp->len);
> > + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
> > +
> > + /* stripe unit and mirror_array_cnt */
> > + rc = -EIO;
> > + p = xdr_inline_decode(&stream, 8 + 4);
> > + if (!p)
> > + goto out_err_free;
> > +
> > + p = xdr_decode_hyper(p, &stripe_unit);
> > + mirror_array_cnt = be32_to_cpup(p++);
> > + dprintk("%s: stripe_unit=%llu mirror_array_cnt=%u\n", __func__,
> > + stripe_unit, mirror_array_cnt);
> > +
> > + if (mirror_array_cnt > NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT ||
> > + mirror_array_cnt == 0)
> > + goto out_err_free;
> > +
> > + rc = -ENOMEM;
> > + fls = kzalloc(sizeof(*fls), gfp_flags);
> > + if (!fls)
> > + goto out_err_free;
> > +
> > + fls->mirror_array_cnt = mirror_array_cnt;
> > + fls->stripe_unit = stripe_unit;
> > + fls->mirror_array = kcalloc(fls->mirror_array_cnt,
> > + sizeof(fls->mirror_array[0]), gfp_flags);
> > + if (fls->mirror_array == NULL)
> > + goto out_err_free;
> > +
> > + for (i = 0; i < fls->mirror_array_cnt; i++) {
> > + struct nfs4_deviceid devid;
> > + struct nfs4_deviceid_node *idnode;
> > + u32 ds_count;
> > + u32 fh_count;
> > + int j;
> > +
> > + rc = -EIO;
> > + p = xdr_inline_decode(&stream, 4);
> > + if (!p)
> > + goto out_err_free;
> > + ds_count = be32_to_cpup(p);
> > +
> > + /* FIXME: allow for striping? */
> > + if (ds_count != 1)
> > + goto out_err_free;
> > +
> > + fls->mirror_array[i] =
> > + kzalloc(sizeof(struct nfs4_ff_layout_mirror),
> > + gfp_flags);
> > + if (fls->mirror_array[i] == NULL) {
> > + rc = -ENOMEM;
> > + goto out_err_free;
> > + }
> > +
> > + spin_lock_init(&fls->mirror_array[i]->lock);
> > + fls->mirror_array[i]->ds_count = ds_count;
> > +
> > + /* deviceid */
> > + rc = decode_deviceid(&stream, &devid);
> > + if (rc)
> > + goto out_err_free;
> > +
> > + idnode = nfs4_find_get_deviceid(NFS_SERVER(lh->plh_inode),
> > + &devid, lh->plh_lc_cred,
> > + gfp_flags);
> > + /*
> > + * upon success, mirror_ds is allocated by previous
> > + * getdeviceinfo, or newly by .alloc_deviceid_node
> > + * nfs4_find_get_deviceid failure is indeed getdeviceinfo falure
> > + */
> > + if (idnode)
> > + fls->mirror_array[i]->mirror_ds =
> > + FF_LAYOUT_MIRROR_DS(idnode);
> > + else
> > + goto out_err_free;
> > +
> > + /* efficiency */
> > + rc = -EIO;
> > + p = xdr_inline_decode(&stream, 4);
> > + if (!p)
> > + goto out_err_free;
> > + fls->mirror_array[i]->efficiency = be32_to_cpup(p);
> > +
> > + /* stateid */
> > + rc = decode_stateid(&stream, &fls->mirror_array[i]->stateid);
> > + if (rc)
> > + goto out_err_free;
> > +
> > + /* fh */
> > + p = xdr_inline_decode(&stream, 4);
> > + if (!p)
> > + goto out_err_free;
> > + fh_count = be32_to_cpup(p);
> > +
> > + fls->mirror_array[i]->fh_versions =
> > + kzalloc(fh_count * sizeof(struct nfs_fh),
> > + gfp_flags);
> > + if (fls->mirror_array[i]->fh_versions == NULL) {
> > + rc = -ENOMEM;
> > + goto out_err_free;
> > + }
> > +
> > + for (j = 0; j < fh_count; j++) {
> > + rc = decode_nfs_fh(&stream,
> > + &fls->mirror_array[i]->fh_versions[j]);
> > + if (rc)
> > + goto out_err_free;
> > + }
> > +
> > + fls->mirror_array[i]->fh_versions_cnt = fh_count;
> > +
> > + /* opaque_auth */
> > + rc = ff_layout_parse_auth(&stream, fls->mirror_array[i]);
> > + if (rc)
> > + goto out_err_free;
> > +
> > + dprintk("%s: uid %d gid %d\n", __func__,
> > + fls->mirror_array[i]->uid,
> > + fls->mirror_array[i]->gid);
> > + }
> > +
> > + ff_layout_sort_mirrors(fls);
> > + rc = ff_layout_check_layout(lgr);
> > + if (rc)
> > + goto out_err_free;
> > +
> > + ret = &fls->generic_hdr;
> > + dprintk("<-- %s (success)\n", __func__);
> > +out_free_page:
> > + __free_page(scratch);
> > + return ret;
> > +out_err_free:
> > + _ff_layout_free_lseg(fls);
> > + ret = ERR_PTR(rc);
> > + dprintk("<-- %s (%d)\n", __func__, rc);
> > + goto out_free_page;
> > +}
> > +
> > +static bool ff_layout_has_rw_segments(struct pnfs_layout_hdr *layout)
> > +{
> > + struct pnfs_layout_segment *lseg;
> > +
> > + list_for_each_entry(lseg, &layout->plh_segs, pls_list)
> > + if (lseg->pls_range.iomode == IOMODE_RW)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static void
> > +ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
> > +{
> > + struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
> > + int i;
> > +
> > + dprintk("--> %s\n", __func__);
> > +
> > + for (i = 0; i < fls->mirror_array_cnt; i++) {
> > + if (fls->mirror_array[i]) {
> > + nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
> > + fls->mirror_array[i]->mirror_ds = NULL;
> > + if (fls->mirror_array[i]->cred) {
> > + put_rpccred(fls->mirror_array[i]->cred);
> > + fls->mirror_array[i]->cred = NULL;
> > + }
> > + }
> > + }
> > +
> > + if (lseg->pls_range.iomode == IOMODE_RW) {
> > + struct nfs4_flexfile_layout *ffl;
> > + struct inode *inode;
> > +
> > + ffl = FF_LAYOUT_FROM_HDR(lseg->pls_layout);
> > + inode = ffl->generic_hdr.plh_inode;
> > + spin_lock(&inode->i_lock);
> > + if (!ff_layout_has_rw_segments(lseg->pls_layout)) {
> > + ffl->commit_info.nbuckets = 0;
> > + kfree(ffl->commit_info.buckets);
> > + ffl->commit_info.buckets = NULL;
> > + }
> > + spin_unlock(&inode->i_lock);
> > + }
> > + _ff_layout_free_lseg(fls);
> > +}
> > +
> > +/* Return 1 until we have multiple lsegs support */
> > +static int
> > +ff_layout_get_lseg_count(struct nfs4_ff_layout_segment *fls)
> > +{
> > + return 1;
> > +}
> > +
> > +static int
> > +ff_layout_alloc_commit_info(struct pnfs_layout_segment *lseg,
> > + struct nfs_commit_info *cinfo,
> > + gfp_t gfp_flags)
> > +{
> > + struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
> > + struct pnfs_commit_bucket *buckets;
> > + int size;
> > +
> > + if (cinfo->ds->nbuckets != 0) {
> > + /* This assumes there is only one RW lseg per file.
> > + * To support multiple lseg per file, we need to
> > + * change struct pnfs_commit_bucket to allow dynamic
> > + * increasing nbuckets.
> > + */
> > + return 0;
> > + }
> > +
> > + size = ff_layout_get_lseg_count(fls) * FF_LAYOUT_MIRROR_COUNT(lseg);
> > +
> > + buckets = kcalloc(size, sizeof(struct pnfs_commit_bucket),
> > + gfp_flags);
> > + if (!buckets)
> > + return -ENOMEM;
> > + else {
> > + int i;
> > +
> > + spin_lock(cinfo->lock);
> > + if (cinfo->ds->nbuckets != 0)
> > + kfree(buckets);
> > + else {
> > + cinfo->ds->buckets = buckets;
> > + cinfo->ds->nbuckets = size;
> > + for (i = 0; i < size; i++) {
> > + INIT_LIST_HEAD(&buckets[i].written);
> > + INIT_LIST_HEAD(&buckets[i].committing);
> > + /* mark direct verifier as unset */
> > + buckets[i].direct_verf.committed =
> > + NFS_INVALID_STABLE_HOW;
> > + }
> > + }
> > + spin_unlock(cinfo->lock);
> > + return 0;
> > + }
> > +}
> > +
> > +static struct nfs4_pnfs_ds *
> > +ff_layout_choose_best_ds_for_read(struct nfs_pageio_descriptor *pgio,
> > + int *best_idx)
> > +{
> > + struct nfs4_ff_layout_segment *fls;
> > + struct nfs4_pnfs_ds *ds;
> > + int idx;
> > +
> > + fls = FF_LAYOUT_LSEG(pgio->pg_lseg);
> > + /* mirrors are sorted by efficiency */
> > + for (idx = 0; idx < fls->mirror_array_cnt; idx++) {
> > + ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, idx, false);
> > + if (ds) {
> > + *best_idx = idx;
> > + return ds;
> > + }
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +static void
> > +ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
> > + struct nfs_page *req)
> > +{
> > + struct nfs_pgio_mirror *pgm;
> > + struct nfs4_ff_layout_mirror *mirror;
> > + struct nfs4_pnfs_ds *ds;
> > + int ds_idx;
> > +
> > + /* Use full layout for now */
> > + if (!pgio->pg_lseg)
> > + pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
> > + req->wb_context,
> > + 0,
> > + NFS4_MAX_UINT64,
> > + IOMODE_READ,
> > + GFP_KERNEL);
> > + /* If no lseg, fall back to read through mds */
> > + if (pgio->pg_lseg == NULL)
> > + goto out_mds;
> > +
> > + ds = ff_layout_choose_best_ds_for_read(pgio, &ds_idx);
> > + if (!ds)
> > + goto out_mds;
> > + mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
> > +
> > + pgio->pg_mirror_idx = ds_idx;
> > +
> > + /* read always uses only one mirror - idx 0 for pgio layer */
> > + pgm = &pgio->pg_mirrors[0];
> > + pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
> > +
> > + return;
> > +out_mds:
> > + pnfs_put_lseg(pgio->pg_lseg);
> > + pgio->pg_lseg = NULL;
> > + nfs_pageio_reset_read_mds(pgio);
> > +}
> > +
> > +static void
> > +ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
> > + struct nfs_page *req)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror;
> > + struct nfs_pgio_mirror *pgm;
> > + struct nfs_commit_info cinfo;
> > + struct nfs4_pnfs_ds *ds;
> > + int i;
> > + int status;
> > +
> > + if (!pgio->pg_lseg)
> > + pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
> > + req->wb_context,
> > + 0,
> > + NFS4_MAX_UINT64,
> > + IOMODE_RW,
> > + GFP_NOFS);
> > + /* If no lseg, fall back to write through mds */
> > + if (pgio->pg_lseg == NULL)
> > + goto out_mds;
> > +
> > + nfs_init_cinfo(&cinfo, pgio->pg_inode, pgio->pg_dreq);
> > + status = ff_layout_alloc_commit_info(pgio->pg_lseg, &cinfo, GFP_NOFS);
> > + if (status < 0)
> > + goto out_mds;
> > +
> > + /* Use a direct mapping of ds_idx to pgio mirror_idx */
> > + if (WARN_ON_ONCE(pgio->pg_mirror_count !=
> > + FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg)))
> > + goto out_mds;
> > +
> > + for (i = 0; i < pgio->pg_mirror_count; i++) {
> > + ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, i, true);
> > + if (!ds)
> > + goto out_mds;
> > + pgm = &pgio->pg_mirrors[i];
> > + mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
> > + pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
> > + }
> > +
> > + return;
> > +
> > +out_mds:
> > + pnfs_put_lseg(pgio->pg_lseg);
> > + pgio->pg_lseg = NULL;
> > + nfs_pageio_reset_write_mds(pgio);
> > +}
> > +
> > +static unsigned int
> > +ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio,
> > + struct nfs_page *req)
> > +{
> > + if (!pgio->pg_lseg)
> > + pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
> > + req->wb_context,
> > + 0,
> > + NFS4_MAX_UINT64,
> > + IOMODE_RW,
> > + GFP_NOFS);
> > + if (pgio->pg_lseg)
> > + return FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg);
> > +
> > + /* no lseg means that pnfs is not in use, so no mirroring here */
> > + pnfs_put_lseg(pgio->pg_lseg);
> > + pgio->pg_lseg = NULL;
> > + nfs_pageio_reset_write_mds(pgio);
> > + return 1;
> > +}
> > +
> > +static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
> > + .pg_init = ff_layout_pg_init_read,
> > + .pg_test = pnfs_generic_pg_test,
> > + .pg_doio = pnfs_generic_pg_readpages,
> > + .pg_cleanup = pnfs_generic_pg_cleanup,
> > +};
> > +
> > +static const struct nfs_pageio_ops ff_layout_pg_write_ops = {
> > + .pg_init = ff_layout_pg_init_write,
> > + .pg_test = pnfs_generic_pg_test,
> > + .pg_doio = pnfs_generic_pg_writepages,
> > + .pg_get_mirror_count = ff_layout_pg_get_mirror_count_write,
> > + .pg_cleanup = pnfs_generic_pg_cleanup,
> > +};
> > +
> > +static void ff_layout_reset_write(struct nfs_pgio_header *hdr, bool retry_pnfs)
> > +{
> > + struct rpc_task *task = &hdr->task;
> > +
> > + pnfs_layoutcommit_inode(hdr->inode, false);
> > +
> > + if (retry_pnfs) {
> > + dprintk("%s Reset task %5u for i/o through pNFS "
> > + "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
> > + hdr->task.tk_pid,
> > + hdr->inode->i_sb->s_id,
> > + (unsigned long long)NFS_FILEID(hdr->inode),
> > + hdr->args.count,
> > + (unsigned long long)hdr->args.offset);
> > +
> > + if (!hdr->dreq) {
> > + struct nfs_open_context *ctx;
> > +
> > + ctx = nfs_list_entry(hdr->pages.next)->wb_context;
> > + set_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
> > + hdr->completion_ops->error_cleanup(&hdr->pages);
> > + } else {
> > + nfs_direct_set_resched_writes(hdr->dreq);
> > + }
> > + return;
> > + }
> > +
> > + if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
> > + dprintk("%s Reset task %5u for i/o through MDS "
> > + "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
> > + hdr->task.tk_pid,
> > + hdr->inode->i_sb->s_id,
> > + (unsigned long long)NFS_FILEID(hdr->inode),
> > + hdr->args.count,
> > + (unsigned long long)hdr->args.offset);
> > +
> > + task->tk_status = pnfs_write_done_resend_to_mds(hdr);
> > + }
> > +}
> > +
> > +static void ff_layout_reset_read(struct nfs_pgio_header *hdr)
> > +{
> > + struct rpc_task *task = &hdr->task;
> > +
> > + pnfs_layoutcommit_inode(hdr->inode, false);
> > +
> > + if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
> > + dprintk("%s Reset task %5u for i/o through MDS "
> > + "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
> > + hdr->task.tk_pid,
> > + hdr->inode->i_sb->s_id,
> > + (unsigned long long)NFS_FILEID(hdr->inode),
> > + hdr->args.count,
> > + (unsigned long long)hdr->args.offset);
> > +
> > + task->tk_status = pnfs_read_done_resend_to_mds(hdr);
> > + }
> > +}
> > +
> > +static int ff_layout_async_handle_error_v4(struct rpc_task *task,
> > + struct nfs4_state *state,
> > + struct nfs_client *clp,
> > + struct pnfs_layout_segment *lseg,
> > + int idx)
> > +{
> > + struct pnfs_layout_hdr *lo = lseg->pls_layout;
> > + struct inode *inode = lo->plh_inode;
> > + struct nfs_server *mds_server = NFS_SERVER(inode);
> > +
> > + struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
> > + struct nfs_client *mds_client = mds_server->nfs_client;
> > + struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
> > +
> > + if (task->tk_status >= 0)
> > + return 0;
> > +
> > + switch (task->tk_status) {
> > + /* MDS state errors */
> > + case -NFS4ERR_DELEG_REVOKED:
> > + case -NFS4ERR_ADMIN_REVOKED:
> > + case -NFS4ERR_BAD_STATEID:
> > + if (state == NULL)
> > + break;
> > + nfs_remove_bad_delegation(state->inode);
> > + case -NFS4ERR_OPENMODE:
> > + if (state == NULL)
> > + break;
> > + if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
> > + goto out_bad_stateid;
> > + goto wait_on_recovery;
> > + case -NFS4ERR_EXPIRED:
> > + if (state != NULL) {
> > + if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
> > + goto out_bad_stateid;
> > + }
> > + nfs4_schedule_lease_recovery(mds_client);
> > + goto wait_on_recovery;
> > + /* DS session errors */
> > + case -NFS4ERR_BADSESSION:
> > + case -NFS4ERR_BADSLOT:
> > + case -NFS4ERR_BAD_HIGH_SLOT:
> > + case -NFS4ERR_DEADSESSION:
> > + case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
> > + case -NFS4ERR_SEQ_FALSE_RETRY:
> > + case -NFS4ERR_SEQ_MISORDERED:
> > + dprintk("%s ERROR %d, Reset session. Exchangeid "
> > + "flags 0x%x\n", __func__, task->tk_status,
> > + clp->cl_exchange_flags);
> > + nfs4_schedule_session_recovery(clp->cl_session, task->tk_status);
> > + break;
> > + case -NFS4ERR_DELAY:
> > + case -NFS4ERR_GRACE:
> > + rpc_delay(task, FF_LAYOUT_POLL_RETRY_MAX);
> > + break;
> > + case -NFS4ERR_RETRY_UNCACHED_REP:
> > + break;
> > + /* Invalidate Layout errors */
> > + case -NFS4ERR_PNFS_NO_LAYOUT:
> > + case -ESTALE: /* mapped NFS4ERR_STALE */
> > + case -EBADHANDLE: /* mapped NFS4ERR_BADHANDLE */
> > + case -EISDIR: /* mapped NFS4ERR_ISDIR */
> > + case -NFS4ERR_FHEXPIRED:
> > + case -NFS4ERR_WRONG_TYPE:
> > + dprintk("%s Invalid layout error %d\n", __func__,
> > + task->tk_status);
> > + /*
> > + * Destroy layout so new i/o will get a new layout.
> > + * Layout will not be destroyed until all current lseg
> > + * references are put. Mark layout as invalid to resend failed
> > + * i/o and all i/o waiting on the slot table to the MDS until
> > + * layout is destroyed and a new valid layout is obtained.
> > + */
> > + pnfs_destroy_layout(NFS_I(inode));
> > + rpc_wake_up(&tbl->slot_tbl_waitq);
> > + goto reset;
> > + /* RPC connection errors */
> > + case -ECONNREFUSED:
> > + case -EHOSTDOWN:
> > + case -EHOSTUNREACH:
> > + case -ENETUNREACH:
> > + case -EIO:
> > + case -ETIMEDOUT:
> > + case -EPIPE:
> > + dprintk("%s DS connection error %d\n", __func__,
> > + task->tk_status);
> > + nfs4_mark_deviceid_unavailable(devid);
> > + rpc_wake_up(&tbl->slot_tbl_waitq);
> > + /* fall through */
> > + default:
> > + if (ff_layout_has_available_ds(lseg))
> > + return -NFS4ERR_RESET_TO_PNFS;
> > +reset:
> > + dprintk("%s Retry through MDS. Error %d\n", __func__,
> > + task->tk_status);
> > + return -NFS4ERR_RESET_TO_MDS;
> > + }
> > +out:
> > + task->tk_status = 0;
> > + return -EAGAIN;
> > +out_bad_stateid:
> > + task->tk_status = -EIO;
> > + return 0;
> > +wait_on_recovery:
> > + rpc_sleep_on(&mds_client->cl_rpcwaitq, task, NULL);
> > + if (test_bit(NFS4CLNT_MANAGER_RUNNING, &mds_client->cl_state) == 0)
> > + rpc_wake_up_queued_task(&mds_client->cl_rpcwaitq, task);
> > + goto out;
> > +}
> > +
> > +/* Retry all errors through either pNFS or MDS except for -EJUKEBOX */
> > +static int ff_layout_async_handle_error_v3(struct rpc_task *task,
> > + struct pnfs_layout_segment *lseg,
> > + int idx)
> > +{
> > + struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
> > +
> > + if (task->tk_status >= 0)
> > + return 0;
> > +
> > + if (task->tk_status != -EJUKEBOX) {
> > + dprintk("%s DS connection error %d\n", __func__,
> > + task->tk_status);
> > + nfs4_mark_deviceid_unavailable(devid);
> > + if (ff_layout_has_available_ds(lseg))
> > + return -NFS4ERR_RESET_TO_PNFS;
> > + else
> > + return -NFS4ERR_RESET_TO_MDS;
> > + }
> > +
> > + if (task->tk_status == -EJUKEBOX)
> > + nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY);
> > + task->tk_status = 0;
> > + rpc_restart_call(task);
> > + rpc_delay(task, NFS_JUKEBOX_RETRY_TIME);
> > + return -EAGAIN;
> > +}
> > +
> > +static int ff_layout_async_handle_error(struct rpc_task *task,
> > + struct nfs4_state *state,
> > + struct nfs_client *clp,
> > + struct pnfs_layout_segment *lseg,
> > + int idx)
> > +{
> > + int vers = clp->cl_nfs_mod->rpc_vers->number;
> > +
> > + switch (vers) {
> > + case 3:
> > + return ff_layout_async_handle_error_v3(task, lseg, idx);
> > + case 4:
> > + return ff_layout_async_handle_error_v4(task, state, clp,
> > + lseg, idx);
> > + default:
> > + /* should never happen */
> > + WARN_ON_ONCE(1);
> > + return 0;
> > + }
> > +}
> > +
> > +static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
> > + int idx, u64 offset, u64 length,
> > + u32 status, int opnum)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror;
> > + int err;
> > +
> > + mirror = FF_LAYOUT_COMP(lseg, idx);
> > + err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
> > + mirror, offset, length, status, opnum,
> > + GFP_NOIO);
> > + dprintk("%s: err %d op %d status %u\n", __func__, err, opnum, status);
> > +}
> > +
> > +/* NFS_PROTO call done callback routines */
> > +
> > +static int ff_layout_read_done_cb(struct rpc_task *task,
> > + struct nfs_pgio_header *hdr)
> > +{
> > + struct inode *inode;
> > + int err;
> > +
> > + trace_nfs4_pnfs_read(hdr, task->tk_status);
> > + if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
> > + hdr->res.op_status = NFS4ERR_NXIO;
> > + if (task->tk_status < 0 && hdr->res.op_status)
> > + ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
> > + hdr->args.offset, hdr->args.count,
> > + hdr->res.op_status, OP_READ);
> > + err = ff_layout_async_handle_error(task, hdr->args.context->state,
> > + hdr->ds_clp, hdr->lseg,
> > + hdr->pgio_mirror_idx);
> > +
> > + switch (err) {
> > + case -NFS4ERR_RESET_TO_PNFS:
> > + set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
> > + &hdr->lseg->pls_layout->plh_flags);
> > + pnfs_read_resend_pnfs(hdr);
> > + return task->tk_status;
> > + case -NFS4ERR_RESET_TO_MDS:
> > + inode = hdr->lseg->pls_layout->plh_inode;
> > + pnfs_error_mark_layout_for_return(inode, hdr->lseg);
> > + ff_layout_reset_read(hdr);
> > + return task->tk_status;
> > + case -EAGAIN:
> > + rpc_restart_call_prepare(task);
> > + return -EAGAIN;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * We reference the rpc_cred of the first WRITE that triggers the need for
> > + * a LAYOUTCOMMIT, and use it to send the layoutcommit compound.
> > + * rfc5661 is not clear about which credential should be used.
> > + *
> > + * Flexlayout client should treat DS replied FILE_SYNC as DATA_SYNC, so
> > + * to follow http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
> > + * we always send layoutcommit after DS writes.
> > + */
> > +static void
> > +ff_layout_set_layoutcommit(struct nfs_pgio_header *hdr)
> > +{
> > + pnfs_set_layoutcommit(hdr);
> > + dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino,
> > + (unsigned long) NFS_I(hdr->inode)->layout->plh_lwb);
> > +}
> > +
> > +static bool
> > +ff_layout_reset_to_mds(struct pnfs_layout_segment *lseg, int idx)
> > +{
> > + /* No mirroring for now */
> > + struct nfs4_deviceid_node *node = FF_LAYOUT_DEVID_NODE(lseg, idx);
> > +
> > + return ff_layout_test_devid_unavailable(node);
> > +}
> > +
> > +static int ff_layout_read_prepare_common(struct rpc_task *task,
> > + struct nfs_pgio_header *hdr)
> > +{
> > + if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
> > + rpc_exit(task, -EIO);
> > + return -EIO;
> > + }
> > + if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
> > + dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
> > + if (ff_layout_has_available_ds(hdr->lseg))
> > + pnfs_read_resend_pnfs(hdr);
> > + else
> > + ff_layout_reset_read(hdr);
> > + rpc_exit(task, 0);
> > + return -EAGAIN;
> > + }
> > + hdr->pgio_done_cb = ff_layout_read_done_cb;
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Call ops for the async read/write cases
> > + * In the case of dense layouts, the offset needs to be reset to its
> > + * original value.
> > + */
> > +static void ff_layout_read_prepare_v3(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + if (ff_layout_read_prepare_common(task, hdr))
> > + return;
> > +
> > + rpc_call_start(task);
> > +}
> > +
> > +static int ff_layout_setup_sequence(struct nfs_client *ds_clp,
> > + struct nfs4_sequence_args *args,
> > + struct nfs4_sequence_res *res,
> > + struct rpc_task *task)
> > +{
> > + if (ds_clp->cl_session)
> > + return nfs41_setup_sequence(ds_clp->cl_session,
> > + args,
> > + res,
> > + task);
> > + return nfs40_setup_sequence(ds_clp->cl_slot_tbl,
> > + args,
> > + res,
> > + task);
>
> I'm not quite seeing how we would end up calling the NFS v4.0 function here.
If there is a session, then we call the 4.1 function, else we call the
4.0 one.
>
>
> > +}
> > +
> > +static void ff_layout_read_prepare_v4(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + if (ff_layout_read_prepare_common(task, hdr))
> > + return;
> > +
> > + if (ff_layout_setup_sequence(hdr->ds_clp,
> > + &hdr->args.seq_args,
> > + &hdr->res.seq_res,
> > + task))
> > + return;
> > +
> > + if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
> > + hdr->args.lock_context, FMODE_READ) == -EIO)
> > + rpc_exit(task, -EIO); /* lost lock, terminate I/O */
> > +}
> > +
> > +static void ff_layout_read_call_done(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
> > +
> > + if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
> > + task->tk_status == 0) {
> > + nfs4_sequence_done(task, &hdr->res.seq_res);
> > + return;
> > + }
> > +
> > + /* Note this may cause RPC to be resent */
> > + hdr->mds_ops->rpc_call_done(task, hdr);
> > +}
> > +
> > +static void ff_layout_read_count_stats(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + rpc_count_iostats_metrics(task,
> > + &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_READ]);
> > +}
> > +
> > +static int ff_layout_write_done_cb(struct rpc_task *task,
> > + struct nfs_pgio_header *hdr)
> > +{
> > + struct inode *inode;
> > + int err;
> > +
> > + trace_nfs4_pnfs_write(hdr, task->tk_status);
> > + if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
> > + hdr->res.op_status = NFS4ERR_NXIO;
> > + if (task->tk_status < 0 && hdr->res.op_status)
> > + ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
> > + hdr->args.offset, hdr->args.count,
> > + hdr->res.op_status, OP_WRITE);
> > + err = ff_layout_async_handle_error(task, hdr->args.context->state,
> > + hdr->ds_clp, hdr->lseg,
> > + hdr->pgio_mirror_idx);
> > +
> > + switch (err) {
> > + case -NFS4ERR_RESET_TO_PNFS:
> > + case -NFS4ERR_RESET_TO_MDS:
> > + inode = hdr->lseg->pls_layout->plh_inode;
> > + pnfs_error_mark_layout_for_return(inode, hdr->lseg);
> > + if (err == -NFS4ERR_RESET_TO_PNFS) {
> > + pnfs_set_retry_layoutget(hdr->lseg->pls_layout);
> > + ff_layout_reset_write(hdr, true);
> > + } else {
> > + pnfs_clear_retry_layoutget(hdr->lseg->pls_layout);
> > + ff_layout_reset_write(hdr, false);
> > + }
> > + return task->tk_status;
> > + case -EAGAIN:
> > + rpc_restart_call_prepare(task);
> > + return -EAGAIN;
> > + }
> > +
> > + if (hdr->res.verf->committed == NFS_FILE_SYNC ||
> > + hdr->res.verf->committed == NFS_DATA_SYNC)
> > + ff_layout_set_layoutcommit(hdr);
> > +
> > + return 0;
> > +}
> > +
> > +static int ff_layout_commit_done_cb(struct rpc_task *task,
> > + struct nfs_commit_data *data)
> > +{
> > + struct inode *inode;
> > + int err;
> > +
> > + trace_nfs4_pnfs_commit_ds(data, task->tk_status);
> > + if (task->tk_status == -ETIMEDOUT && !data->res.op_status)
> > + data->res.op_status = NFS4ERR_NXIO;
> > + if (task->tk_status < 0 && data->res.op_status)
> > + ff_layout_io_track_ds_error(data->lseg, data->ds_commit_index,
> > + data->args.offset, data->args.count,
> > + data->res.op_status, OP_COMMIT);
> > + err = ff_layout_async_handle_error(task, NULL, data->ds_clp,
> > + data->lseg, data->ds_commit_index);
> > +
> > + switch (err) {
> > + case -NFS4ERR_RESET_TO_PNFS:
> > + case -NFS4ERR_RESET_TO_MDS:
> > + inode = data->lseg->pls_layout->plh_inode;
> > + pnfs_error_mark_layout_for_return(inode, data->lseg);
> > + if (err == -NFS4ERR_RESET_TO_PNFS)
> > + pnfs_set_retry_layoutget(data->lseg->pls_layout);
> > + else
> > + pnfs_clear_retry_layoutget(data->lseg->pls_layout);
> > + pnfs_generic_prepare_to_resend_writes(data);
> > + return -EAGAIN;
> > + case -EAGAIN:
> > + rpc_restart_call_prepare(task);
> > + return -EAGAIN;
> > + }
> > +
> > + if (data->verf.committed == NFS_UNSTABLE)
> > + pnfs_commit_set_layoutcommit(data);
> > +
> > + return 0;
> > +}
> > +
> > +static int ff_layout_write_prepare_common(struct rpc_task *task,
> > + struct nfs_pgio_header *hdr)
> > +{
> > + if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
> > + rpc_exit(task, -EIO);
> > + return -EIO;
> > + }
> > +
> > + if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
> > + bool retry_pnfs;
> > +
> > + retry_pnfs = ff_layout_has_available_ds(hdr->lseg);
> > + dprintk("%s task %u reset io to %s\n", __func__,
> > + task->tk_pid, retry_pnfs ? "pNFS" : "MDS");
> > + ff_layout_reset_write(hdr, retry_pnfs);
> > + rpc_exit(task, 0);
> > + return -EAGAIN;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void ff_layout_write_prepare_v3(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + if (ff_layout_write_prepare_common(task, hdr))
> > + return;
> > +
> > + rpc_call_start(task);
> > +}
> > +
> > +static void ff_layout_write_prepare_v4(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + if (ff_layout_write_prepare_common(task, hdr))
> > + return;
> > +
> > + if (ff_layout_setup_sequence(hdr->ds_clp,
> > + &hdr->args.seq_args,
> > + &hdr->res.seq_res,
> > + task))
> > + return;
> > +
> > + if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
> > + hdr->args.lock_context, FMODE_WRITE) == -EIO)
> > + rpc_exit(task, -EIO); /* lost lock, terminate I/O */
> > +}
> > +
> > +static void ff_layout_write_call_done(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
> > + task->tk_status == 0) {
> > + nfs4_sequence_done(task, &hdr->res.seq_res);
> > + return;
> > + }
> > +
> > + /* Note this may cause RPC to be resent */
> > + hdr->mds_ops->rpc_call_done(task, hdr);
> > +}
> > +
> > +static void ff_layout_write_count_stats(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_pgio_header *hdr = data;
> > +
> > + rpc_count_iostats_metrics(task,
> > + &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_WRITE]);
> > +}
> > +
> > +static void ff_layout_commit_prepare_v3(struct rpc_task *task, void *data)
> > +{
> > + rpc_call_start(task);
> > +}
> > +
> > +static void ff_layout_commit_prepare_v4(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_commit_data *wdata = data;
> > +
> > + ff_layout_setup_sequence(wdata->ds_clp,
> > + &wdata->args.seq_args,
> > + &wdata->res.seq_res,
> > + task);
> > +}
> > +
> > +static void ff_layout_commit_count_stats(struct rpc_task *task, void *data)
> > +{
> > + struct nfs_commit_data *cdata = data;
> > +
> > + rpc_count_iostats_metrics(task,
> > + &NFS_CLIENT(cdata->inode)->cl_metrics[NFSPROC4_CLNT_COMMIT]);
> > +}
> > +
> > +static const struct rpc_call_ops ff_layout_read_call_ops_v3 = {
> > + .rpc_call_prepare = ff_layout_read_prepare_v3,
> > + .rpc_call_done = ff_layout_read_call_done,
> > + .rpc_count_stats = ff_layout_read_count_stats,
> > + .rpc_release = pnfs_generic_rw_release,
> > +};
> > +
> > +static const struct rpc_call_ops ff_layout_read_call_ops_v4 = {
> > + .rpc_call_prepare = ff_layout_read_prepare_v4,
> > + .rpc_call_done = ff_layout_read_call_done,
> > + .rpc_count_stats = ff_layout_read_count_stats,
> > + .rpc_release = pnfs_generic_rw_release,
> > +};
> > +
> > +static const struct rpc_call_ops ff_layout_write_call_ops_v3 = {
> > + .rpc_call_prepare = ff_layout_write_prepare_v3,
> > + .rpc_call_done = ff_layout_write_call_done,
> > + .rpc_count_stats = ff_layout_write_count_stats,
> > + .rpc_release = pnfs_generic_rw_release,
> > +};
> > +
> > +static const struct rpc_call_ops ff_layout_write_call_ops_v4 = {
> > + .rpc_call_prepare = ff_layout_write_prepare_v4,
> > + .rpc_call_done = ff_layout_write_call_done,
> > + .rpc_count_stats = ff_layout_write_count_stats,
> > + .rpc_release = pnfs_generic_rw_release,
> > +};
> > +
> > +static const struct rpc_call_ops ff_layout_commit_call_ops_v3 = {
> > + .rpc_call_prepare = ff_layout_commit_prepare_v3,
> > + .rpc_call_done = pnfs_generic_write_commit_done,
> > + .rpc_count_stats = ff_layout_commit_count_stats,
> > + .rpc_release = pnfs_generic_commit_release,
> > +};
> > +
> > +static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
> > + .rpc_call_prepare = ff_layout_commit_prepare_v4,
> > + .rpc_call_done = pnfs_generic_write_commit_done,
> > + .rpc_count_stats = ff_layout_commit_count_stats,
> > + .rpc_release = pnfs_generic_commit_release,
> > +};
> > +
> > +static enum pnfs_try_status
> > +ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
> > +{
> > + struct pnfs_layout_segment *lseg = hdr->lseg;
> > + struct nfs4_pnfs_ds *ds;
> > + struct rpc_clnt *ds_clnt;
> > + struct rpc_cred *ds_cred;
> > + loff_t offset = hdr->args.offset;
> > + u32 idx = hdr->pgio_mirror_idx;
> > + int vers;
> > + struct nfs_fh *fh;
> > +
> > + dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
> > + __func__, hdr->inode->i_ino,
> > + hdr->args.pgbase, (size_t)hdr->args.count, offset);
> > +
> > + ds = nfs4_ff_layout_prepare_ds(lseg, idx, false);
> > + if (!ds)
> > + goto out_failed;
> > +
> > + ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
> > + hdr->inode);
> > + if (IS_ERR(ds_clnt))
> > + goto out_failed;
> > +
> > + ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
> > + if (IS_ERR(ds_cred))
> > + goto out_failed;
> > +
> > + vers = nfs4_ff_layout_ds_version(lseg, idx);
> > +
> > + dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
> > + ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count), vers);
> > +
> > + atomic_inc(&ds->ds_clp->cl_count);
> > + hdr->ds_clp = ds->ds_clp;
> > + fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
> > + if (fh)
> > + hdr->args.fh = fh;
> > +
> > + /*
> > + * Note that if we ever decide to split across DSes,
> > + * then we may need to handle dense-like offsets.
> > + */
> > + hdr->args.offset = offset;
> > + hdr->mds_offset = offset;
> > +
> > + /* Perform an asynchronous read to ds */
> > + nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
> > + vers == 3 ? &ff_layout_read_call_ops_v3 :
> > + &ff_layout_read_call_ops_v4,
> > + 0, RPC_TASK_SOFTCONN);
> > +
> > + return PNFS_ATTEMPTED;
> > +
> > +out_failed:
> > + if (ff_layout_has_available_ds(lseg))
> > + return PNFS_TRY_AGAIN;
> > + return PNFS_NOT_ATTEMPTED;
> > +}
> > +
> > +/* Perform async writes. */
> > +static enum pnfs_try_status
> > +ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
> > +{
> > + struct pnfs_layout_segment *lseg = hdr->lseg;
> > + struct nfs4_pnfs_ds *ds;
> > + struct rpc_clnt *ds_clnt;
> > + struct rpc_cred *ds_cred;
> > + loff_t offset = hdr->args.offset;
> > + int vers;
> > + struct nfs_fh *fh;
> > + int idx = hdr->pgio_mirror_idx;
> > +
> > + ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
> > + if (!ds)
> > + return PNFS_NOT_ATTEMPTED;
> > +
> > + ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
> > + hdr->inode);
> > + if (IS_ERR(ds_clnt))
> > + return PNFS_NOT_ATTEMPTED;
> > +
> > + ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
> > + if (IS_ERR(ds_cred))
> > + return PNFS_NOT_ATTEMPTED;
> > +
> > + vers = nfs4_ff_layout_ds_version(lseg, idx);
> > +
> > + dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d vers %d\n",
> > + __func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
> > + offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count),
> > + vers);
> > +
> > + hdr->pgio_done_cb = ff_layout_write_done_cb;
> > + atomic_inc(&ds->ds_clp->cl_count);
> > + hdr->ds_clp = ds->ds_clp;
> > + hdr->ds_commit_idx = idx;
> > + fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
> > + if (fh)
> > + hdr->args.fh = fh;
> > +
> > + /*
> > + * Note that if we ever decide to split across DSes,
> > + * then we may need to handle dense-like offsets.
> > + */
> > + hdr->args.offset = offset;
> > +
> > + /* Perform an asynchronous write */
> > + nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
> > + vers == 3 ? &ff_layout_write_call_ops_v3 :
> > + &ff_layout_write_call_ops_v4,
> > + sync, RPC_TASK_SOFTCONN);
> > + return PNFS_ATTEMPTED;
> > +}
> > +
> > +static void
> > +ff_layout_mark_request_commit(struct nfs_page *req,
> > + struct pnfs_layout_segment *lseg,
> > + struct nfs_commit_info *cinfo,
> > + u32 ds_commit_idx)
> > +{
> > + struct list_head *list;
> > + struct pnfs_commit_bucket *buckets;
> > +
> > + spin_lock(cinfo->lock);
> > + buckets = cinfo->ds->buckets;
> > + list = &buckets[ds_commit_idx].written;
> > + if (list_empty(list)) {
> > + /* Non-empty buckets hold a reference on the lseg. That ref
> > + * is normally transferred to the COMMIT call and released
> > + * there. It could also be released if the last req is pulled
> > + * off due to a rewrite, in which case it will be done in
> > + * pnfs_common_clear_request_commit
> > + */
> > + WARN_ON_ONCE(buckets[ds_commit_idx].wlseg != NULL);
> > + buckets[ds_commit_idx].wlseg = pnfs_get_lseg(lseg);
> > + }
> > + set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
> > + cinfo->ds->nwritten++;
> > +
> > + /* nfs_request_add_commit_list(). We need to add req to list without
> > + * dropping cinfo lock.
> > + */
> > + set_bit(PG_CLEAN, &(req)->wb_flags);
> > + nfs_list_add_request(req, list);
> > + cinfo->mds->ncommit++;
> > + spin_unlock(cinfo->lock);
> > + if (!cinfo->dreq) {
> > + inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
> > + inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
> > + BDI_RECLAIMABLE);
> > + __mark_inode_dirty(req->wb_context->dentry->d_inode,
> > + I_DIRTY_DATASYNC);
> > + }
> > +}
> > +
> > +static u32 calc_ds_index_from_commit(struct pnfs_layout_segment *lseg, u32 i)
> > +{
> > + return i;
> > +}
>
> Is calc_ds_index_from_commit() this something that will be expanded on later?
Ah, it took me a bit, but this is a copy of the file layout
calc_ds_index_from_commit(). And we only support SPARSE striping.
We could below simply state:
idx = data->ds_commit_index;
But I wanted to keep the same flow as the file layout.
More than willing to change this now that we know we will not
support DENSE layouts.
>
> > +
> > +static struct nfs_fh *
> > +select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
> > +{
> > + struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
> > +
> > + /* FIXME: Assume that there is only one NFS version available
> > + * for the DS.
> > + */
> > + return &flseg->mirror_array[i]->fh_versions[0];
> > +}
> > +
> > +static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
> > +{
> > + struct pnfs_layout_segment *lseg = data->lseg;
> > + struct nfs4_pnfs_ds *ds;
> > + struct rpc_clnt *ds_clnt;
> > + struct rpc_cred *ds_cred;
> > + u32 idx;
> > + int vers;
> > + struct nfs_fh *fh;
> > +
> > + idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
> > + ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
> > + if (!ds)
> > + goto out_err;
> > +
> > + ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
> > + data->inode);
> > + if (IS_ERR(ds_clnt))
> > + goto out_err;
> > +
> > + ds_cred = ff_layout_get_ds_cred(lseg, idx, data->cred);
> > + if (IS_ERR(ds_cred))
> > + goto out_err;
> > +
> > + vers = nfs4_ff_layout_ds_version(lseg, idx);
> > +
> > + dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
> > + data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count),
> > + vers);
> > + data->commit_done_cb = ff_layout_commit_done_cb;
> > + data->cred = ds_cred;
> > + atomic_inc(&ds->ds_clp->cl_count);
> > + data->ds_clp = ds->ds_clp;
> > + fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
> > + if (fh)
> > + data->args.fh = fh;
> > + return nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
> > + vers == 3 ? &ff_layout_commit_call_ops_v3 :
> > + &ff_layout_commit_call_ops_v4,
> > + how, RPC_TASK_SOFTCONN);
> > +out_err:
> > + pnfs_generic_prepare_to_resend_writes(data);
> > + pnfs_generic_commit_release(data);
> > + return -EAGAIN;
> > +}
> > +
> > +static int
> > +ff_layout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
> > + int how, struct nfs_commit_info *cinfo)
> > +{
> > + return pnfs_generic_commit_pagelist(inode, mds_pages, how, cinfo,
> > + ff_layout_initiate_commit);
> > +}
> > +
> > +static struct pnfs_ds_commit_info *
> > +ff_layout_get_ds_info(struct inode *inode)
> > +{
> > + struct pnfs_layout_hdr *layout = NFS_I(inode)->layout;
> > +
> > + if (layout == NULL)
> > + return NULL;
> > + else
> ^^^^
> Nit: We don't need the else here.
Agreed.
>
> Thanks,
> Anna
>
> > + return &FF_LAYOUT_FROM_HDR(layout)->commit_info;
> > +}
> > +
> > +static void
> > +ff_layout_free_deveiceid_node(struct nfs4_deviceid_node *d)
> > +{
> > + nfs4_ff_layout_free_deviceid(container_of(d, struct nfs4_ff_layout_ds,
> > + id_node));
> > +}
> > +
> > +static int ff_layout_encode_ioerr(struct nfs4_flexfile_layout *flo,
> > + struct xdr_stream *xdr,
> > + const struct nfs4_layoutreturn_args *args)
> > +{
> > + struct pnfs_layout_hdr *hdr = &flo->generic_hdr;
> > + __be32 *start;
> > + int count = 0, ret = 0;
> > +
> > + start = xdr_reserve_space(xdr, 4);
> > + if (unlikely(!start))
> > + return -E2BIG;
> > +
> > + /* This assume we always return _ALL_ layouts */
> > + spin_lock(&hdr->plh_inode->i_lock);
> > + ret = ff_layout_encode_ds_ioerr(flo, xdr, &count, &args->range);
> > + spin_unlock(&hdr->plh_inode->i_lock);
> > +
> > + *start = cpu_to_be32(count);
> > +
> > + return ret;
> > +}
> > +
> > +/* report nothing for now */
> > +static void ff_layout_encode_iostats(struct nfs4_flexfile_layout *flo,
> > + struct xdr_stream *xdr,
> > + const struct nfs4_layoutreturn_args *args)
> > +{
> > + __be32 *p;
> > +
> > + p = xdr_reserve_space(xdr, 4);
> > + if (likely(p))
> > + *p = cpu_to_be32(0);
> > +}
> > +
> > +static struct nfs4_deviceid_node *
> > +ff_layout_alloc_deviceid_node(struct nfs_server *server,
> > + struct pnfs_device *pdev, gfp_t gfp_flags)
> > +{
> > + struct nfs4_ff_layout_ds *dsaddr;
> > +
> > + dsaddr = nfs4_ff_alloc_deviceid_node(server, pdev, gfp_flags);
> > + if (!dsaddr)
> > + return NULL;
> > + return &dsaddr->id_node;
> > +}
> > +
> > +static void
> > +ff_layout_encode_layoutreturn(struct pnfs_layout_hdr *lo,
> > + struct xdr_stream *xdr,
> > + const struct nfs4_layoutreturn_args *args)
> > +{
> > + struct nfs4_flexfile_layout *flo = FF_LAYOUT_FROM_HDR(lo);
> > + __be32 *start;
> > +
> > + dprintk("%s: Begin\n", __func__);
> > + start = xdr_reserve_space(xdr, 4);
> > + BUG_ON(!start);
> > +
> > + if (ff_layout_encode_ioerr(flo, xdr, args))
> > + goto out;
> > +
> > + ff_layout_encode_iostats(flo, xdr, args);
> > +out:
> > + *start = cpu_to_be32((xdr->p - start - 1) * 4);
> > + dprintk("%s: Return\n", __func__);
> > +}
> > +
> > +static struct pnfs_layoutdriver_type flexfilelayout_type = {
> > + .id = LAYOUT_FLEX_FILES,
> > + .name = "LAYOUT_FLEX_FILES",
> > + .owner = THIS_MODULE,
> > + .alloc_layout_hdr = ff_layout_alloc_layout_hdr,
> > + .free_layout_hdr = ff_layout_free_layout_hdr,
> > + .alloc_lseg = ff_layout_alloc_lseg,
> > + .free_lseg = ff_layout_free_lseg,
> > + .pg_read_ops = &ff_layout_pg_read_ops,
> > + .pg_write_ops = &ff_layout_pg_write_ops,
> > + .get_ds_info = ff_layout_get_ds_info,
> > + .free_deviceid_node = ff_layout_free_deveiceid_node,
> > + .mark_request_commit = ff_layout_mark_request_commit,
> > + .clear_request_commit = pnfs_generic_clear_request_commit,
> > + .scan_commit_lists = pnfs_generic_scan_commit_lists,
> > + .recover_commit_reqs = pnfs_generic_recover_commit_reqs,
> > + .commit_pagelist = ff_layout_commit_pagelist,
> > + .read_pagelist = ff_layout_read_pagelist,
> > + .write_pagelist = ff_layout_write_pagelist,
> > + .alloc_deviceid_node = ff_layout_alloc_deviceid_node,
> > + .encode_layoutreturn = ff_layout_encode_layoutreturn,
> > +};
> > +
> > +static int __init nfs4flexfilelayout_init(void)
> > +{
> > + printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Registering...\n",
> > + __func__);
> > + return pnfs_register_layoutdriver(&flexfilelayout_type);
> > +}
> > +
> > +static void __exit nfs4flexfilelayout_exit(void)
> > +{
> > + printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Unregistering...\n",
> > + __func__);
> > + pnfs_unregister_layoutdriver(&flexfilelayout_type);
> > +}
> > +
> > +MODULE_ALIAS("nfs-layouttype4-4");
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_DESCRIPTION("The NFSv4 flexfile layout driver");
> > +
> > +module_init(nfs4flexfilelayout_init);
> > +module_exit(nfs4flexfilelayout_exit);
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
> > new file mode 100644
> > index 0000000..712fc55
> > --- /dev/null
> > +++ b/fs/nfs/flexfilelayout/flexfilelayout.h
> > @@ -0,0 +1,158 @@
> > +/*
> > + * NFSv4 flexfile layout driver data structures.
> > + *
> > + * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
> > + *
> > + * Tao Peng <[email protected]>
> > + */
> > +
> > +#ifndef FS_NFS_NFS4FLEXFILELAYOUT_H
> > +#define FS_NFS_NFS4FLEXFILELAYOUT_H
> > +
> > +#include "../pnfs.h"
> > +
> > +/* XXX: Let's filter out insanely large mirror count for now to avoid oom
> > + * due to network error etc. */
> > +#define NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT 4096
> > +
> > +struct nfs4_ff_ds_version {
> > + u32 version;
> > + u32 minor_version;
> > + u32 rsize;
> > + u32 wsize;
> > + bool tightly_coupled;
> > +};
> > +
> > +/* chained in global deviceid hlist */
> > +struct nfs4_ff_layout_ds {
> > + struct nfs4_deviceid_node id_node;
> > + u32 ds_versions_cnt;
> > + struct nfs4_ff_ds_version *ds_versions;
> > + struct nfs4_pnfs_ds *ds;
> > +};
> > +
> > +struct nfs4_ff_layout_ds_err {
> > + struct list_head list; /* linked in mirror error_list */
> > + u64 offset;
> > + u64 length;
> > + int status;
> > + enum nfs_opnum4 opnum;
> > + nfs4_stateid stateid;
> > + struct nfs4_deviceid deviceid;
> > +};
> > +
> > +struct nfs4_ff_layout_mirror {
> > + u32 ds_count;
> > + u32 efficiency;
> > + struct nfs4_ff_layout_ds *mirror_ds;
> > + u32 fh_versions_cnt;
> > + struct nfs_fh *fh_versions;
> > + nfs4_stateid stateid;
> > + union {
> > + struct { /* same as struct unx_cred */
> > + u32 uid; /* -1 iff AUTH_NONE */
> > + u32 gid; /* -1 iff AUTH_NONE */
> > + u32 gids[16];
> > + };
> > + };
> > + struct rpc_cred *cred;
> > + spinlock_t lock;
> > +};
> > +
> > +struct nfs4_ff_layout_segment {
> > + struct pnfs_layout_segment generic_hdr;
> > + u64 stripe_unit;
> > + u32 mirror_array_cnt;
> > + struct nfs4_ff_layout_mirror **mirror_array;
> > +};
> > +
> > +struct nfs4_flexfile_layout {
> > + struct pnfs_layout_hdr generic_hdr;
> > + struct pnfs_ds_commit_info commit_info;
> > + struct list_head error_list; /* nfs4_ff_layout_ds_err */
> > +};
> > +
> > +static inline struct nfs4_flexfile_layout *
> > +FF_LAYOUT_FROM_HDR(struct pnfs_layout_hdr *lo)
> > +{
> > + return container_of(lo, struct nfs4_flexfile_layout, generic_hdr);
> > +}
> > +
> > +static inline struct nfs4_ff_layout_segment *
> > +FF_LAYOUT_LSEG(struct pnfs_layout_segment *lseg)
> > +{
> > + return container_of(lseg,
> > + struct nfs4_ff_layout_segment,
> > + generic_hdr);
> > +}
> > +
> > +static inline struct nfs4_deviceid_node *
> > +FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
> > +{
> > + if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt ||
> > + FF_LAYOUT_LSEG(lseg)->mirror_array[idx] == NULL ||
> > + FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds == NULL)
> > + return NULL;
> > + return &FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds->id_node;
> > +}
> > +
> > +static inline struct nfs4_ff_layout_ds *
> > +FF_LAYOUT_MIRROR_DS(struct nfs4_deviceid_node *node)
> > +{
> > + return container_of(node, struct nfs4_ff_layout_ds, id_node);
> > +}
> > +
> > +static inline struct nfs4_ff_layout_mirror *
> > +FF_LAYOUT_COMP(struct pnfs_layout_segment *lseg, u32 idx)
> > +{
> > + if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt)
> > + return NULL;
> > + return FF_LAYOUT_LSEG(lseg)->mirror_array[idx];
> > +}
> > +
> > +static inline u32
> > +FF_LAYOUT_MIRROR_COUNT(struct pnfs_layout_segment *lseg)
> > +{
> > + return FF_LAYOUT_LSEG(lseg)->mirror_array_cnt;
> > +}
> > +
> > +static inline bool
> > +ff_layout_test_devid_unavailable(struct nfs4_deviceid_node *node)
> > +{
> > + return nfs4_test_deviceid_unavailable(node);
> > +}
> > +
> > +static inline int
> > +nfs4_ff_layout_ds_version(struct pnfs_layout_segment *lseg, u32 ds_idx)
> > +{
> > + return FF_LAYOUT_COMP(lseg, ds_idx)->mirror_ds->ds_versions[0].version;
> > +}
> > +
> > +struct nfs4_ff_layout_ds *
> > +nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
> > + gfp_t gfp_flags);
> > +void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
> > +void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
> > +int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
> > + struct nfs4_ff_layout_mirror *mirror, u64 offset,
> > + u64 length, int status, enum nfs_opnum4 opnum,
> > + gfp_t gfp_flags);
> > +int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
> > + struct xdr_stream *xdr, int *count,
> > + const struct pnfs_layout_range *range);
> > +struct nfs_fh *
> > +nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx);
> > +
> > +struct nfs4_pnfs_ds *
> > +nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
> > + bool fail_return);
> > +
> > +struct rpc_clnt *
> > +nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg,
> > + u32 ds_idx,
> > + struct nfs_client *ds_clp,
> > + struct inode *inode);
> > +struct rpc_cred *ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg,
> > + u32 ds_idx, struct rpc_cred *mdscred);
> > +bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg);
> > +#endif /* FS_NFS_NFS4FLEXFILELAYOUT_H */
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> > new file mode 100644
> > index 0000000..5dae5c2
> > --- /dev/null
> > +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> > @@ -0,0 +1,552 @@
> > +/*
> > + * Device operations for the pnfs nfs4 file layout driver.
> > + *
> > + * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
> > + *
> > + * Tao Peng <[email protected]>
> > + */
> > +
> > +#include <linux/nfs_fs.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/module.h>
> > +#include <linux/sunrpc/addr.h>
> > +
> > +#include "../internal.h"
> > +#include "../nfs4session.h"
> > +#include "flexfilelayout.h"
> > +
> > +#define NFSDBG_FACILITY NFSDBG_PNFS_LD
> > +
> > +static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
> > +static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
> > +
> > +void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
> > +{
> > + if (mirror_ds)
> > + nfs4_put_deviceid_node(&mirror_ds->id_node);
> > +}
> > +
> > +void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
> > +{
> > + nfs4_print_deviceid(&mirror_ds->id_node.deviceid);
> > + nfs4_pnfs_ds_put(mirror_ds->ds);
> > + kfree(mirror_ds);
> > +}
> > +
> > +/* Decode opaque device data and construct new_ds using it */
> > +struct nfs4_ff_layout_ds *
> > +nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
> > + gfp_t gfp_flags)
> > +{
> > + struct xdr_stream stream;
> > + struct xdr_buf buf;
> > + struct page *scratch;
> > + struct list_head dsaddrs;
> > + struct nfs4_pnfs_ds_addr *da;
> > + struct nfs4_ff_layout_ds *new_ds = NULL;
> > + struct nfs4_ff_ds_version *ds_versions = NULL;
> > + u32 mp_count;
> > + u32 version_count;
> > + __be32 *p;
> > + int i, ret = -ENOMEM;
> > +
> > + /* set up xdr stream */
> > + scratch = alloc_page(gfp_flags);
> > + if (!scratch)
> > + goto out_err;
> > +
> > + new_ds = kzalloc(sizeof(struct nfs4_ff_layout_ds), gfp_flags);
> > + if (!new_ds)
> > + goto out_scratch;
> > +
> > + nfs4_init_deviceid_node(&new_ds->id_node,
> > + server,
> > + &pdev->dev_id);
> > + INIT_LIST_HEAD(&dsaddrs);
> > +
> > + xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen);
> > + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
> > +
> > + /* multipath count */
> > + p = xdr_inline_decode(&stream, 4);
> > + if (unlikely(!p))
> > + goto out_err_drain_dsaddrs;
> > + mp_count = be32_to_cpup(p);
> > + dprintk("%s: multipath ds count %d\n", __func__, mp_count);
> > +
> > + for (i = 0; i < mp_count; i++) {
> > + /* multipath ds */
> > + da = nfs4_decode_mp_ds_addr(server->nfs_client->cl_net,
> > + &stream, gfp_flags);
> > + if (da)
> > + list_add_tail(&da->da_node, &dsaddrs);
> > + }
> > + if (list_empty(&dsaddrs)) {
> > + dprintk("%s: no suitable DS addresses found\n",
> > + __func__);
> > + ret = -ENOMEDIUM;
> > + goto out_err_drain_dsaddrs;
> > + }
> > +
> > + /* version count */
> > + p = xdr_inline_decode(&stream, 4);
> > + if (unlikely(!p))
> > + goto out_err_drain_dsaddrs;
> > + version_count = be32_to_cpup(p);
> > + dprintk("%s: version count %d\n", __func__, version_count);
> > +
> > + ds_versions = kzalloc(version_count * sizeof(struct nfs4_ff_ds_version),
> > + gfp_flags);
> > + if (!ds_versions)
> > + goto out_scratch;
> > +
> > + for (i = 0; i < version_count; i++) {
> > + /* 20 = version(4) + minor_version(4) + rsize(4) + wsize(4) +
> > + * tightly_coupled(4) */
> > + p = xdr_inline_decode(&stream, 20);
> > + if (unlikely(!p))
> > + goto out_err_drain_dsaddrs;
> > + ds_versions[i].version = be32_to_cpup(p++);
> > + ds_versions[i].minor_version = be32_to_cpup(p++);
> > + ds_versions[i].rsize = nfs_block_size(be32_to_cpup(p++), NULL);
> > + ds_versions[i].wsize = nfs_block_size(be32_to_cpup(p++), NULL);
> > + ds_versions[i].tightly_coupled = be32_to_cpup(p);
> > +
> > + if (ds_versions[i].rsize > NFS_MAX_FILE_IO_SIZE)
> > + ds_versions[i].rsize = NFS_MAX_FILE_IO_SIZE;
> > + if (ds_versions[i].wsize > NFS_MAX_FILE_IO_SIZE)
> > + ds_versions[i].wsize = NFS_MAX_FILE_IO_SIZE;
> > +
> > + if (ds_versions[i].version != 3 || ds_versions[i].minor_version != 0) {
> > + dprintk("%s: [%d] unsupported ds version %d-%d\n", __func__,
> > + i, ds_versions[i].version,
> > + ds_versions[i].minor_version);
> > + ret = -EPROTONOSUPPORT;
> > + goto out_err_drain_dsaddrs;
> > + }
> > +
> > + dprintk("%s: [%d] vers %u minor_ver %u rsize %u wsize %u coupled %d\n",
> > + __func__, i, ds_versions[i].version,
> > + ds_versions[i].minor_version,
> > + ds_versions[i].rsize,
> > + ds_versions[i].wsize,
> > + ds_versions[i].tightly_coupled);
> > + }
> > +
> > + new_ds->ds_versions = ds_versions;
> > + new_ds->ds_versions_cnt = version_count;
> > +
> > + new_ds->ds = nfs4_pnfs_ds_add(&dsaddrs, gfp_flags);
> > + if (!new_ds->ds)
> > + goto out_err_drain_dsaddrs;
> > +
> > + /* If DS was already in cache, free ds addrs */
> > + while (!list_empty(&dsaddrs)) {
> > + da = list_first_entry(&dsaddrs,
> > + struct nfs4_pnfs_ds_addr,
> > + da_node);
> > + list_del_init(&da->da_node);
> > + kfree(da->da_remotestr);
> > + kfree(da);
> > + }
> > +
> > + __free_page(scratch);
> > + return new_ds;
> > +
> > +out_err_drain_dsaddrs:
> > + while (!list_empty(&dsaddrs)) {
> > + da = list_first_entry(&dsaddrs, struct nfs4_pnfs_ds_addr,
> > + da_node);
> > + list_del_init(&da->da_node);
> > + kfree(da->da_remotestr);
> > + kfree(da);
> > + }
> > +
> > + kfree(ds_versions);
> > +out_scratch:
> > + __free_page(scratch);
> > +out_err:
> > + kfree(new_ds);
> > +
> > + dprintk("%s ERROR: returning %d\n", __func__, ret);
> > + return NULL;
> > +}
> > +
> > +static u64
> > +end_offset(u64 start, u64 len)
> > +{
> > + u64 end;
> > +
> > + end = start + len;
> > + return end >= start ? end : NFS4_MAX_UINT64;
> > +}
> > +
> > +static void extend_ds_error(struct nfs4_ff_layout_ds_err *err,
> > + u64 offset, u64 length)
> > +{
> > + u64 end;
> > +
> > + end = max_t(u64, end_offset(err->offset, err->length),
> > + end_offset(offset, length));
> > + err->offset = min_t(u64, err->offset, offset);
> > + err->length = end - err->offset;
> > +}
> > +
> > +static bool ds_error_can_merge(struct nfs4_ff_layout_ds_err *err, u64 offset,
> > + u64 length, int status, enum nfs_opnum4 opnum,
> > + nfs4_stateid *stateid,
> > + struct nfs4_deviceid *deviceid)
> > +{
> > + return err->status == status && err->opnum == opnum &&
> > + nfs4_stateid_match(&err->stateid, stateid) &&
> > + !memcmp(&err->deviceid, deviceid, sizeof(*deviceid)) &&
> > + end_offset(err->offset, err->length) >= offset &&
> > + err->offset <= end_offset(offset, length);
> > +}
> > +
> > +static bool merge_ds_error(struct nfs4_ff_layout_ds_err *old,
> > + struct nfs4_ff_layout_ds_err *new)
> > +{
> > + if (!ds_error_can_merge(old, new->offset, new->length, new->status,
> > + new->opnum, &new->stateid, &new->deviceid))
> > + return false;
> > +
> > + extend_ds_error(old, new->offset, new->length);
> > + return true;
> > +}
> > +
> > +static bool
> > +ff_layout_add_ds_error_locked(struct nfs4_flexfile_layout *flo,
> > + struct nfs4_ff_layout_ds_err *dserr)
> > +{
> > + struct nfs4_ff_layout_ds_err *err;
> > +
> > + list_for_each_entry(err, &flo->error_list, list) {
> > + if (merge_ds_error(err, dserr)) {
> > + return true;
> > + }
> > + }
> > +
> > + list_add(&dserr->list, &flo->error_list);
> > + return false;
> > +}
> > +
> > +static bool
> > +ff_layout_update_ds_error(struct nfs4_flexfile_layout *flo, u64 offset,
> > + u64 length, int status, enum nfs_opnum4 opnum,
> > + nfs4_stateid *stateid, struct nfs4_deviceid *deviceid)
> > +{
> > + bool found = false;
> > + struct nfs4_ff_layout_ds_err *err;
> > +
> > + list_for_each_entry(err, &flo->error_list, list) {
> > + if (ds_error_can_merge(err, offset, length, status, opnum,
> > + stateid, deviceid)) {
> > + found = true;
> > + extend_ds_error(err, offset, length);
> > + break;
> > + }
> > + }
> > +
> > + return found;
> > +}
> > +
> > +int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
> > + struct nfs4_ff_layout_mirror *mirror, u64 offset,
> > + u64 length, int status, enum nfs_opnum4 opnum,
> > + gfp_t gfp_flags)
> > +{
> > + struct nfs4_ff_layout_ds_err *dserr;
> > + bool needfree;
> > +
> > + if (status == 0)
> > + return 0;
> > +
> > + if (mirror->mirror_ds == NULL)
> > + return -EINVAL;
> > +
> > + spin_lock(&flo->generic_hdr.plh_inode->i_lock);
> > + if (ff_layout_update_ds_error(flo, offset, length, status, opnum,
> > + &mirror->stateid,
> > + &mirror->mirror_ds->id_node.deviceid)) {
> > + spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
> > + return 0;
> > + }
> > + spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
> > + dserr = kmalloc(sizeof(*dserr), gfp_flags);
> > + if (!dserr)
> > + return -ENOMEM;
> > +
> > + INIT_LIST_HEAD(&dserr->list);
> > + dserr->offset = offset;
> > + dserr->length = length;
> > + dserr->status = status;
> > + dserr->opnum = opnum;
> > + nfs4_stateid_copy(&dserr->stateid, &mirror->stateid);
> > + memcpy(&dserr->deviceid, &mirror->mirror_ds->id_node.deviceid,
> > + NFS4_DEVICEID4_SIZE);
> > +
> > + spin_lock(&flo->generic_hdr.plh_inode->i_lock);
> > + needfree = ff_layout_add_ds_error_locked(flo, dserr);
> > + spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
> > + if (needfree)
> > + kfree(dserr);
> > +
> > + return 0;
> > +}
> > +
> > +/* currently we only support AUTH_NONE and AUTH_SYS */
> > +static rpc_authflavor_t
> > +nfs4_ff_layout_choose_authflavor(struct nfs4_ff_layout_mirror *mirror)
> > +{
> > + if (mirror->uid == (u32)-1)
> > + return RPC_AUTH_NULL;
> > + return RPC_AUTH_UNIX;
> > +}
> > +
> > +/* fetch cred for NFSv3 DS */
> > +static int ff_layout_update_mirror_cred(struct nfs4_ff_layout_mirror *mirror,
> > + struct nfs4_pnfs_ds *ds)
> > +{
> > + if (ds && !mirror->cred && mirror->mirror_ds->ds_versions[0].version == 3) {
> > + struct rpc_auth *auth = ds->ds_clp->cl_rpcclient->cl_auth;
> > + struct rpc_cred *cred;
> > + struct auth_cred acred = {
> > + .uid = make_kuid(&init_user_ns, mirror->uid),
> > + .gid = make_kgid(&init_user_ns, mirror->gid),
> > + };
> > +
> > + /* AUTH_NULL ignores acred */
> > + cred = auth->au_ops->lookup_cred(auth, &acred, 0);
> > + if (IS_ERR(cred)) {
> > + dprintk("%s: lookup_cred failed with %ld\n",
> > + __func__, PTR_ERR(cred));
> > + return PTR_ERR(cred);
> > + } else {
> > + mirror->cred = cred;
> > + }
> > + }
> > + return 0;
> > +}
> > +
> > +struct nfs_fh *
> > +nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, mirror_idx);
> > + struct nfs_fh *fh = NULL;
> > + struct nfs4_deviceid_node *devid;
> > +
> > + if (mirror == NULL || mirror->mirror_ds == NULL ||
> > + mirror->mirror_ds->ds == NULL) {
> > + printk(KERN_ERR "NFS: %s: No data server for mirror offset index %d\n",
> > + __func__, mirror_idx);
> > + if (mirror && mirror->mirror_ds) {
> > + devid = &mirror->mirror_ds->id_node;
> > + pnfs_generic_mark_devid_invalid(devid);
> > + }
> > + goto out;
> > + }
> > +
> > + /* FIXME: For now assume there is only 1 version available for the DS */
> > + fh = &mirror->fh_versions[0];
> > +out:
> > + return fh;
> > +}
> > +
> > +/* Upon return, either ds is connected, or ds is NULL */
> > +struct nfs4_pnfs_ds *
> > +nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
> > + bool fail_return)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
> > + struct nfs4_pnfs_ds *ds = NULL;
> > + struct nfs4_deviceid_node *devid;
> > + struct inode *ino = lseg->pls_layout->plh_inode;
> > + struct nfs_server *s = NFS_SERVER(ino);
> > + unsigned int max_payload;
> > + rpc_authflavor_t flavor;
> > +
> > + if (mirror == NULL || mirror->mirror_ds == NULL ||
> > + mirror->mirror_ds->ds == NULL) {
> > + printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
> > + __func__, ds_idx);
> > + if (mirror && mirror->mirror_ds) {
> > + devid = &mirror->mirror_ds->id_node;
> > + pnfs_generic_mark_devid_invalid(devid);
> > + }
> > + goto out;
> > + }
> > +
> > + ds = mirror->mirror_ds->ds;
> > + devid = &mirror->mirror_ds->id_node;
> > +
> > + /* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
> > + smp_rmb();
> > + if (ds->ds_clp)
> > + goto out_test_devid;
> > +
> > + flavor = nfs4_ff_layout_choose_authflavor(mirror);
> > +
> > + /* FIXME: For now we assume the server sent only one version of NFS
> > + * to use for the DS.
> > + */
> > + nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
> > + dataserver_retrans,
> > + mirror->mirror_ds->ds_versions[0].version,
> > + mirror->mirror_ds->ds_versions[0].minor_version,
> > + flavor);
> > +
> > + /* connect success, check rsize/wsize limit */
> > + if (ds->ds_clp) {
> > + max_payload =
> > + nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
> > + NULL);
> > + if (mirror->mirror_ds->ds_versions[0].rsize > max_payload)
> > + mirror->mirror_ds->ds_versions[0].rsize = max_payload;
> > + if (mirror->mirror_ds->ds_versions[0].wsize > max_payload)
> > + mirror->mirror_ds->ds_versions[0].wsize = max_payload;
> > + } else {
> > + ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
> > + mirror, lseg->pls_range.offset,
> > + lseg->pls_range.length, NFS4ERR_NXIO,
> > + OP_ILLEGAL, GFP_NOIO);
> > + if (fail_return) {
> > + pnfs_error_mark_layout_for_return(ino, lseg);
> > + if (ff_layout_has_available_ds(lseg))
> > + pnfs_set_retry_layoutget(lseg->pls_layout);
> > + else
> > + pnfs_clear_retry_layoutget(lseg->pls_layout);
> > +
> > + } else {
> > + if (ff_layout_has_available_ds(lseg))
> > + set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
> > + &lseg->pls_layout->plh_flags);
> > + else {
> > + pnfs_error_mark_layout_for_return(ino, lseg);
> > + pnfs_clear_retry_layoutget(lseg->pls_layout);
> > + }
> > + }
> > + }
> > +
> > +out_test_devid:
> > + if (ff_layout_test_devid_unavailable(devid))
> > + ds = NULL;
> > +out:
> > + if (ff_layout_update_mirror_cred(mirror, ds))
> > + ds = NULL;
> > + return ds;
> > +}
> > +
> > +struct rpc_cred *
> > +ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg, u32 ds_idx,
> > + struct rpc_cred *mdscred)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
> > + struct rpc_cred *cred = ERR_PTR(-EINVAL);
> > +
> > + if (!nfs4_ff_layout_prepare_ds(lseg, ds_idx, true))
> > + goto out;
> > +
> > + if (mirror && mirror->cred)
> > + cred = mirror->cred;
> > + else
> > + cred = mdscred;
> > +out:
> > + return cred;
> > +}
> > +
> > +/**
> > +* Find or create a DS rpc client with th MDS server rpc client auth flavor
> > +* in the nfs_client cl_ds_clients list.
> > +*/
> > +struct rpc_clnt *
> > +nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg, u32 ds_idx,
> > + struct nfs_client *ds_clp, struct inode *inode)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
> > +
> > + switch (mirror->mirror_ds->ds_versions[0].version) {
> > + case 3:
> > + /* For NFSv3 DS, flavor is set when creating DS connections */
> > + return ds_clp->cl_rpcclient;
> > + case 4:
> > + return nfs4_find_or_create_ds_client(ds_clp, inode);
> > + default:
> > + BUG();
> > + }
> > +}
> > +
> > +static bool is_range_intersecting(u64 offset1, u64 length1,
> > + u64 offset2, u64 length2)
> > +{
> > + u64 end1 = end_offset(offset1, length1);
> > + u64 end2 = end_offset(offset2, length2);
> > +
> > + return (end1 == NFS4_MAX_UINT64 || end1 > offset2) &&
> > + (end2 == NFS4_MAX_UINT64 || end2 > offset1);
> > +}
> > +
> > +/* called with inode i_lock held */
> > +int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
> > + struct xdr_stream *xdr, int *count,
> > + const struct pnfs_layout_range *range)
> > +{
> > + struct nfs4_ff_layout_ds_err *err, *n;
> > + __be32 *p;
> > +
> > + list_for_each_entry_safe(err, n, &flo->error_list, list) {
> > + if (!is_range_intersecting(err->offset, err->length,
> > + range->offset, range->length))
> > + continue;
> > + /* offset(8) + length(8) + stateid(NFS4_STATEID_SIZE)
> > + * + deviceid(NFS4_DEVICEID4_SIZE) + status(4) + opnum(4)
> > + */
> > + p = xdr_reserve_space(xdr,
> > + 24 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
> > + if (unlikely(!p))
> > + return -ENOBUFS;
> > + p = xdr_encode_hyper(p, err->offset);
> > + p = xdr_encode_hyper(p, err->length);
> > + p = xdr_encode_opaque_fixed(p, &err->stateid,
> > + NFS4_STATEID_SIZE);
> > + p = xdr_encode_opaque_fixed(p, &err->deviceid,
> > + NFS4_DEVICEID4_SIZE);
> > + *p++ = cpu_to_be32(err->status);
> > + *p++ = cpu_to_be32(err->opnum);
> > + *count += 1;
> > + list_del(&err->list);
> > + kfree(err);
> > + dprintk("%s: offset %llu length %llu status %d op %d count %d\n",
> > + __func__, err->offset, err->length, err->status,
> > + err->opnum, *count);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg)
> > +{
> > + struct nfs4_ff_layout_mirror *mirror;
> > + struct nfs4_deviceid_node *devid;
> > + int idx;
> > +
> > + for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
> > + mirror = FF_LAYOUT_COMP(lseg, idx);
> > + if (mirror && mirror->mirror_ds) {
> > + devid = &mirror->mirror_ds->id_node;
> > + if (!ff_layout_test_devid_unavailable(devid))
> > + return true;
> > + }
> > + }
> > +
> > + return false;
> > +}
> > +
> > +module_param(dataserver_retrans, uint, 0644);
> > +MODULE_PARM_DESC(dataserver_retrans, "The number of times the NFSv4.1 client "
> > + "retries a request before it attempts further "
> > + " recovery action.");
> > +module_param(dataserver_timeo, uint, 0644);
> > +MODULE_PARM_DESC(dataserver_timeo, "The time (in tenths of a second) the "
> > + "NFSv4.1 client waits for a response from a "
> > + " data server before it retries an NFS request.");
> > diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> > index 022b761..de7c91c 100644
> > --- a/include/linux/nfs4.h
> > +++ b/include/linux/nfs4.h
> > @@ -516,6 +516,7 @@ enum pnfs_layouttype {
> > LAYOUT_NFSV4_1_FILES = 1,
> > LAYOUT_OSD2_OBJECTS = 2,
> > LAYOUT_BLOCK_VOLUME = 3,
> > + LAYOUT_FLEX_FILES = 4,
> > };
> >
> > /* used for both layout return and recall */
> >
>