Hi,
This patchset introduces the Flexfile Layout Module for the
client.
It corresponds to draft 4
(http://tools.ietf.org/id/draft-ietf-nfsv4-flex-files-04.txt)
of the Parallel NFS (pNFS) Flexible File Layout
(https://datatracker.ietf.org/doc/draft-ietf-nfsv4-flex-files/).
These changes from v3 are based off of additional review comments
from Christoph and Anna. And I updated the Documentation for
the Layout Types.
The [flexfiles] branch of
git://git.linux-nfs.org/projects/loghyr/linux-nfs.git
also has this code.
Thanks,
Tom
Peng Tao (35):
nfs41: pull data server cache from file layout to generic pnfs
nfs41: pull decode_ds_addr from file layout to generic pnfs
nfs41: pull nfs4_ds_connect from file layout to generic pnfs
nfs41: allow LD to choose DS connection auth flavor
nfs41: move file layout macros to generic pnfs
nfsv3: introduce nfs3_set_ds_client
nfs41: allow LD to choose DS connection version/minor_version
nfs41: create NFSv3 DS connection if specified
nfs: allow different protocol in nfs_initiate_commit
nfs4: pass slot table to nfs40_setup_sequence
nfs4: export nfs4_sequence_done
nfs: allow to specify cred in nfs_initiate_pgio
nfs: set hostname when creating nfsv3 ds connection
nfs/flexclient: export pnfs_layoutcommit_inode
nfs41: close a small race window when adding new layout to global list
nfs41: serialize first layoutget of a file
nfs: save server READ/WRITE/COMMIT status
nfs41: pass iomode through layoutreturn args
nfs41: make a helper function to send layoutreturn
nfs41: add a helper to mark layout for return
nfs41: don't use a layout if it is marked for returning
nfs41: send layoutreturn in last put_lseg
nfs41: clear NFS_LAYOUT_RETURN if layoutreturn is sent or failed to
send
nfs/filelayout: use pnfs_error_mark_layout_for_return
nfs41: add a debug warning if we destroy an unempty layout
nfs: only reset desc->pg_mirror_idx when mirroring is supported
nfs: add nfs_pgio_current_mirror helper
pnfs: allow LD to ask to resend read through pnfs
nfs41: add range to layoutreturn args
nfs41: allow async version layoutreturn
nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
nfs/flexfiles: send layoutreturn before freeing lseg
nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
Tom Haynes (5):
pnfs: Prepare for flexfiles by pulling out common code
pnfs: Do not grab the commit_info lock twice when rescheduling writes
pnfs: Add nfs_rpc_ops in calls to nfs_initiate_pgio
pnfs/flexfiles: Add the FlexFile Layout Driver
pnfs: Update documentation on the Layout Drivers
Trond Myklebust (1):
NFSv4.1/NFSv3: Add pNFS callbacks for nfs3_(read|write|commit)_done()
Weston Andros Adamson (9):
sunrpc: add rpc_count_iostats_idx
nfs: introduce pg_cleanup op for pgio descriptors
pnfs: release lseg in pnfs_generic_pg_cleanup
nfs: handle overlapping reqs in lock_and_join
nfs: rename pgio header ds_idx to ds_commit_idx
pnfs: pass ds_commit_idx through the commit path
nfs: add mirroring support to pgio layer
nfs: mirroring support for direct io
pnfs: fail comparison when bucket verifier not set
Documentation/filesystems/nfs/pnfs.txt | 13 +-
fs/nfs/Kconfig | 5 +
fs/nfs/Makefile | 3 +-
fs/nfs/blocklayout/blocklayout.c | 2 +
fs/nfs/direct.c | 108 +-
fs/nfs/filelayout/filelayout.c | 315 +-----
fs/nfs/filelayout/filelayout.h | 40 -
fs/nfs/filelayout/filelayoutdev.c | 469 +--------
fs/nfs/flexfilelayout/Makefile | 5 +
fs/nfs/flexfilelayout/flexfilelayout.c | 1603 +++++++++++++++++++++++++++++
fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
fs/nfs/internal.h | 31 +-
fs/nfs/nfs2xdr.c | 10 +-
fs/nfs/nfs3_fs.h | 2 +
fs/nfs/nfs3client.c | 41 +
fs/nfs/nfs3proc.c | 9 +
fs/nfs/nfs3super.c | 2 +-
fs/nfs/nfs3xdr.c | 3 +
fs/nfs/nfs4_fs.h | 6 +
fs/nfs/nfs4client.c | 7 +-
fs/nfs/nfs4proc.c | 48 +-
fs/nfs/nfs4xdr.c | 9 +-
fs/nfs/objlayout/objio_osd.c | 5 +-
fs/nfs/pagelist.c | 294 +++++-
fs/nfs/pnfs.c | 419 ++++++--
fs/nfs/pnfs.h | 120 ++-
fs/nfs/pnfs_nfs.c | 808 +++++++++++++++
fs/nfs/read.c | 33 +-
fs/nfs/write.c | 52 +-
include/linux/nfs4.h | 1 +
include/linux/nfs_fs_sb.h | 9 +-
include/linux/nfs_page.h | 22 +-
include/linux/nfs_xdr.h | 6 +-
include/linux/sunrpc/metrics.h | 2 +
net/sunrpc/stats.c | 26 +-
36 files changed, 4221 insertions(+), 1017 deletions(-)
create mode 100644 fs/nfs/flexfilelayout/Makefile
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
create mode 100644 fs/nfs/pnfs_nfs.c
--
1.9.3
The flexfilelayout driver will share some common code
with the filelayout driver. This set of changes refactors
that common code out to avoid any module depenencies.
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/Makefile | 2 +-
fs/nfs/filelayout/filelayout.c | 291 ++------------------------------------
fs/nfs/filelayout/filelayout.h | 11 --
fs/nfs/filelayout/filelayoutdev.c | 2 +-
fs/nfs/pnfs.h | 23 +++
fs/nfs/pnfs_nfs.c | 291 ++++++++++++++++++++++++++++++++++++++
6 files changed, 330 insertions(+), 290 deletions(-)
create mode 100644 fs/nfs/pnfs_nfs.c
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 04cb830..23abffa 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -27,7 +27,7 @@ nfsv4-y := nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o nfs4super.o nfs4file.o
dns_resolve.o nfs4trace.o
nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
nfsv4-$(CONFIG_SYSCTL) += nfs4sysctl.o
-nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o
+nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o pnfs_nfs.o
nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 7afb52f..bc36ed3 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -118,13 +118,6 @@ static void filelayout_reset_read(struct nfs_pgio_header *hdr)
}
}
-static void filelayout_fenceme(struct inode *inode, struct pnfs_layout_hdr *lo)
-{
- if (!test_and_clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
- return;
- pnfs_return_layout(inode);
-}
-
static int filelayout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
@@ -339,16 +332,6 @@ static void filelayout_read_count_stats(struct rpc_task *task, void *data)
rpc_count_iostats(task, NFS_SERVER(hdr->inode)->client->cl_metrics);
}
-static void filelayout_read_release(void *data)
-{
- struct nfs_pgio_header *hdr = data;
- struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
-
- filelayout_fenceme(lo->plh_inode, lo);
- nfs_put_client(hdr->ds_clp);
- hdr->mds_ops->rpc_release(data);
-}
-
static int filelayout_write_done_cb(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
@@ -371,17 +354,6 @@ static int filelayout_write_done_cb(struct rpc_task *task,
return 0;
}
-/* Fake up some data that will cause nfs_commit_release to retry the writes. */
-static void prepare_to_resend_writes(struct nfs_commit_data *data)
-{
- struct nfs_page *first = nfs_list_entry(data->pages.next);
-
- data->task.tk_status = 0;
- memcpy(&data->verf.verifier, &first->wb_verf,
- sizeof(data->verf.verifier));
- data->verf.verifier.data[0]++; /* ensure verifier mismatch */
-}
-
static int filelayout_commit_done_cb(struct rpc_task *task,
struct nfs_commit_data *data)
{
@@ -393,7 +365,7 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
switch (err) {
case -NFS4ERR_RESET_TO_MDS:
- prepare_to_resend_writes(data);
+ pnfs_generic_prepare_to_resend_writes(data);
return -EAGAIN;
case -EAGAIN:
rpc_restart_call_prepare(task);
@@ -451,16 +423,6 @@ static void filelayout_write_count_stats(struct rpc_task *task, void *data)
rpc_count_iostats(task, NFS_SERVER(hdr->inode)->client->cl_metrics);
}
-static void filelayout_write_release(void *data)
-{
- struct nfs_pgio_header *hdr = data;
- struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
-
- filelayout_fenceme(lo->plh_inode, lo);
- nfs_put_client(hdr->ds_clp);
- hdr->mds_ops->rpc_release(data);
-}
-
static void filelayout_commit_prepare(struct rpc_task *task, void *data)
{
struct nfs_commit_data *wdata = data;
@@ -471,14 +433,6 @@ static void filelayout_commit_prepare(struct rpc_task *task, void *data)
task);
}
-static void filelayout_write_commit_done(struct rpc_task *task, void *data)
-{
- struct nfs_commit_data *wdata = data;
-
- /* Note this may cause RPC to be resent */
- wdata->mds_ops->rpc_call_done(task, data);
-}
-
static void filelayout_commit_count_stats(struct rpc_task *task, void *data)
{
struct nfs_commit_data *cdata = data;
@@ -486,35 +440,25 @@ static void filelayout_commit_count_stats(struct rpc_task *task, void *data)
rpc_count_iostats(task, NFS_SERVER(cdata->inode)->client->cl_metrics);
}
-static void filelayout_commit_release(void *calldata)
-{
- struct nfs_commit_data *data = calldata;
-
- data->completion_ops->completion(data);
- pnfs_put_lseg(data->lseg);
- nfs_put_client(data->ds_clp);
- nfs_commitdata_release(data);
-}
-
static const struct rpc_call_ops filelayout_read_call_ops = {
.rpc_call_prepare = filelayout_read_prepare,
.rpc_call_done = filelayout_read_call_done,
.rpc_count_stats = filelayout_read_count_stats,
- .rpc_release = filelayout_read_release,
+ .rpc_release = pnfs_generic_rw_release,
};
static const struct rpc_call_ops filelayout_write_call_ops = {
.rpc_call_prepare = filelayout_write_prepare,
.rpc_call_done = filelayout_write_call_done,
.rpc_count_stats = filelayout_write_count_stats,
- .rpc_release = filelayout_write_release,
+ .rpc_release = pnfs_generic_rw_release,
};
static const struct rpc_call_ops filelayout_commit_call_ops = {
.rpc_call_prepare = filelayout_commit_prepare,
- .rpc_call_done = filelayout_write_commit_done,
+ .rpc_call_done = pnfs_generic_write_commit_done,
.rpc_count_stats = filelayout_commit_count_stats,
- .rpc_release = filelayout_commit_release,
+ .rpc_release = pnfs_generic_commit_release,
};
static enum pnfs_try_status
@@ -1004,33 +948,6 @@ static u32 select_bucket_index(struct nfs4_filelayout_segment *fl, u32 j)
return j;
}
-/* The generic layer is about to remove the req from the commit list.
- * If this will make the bucket empty, it will need to put the lseg reference.
- * Note this is must be called holding the inode (/cinfo) lock
- */
-static void
-filelayout_clear_request_commit(struct nfs_page *req,
- struct nfs_commit_info *cinfo)
-{
- struct pnfs_layout_segment *freeme = NULL;
-
- if (!test_and_clear_bit(PG_COMMIT_TO_DS, &req->wb_flags))
- goto out;
- cinfo->ds->nwritten--;
- if (list_is_singular(&req->wb_list)) {
- struct pnfs_commit_bucket *bucket;
-
- bucket = list_first_entry(&req->wb_list,
- struct pnfs_commit_bucket,
- written);
- freeme = bucket->wlseg;
- bucket->wlseg = NULL;
- }
-out:
- nfs_request_remove_commit_list(req, cinfo);
- pnfs_put_lseg_locked(freeme);
-}
-
static void
filelayout_mark_request_commit(struct nfs_page *req,
struct pnfs_layout_segment *lseg,
@@ -1064,7 +981,7 @@ filelayout_mark_request_commit(struct nfs_page *req,
* is normally transferred to the COMMIT call and released
* there. It could also be released if the last req is pulled
* off due to a rewrite, in which case it will be done in
- * filelayout_clear_request_commit
+ * pnfs_generic_clear_request_commit
*/
buckets[i].wlseg = pnfs_get_lseg(lseg);
}
@@ -1142,97 +1059,11 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
&filelayout_commit_call_ops, how,
RPC_TASK_SOFTCONN);
out_err:
- prepare_to_resend_writes(data);
- filelayout_commit_release(data);
+ pnfs_generic_prepare_to_resend_writes(data);
+ pnfs_generic_commit_release(data);
return -EAGAIN;
}
-static int
-transfer_commit_list(struct list_head *src, struct list_head *dst,
- struct nfs_commit_info *cinfo, int max)
-{
- struct nfs_page *req, *tmp;
- int ret = 0;
-
- list_for_each_entry_safe(req, tmp, src, wb_list) {
- if (!nfs_lock_request(req))
- continue;
- kref_get(&req->wb_kref);
- if (cond_resched_lock(cinfo->lock))
- list_safe_reset_next(req, tmp, wb_list);
- nfs_request_remove_commit_list(req, cinfo);
- clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
- nfs_list_add_request(req, dst);
- ret++;
- if ((ret == max) && !cinfo->dreq)
- break;
- }
- return ret;
-}
-
-/* Note called with cinfo->lock held. */
-static int
-filelayout_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
- struct nfs_commit_info *cinfo,
- int max)
-{
- struct list_head *src = &bucket->written;
- struct list_head *dst = &bucket->committing;
- int ret;
-
- ret = transfer_commit_list(src, dst, cinfo, max);
- if (ret) {
- cinfo->ds->nwritten -= ret;
- cinfo->ds->ncommitting += ret;
- bucket->clseg = bucket->wlseg;
- if (list_empty(src))
- bucket->wlseg = NULL;
- else
- pnfs_get_lseg(bucket->clseg);
- }
- return ret;
-}
-
-/* Move reqs from written to committing lists, returning count of number moved.
- * Note called with cinfo->lock held.
- */
-static int filelayout_scan_commit_lists(struct nfs_commit_info *cinfo,
- int max)
-{
- int i, rv = 0, cnt;
-
- for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
- cnt = filelayout_scan_ds_commit_list(&cinfo->ds->buckets[i],
- cinfo, max);
- max -= cnt;
- rv += cnt;
- }
- return rv;
-}
-
-/* Pull everything off the committing lists and dump into @dst */
-static void filelayout_recover_commit_reqs(struct list_head *dst,
- struct nfs_commit_info *cinfo)
-{
- struct pnfs_commit_bucket *b;
- struct pnfs_layout_segment *freeme;
- int i;
-
-restart:
- spin_lock(cinfo->lock);
- for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
- if (transfer_commit_list(&b->written, dst, cinfo, 0)) {
- freeme = b->wlseg;
- b->wlseg = NULL;
- spin_unlock(cinfo->lock);
- pnfs_put_lseg(freeme);
- goto restart;
- }
- }
- cinfo->ds->nwritten = 0;
- spin_unlock(cinfo->lock);
-}
-
/* filelayout_search_commit_reqs - Search lists in @cinfo for the head reqest
* for @page
* @cinfo - commit info for current inode
@@ -1263,108 +1094,14 @@ filelayout_search_commit_reqs(struct nfs_commit_info *cinfo, struct page *page)
return NULL;
}
-static void filelayout_retry_commit(struct nfs_commit_info *cinfo, int idx)
-{
- struct pnfs_ds_commit_info *fl_cinfo = cinfo->ds;
- struct pnfs_commit_bucket *bucket;
- struct pnfs_layout_segment *freeme;
- int i;
-
- for (i = idx; i < fl_cinfo->nbuckets; i++) {
- bucket = &fl_cinfo->buckets[i];
- if (list_empty(&bucket->committing))
- continue;
- nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
- spin_lock(cinfo->lock);
- freeme = bucket->clseg;
- bucket->clseg = NULL;
- spin_unlock(cinfo->lock);
- pnfs_put_lseg(freeme);
- }
-}
-
-static unsigned int
-alloc_ds_commits(struct nfs_commit_info *cinfo, struct list_head *list)
-{
- struct pnfs_ds_commit_info *fl_cinfo;
- struct pnfs_commit_bucket *bucket;
- struct nfs_commit_data *data;
- int i;
- unsigned int nreq = 0;
-
- fl_cinfo = cinfo->ds;
- bucket = fl_cinfo->buckets;
- for (i = 0; i < fl_cinfo->nbuckets; i++, bucket++) {
- if (list_empty(&bucket->committing))
- continue;
- data = nfs_commitdata_alloc();
- if (!data)
- break;
- data->ds_commit_index = i;
- spin_lock(cinfo->lock);
- data->lseg = bucket->clseg;
- bucket->clseg = NULL;
- spin_unlock(cinfo->lock);
- list_add(&data->pages, list);
- nreq++;
- }
-
- /* Clean up on error */
- filelayout_retry_commit(cinfo, i);
- /* Caller will clean up entries put on list */
- return nreq;
-}
-
-/* This follows nfs_commit_list pretty closely */
static int
filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
int how, struct nfs_commit_info *cinfo)
{
- struct nfs_commit_data *data, *tmp;
- LIST_HEAD(list);
- unsigned int nreq = 0;
-
- if (!list_empty(mds_pages)) {
- data = nfs_commitdata_alloc();
- if (data != NULL) {
- data->lseg = NULL;
- list_add(&data->pages, &list);
- nreq++;
- } else {
- nfs_retry_commit(mds_pages, NULL, cinfo);
- filelayout_retry_commit(cinfo, 0);
- cinfo->completion_ops->error_cleanup(NFS_I(inode));
- return -ENOMEM;
- }
- }
-
- nreq += alloc_ds_commits(cinfo, &list);
-
- if (nreq == 0) {
- cinfo->completion_ops->error_cleanup(NFS_I(inode));
- goto out;
- }
-
- atomic_add(nreq, &cinfo->mds->rpcs_out);
-
- list_for_each_entry_safe(data, tmp, &list, pages) {
- list_del_init(&data->pages);
- if (!data->lseg) {
- nfs_init_commit(data, mds_pages, NULL, cinfo);
- nfs_initiate_commit(NFS_CLIENT(inode), data,
- data->mds_ops, how, 0);
- } else {
- struct pnfs_commit_bucket *buckets;
-
- buckets = cinfo->ds->buckets;
- nfs_init_commit(data, &buckets[data->ds_commit_index].committing, data->lseg, cinfo);
- filelayout_initiate_commit(data, how);
- }
- }
-out:
- cinfo->ds->ncommitting = 0;
- return PNFS_ATTEMPTED;
+ return pnfs_generic_commit_pagelist(inode, mds_pages, how, cinfo,
+ filelayout_initiate_commit);
}
+
static struct nfs4_deviceid_node *
filelayout_alloc_deviceid_node(struct nfs_server *server,
struct pnfs_device *pdev, gfp_t gfp_flags)
@@ -1421,9 +1158,9 @@ static struct pnfs_layoutdriver_type filelayout_type = {
.pg_write_ops = &filelayout_pg_write_ops,
.get_ds_info = &filelayout_get_ds_info,
.mark_request_commit = filelayout_mark_request_commit,
- .clear_request_commit = filelayout_clear_request_commit,
- .scan_commit_lists = filelayout_scan_commit_lists,
- .recover_commit_reqs = filelayout_recover_commit_reqs,
+ .clear_request_commit = pnfs_generic_clear_request_commit,
+ .scan_commit_lists = pnfs_generic_scan_commit_lists,
+ .recover_commit_reqs = pnfs_generic_recover_commit_reqs,
.search_commit_reqs = filelayout_search_commit_reqs,
.commit_pagelist = filelayout_commit_pagelist,
.read_pagelist = filelayout_read_pagelist,
diff --git a/fs/nfs/filelayout/filelayout.h b/fs/nfs/filelayout/filelayout.h
index 7c9f800..a5ce9b4 100644
--- a/fs/nfs/filelayout/filelayout.h
+++ b/fs/nfs/filelayout/filelayout.h
@@ -119,17 +119,6 @@ FILELAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg)
return &FILELAYOUT_LSEG(lseg)->dsaddr->id_node;
}
-static inline void
-filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
-{
- u32 *p = (u32 *)&node->deviceid;
-
- printk(KERN_WARNING "NFS: Deviceid [%x%x%x%x] marked out of use.\n",
- p[0], p[1], p[2], p[3]);
-
- set_bit(NFS_DEVICEID_INVALID, &node->flags);
-}
-
static inline bool
filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
{
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index bfecac7..d21080a 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -708,7 +708,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
if (ds == NULL) {
printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
__func__, ds_idx);
- filelayout_mark_devid_invalid(devid);
+ pnfs_generic_mark_devid_invalid(devid);
goto out;
}
smp_rmb();
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9ae5b76..f176634 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -275,6 +275,23 @@ void nfs4_mark_deviceid_unavailable(struct nfs4_deviceid_node *node);
bool nfs4_test_deviceid_unavailable(struct nfs4_deviceid_node *node);
void nfs4_deviceid_purge_client(const struct nfs_client *);
+/* pnfs_nfs.c */
+void pnfs_generic_clear_request_commit(struct nfs_page *req,
+ struct nfs_commit_info *cinfo);
+void pnfs_generic_commit_release(void *calldata);
+void pnfs_generic_prepare_to_resend_writes(struct nfs_commit_data *data);
+void pnfs_generic_rw_release(void *data);
+void pnfs_generic_recover_commit_reqs(struct list_head *dst,
+ struct nfs_commit_info *cinfo);
+int pnfs_generic_commit_pagelist(struct inode *inode,
+ struct list_head *mds_pages,
+ int how,
+ struct nfs_commit_info *cinfo,
+ int (*initiate_commit)(struct nfs_commit_data *data,
+ int how));
+int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo, int max);
+void pnfs_generic_write_commit_done(struct rpc_task *task, void *data);
+
static inline struct nfs4_deviceid_node *
nfs4_get_deviceid(struct nfs4_deviceid_node *d)
{
@@ -317,6 +334,12 @@ pnfs_get_ds_info(struct inode *inode)
return ld->get_ds_info(inode);
}
+static inline void
+pnfs_generic_mark_devid_invalid(struct nfs4_deviceid_node *node)
+{
+ set_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
static inline bool
pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
struct nfs_commit_info *cinfo)
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
new file mode 100644
index 0000000..e5f841c
--- /dev/null
+++ b/fs/nfs/pnfs_nfs.c
@@ -0,0 +1,291 @@
+/*
+ * Common NFS I/O operations for the pnfs file based
+ * layout drivers.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tom Haynes <[email protected]>
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
+
+#include "internal.h"
+#include "pnfs.h"
+
+static void pnfs_generic_fenceme(struct inode *inode,
+ struct pnfs_layout_hdr *lo)
+{
+ if (!test_and_clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
+ return;
+ pnfs_return_layout(inode);
+}
+
+void pnfs_generic_rw_release(void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+ struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
+
+ pnfs_generic_fenceme(lo->plh_inode, lo);
+ nfs_put_client(hdr->ds_clp);
+ hdr->mds_ops->rpc_release(data);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_rw_release);
+
+/* Fake up some data that will cause nfs_commit_release to retry the writes. */
+void pnfs_generic_prepare_to_resend_writes(struct nfs_commit_data *data)
+{
+ struct nfs_page *first = nfs_list_entry(data->pages.next);
+
+ data->task.tk_status = 0;
+ memcpy(&data->verf.verifier, &first->wb_verf,
+ sizeof(data->verf.verifier));
+ data->verf.verifier.data[0]++; /* ensure verifier mismatch */
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_prepare_to_resend_writes);
+
+void pnfs_generic_write_commit_done(struct rpc_task *task, void *data)
+{
+ struct nfs_commit_data *wdata = data;
+
+ /* Note this may cause RPC to be resent */
+ wdata->mds_ops->rpc_call_done(task, data);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_write_commit_done);
+
+void pnfs_generic_commit_release(void *calldata)
+{
+ struct nfs_commit_data *data = calldata;
+
+ data->completion_ops->completion(data);
+ pnfs_put_lseg(data->lseg);
+ nfs_put_client(data->ds_clp);
+ nfs_commitdata_release(data);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_commit_release);
+
+/* The generic layer is about to remove the req from the commit list.
+ * If this will make the bucket empty, it will need to put the lseg reference.
+ * Note this is must be called holding the inode (/cinfo) lock
+ */
+void
+pnfs_generic_clear_request_commit(struct nfs_page *req,
+ struct nfs_commit_info *cinfo)
+{
+ struct pnfs_layout_segment *freeme = NULL;
+
+ if (!test_and_clear_bit(PG_COMMIT_TO_DS, &req->wb_flags))
+ goto out;
+ cinfo->ds->nwritten--;
+ if (list_is_singular(&req->wb_list)) {
+ struct pnfs_commit_bucket *bucket;
+
+ bucket = list_first_entry(&req->wb_list,
+ struct pnfs_commit_bucket,
+ written);
+ freeme = bucket->wlseg;
+ bucket->wlseg = NULL;
+ }
+out:
+ nfs_request_remove_commit_list(req, cinfo);
+ pnfs_put_lseg_locked(freeme);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_clear_request_commit);
+
+static int
+pnfs_generic_transfer_commit_list(struct list_head *src, struct list_head *dst,
+ struct nfs_commit_info *cinfo, int max)
+{
+ struct nfs_page *req, *tmp;
+ int ret = 0;
+
+ list_for_each_entry_safe(req, tmp, src, wb_list) {
+ if (!nfs_lock_request(req))
+ continue;
+ kref_get(&req->wb_kref);
+ if (cond_resched_lock(cinfo->lock))
+ list_safe_reset_next(req, tmp, wb_list);
+ nfs_request_remove_commit_list(req, cinfo);
+ clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
+ nfs_list_add_request(req, dst);
+ ret++;
+ if ((ret == max) && !cinfo->dreq)
+ break;
+ }
+ return ret;
+}
+
+/* Note called with cinfo->lock held. */
+static int
+pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
+ struct nfs_commit_info *cinfo,
+ int max)
+{
+ struct list_head *src = &bucket->written;
+ struct list_head *dst = &bucket->committing;
+ int ret;
+
+ ret = pnfs_generic_transfer_commit_list(src, dst, cinfo, max);
+ if (ret) {
+ cinfo->ds->nwritten -= ret;
+ cinfo->ds->ncommitting += ret;
+ bucket->clseg = bucket->wlseg;
+ if (list_empty(src))
+ bucket->wlseg = NULL;
+ else
+ pnfs_get_lseg(bucket->clseg);
+ }
+ return ret;
+}
+
+/* Move reqs from written to committing lists, returning count of number moved.
+ * Note called with cinfo->lock held.
+ */
+int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
+ int max)
+{
+ int i, rv = 0, cnt;
+
+ for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
+ cnt = pnfs_generic_scan_ds_commit_list(&cinfo->ds->buckets[i],
+ cinfo, max);
+ max -= cnt;
+ rv += cnt;
+ }
+ return rv;
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_scan_commit_lists);
+
+/* Pull everything off the committing lists and dump into @dst */
+void pnfs_generic_recover_commit_reqs(struct list_head *dst,
+ struct nfs_commit_info *cinfo)
+{
+ struct pnfs_commit_bucket *b;
+ struct pnfs_layout_segment *freeme;
+ int i;
+
+restart:
+ spin_lock(cinfo->lock);
+ for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
+ if (pnfs_generic_transfer_commit_list(&b->written, dst,
+ cinfo, 0)) {
+ freeme = b->wlseg;
+ b->wlseg = NULL;
+ spin_unlock(cinfo->lock);
+ pnfs_put_lseg(freeme);
+ goto restart;
+ }
+ }
+ cinfo->ds->nwritten = 0;
+ spin_unlock(cinfo->lock);
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_recover_commit_reqs);
+
+static void pnfs_generic_retry_commit(struct nfs_commit_info *cinfo, int idx)
+{
+ struct pnfs_ds_commit_info *fl_cinfo = cinfo->ds;
+ struct pnfs_commit_bucket *bucket;
+ struct pnfs_layout_segment *freeme;
+ int i;
+
+ for (i = idx; i < fl_cinfo->nbuckets; i++) {
+ bucket = &fl_cinfo->buckets[i];
+ if (list_empty(&bucket->committing))
+ continue;
+ nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
+ spin_lock(cinfo->lock);
+ freeme = bucket->clseg;
+ bucket->clseg = NULL;
+ spin_unlock(cinfo->lock);
+ pnfs_put_lseg(freeme);
+ }
+}
+
+static unsigned int
+pnfs_generic_alloc_ds_commits(struct nfs_commit_info *cinfo,
+ struct list_head *list)
+{
+ struct pnfs_ds_commit_info *fl_cinfo;
+ struct pnfs_commit_bucket *bucket;
+ struct nfs_commit_data *data;
+ int i;
+ unsigned int nreq = 0;
+
+ fl_cinfo = cinfo->ds;
+ bucket = fl_cinfo->buckets;
+ for (i = 0; i < fl_cinfo->nbuckets; i++, bucket++) {
+ if (list_empty(&bucket->committing))
+ continue;
+ data = nfs_commitdata_alloc();
+ if (!data)
+ break;
+ data->ds_commit_index = i;
+ spin_lock(cinfo->lock);
+ data->lseg = bucket->clseg;
+ bucket->clseg = NULL;
+ spin_unlock(cinfo->lock);
+ list_add(&data->pages, list);
+ nreq++;
+ }
+
+ /* Clean up on error */
+ pnfs_generic_retry_commit(cinfo, i);
+ return nreq;
+}
+
+/* This follows nfs_commit_list pretty closely */
+int
+pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
+ int how, struct nfs_commit_info *cinfo,
+ int (*initiate_commit)(struct nfs_commit_data *data,
+ int how))
+{
+ struct nfs_commit_data *data, *tmp;
+ LIST_HEAD(list);
+ unsigned int nreq = 0;
+
+ if (!list_empty(mds_pages)) {
+ data = nfs_commitdata_alloc();
+ if (data != NULL) {
+ data->lseg = NULL;
+ list_add(&data->pages, &list);
+ nreq++;
+ } else {
+ nfs_retry_commit(mds_pages, NULL, cinfo);
+ pnfs_generic_retry_commit(cinfo, 0);
+ cinfo->completion_ops->error_cleanup(NFS_I(inode));
+ return -ENOMEM;
+ }
+ }
+
+ nreq += pnfs_generic_alloc_ds_commits(cinfo, &list);
+
+ if (nreq == 0) {
+ cinfo->completion_ops->error_cleanup(NFS_I(inode));
+ goto out;
+ }
+
+ atomic_add(nreq, &cinfo->mds->rpcs_out);
+
+ list_for_each_entry_safe(data, tmp, &list, pages) {
+ list_del_init(&data->pages);
+ if (!data->lseg) {
+ nfs_init_commit(data, mds_pages, NULL, cinfo);
+ nfs_initiate_commit(NFS_CLIENT(inode), data,
+ data->mds_ops, how, 0);
+ } else {
+ struct pnfs_commit_bucket *buckets;
+
+ buckets = cinfo->ds->buckets;
+ nfs_init_commit(data,
+ &buckets[data->ds_commit_index].committing,
+ data->lseg,
+ cinfo);
+ initiate_commit(data, how);
+ }
+ }
+out:
+ cinfo->ds->ncommitting = 0;
+ return PNFS_ATTEMPTED;
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_commit_pagelist);
--
1.9.3
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/direct.c | 19 +++++++++++++++----
fs/nfs/pnfs.h | 15 ---------------
fs/nfs/pnfs_nfs.c | 15 ++++++++-------
3 files changed, 23 insertions(+), 26 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 10bf072..e84f764 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -573,6 +573,20 @@ out:
return result;
}
+static void
+nfs_direct_write_scan_commit_list(struct inode *inode,
+ struct list_head *list,
+ struct nfs_commit_info *cinfo)
+{
+ spin_lock(cinfo->lock);
+#ifdef CONFIG_NFS_V4_1
+ if (cinfo->ds != NULL && cinfo->ds->nwritten != 0)
+ NFS_SERVER(inode)->pnfs_curr_ld->recover_commit_reqs(list, cinfo);
+#endif
+ nfs_scan_commit_list(&cinfo->mds->list, list, cinfo, 0);
+ spin_unlock(cinfo->lock);
+}
+
static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
{
struct nfs_pageio_descriptor desc;
@@ -582,10 +596,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
LIST_HEAD(failed);
nfs_init_cinfo_from_dreq(&cinfo, dreq);
- pnfs_recover_commit_reqs(dreq->inode, &reqs, &cinfo);
- spin_lock(cinfo.lock);
- nfs_scan_commit_list(&cinfo.mds->list, &reqs, &cinfo, 0);
- spin_unlock(cinfo.lock);
+ nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo);
dreq->count = 0;
get_dreq(dreq);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f176634..e94f605 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -375,15 +375,6 @@ pnfs_scan_commit_lists(struct inode *inode, struct nfs_commit_info *cinfo,
return NFS_SERVER(inode)->pnfs_curr_ld->scan_commit_lists(cinfo, max);
}
-static inline void
-pnfs_recover_commit_reqs(struct inode *inode, struct list_head *list,
- struct nfs_commit_info *cinfo)
-{
- if (cinfo->ds == NULL || cinfo->ds->nwritten == 0)
- return;
- NFS_SERVER(inode)->pnfs_curr_ld->recover_commit_reqs(list, cinfo);
-}
-
static inline struct nfs_page *
pnfs_search_commit_reqs(struct inode *inode, struct nfs_commit_info *cinfo,
struct page *page)
@@ -554,12 +545,6 @@ pnfs_scan_commit_lists(struct inode *inode, struct nfs_commit_info *cinfo,
return 0;
}
-static inline void
-pnfs_recover_commit_reqs(struct inode *inode, struct list_head *list,
- struct nfs_commit_info *cinfo)
-{
-}
-
static inline struct nfs_page *
pnfs_search_commit_reqs(struct inode *inode, struct nfs_commit_info *cinfo,
struct page *page)
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index e5f841c..fd2a2f0 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -66,7 +66,7 @@ EXPORT_SYMBOL_GPL(pnfs_generic_commit_release);
/* The generic layer is about to remove the req from the commit list.
* If this will make the bucket empty, it will need to put the lseg reference.
- * Note this is must be called holding the inode (/cinfo) lock
+ * Note this must be called holding the inode (/cinfo) lock
*/
void
pnfs_generic_clear_request_commit(struct nfs_page *req,
@@ -115,7 +115,6 @@ pnfs_generic_transfer_commit_list(struct list_head *src, struct list_head *dst,
return ret;
}
-/* Note called with cinfo->lock held. */
static int
pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
struct nfs_commit_info *cinfo,
@@ -125,6 +124,7 @@ pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
struct list_head *dst = &bucket->committing;
int ret;
+ lockdep_assert_held(cinfo->lock);
ret = pnfs_generic_transfer_commit_list(src, dst, cinfo, max);
if (ret) {
cinfo->ds->nwritten -= ret;
@@ -138,14 +138,15 @@ pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
return ret;
}
-/* Move reqs from written to committing lists, returning count of number moved.
- * Note called with cinfo->lock held.
+/* Move reqs from written to committing lists, returning count
+ * of number moved.
*/
int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
int max)
{
int i, rv = 0, cnt;
+ lockdep_assert_held(cinfo->lock);
for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
cnt = pnfs_generic_scan_ds_commit_list(&cinfo->ds->buckets[i],
cinfo, max);
@@ -156,7 +157,7 @@ int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
}
EXPORT_SYMBOL_GPL(pnfs_generic_scan_commit_lists);
-/* Pull everything off the committing lists and dump into @dst */
+/* Pull everything off the committing lists and dump into @dst. */
void pnfs_generic_recover_commit_reqs(struct list_head *dst,
struct nfs_commit_info *cinfo)
{
@@ -164,8 +165,8 @@ void pnfs_generic_recover_commit_reqs(struct list_head *dst,
struct pnfs_layout_segment *freeme;
int i;
+ lockdep_assert_held(cinfo->lock);
restart:
- spin_lock(cinfo->lock);
for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
if (pnfs_generic_transfer_commit_list(&b->written, dst,
cinfo, 0)) {
@@ -173,11 +174,11 @@ restart:
b->wlseg = NULL;
spin_unlock(cinfo->lock);
pnfs_put_lseg(freeme);
+ spin_lock(cinfo->lock);
goto restart;
}
}
cinfo->ds->nwritten = 0;
- spin_unlock(cinfo->lock);
}
EXPORT_SYMBOL_GPL(pnfs_generic_recover_commit_reqs);
--
1.9.3
From: Peng Tao <[email protected]>
Also pull nfs4_pnfs_ds_addr and nfs4_pnfs_ds to generic pnfs.
They can all be reused by flexfile layout as well.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.h | 19 ---
fs/nfs/filelayout/filelayoutdev.c | 235 +-----------------------------------
fs/nfs/pnfs.h | 21 ++++
fs/nfs/pnfs_nfs.c | 242 ++++++++++++++++++++++++++++++++++++++
4 files changed, 265 insertions(+), 252 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.h b/fs/nfs/filelayout/filelayout.h
index a5ce9b4..f97eea6 100644
--- a/fs/nfs/filelayout/filelayout.h
+++ b/fs/nfs/filelayout/filelayout.h
@@ -56,24 +56,6 @@ enum stripetype4 {
STRIPE_DENSE = 2
};
-/* Individual ip address */
-struct nfs4_pnfs_ds_addr {
- struct sockaddr_storage da_addr;
- size_t da_addrlen;
- struct list_head da_node; /* nfs4_pnfs_dev_hlist dev_dslist */
- char *da_remotestr; /* human readable addr+port */
-};
-
-struct nfs4_pnfs_ds {
- struct list_head ds_node; /* nfs4_pnfs_dev_hlist dev_dslist */
- char *ds_remotestr; /* comma sep list of addrs */
- struct list_head ds_addrs;
- struct nfs_client *ds_clp;
- atomic_t ds_count;
- unsigned long ds_state;
-#define NFS4DS_CONNECTING 0 /* ds is establishing connection */
-};
-
struct nfs4_file_layout_dsaddr {
struct nfs4_deviceid_node id_node;
u32 stripe_count;
@@ -131,7 +113,6 @@ filelayout_test_devid_unavailable(struct nfs4_deviceid_node *node);
extern struct nfs_fh *
nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
-extern void print_ds(struct nfs4_pnfs_ds *ds);
u32 nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset);
u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j);
struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index d21080a..fbfbb70 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -43,114 +43,6 @@ static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
/*
- * Data server cache
- *
- * Data servers can be mapped to different device ids.
- * nfs4_pnfs_ds reference counting
- * - set to 1 on allocation
- * - incremented when a device id maps a data server already in the cache.
- * - decremented when deviceid is removed from the cache.
- */
-static DEFINE_SPINLOCK(nfs4_ds_cache_lock);
-static LIST_HEAD(nfs4_data_server_cache);
-
-/* Debug routines */
-void
-print_ds(struct nfs4_pnfs_ds *ds)
-{
- if (ds == NULL) {
- printk("%s NULL device\n", __func__);
- return;
- }
- printk(" ds %s\n"
- " ref count %d\n"
- " client %p\n"
- " cl_exchange_flags %x\n",
- ds->ds_remotestr,
- atomic_read(&ds->ds_count), ds->ds_clp,
- ds->ds_clp ? ds->ds_clp->cl_exchange_flags : 0);
-}
-
-static bool
-same_sockaddr(struct sockaddr *addr1, struct sockaddr *addr2)
-{
- struct sockaddr_in *a, *b;
- struct sockaddr_in6 *a6, *b6;
-
- if (addr1->sa_family != addr2->sa_family)
- return false;
-
- switch (addr1->sa_family) {
- case AF_INET:
- a = (struct sockaddr_in *)addr1;
- b = (struct sockaddr_in *)addr2;
-
- if (a->sin_addr.s_addr == b->sin_addr.s_addr &&
- a->sin_port == b->sin_port)
- return true;
- break;
-
- case AF_INET6:
- a6 = (struct sockaddr_in6 *)addr1;
- b6 = (struct sockaddr_in6 *)addr2;
-
- /* LINKLOCAL addresses must have matching scope_id */
- if (ipv6_addr_src_scope(&a6->sin6_addr) ==
- IPV6_ADDR_SCOPE_LINKLOCAL &&
- a6->sin6_scope_id != b6->sin6_scope_id)
- return false;
-
- if (ipv6_addr_equal(&a6->sin6_addr, &b6->sin6_addr) &&
- a6->sin6_port == b6->sin6_port)
- return true;
- break;
-
- default:
- dprintk("%s: unhandled address family: %u\n",
- __func__, addr1->sa_family);
- return false;
- }
-
- return false;
-}
-
-static bool
-_same_data_server_addrs_locked(const struct list_head *dsaddrs1,
- const struct list_head *dsaddrs2)
-{
- struct nfs4_pnfs_ds_addr *da1, *da2;
-
- /* step through both lists, comparing as we go */
- for (da1 = list_first_entry(dsaddrs1, typeof(*da1), da_node),
- da2 = list_first_entry(dsaddrs2, typeof(*da2), da_node);
- da1 != NULL && da2 != NULL;
- da1 = list_entry(da1->da_node.next, typeof(*da1), da_node),
- da2 = list_entry(da2->da_node.next, typeof(*da2), da_node)) {
- if (!same_sockaddr((struct sockaddr *)&da1->da_addr,
- (struct sockaddr *)&da2->da_addr))
- return false;
- }
- if (da1 == NULL && da2 == NULL)
- return true;
-
- return false;
-}
-
-/*
- * Lookup DS by addresses. nfs4_ds_cache_lock is held
- */
-static struct nfs4_pnfs_ds *
-_data_server_lookup_locked(const struct list_head *dsaddrs)
-{
- struct nfs4_pnfs_ds *ds;
-
- list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
- if (_same_data_server_addrs_locked(&ds->ds_addrs, dsaddrs))
- return ds;
- return NULL;
-}
-
-/*
* Create an rpc connection to the nfs4_pnfs_ds data server
* Currently only supports IPv4 and IPv6 addresses
*/
@@ -195,30 +87,6 @@ out_put:
goto out;
}
-static void
-destroy_ds(struct nfs4_pnfs_ds *ds)
-{
- struct nfs4_pnfs_ds_addr *da;
-
- dprintk("--> %s\n", __func__);
- ifdebug(FACILITY)
- print_ds(ds);
-
- nfs_put_client(ds->ds_clp);
-
- while (!list_empty(&ds->ds_addrs)) {
- da = list_first_entry(&ds->ds_addrs,
- struct nfs4_pnfs_ds_addr,
- da_node);
- list_del_init(&da->da_node);
- kfree(da->da_remotestr);
- kfree(da);
- }
-
- kfree(ds->ds_remotestr);
- kfree(ds);
-}
-
void
nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
{
@@ -229,113 +97,14 @@ nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
for (i = 0; i < dsaddr->ds_num; i++) {
ds = dsaddr->ds_list[i];
- if (ds != NULL) {
- if (atomic_dec_and_lock(&ds->ds_count,
- &nfs4_ds_cache_lock)) {
- list_del_init(&ds->ds_node);
- spin_unlock(&nfs4_ds_cache_lock);
- destroy_ds(ds);
- }
- }
+ if (ds != NULL)
+ nfs4_pnfs_ds_put(ds);
}
kfree(dsaddr->stripe_indices);
kfree(dsaddr);
}
/*
- * Create a string with a human readable address and port to avoid
- * complicated setup around many dprinks.
- */
-static char *
-nfs4_pnfs_remotestr(struct list_head *dsaddrs, gfp_t gfp_flags)
-{
- struct nfs4_pnfs_ds_addr *da;
- char *remotestr;
- size_t len;
- char *p;
-
- len = 3; /* '{', '}' and eol */
- list_for_each_entry(da, dsaddrs, da_node) {
- len += strlen(da->da_remotestr) + 1; /* string plus comma */
- }
-
- remotestr = kzalloc(len, gfp_flags);
- if (!remotestr)
- return NULL;
-
- p = remotestr;
- *(p++) = '{';
- len--;
- list_for_each_entry(da, dsaddrs, da_node) {
- size_t ll = strlen(da->da_remotestr);
-
- if (ll > len)
- goto out_err;
-
- memcpy(p, da->da_remotestr, ll);
- p += ll;
- len -= ll;
-
- if (len < 1)
- goto out_err;
- (*p++) = ',';
- len--;
- }
- if (len < 2)
- goto out_err;
- *(p++) = '}';
- *p = '\0';
- return remotestr;
-out_err:
- kfree(remotestr);
- return NULL;
-}
-
-static struct nfs4_pnfs_ds *
-nfs4_pnfs_ds_add(struct list_head *dsaddrs, gfp_t gfp_flags)
-{
- struct nfs4_pnfs_ds *tmp_ds, *ds = NULL;
- char *remotestr;
-
- if (list_empty(dsaddrs)) {
- dprintk("%s: no addresses defined\n", __func__);
- goto out;
- }
-
- ds = kzalloc(sizeof(*ds), gfp_flags);
- if (!ds)
- goto out;
-
- /* this is only used for debugging, so it's ok if its NULL */
- remotestr = nfs4_pnfs_remotestr(dsaddrs, gfp_flags);
-
- spin_lock(&nfs4_ds_cache_lock);
- tmp_ds = _data_server_lookup_locked(dsaddrs);
- if (tmp_ds == NULL) {
- INIT_LIST_HEAD(&ds->ds_addrs);
- list_splice_init(dsaddrs, &ds->ds_addrs);
- ds->ds_remotestr = remotestr;
- atomic_set(&ds->ds_count, 1);
- INIT_LIST_HEAD(&ds->ds_node);
- ds->ds_clp = NULL;
- list_add(&ds->ds_node, &nfs4_data_server_cache);
- dprintk("%s add new data server %s\n", __func__,
- ds->ds_remotestr);
- } else {
- kfree(remotestr);
- kfree(ds);
- atomic_inc(&tmp_ds->ds_count);
- dprintk("%s data server %s found, inc'ed ds_count to %d\n",
- __func__, tmp_ds->ds_remotestr,
- atomic_read(&tmp_ds->ds_count));
- ds = tmp_ds;
- }
- spin_unlock(&nfs4_ds_cache_lock);
-out:
- return ds;
-}
-
-/*
* Currently only supports ipv4, ipv6 and one multi-path address.
*/
static struct nfs4_pnfs_ds_addr *
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index e94f605..b0168f1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -40,6 +40,24 @@ enum {
NFS_LSEG_LAYOUTCOMMIT, /* layoutcommit bit set for layoutcommit */
};
+/* Individual ip address */
+struct nfs4_pnfs_ds_addr {
+ struct sockaddr_storage da_addr;
+ size_t da_addrlen;
+ struct list_head da_node; /* nfs4_pnfs_dev_hlist dev_dslist */
+ char *da_remotestr; /* human readable addr+port */
+};
+
+struct nfs4_pnfs_ds {
+ struct list_head ds_node; /* nfs4_pnfs_dev_hlist dev_dslist */
+ char *ds_remotestr; /* comma sep list of addrs */
+ struct list_head ds_addrs;
+ struct nfs_client *ds_clp;
+ atomic_t ds_count;
+ unsigned long ds_state;
+#define NFS4DS_CONNECTING 0 /* ds is establishing connection */
+};
+
struct pnfs_layout_segment {
struct list_head pls_list;
struct list_head pls_lc_list;
@@ -291,6 +309,9 @@ int pnfs_generic_commit_pagelist(struct inode *inode,
int how));
int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo, int max);
void pnfs_generic_write_commit_done(struct rpc_task *task, void *data);
+void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
+struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
+ gfp_t gfp_flags);
static inline struct nfs4_deviceid_node *
nfs4_get_deviceid(struct nfs4_deviceid_node *d)
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index fd2a2f0..3bb2b74 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -13,6 +13,8 @@
#include "internal.h"
#include "pnfs.h"
+#define NFSDBG_FACILITY NFSDBG_PNFS
+
static void pnfs_generic_fenceme(struct inode *inode,
struct pnfs_layout_hdr *lo)
{
@@ -290,3 +292,243 @@ out:
return PNFS_ATTEMPTED;
}
EXPORT_SYMBOL_GPL(pnfs_generic_commit_pagelist);
+
+/*
+ * Data server cache
+ *
+ * Data servers can be mapped to different device ids.
+ * nfs4_pnfs_ds reference counting
+ * - set to 1 on allocation
+ * - incremented when a device id maps a data server already in the cache.
+ * - decremented when deviceid is removed from the cache.
+ */
+static DEFINE_SPINLOCK(nfs4_ds_cache_lock);
+static LIST_HEAD(nfs4_data_server_cache);
+
+/* Debug routines */
+static void
+print_ds(struct nfs4_pnfs_ds *ds)
+{
+ if (ds == NULL) {
+ printk(KERN_WARNING "%s NULL device\n", __func__);
+ return;
+ }
+ printk(KERN_WARNING " ds %s\n"
+ " ref count %d\n"
+ " client %p\n"
+ " cl_exchange_flags %x\n",
+ ds->ds_remotestr,
+ atomic_read(&ds->ds_count), ds->ds_clp,
+ ds->ds_clp ? ds->ds_clp->cl_exchange_flags : 0);
+}
+
+static bool
+same_sockaddr(struct sockaddr *addr1, struct sockaddr *addr2)
+{
+ struct sockaddr_in *a, *b;
+ struct sockaddr_in6 *a6, *b6;
+
+ if (addr1->sa_family != addr2->sa_family)
+ return false;
+
+ switch (addr1->sa_family) {
+ case AF_INET:
+ a = (struct sockaddr_in *)addr1;
+ b = (struct sockaddr_in *)addr2;
+
+ if (a->sin_addr.s_addr == b->sin_addr.s_addr &&
+ a->sin_port == b->sin_port)
+ return true;
+ break;
+
+ case AF_INET6:
+ a6 = (struct sockaddr_in6 *)addr1;
+ b6 = (struct sockaddr_in6 *)addr2;
+
+ /* LINKLOCAL addresses must have matching scope_id */
+ if (ipv6_addr_src_scope(&a6->sin6_addr) ==
+ IPV6_ADDR_SCOPE_LINKLOCAL &&
+ a6->sin6_scope_id != b6->sin6_scope_id)
+ return false;
+
+ if (ipv6_addr_equal(&a6->sin6_addr, &b6->sin6_addr) &&
+ a6->sin6_port == b6->sin6_port)
+ return true;
+ break;
+
+ default:
+ dprintk("%s: unhandled address family: %u\n",
+ __func__, addr1->sa_family);
+ return false;
+ }
+
+ return false;
+}
+
+static bool
+_same_data_server_addrs_locked(const struct list_head *dsaddrs1,
+ const struct list_head *dsaddrs2)
+{
+ struct nfs4_pnfs_ds_addr *da1, *da2;
+
+ /* step through both lists, comparing as we go */
+ for (da1 = list_first_entry(dsaddrs1, typeof(*da1), da_node),
+ da2 = list_first_entry(dsaddrs2, typeof(*da2), da_node);
+ da1 != NULL && da2 != NULL;
+ da1 = list_entry(da1->da_node.next, typeof(*da1), da_node),
+ da2 = list_entry(da2->da_node.next, typeof(*da2), da_node)) {
+ if (!same_sockaddr((struct sockaddr *)&da1->da_addr,
+ (struct sockaddr *)&da2->da_addr))
+ return false;
+ }
+ if (da1 == NULL && da2 == NULL)
+ return true;
+
+ return false;
+}
+
+/*
+ * Lookup DS by addresses. nfs4_ds_cache_lock is held
+ */
+static struct nfs4_pnfs_ds *
+_data_server_lookup_locked(const struct list_head *dsaddrs)
+{
+ struct nfs4_pnfs_ds *ds;
+
+ list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
+ if (_same_data_server_addrs_locked(&ds->ds_addrs, dsaddrs))
+ return ds;
+ return NULL;
+}
+
+static void destroy_ds(struct nfs4_pnfs_ds *ds)
+{
+ struct nfs4_pnfs_ds_addr *da;
+
+ dprintk("--> %s\n", __func__);
+ ifdebug(FACILITY)
+ print_ds(ds);
+
+ nfs_put_client(ds->ds_clp);
+
+ while (!list_empty(&ds->ds_addrs)) {
+ da = list_first_entry(&ds->ds_addrs,
+ struct nfs4_pnfs_ds_addr,
+ da_node);
+ list_del_init(&da->da_node);
+ kfree(da->da_remotestr);
+ kfree(da);
+ }
+
+ kfree(ds->ds_remotestr);
+ kfree(ds);
+}
+
+void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds)
+{
+ if (atomic_dec_and_lock(&ds->ds_count,
+ &nfs4_ds_cache_lock)) {
+ list_del_init(&ds->ds_node);
+ spin_unlock(&nfs4_ds_cache_lock);
+ destroy_ds(ds);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_put);
+
+/*
+ * Create a string with a human readable address and port to avoid
+ * complicated setup around many dprinks.
+ */
+static char *
+nfs4_pnfs_remotestr(struct list_head *dsaddrs, gfp_t gfp_flags)
+{
+ struct nfs4_pnfs_ds_addr *da;
+ char *remotestr;
+ size_t len;
+ char *p;
+
+ len = 3; /* '{', '}' and eol */
+ list_for_each_entry(da, dsaddrs, da_node) {
+ len += strlen(da->da_remotestr) + 1; /* string plus comma */
+ }
+
+ remotestr = kzalloc(len, gfp_flags);
+ if (!remotestr)
+ return NULL;
+
+ p = remotestr;
+ *(p++) = '{';
+ len--;
+ list_for_each_entry(da, dsaddrs, da_node) {
+ size_t ll = strlen(da->da_remotestr);
+
+ if (ll > len)
+ goto out_err;
+
+ memcpy(p, da->da_remotestr, ll);
+ p += ll;
+ len -= ll;
+
+ if (len < 1)
+ goto out_err;
+ (*p++) = ',';
+ len--;
+ }
+ if (len < 2)
+ goto out_err;
+ *(p++) = '}';
+ *p = '\0';
+ return remotestr;
+out_err:
+ kfree(remotestr);
+ return NULL;
+}
+
+/*
+ * Given a list of multipath struct nfs4_pnfs_ds_addr, add it to ds cache if
+ * uncached and return cached struct nfs4_pnfs_ds.
+ */
+struct nfs4_pnfs_ds *
+nfs4_pnfs_ds_add(struct list_head *dsaddrs, gfp_t gfp_flags)
+{
+ struct nfs4_pnfs_ds *tmp_ds, *ds = NULL;
+ char *remotestr;
+
+ if (list_empty(dsaddrs)) {
+ dprintk("%s: no addresses defined\n", __func__);
+ goto out;
+ }
+
+ ds = kzalloc(sizeof(*ds), gfp_flags);
+ if (!ds)
+ goto out;
+
+ /* this is only used for debugging, so it's ok if its NULL */
+ remotestr = nfs4_pnfs_remotestr(dsaddrs, gfp_flags);
+
+ spin_lock(&nfs4_ds_cache_lock);
+ tmp_ds = _data_server_lookup_locked(dsaddrs);
+ if (tmp_ds == NULL) {
+ INIT_LIST_HEAD(&ds->ds_addrs);
+ list_splice_init(dsaddrs, &ds->ds_addrs);
+ ds->ds_remotestr = remotestr;
+ atomic_set(&ds->ds_count, 1);
+ INIT_LIST_HEAD(&ds->ds_node);
+ ds->ds_clp = NULL;
+ list_add(&ds->ds_node, &nfs4_data_server_cache);
+ dprintk("%s add new data server %s\n", __func__,
+ ds->ds_remotestr);
+ } else {
+ kfree(remotestr);
+ kfree(ds);
+ atomic_inc(&tmp_ds->ds_count);
+ dprintk("%s data server %s found, inc'ed ds_count to %d\n",
+ __func__, tmp_ds->ds_remotestr,
+ atomic_read(&tmp_ds->ds_count));
+ ds = tmp_ds;
+ }
+ spin_unlock(&nfs4_ds_cache_lock);
+out:
+ return ds;
+}
+EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_add);
--
1.9.3
From: Peng Tao <[email protected]>
It can be reused by flexfile layout.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 152 +-------------------------------------
fs/nfs/pnfs.h | 3 +
fs/nfs/pnfs_nfs.c | 149 +++++++++++++++++++++++++++++++++++++
3 files changed, 154 insertions(+), 150 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index fbfbb70..c7f6041 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -31,7 +31,6 @@
#include <linux/nfs_fs.h>
#include <linux/vmalloc.h>
#include <linux/module.h>
-#include <linux/sunrpc/addr.h>
#include "../internal.h"
#include "../nfs4session.h"
@@ -104,153 +103,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
kfree(dsaddr);
}
-/*
- * Currently only supports ipv4, ipv6 and one multi-path address.
- */
-static struct nfs4_pnfs_ds_addr *
-decode_ds_addr(struct net *net, struct xdr_stream *streamp, gfp_t gfp_flags)
-{
- struct nfs4_pnfs_ds_addr *da = NULL;
- char *buf, *portstr;
- __be16 port;
- int nlen, rlen;
- int tmp[2];
- __be32 *p;
- char *netid, *match_netid;
- size_t len, match_netid_len;
- char *startsep = "";
- char *endsep = "";
-
-
- /* r_netid */
- p = xdr_inline_decode(streamp, 4);
- if (unlikely(!p))
- goto out_err;
- nlen = be32_to_cpup(p++);
-
- p = xdr_inline_decode(streamp, nlen);
- if (unlikely(!p))
- goto out_err;
-
- netid = kmalloc(nlen+1, gfp_flags);
- if (unlikely(!netid))
- goto out_err;
-
- netid[nlen] = '\0';
- memcpy(netid, p, nlen);
-
- /* r_addr: ip/ip6addr with port in dec octets - see RFC 5665 */
- p = xdr_inline_decode(streamp, 4);
- if (unlikely(!p))
- goto out_free_netid;
- rlen = be32_to_cpup(p);
-
- p = xdr_inline_decode(streamp, rlen);
- if (unlikely(!p))
- goto out_free_netid;
-
- /* port is ".ABC.DEF", 8 chars max */
- if (rlen > INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN + 8) {
- dprintk("%s: Invalid address, length %d\n", __func__,
- rlen);
- goto out_free_netid;
- }
- buf = kmalloc(rlen + 1, gfp_flags);
- if (!buf) {
- dprintk("%s: Not enough memory\n", __func__);
- goto out_free_netid;
- }
- buf[rlen] = '\0';
- memcpy(buf, p, rlen);
-
- /* replace port '.' with '-' */
- portstr = strrchr(buf, '.');
- if (!portstr) {
- dprintk("%s: Failed finding expected dot in port\n",
- __func__);
- goto out_free_buf;
- }
- *portstr = '-';
-
- /* find '.' between address and port */
- portstr = strrchr(buf, '.');
- if (!portstr) {
- dprintk("%s: Failed finding expected dot between address and "
- "port\n", __func__);
- goto out_free_buf;
- }
- *portstr = '\0';
-
- da = kzalloc(sizeof(*da), gfp_flags);
- if (unlikely(!da))
- goto out_free_buf;
-
- INIT_LIST_HEAD(&da->da_node);
-
- if (!rpc_pton(net, buf, portstr-buf, (struct sockaddr *)&da->da_addr,
- sizeof(da->da_addr))) {
- dprintk("%s: error parsing address %s\n", __func__, buf);
- goto out_free_da;
- }
-
- portstr++;
- sscanf(portstr, "%d-%d", &tmp[0], &tmp[1]);
- port = htons((tmp[0] << 8) | (tmp[1]));
-
- switch (da->da_addr.ss_family) {
- case AF_INET:
- ((struct sockaddr_in *)&da->da_addr)->sin_port = port;
- da->da_addrlen = sizeof(struct sockaddr_in);
- match_netid = "tcp";
- match_netid_len = 3;
- break;
-
- case AF_INET6:
- ((struct sockaddr_in6 *)&da->da_addr)->sin6_port = port;
- da->da_addrlen = sizeof(struct sockaddr_in6);
- match_netid = "tcp6";
- match_netid_len = 4;
- startsep = "[";
- endsep = "]";
- break;
-
- default:
- dprintk("%s: unsupported address family: %u\n",
- __func__, da->da_addr.ss_family);
- goto out_free_da;
- }
-
- if (nlen != match_netid_len || strncmp(netid, match_netid, nlen)) {
- dprintk("%s: ERROR: r_netid \"%s\" != \"%s\"\n",
- __func__, netid, match_netid);
- goto out_free_da;
- }
-
- /* save human readable address */
- len = strlen(startsep) + strlen(buf) + strlen(endsep) + 7;
- da->da_remotestr = kzalloc(len, gfp_flags);
-
- /* NULL is ok, only used for dprintk */
- if (da->da_remotestr)
- snprintf(da->da_remotestr, len, "%s%s%s:%u", startsep,
- buf, endsep, ntohs(port));
-
- dprintk("%s: Parsed DS addr %s\n", __func__, da->da_remotestr);
- kfree(buf);
- kfree(netid);
- return da;
-
-out_free_da:
- kfree(da);
-out_free_buf:
- dprintk("%s: Error parsing DS addr: %s\n", __func__, buf);
- kfree(buf);
-out_free_netid:
- kfree(netid);
-out_err:
- return NULL;
-}
-
/* Decode opaque device data and return the result */
struct nfs4_file_layout_dsaddr *
nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
@@ -353,8 +205,8 @@ nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
mp_count = be32_to_cpup(p); /* multipath count */
for (j = 0; j < mp_count; j++) {
- da = decode_ds_addr(server->nfs_client->cl_net,
- &stream, gfp_flags);
+ da = nfs4_decode_mp_ds_addr(server->nfs_client->cl_net,
+ &stream, gfp_flags);
if (da)
list_add_tail(&da->da_node, &dsaddrs);
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index b0168f1..403d7bb 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -312,6 +312,9 @@ void pnfs_generic_write_commit_done(struct rpc_task *task, void *data);
void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
+struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
+ struct xdr_stream *xdr,
+ gfp_t gfp_flags);
static inline struct nfs4_deviceid_node *
nfs4_get_deviceid(struct nfs4_deviceid_node *d)
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 3bb2b74..81ec449 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -9,6 +9,7 @@
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
+#include <linux/sunrpc/addr.h>
#include "internal.h"
#include "pnfs.h"
@@ -532,3 +533,151 @@ out:
return ds;
}
EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_add);
+
+/*
+ * Currently only supports ipv4, ipv6 and one multi-path address.
+ */
+struct nfs4_pnfs_ds_addr *
+nfs4_decode_mp_ds_addr(struct net *net, struct xdr_stream *xdr, gfp_t gfp_flags)
+{
+ struct nfs4_pnfs_ds_addr *da = NULL;
+ char *buf, *portstr;
+ __be16 port;
+ int nlen, rlen;
+ int tmp[2];
+ __be32 *p;
+ char *netid, *match_netid;
+ size_t len, match_netid_len;
+ char *startsep = "";
+ char *endsep = "";
+
+
+ /* r_netid */
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_err;
+ nlen = be32_to_cpup(p++);
+
+ p = xdr_inline_decode(xdr, nlen);
+ if (unlikely(!p))
+ goto out_err;
+
+ netid = kmalloc(nlen+1, gfp_flags);
+ if (unlikely(!netid))
+ goto out_err;
+
+ netid[nlen] = '\0';
+ memcpy(netid, p, nlen);
+
+ /* r_addr: ip/ip6addr with port in dec octets - see RFC 5665 */
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_free_netid;
+ rlen = be32_to_cpup(p);
+
+ p = xdr_inline_decode(xdr, rlen);
+ if (unlikely(!p))
+ goto out_free_netid;
+
+ /* port is ".ABC.DEF", 8 chars max */
+ if (rlen > INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN + 8) {
+ dprintk("%s: Invalid address, length %d\n", __func__,
+ rlen);
+ goto out_free_netid;
+ }
+ buf = kmalloc(rlen + 1, gfp_flags);
+ if (!buf) {
+ dprintk("%s: Not enough memory\n", __func__);
+ goto out_free_netid;
+ }
+ buf[rlen] = '\0';
+ memcpy(buf, p, rlen);
+
+ /* replace port '.' with '-' */
+ portstr = strrchr(buf, '.');
+ if (!portstr) {
+ dprintk("%s: Failed finding expected dot in port\n",
+ __func__);
+ goto out_free_buf;
+ }
+ *portstr = '-';
+
+ /* find '.' between address and port */
+ portstr = strrchr(buf, '.');
+ if (!portstr) {
+ dprintk("%s: Failed finding expected dot between address and "
+ "port\n", __func__);
+ goto out_free_buf;
+ }
+ *portstr = '\0';
+
+ da = kzalloc(sizeof(*da), gfp_flags);
+ if (unlikely(!da))
+ goto out_free_buf;
+
+ INIT_LIST_HEAD(&da->da_node);
+
+ if (!rpc_pton(net, buf, portstr-buf, (struct sockaddr *)&da->da_addr,
+ sizeof(da->da_addr))) {
+ dprintk("%s: error parsing address %s\n", __func__, buf);
+ goto out_free_da;
+ }
+
+ portstr++;
+ sscanf(portstr, "%d-%d", &tmp[0], &tmp[1]);
+ port = htons((tmp[0] << 8) | (tmp[1]));
+
+ switch (da->da_addr.ss_family) {
+ case AF_INET:
+ ((struct sockaddr_in *)&da->da_addr)->sin_port = port;
+ da->da_addrlen = sizeof(struct sockaddr_in);
+ match_netid = "tcp";
+ match_netid_len = 3;
+ break;
+
+ case AF_INET6:
+ ((struct sockaddr_in6 *)&da->da_addr)->sin6_port = port;
+ da->da_addrlen = sizeof(struct sockaddr_in6);
+ match_netid = "tcp6";
+ match_netid_len = 4;
+ startsep = "[";
+ endsep = "]";
+ break;
+
+ default:
+ dprintk("%s: unsupported address family: %u\n",
+ __func__, da->da_addr.ss_family);
+ goto out_free_da;
+ }
+
+ if (nlen != match_netid_len || strncmp(netid, match_netid, nlen)) {
+ dprintk("%s: ERROR: r_netid \"%s\" != \"%s\"\n",
+ __func__, netid, match_netid);
+ goto out_free_da;
+ }
+
+ /* save human readable address */
+ len = strlen(startsep) + strlen(buf) + strlen(endsep) + 7;
+ da->da_remotestr = kzalloc(len, gfp_flags);
+
+ /* NULL is ok, only used for dprintk */
+ if (da->da_remotestr)
+ snprintf(da->da_remotestr, len, "%s%s%s:%u", startsep,
+ buf, endsep, ntohs(port));
+
+ dprintk("%s: Parsed DS addr %s\n", __func__, da->da_remotestr);
+ kfree(buf);
+ kfree(netid);
+ return da;
+
+out_free_da:
+ kfree(da);
+out_free_buf:
+ dprintk("%s: Error parsing DS addr: %s\n", __func__, buf);
+ kfree(buf);
+out_free_netid:
+ kfree(netid);
+out_err:
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(nfs4_decode_mp_ds_addr);
--
1.9.3
From: Peng Tao <[email protected]>
It can be reused by flexfiles layout client.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 78 +++----------------------------------
fs/nfs/pnfs.h | 3 ++
fs/nfs/pnfs_nfs.c | 81 +++++++++++++++++++++++++++++++++++++++
3 files changed, 89 insertions(+), 73 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index c7f6041..27bdd8c 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -41,51 +41,6 @@
static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
-/*
- * Create an rpc connection to the nfs4_pnfs_ds data server
- * Currently only supports IPv4 and IPv6 addresses
- */
-static int
-nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
-{
- struct nfs_client *clp = ERR_PTR(-EIO);
- struct nfs4_pnfs_ds_addr *da;
- int status = 0;
-
- dprintk("--> %s DS %s au_flavor %d\n", __func__, ds->ds_remotestr,
- mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
-
- list_for_each_entry(da, &ds->ds_addrs, da_node) {
- dprintk("%s: DS %s: trying address %s\n",
- __func__, ds->ds_remotestr, da->da_remotestr);
-
- clp = nfs4_set_ds_client(mds_srv->nfs_client,
- (struct sockaddr *)&da->da_addr,
- da->da_addrlen, IPPROTO_TCP,
- dataserver_timeo, dataserver_retrans);
- if (!IS_ERR(clp))
- break;
- }
-
- if (IS_ERR(clp)) {
- status = PTR_ERR(clp);
- goto out;
- }
-
- status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
- if (status)
- goto out_put;
-
- smp_wmb();
- ds->ds_clp = clp;
- dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
-out:
- return status;
-out_put:
- nfs_put_client(clp);
- goto out;
-}
-
void
nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
{
@@ -302,22 +257,7 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
return flseg->fh_array[i];
}
-static void nfs4_wait_ds_connect(struct nfs4_pnfs_ds *ds)
-{
- might_sleep();
- wait_on_bit_action(&ds->ds_state, NFS4DS_CONNECTING,
- nfs_wait_bit_killable, TASK_KILLABLE);
-}
-
-static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
-{
- smp_mb__before_atomic();
- clear_bit(NFS4DS_CONNECTING, &ds->ds_state);
- smp_mb__after_atomic();
- wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
-}
-
-
+/* Upon return, either ds is connected, or ds is NULL */
struct nfs4_pnfs_ds *
nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
{
@@ -325,6 +265,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
struct nfs4_pnfs_ds *ret = ds;
+ struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
if (ds == NULL) {
printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
@@ -336,18 +277,9 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
if (ds->ds_clp)
goto out_test_devid;
- if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
- struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
- int err;
-
- err = nfs4_ds_connect(s, ds);
- if (err)
- nfs4_mark_deviceid_unavailable(devid);
- nfs4_clear_ds_conn_bit(ds);
- } else {
- /* Either ds is connected, or ds is NULL */
- nfs4_wait_ds_connect(ds);
- }
+ nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
+ dataserver_retrans);
+
out_test_devid:
if (filelayout_test_devid_unavailable(devid))
ret = NULL;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 403d7bb..9a8937c 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -312,6 +312,9 @@ void pnfs_generic_write_commit_done(struct rpc_task *task, void *data);
void nfs4_pnfs_ds_put(struct nfs4_pnfs_ds *ds);
struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
+void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
+ struct nfs4_deviceid_node *devid, unsigned int timeo,
+ unsigned int retrans);
struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
struct xdr_stream *xdr,
gfp_t gfp_flags);
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 81ec449..5a92e76 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -11,6 +11,7 @@
#include <linux/nfs_page.h>
#include <linux/sunrpc/addr.h>
+#include "nfs4session.h"
#include "internal.h"
#include "pnfs.h"
@@ -534,6 +535,86 @@ out:
}
EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_add);
+static void nfs4_wait_ds_connect(struct nfs4_pnfs_ds *ds)
+{
+ might_sleep();
+ wait_on_bit(&ds->ds_state, NFS4DS_CONNECTING,
+ TASK_KILLABLE);
+}
+
+static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
+{
+ smp_mb__before_atomic();
+ clear_bit(NFS4DS_CONNECTING, &ds->ds_state);
+ smp_mb__after_atomic();
+ wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
+}
+
+static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
+ struct nfs4_pnfs_ds *ds,
+ unsigned int timeo,
+ unsigned int retrans)
+{
+ struct nfs_client *clp = ERR_PTR(-EIO);
+ struct nfs4_pnfs_ds_addr *da;
+ int status = 0;
+
+ dprintk("--> %s DS %s au_flavor %d\n", __func__, ds->ds_remotestr,
+ mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
+
+ list_for_each_entry(da, &ds->ds_addrs, da_node) {
+ dprintk("%s: DS %s: trying address %s\n",
+ __func__, ds->ds_remotestr, da->da_remotestr);
+
+ clp = nfs4_set_ds_client(mds_srv->nfs_client,
+ (struct sockaddr *)&da->da_addr,
+ da->da_addrlen, IPPROTO_TCP,
+ timeo, retrans);
+ if (!IS_ERR(clp))
+ break;
+ }
+
+ if (IS_ERR(clp)) {
+ status = PTR_ERR(clp);
+ goto out;
+ }
+
+ status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
+ if (status)
+ goto out_put;
+
+ smp_wmb();
+ ds->ds_clp = clp;
+ dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
+out:
+ return status;
+out_put:
+ nfs_put_client(clp);
+ goto out;
+}
+
+/*
+ * Create an rpc connection to the nfs4_pnfs_ds data server.
+ * Currently only supports IPv4 and IPv6 addresses.
+ * If connection fails, make devid unavailable.
+ */
+void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
+ struct nfs4_deviceid_node *devid, unsigned int timeo,
+ unsigned int retrans)
+{
+ if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
+ int err = 0;
+
+ err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans);
+ if (err)
+ nfs4_mark_deviceid_unavailable(devid);
+ nfs4_clear_ds_conn_bit(ds);
+ } else {
+ nfs4_wait_ds_connect(ds);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs4_pnfs_ds_connect);
+
/*
* Currently only supports ipv4, ipv6 and one multi-path address.
*/
--
1.9.3
From: Peng Tao <[email protected]>
flexfile layout may use different auth flavor as specified by MDS.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 3 ++-
fs/nfs/internal.h | 3 ++-
fs/nfs/nfs4client.c | 5 +++--
fs/nfs/pnfs.h | 2 +-
fs/nfs/pnfs_nfs.c | 10 ++++++----
5 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index 27bdd8c..5e4b0ce 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -278,7 +278,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
goto out_test_devid;
nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
- dataserver_retrans);
+ dataserver_retrans,
+ s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
out_test_devid:
if (filelayout_test_devid_unavailable(devid))
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index efaa31c..7d7c36f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -189,7 +189,8 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr,
int ds_addrlen, int ds_proto,
unsigned int ds_timeo,
- unsigned int ds_retrans);
+ unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor);
extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
struct inode *);
#ifdef CONFIG_PROC_FS
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 0331125..61d552d 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -837,7 +837,8 @@ error:
*/
struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr, int ds_addrlen,
- int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans)
+ int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor)
{
struct nfs_client_initdata cl_init = {
.addr = ds_addr,
@@ -862,7 +863,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
*/
nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
- mds_clp->cl_rpcclient->cl_auth->au_flavor);
+ au_flavor);
dprintk("<-- %s %p\n", __func__, clp);
return clp;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9a8937c..2ea9e9a 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -314,7 +314,7 @@ struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans);
+ unsigned int retrans, rpc_authflavor_t au_flavor);
struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
struct xdr_stream *xdr,
gfp_t gfp_flags);
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 5a92e76..106ee08 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -553,7 +553,8 @@ static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
struct nfs4_pnfs_ds *ds,
unsigned int timeo,
- unsigned int retrans)
+ unsigned int retrans,
+ rpc_authflavor_t au_flavor)
{
struct nfs_client *clp = ERR_PTR(-EIO);
struct nfs4_pnfs_ds_addr *da;
@@ -569,7 +570,7 @@ static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
clp = nfs4_set_ds_client(mds_srv->nfs_client,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
- timeo, retrans);
+ timeo, retrans, au_flavor);
if (!IS_ERR(clp))
break;
}
@@ -600,12 +601,13 @@ out_put:
*/
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans)
+ unsigned int retrans, rpc_authflavor_t au_flavor)
{
if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
int err = 0;
- err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans);
+ err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo,
+ retrans, au_flavor);
if (err)
nfs4_mark_deviceid_unavailable(devid);
nfs4_clear_ds_conn_bit(ds);
--
1.9.3
From: Peng Tao <[email protected]>
They can be reused by flexfile layout as well.
Also add a code such that if read fails on one DS and
there are other DSes available to use, don't resend
through MDS but through pNFS so that client can read
from other DSes.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.h | 10 ----------
fs/nfs/pnfs.h | 11 +++++++++++
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.h b/fs/nfs/filelayout/filelayout.h
index f97eea6..2896cb8 100644
--- a/fs/nfs/filelayout/filelayout.h
+++ b/fs/nfs/filelayout/filelayout.h
@@ -33,13 +33,6 @@
#include "../pnfs.h"
/*
- * Default data server connection timeout and retrans vaules.
- * Set by module paramters dataserver_timeo and dataserver_retrans.
- */
-#define NFS4_DEF_DS_TIMEO 600 /* in tenths of a second */
-#define NFS4_DEF_DS_RETRANS 5
-
-/*
* Field testing shows we need to support up to 4096 stripe indices.
* We store each index as a u8 (u32 on the wire) to keep the memory footprint
* reasonable. This in turn means we support a maximum of 256
@@ -48,9 +41,6 @@
#define NFS4_PNFS_MAX_STRIPE_CNT 4096
#define NFS4_PNFS_MAX_MULTI_CNT 256 /* 256 fit into a u8 stripe_index */
-/* error codes for internal use */
-#define NFS4ERR_RESET_TO_MDS 12001
-
enum stripetype4 {
STRIPE_SPARSE = 1,
STRIPE_DENSE = 2
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 2ea9e9a..aef89b3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -77,6 +77,17 @@ enum pnfs_try_status {
#define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
+/*
+ * Default data server connection timeout and retrans vaules.
+ * Set by module parameters dataserver_timeo and dataserver_retrans.
+ */
+#define NFS4_DEF_DS_TIMEO 600 /* in tenths of a second */
+#define NFS4_DEF_DS_RETRANS 5
+
+/* error codes for internal use */
+#define NFS4ERR_RESET_TO_MDS 12001
+#define NFS4ERR_RESET_TO_PNFS 12002
+
enum {
NFS_LAYOUT_RO_FAILED = 0, /* get ro layout failed stop trying */
NFS_LAYOUT_RW_FAILED, /* get rw layout failed stop trying */
--
1.9.3
From: Peng Tao <[email protected]>
The flexfiles layout wants to create DS connection over NFSv3.
Add nfs3_set_ds_client to allow that to happen.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/internal.h | 4 ++++
fs/nfs/nfs3_fs.h | 2 ++
fs/nfs/nfs3client.c | 34 ++++++++++++++++++++++++++++++++++
fs/nfs/nfs3super.c | 2 +-
include/linux/nfs_fs_sb.h | 9 +++++----
5 files changed, 46 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 7d7c36f..7332ba1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -193,6 +193,10 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
rpc_authflavor_t au_flavor);
extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
struct inode *);
+extern struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
+ const struct sockaddr *ds_addr, int ds_addrlen,
+ int ds_proto, unsigned int ds_timeo,
+ unsigned int ds_retrans, rpc_authflavor_t au_flavor);
#ifdef CONFIG_PROC_FS
extern int __init nfs_fs_proc_init(void);
extern void nfs_fs_proc_exit(void);
diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
index 333ae40..e134d65 100644
--- a/fs/nfs/nfs3_fs.h
+++ b/fs/nfs/nfs3_fs.h
@@ -30,5 +30,7 @@ struct nfs_server *nfs3_create_server(struct nfs_mount_info *, struct nfs_subver
struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
struct nfs_fattr *, rpc_authflavor_t);
+/* nfs3super.c */
+extern struct nfs_subversion nfs_v3;
#endif /* __LINUX_FS_NFS_NFS3_FS_H */
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 8c1b437..52e2344 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -64,3 +64,37 @@ struct nfs_server *nfs3_clone_server(struct nfs_server *source,
nfs_init_server_aclclient(server);
return server;
}
+
+/*
+ * Set up a pNFS Data Server client over NFSv3.
+ *
+ * Return any existing nfs_client that matches server address,port,version
+ * and minorversion.
+ *
+ * For a new nfs_client, use a soft mount (default), a low retrans and a
+ * low timeout interval so that if a connection is lost, we retry through
+ * the MDS.
+ */
+struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
+ const struct sockaddr *ds_addr, int ds_addrlen,
+ int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
+ rpc_authflavor_t au_flavor)
+{
+ struct nfs_client_initdata cl_init = {
+ .addr = ds_addr,
+ .addrlen = ds_addrlen,
+ .nfs_mod = &nfs_v3,
+ .proto = ds_proto,
+ .net = mds_clp->cl_net,
+ };
+ struct rpc_timeout ds_timeout;
+ struct nfs_client *clp;
+
+ /* Use the MDS nfs_client cl_ipaddr. */
+ nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
+ clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
+ au_flavor);
+
+ return clp;
+}
+EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
diff --git a/fs/nfs/nfs3super.c b/fs/nfs/nfs3super.c
index 6af29c2..5c4394e 100644
--- a/fs/nfs/nfs3super.c
+++ b/fs/nfs/nfs3super.c
@@ -7,7 +7,7 @@
#include "nfs3_fs.h"
#include "nfs.h"
-static struct nfs_subversion nfs_v3 = {
+struct nfs_subversion nfs_v3 = {
.owner = THIS_MODULE,
.nfs_fs = &nfs_fs_type,
.rpc_vers = &nfs_version3,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 1e37fbb..38a3c5e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -74,10 +74,6 @@ struct nfs_client {
/* idmapper */
struct idmap * cl_idmap;
- /* Our own IP address, as a null-terminated string.
- * This is used to generate the mv0 callback address.
- */
- char cl_ipaddr[48];
u32 cl_cb_ident; /* v4.0 callback identifier */
const struct nfs4_minor_version_ops *cl_mvops;
unsigned long cl_mig_gen;
@@ -105,6 +101,11 @@ struct nfs_client {
#define NFS_SP4_MACH_CRED_COMMIT 6 /* COMMIT */
#endif /* CONFIG_NFS_V4 */
+ /* Our own IP address, as a null-terminated string.
+ * This is used to generate the mv0 callback address.
+ */
+ char cl_ipaddr[48];
+
#ifdef CONFIG_NFS_FSCACHE
struct fscache_cookie *fscache; /* client index cache cookie */
#endif
--
1.9.3
From: Peng Tao <[email protected]>
flexfile layout may need to set such when making DS connections.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 3 ++-
fs/nfs/internal.h | 1 +
fs/nfs/nfs4client.c | 4 ++--
fs/nfs/pnfs.h | 3 ++-
fs/nfs/pnfs_nfs.c | 11 +++++++----
5 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index 5e4b0ce..4f372e2 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -278,7 +278,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
goto out_test_devid;
nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
- dataserver_retrans,
+ dataserver_retrans, 4,
+ s->nfs_client->cl_minorversion,
s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
out_test_devid:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 7332ba1..5543850 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -190,6 +190,7 @@ extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
int ds_addrlen, int ds_proto,
unsigned int ds_timeo,
unsigned int ds_retrans,
+ u32 minor_version,
rpc_authflavor_t au_flavor);
extern struct rpc_clnt *nfs4_find_or_create_ds_client(struct nfs_client *,
struct inode *);
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 61d552d..d303627 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -838,14 +838,14 @@ error:
struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr, int ds_addrlen,
int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans,
- rpc_authflavor_t au_flavor)
+ u32 minor_version, rpc_authflavor_t au_flavor)
{
struct nfs_client_initdata cl_init = {
.addr = ds_addr,
.addrlen = ds_addrlen,
.nfs_mod = &nfs_v4,
.proto = ds_proto,
- .minorversion = mds_clp->cl_minorversion,
+ .minorversion = minor_version,
.net = mds_clp->cl_net,
};
struct rpc_timeout ds_timeout;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index aef89b3..70ffec1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -325,7 +325,8 @@ struct nfs4_pnfs_ds *nfs4_pnfs_ds_add(struct list_head *dsaddrs,
gfp_t gfp_flags);
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans, rpc_authflavor_t au_flavor);
+ unsigned int retrans, u32 versoin, u32 minor_version,
+ rpc_authflavor_t au_flavor);
struct nfs4_pnfs_ds_addr *nfs4_decode_mp_ds_addr(struct net *net,
struct xdr_stream *xdr,
gfp_t gfp_flags);
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 106ee08..ad211a4 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -554,6 +554,7 @@ static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
struct nfs4_pnfs_ds *ds,
unsigned int timeo,
unsigned int retrans,
+ u32 minor_version,
rpc_authflavor_t au_flavor)
{
struct nfs_client *clp = ERR_PTR(-EIO);
@@ -570,7 +571,8 @@ static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
clp = nfs4_set_ds_client(mds_srv->nfs_client,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
- timeo, retrans, au_flavor);
+ timeo, retrans, minor_version,
+ au_flavor);
if (!IS_ERR(clp))
break;
}
@@ -601,13 +603,14 @@ out_put:
*/
void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
struct nfs4_deviceid_node *devid, unsigned int timeo,
- unsigned int retrans, rpc_authflavor_t au_flavor)
+ unsigned int retrans, u32 version,
+ u32 minor_version, rpc_authflavor_t au_flavor)
{
if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
int err = 0;
- err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo,
- retrans, au_flavor);
+ err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans,
+ minor_version, au_flavor);
if (err)
nfs4_mark_deviceid_unavailable(devid);
nfs4_clear_ds_conn_bit(ds);
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs_nfs.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index ad211a4..5c6cb00 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -550,7 +550,44 @@ static void nfs4_clear_ds_conn_bit(struct nfs4_pnfs_ds *ds)
wake_up_bit(&ds->ds_state, NFS4DS_CONNECTING);
}
-static int _nfs4_pnfs_ds_connect(struct nfs_server *mds_srv,
+static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv,
+ struct nfs4_pnfs_ds *ds,
+ unsigned int timeo,
+ unsigned int retrans,
+ rpc_authflavor_t au_flavor)
+{
+ struct nfs_client *clp = ERR_PTR(-EIO);
+ struct nfs4_pnfs_ds_addr *da;
+ int status = 0;
+
+ dprintk("--> %s DS %s au_flavor %d\n", __func__,
+ ds->ds_remotestr, au_flavor);
+
+ list_for_each_entry(da, &ds->ds_addrs, da_node) {
+ dprintk("%s: DS %s: trying address %s\n",
+ __func__, ds->ds_remotestr, da->da_remotestr);
+
+ clp = nfs3_set_ds_client(mds_srv->nfs_client,
+ (struct sockaddr *)&da->da_addr,
+ da->da_addrlen, IPPROTO_TCP,
+ timeo, retrans, au_flavor);
+ if (!IS_ERR(clp))
+ break;
+ }
+
+ if (IS_ERR(clp)) {
+ status = PTR_ERR(clp);
+ goto out;
+ }
+
+ smp_wmb();
+ ds->ds_clp = clp;
+ dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
+out:
+ return status;
+}
+
+static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
struct nfs4_pnfs_ds *ds,
unsigned int timeo,
unsigned int retrans,
@@ -609,8 +646,19 @@ void nfs4_pnfs_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds,
if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) {
int err = 0;
- err = _nfs4_pnfs_ds_connect(mds_srv, ds, timeo, retrans,
- minor_version, au_flavor);
+ if (version == 3) {
+ err = _nfs4_pnfs_v3_ds_connect(mds_srv, ds, timeo,
+ retrans, au_flavor);
+ } else if (version == 4) {
+ err = _nfs4_pnfs_v4_ds_connect(mds_srv, ds, timeo,
+ retrans, minor_version,
+ au_flavor);
+ } else {
+ dprintk("%s: unsupported DS version %d\n", __func__,
+ version);
+ err = -EPROTONOSUPPORT;
+ }
+
if (err)
nfs4_mark_deviceid_unavailable(devid);
nfs4_clear_ds_conn_bit(ds);
--
1.9.3
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 4 ++--
fs/nfs/internal.h | 1 +
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/read.c | 3 ++-
fs/nfs/write.c | 6 +++---
include/linux/nfs_page.h | 1 +
6 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index bc36ed3..25c4896 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -501,7 +501,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr,
+ nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
&filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
}
@@ -542,7 +542,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr,
+ nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
&filelayout_write_call_ops, sync,
RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5543850..1d15ffa 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -251,6 +251,7 @@ void nfs_pgio_header_free(struct nfs_pgio_header *);
void nfs_pgio_data_destroy(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
+ const struct nfs_rpc_ops *,
const struct rpc_call_ops *, int, int);
void nfs_free_request(struct nfs_page *req);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 2b5e769..35a2626 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -597,6 +597,7 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
}
int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
@@ -616,7 +617,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
};
int ret = 0;
- hdr->rw_ops->rw_initiate(hdr, &msg, &task_setup_data, how);
+ hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how);
dprintk("NFS: %5u initiated pgio call "
"(req %s/%llu, %u bytes @ offset %llu)\n",
@@ -792,7 +793,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0)
ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
- hdr, desc->pg_rpc_callops,
+ hdr, NFS_PROTO(hdr->inode),
+ desc->pg_rpc_callops,
desc->pg_ioflags, 0);
return ret;
}
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index c91a479..092ab49 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -168,13 +168,14 @@ out:
static void nfs_initiate_read(struct nfs_pgio_header *hdr,
struct rpc_message *msg,
+ const struct nfs_rpc_ops *rpc_ops,
struct rpc_task_setup *task_setup_data, int how)
{
struct inode *inode = hdr->inode;
int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
task_setup_data->flags |= swap_flags;
- NFS_PROTO(inode)->read_setup(hdr, msg);
+ rpc_ops->read_setup(hdr, msg);
}
static void
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index af3af68..54d4857 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1240,15 +1240,15 @@ static int flush_task_priority(int how)
static void nfs_initiate_write(struct nfs_pgio_header *hdr,
struct rpc_message *msg,
+ const struct nfs_rpc_ops *rpc_ops,
struct rpc_task_setup *task_setup_data, int how)
{
- struct inode *inode = hdr->inode;
int priority = flush_task_priority(how);
task_setup_data->priority = priority;
- NFS_PROTO(inode)->write_setup(hdr, msg);
+ rpc_ops->write_setup(hdr, msg);
- nfs4_state_protect_write(NFS_SERVER(inode)->nfs_client,
+ nfs4_state_protect_write(NFS_SERVER(hdr->inode)->nfs_client,
&task_setup_data->rpc_client, msg, hdr);
}
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 6c3e06e..4c3aa80 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -69,6 +69,7 @@ struct nfs_rw_ops {
struct inode *);
void (*rw_result)(struct rpc_task *, struct nfs_pgio_header *);
void (*rw_initiate)(struct nfs_pgio_header *, struct rpc_message *,
+ const struct nfs_rpc_ops *,
struct rpc_task_setup *, int);
};
--
1.9.3
From: Peng Tao <[email protected]>
pnfs flexfile layout client may want to use NFSv3 ops rather
than the default MDS v4 ops.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 2 +-
fs/nfs/internal.h | 1 +
fs/nfs/pnfs_nfs.c | 1 +
fs/nfs/write.c | 7 ++++---
4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 25c4896..e5a3c5b 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -1055,7 +1055,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
if (fh)
data->args.fh = fh;
- return nfs_initiate_commit(ds_clnt, data,
+ return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
RPC_TASK_SOFTCONN);
out_err:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 1d15ffa..98dee83 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -436,6 +436,7 @@ extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
extern void nfs_commit_prepare(struct rpc_task *task, void *calldata);
extern int nfs_initiate_commit(struct rpc_clnt *clnt,
struct nfs_commit_data *data,
+ const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
int how, int flags);
extern void nfs_init_commit(struct nfs_commit_data *data,
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 5c6cb00..e6a4587 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -277,6 +277,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
if (!data->lseg) {
nfs_init_commit(data, mds_pages, NULL, cinfo);
nfs_initiate_commit(NFS_CLIENT(inode), data,
+ NFS_PROTO(data->inode),
data->mds_ops, how, 0);
} else {
struct pnfs_commit_bucket *buckets;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 54d4857..8800bd3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1465,6 +1465,7 @@ void nfs_commitdata_release(struct nfs_commit_data *data)
EXPORT_SYMBOL_GPL(nfs_commitdata_release);
int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
+ const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
int how, int flags)
{
@@ -1486,7 +1487,7 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
.priority = priority,
};
/* Set up the initial task struct. */
- NFS_PROTO(data->inode)->commit_setup(data, &msg);
+ nfs_ops->commit_setup(data, &msg);
dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
@@ -1589,8 +1590,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
/* Set up the argument struct */
nfs_init_commit(data, head, NULL, cinfo);
atomic_inc(&cinfo->mds->rpcs_out);
- return nfs_initiate_commit(NFS_CLIENT(inode), data, data->mds_ops,
- how, 0);
+ return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
+ data->mds_ops, how, 0);
out_bad:
nfs_retry_commit(head, NULL, cinfo);
cinfo->completion_ops->error_cleanup(NFS_I(inode));
--
1.9.3
From: Peng Tao <[email protected]>
flexclient needs this as there is no nfs_server to DS connection.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4_fs.h | 4 ++++
fs/nfs/nfs4proc.c | 24 +++++++++++++-----------
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index a081787..90c4ffe 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -443,6 +443,10 @@ extern void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid);
extern void nfs_increment_lock_seqid(int status, struct nfs_seqid *seqid);
extern void nfs_release_seqid(struct nfs_seqid *seqid);
extern void nfs_free_seqid(struct nfs_seqid *seqid);
+extern int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
+ struct nfs4_sequence_args *args,
+ struct nfs4_sequence_res *res,
+ struct rpc_task *task);
extern void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e7f8d5f..9b1a481 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -495,12 +495,11 @@ static void nfs4_set_sequence_privileged(struct nfs4_sequence_args *args)
args->sa_privileged = 1;
}
-static int nfs40_setup_sequence(const struct nfs_server *server,
- struct nfs4_sequence_args *args,
- struct nfs4_sequence_res *res,
- struct rpc_task *task)
+int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
+ struct nfs4_sequence_args *args,
+ struct nfs4_sequence_res *res,
+ struct rpc_task *task)
{
- struct nfs4_slot_table *tbl = server->nfs_client->cl_slot_tbl;
struct nfs4_slot *slot;
/* slot already allocated? */
@@ -535,6 +534,7 @@ out_sleep:
spin_unlock(&tbl->slot_tbl_lock);
return -EAGAIN;
}
+EXPORT_SYMBOL_GPL(nfs40_setup_sequence);
static int nfs40_sequence_done(struct rpc_task *task,
struct nfs4_sequence_res *res)
@@ -777,7 +777,8 @@ static int nfs4_setup_sequence(const struct nfs_server *server,
int ret = 0;
if (!session)
- return nfs40_setup_sequence(server, args, res, task);
+ return nfs40_setup_sequence(server->nfs_client->cl_slot_tbl,
+ args, res, task);
dprintk("--> %s clp %p session %p sr_slot %u\n",
__func__, session->clp, session, res->sr_slot ?
@@ -818,7 +819,8 @@ static int nfs4_setup_sequence(const struct nfs_server *server,
struct nfs4_sequence_res *res,
struct rpc_task *task)
{
- return nfs40_setup_sequence(server, args, res, task);
+ return nfs40_setup_sequence(server->nfs_client->cl_slot_tbl,
+ args, res, task);
}
static int nfs4_sequence_done(struct rpc_task *task,
@@ -1681,8 +1683,8 @@ static void nfs4_open_confirm_prepare(struct rpc_task *task, void *calldata)
{
struct nfs4_opendata *data = calldata;
- nfs40_setup_sequence(data->o_arg.server, &data->c_arg.seq_args,
- &data->c_res.seq_res, task);
+ nfs40_setup_sequence(data->o_arg.server->nfs_client->cl_slot_tbl,
+ &data->c_arg.seq_args, &data->c_res.seq_res, task);
}
static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
@@ -5965,8 +5967,8 @@ static void nfs4_release_lockowner_prepare(struct rpc_task *task, void *calldata
{
struct nfs_release_lockowner_data *data = calldata;
struct nfs_server *server = data->server;
- nfs40_setup_sequence(server, &data->args.seq_args,
- &data->res.seq_res, task);
+ nfs40_setup_sequence(server->nfs_client->cl_slot_tbl,
+ &data->args.seq_args, &data->res.seq_res, task);
data->args.lock_owner.clientid = server->nfs_client->cl_clientid;
data->timestamp = jiffies;
}
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4_fs.h | 2 ++
fs/nfs/nfs4proc.c | 9 +++++----
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 90c4ffe..b3c771e 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -447,6 +447,8 @@ extern int nfs40_setup_sequence(struct nfs4_slot_table *tbl,
struct nfs4_sequence_args *args,
struct nfs4_sequence_res *res,
struct rpc_task *task);
+extern int nfs4_sequence_done(struct rpc_task *task,
+ struct nfs4_sequence_res *res);
extern void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9b1a481..1dff02a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -694,8 +694,7 @@ out_retry:
}
EXPORT_SYMBOL_GPL(nfs41_sequence_done);
-static int nfs4_sequence_done(struct rpc_task *task,
- struct nfs4_sequence_res *res)
+int nfs4_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res)
{
if (res->sr_slot == NULL)
return 1;
@@ -703,6 +702,7 @@ static int nfs4_sequence_done(struct rpc_task *task,
return nfs40_sequence_done(task, res);
return nfs41_sequence_done(task, res);
}
+EXPORT_SYMBOL_GPL(nfs4_sequence_done);
int nfs41_setup_sequence(struct nfs4_session *session,
struct nfs4_sequence_args *args,
@@ -823,11 +823,12 @@ static int nfs4_setup_sequence(const struct nfs_server *server,
args, res, task);
}
-static int nfs4_sequence_done(struct rpc_task *task,
- struct nfs4_sequence_res *res)
+int nfs4_sequence_done(struct rpc_task *task,
+ struct nfs4_sequence_res *res)
{
return nfs40_sequence_done(task, res);
}
+EXPORT_SYMBOL_GPL(nfs4_sequence_done);
#endif /* !CONFIG_NFS_V4_1 */
--
1.9.3
From: Peng Tao <[email protected]>
so that flexfile layout client can pass in DS credential instead of
using user cred, which will be done in the next patch.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 11 ++++++-----
fs/nfs/internal.h | 6 +++---
fs/nfs/pagelist.c | 8 +++++---
3 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index e5a3c5b..bfa8547 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -501,8 +501,9 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
- &filelayout_read_call_ops, 0, RPC_TASK_SOFTCONN);
+ nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
+ 0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
}
@@ -542,9 +543,9 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, NFS_PROTO(hdr->inode),
- &filelayout_write_call_ops, sync,
- RPC_TASK_SOFTCONN);
+ nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
+ sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 98dee83..e9305e9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -250,9 +250,9 @@ struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
void nfs_pgio_data_destroy(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
- const struct nfs_rpc_ops *,
- const struct rpc_call_ops *, int, int);
+int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
+ struct rpc_cred *cred, const struct nfs_rpc_ops *rpc_ops,
+ const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
static inline void nfs_iocounter_init(struct nfs_io_counter *c)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 35a2626..c4d1758 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -597,14 +597,14 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
}
int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct nfs_rpc_ops *rpc_ops,
+ struct rpc_cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
struct rpc_message msg = {
.rpc_argp = &hdr->args,
.rpc_resp = &hdr->res,
- .rpc_cred = hdr->cred,
+ .rpc_cred = cred,
};
struct rpc_task_setup task_setup_data = {
.rpc_client = clnt,
@@ -793,7 +793,9 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0)
ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
- hdr, NFS_PROTO(hdr->inode),
+ hdr,
+ hdr->cred,
+ NFS_PROTO(hdr->inode),
desc->pg_rpc_callops,
desc->pg_ioflags, 0);
return ret;
--
1.9.3
From: Trond Myklebust <[email protected]>
Enable pNFS callbacks to allow flex files to work correctly with a
NFSv3-enabled data server.
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3proc.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 524f9f8..78e557c 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -800,6 +800,9 @@ static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
{
struct inode *inode = hdr->inode;
+ if (hdr->pgio_done_cb != NULL)
+ return hdr->pgio_done_cb(task, hdr);
+
if (nfs3_async_handle_jukebox(task, inode))
return -EAGAIN;
@@ -825,6 +828,9 @@ static int nfs3_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
{
struct inode *inode = hdr->inode;
+ if (hdr->pgio_done_cb != NULL)
+ return hdr->pgio_done_cb(task, hdr);
+
if (nfs3_async_handle_jukebox(task, inode))
return -EAGAIN;
if (task->tk_status >= 0)
@@ -845,6 +851,9 @@ static void nfs3_proc_commit_rpc_prepare(struct rpc_task *task, struct nfs_commi
static int nfs3_commit_done(struct rpc_task *task, struct nfs_commit_data *data)
{
+ if (data->commit_done_cb != NULL)
+ return data->commit_done_cb(task, data);
+
if (nfs3_async_handle_jukebox(task, data->inode))
return -EAGAIN;
nfs_refresh_inode(data->inode, data->res.fattr);
--
1.9.3
From: Weston Andros Adamson <[email protected]>
Add a call to tally stats for a task under a different statsidx than
what's contained in the task structure.
This is needed to properly account for pnfs reads/writes when the
DS nfs version != the MDS version.
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
include/linux/sunrpc/metrics.h | 2 ++
net/sunrpc/stats.c | 26 +++++++++++++++++++-------
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
index eecb5a7..89f2ca1 100644
--- a/include/linux/sunrpc/metrics.h
+++ b/include/linux/sunrpc/metrics.h
@@ -79,6 +79,8 @@ struct rpc_clnt;
struct rpc_iostats * rpc_alloc_iostats(struct rpc_clnt *);
void rpc_count_iostats(const struct rpc_task *,
struct rpc_iostats *);
+void rpc_count_iostats_metrics(const struct rpc_task *,
+ struct rpc_iostats *);
void rpc_print_iostats(struct seq_file *, struct rpc_clnt *);
void rpc_free_iostats(struct rpc_iostats *);
diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 9711a15..2ecb994 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -140,22 +140,20 @@ void rpc_free_iostats(struct rpc_iostats *stats)
EXPORT_SYMBOL_GPL(rpc_free_iostats);
/**
- * rpc_count_iostats - tally up per-task stats
+ * rpc_count_iostats_metrics - tally up per-task stats
* @task: completed rpc_task
- * @stats: array of stat structures
+ * @op_metrics: stat structure for OP that will accumulate stats from @task
*/
-void rpc_count_iostats(const struct rpc_task *task, struct rpc_iostats *stats)
+void rpc_count_iostats_metrics(const struct rpc_task *task,
+ struct rpc_iostats *op_metrics)
{
struct rpc_rqst *req = task->tk_rqstp;
- struct rpc_iostats *op_metrics;
ktime_t delta, now;
- if (!stats || !req)
+ if (!op_metrics || !req)
return;
now = ktime_get();
- op_metrics = &stats[task->tk_msg.rpc_proc->p_statidx];
-
spin_lock(&op_metrics->om_lock);
op_metrics->om_ops++;
@@ -175,6 +173,20 @@ void rpc_count_iostats(const struct rpc_task *task, struct rpc_iostats *stats)
spin_unlock(&op_metrics->om_lock);
}
+EXPORT_SYMBOL_GPL(rpc_count_iostats_metrics);
+
+/**
+ * rpc_count_iostats - tally up per-task stats
+ * @task: completed rpc_task
+ * @stats: array of stat structures
+ *
+ * Uses the statidx from @task
+ */
+void rpc_count_iostats(const struct rpc_task *task, struct rpc_iostats *stats)
+{
+ rpc_count_iostats_metrics(task,
+ &stats[task->tk_msg.rpc_proc->p_statidx]);
+}
EXPORT_SYMBOL_GPL(rpc_count_iostats);
static void _print_name(struct seq_file *seq, unsigned int op,
--
1.9.3
From: Peng Tao <[email protected]>
lockd assumes hostname exists otherwise kernel oops.
It can be reproduced by following steps:
1. mount flexfile MDS
2. write some files
3. mount DS via nfsv3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8134f332>] strlen+0x2/0x20
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: nfsd(F) nfs_layout_flexfiles(F) rpcsec_gss_krb5(F) auth_rpcgss(F) nfsv4(F) dns_resolver(F) nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) ebtable_nat(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) bnep(F) snd_ens1371(F) snd_rawmidi(F) snd_ac97_codec(F) btusb(F) ac97_bus(F) snd_seq(F) snd_seq_device(F) snd_pcm(F) ppdev(F) bluetooth(F) 6lowpan_iphc(F) rfkill(F) vmw_balloon(F) snd_timer(F) snd(F) soundcore(F) gameport(F) i2c_piix4(F) e1000(F) vmw_vmci(F) parport_pc(F) parport(F) shpchp(F) uinput(F) xfs(F) libcrc32c(F) vmwgfx(F) ttm(F) drm(F) mptspi(F) scsi_transport_spi(F) mptscsih(F) mptbase(F) i2c_core(F)
CPU: 0 PID: 10397 Comm: mount.nfs Tainted: GF 3.14.7-100.pd_client.001.fc16.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff880008942600 ti: ffff880007990000 task.ti: ffff880007990000
RIP: 0010:[<ffffffff8134f332>] [<ffffffff8134f332>] strlen+0x2/0x20
RSP: 0018:ffff880007991aa0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880038d39c20 RCX: 0000000000000004
RDX: 0000000000000006 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff880007991b38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000014600 R11: 0000000000000400 R12: ffffffff81cc8580
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000004
FS: 00007f90cd2ef880(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001710000 CR4: 00000000001407f0
Stack:
ffffffffa045f52c ffff880001782230 ffff880004141e28 0006880007991ac8
ffffffff816dc14b ffff880000000000 ffff880038d39c20 0000000000000010
0000000481cc0006 0000000000000000 ffffffffa0410be8 000000000000c014
Call Trace:
[<ffffffffa045f52c>] ? nlmclnt_lookup_host+0x4c/0x2c0 [lockd]
[<ffffffff816dc14b>] ? _raw_spin_unlock_bh+0x1b/0x20
[<ffffffffa0410be8>] ? svc_destroy+0xb8/0x140 [sunrpc]
[<ffffffffa045c323>] nlmclnt_init+0x53/0xc0 [lockd]
[<ffffffffa047d2dc>] ? nfs_get_client+0x1cc/0x340 [nfs]
[<ffffffffa047c2e7>] nfs_start_lockd+0xa7/0xd0 [nfs]
[<ffffffffa047df71>] nfs_create_server+0x181/0x5c0 [nfs]
[<ffffffffa04460f3>] nfs3_create_server+0x13/0x30 [nfsv3]
[<ffffffffa048a0bc>] nfs_try_mount+0x21c/0x300 [nfs]
[<ffffffff811ca32d>] ? __kmalloc_track_caller+0x1ad/0x240
[<ffffffffa048b677>] ? nfs_fs_mount+0xc37/0xd80 [nfs]
[<ffffffffa048ad05>] nfs_fs_mount+0x2c5/0xd80 [nfs]
[<ffffffffa048a830>] ? nfs_clone_super+0x140/0x140 [nfs]
[<ffffffffa048a240>] ? nfs_clone_sb_security+0x40/0x40 [nfs]
[<ffffffff811e7e43>] mount_fs+0x43/0x1b0
[<ffffffff81193100>] ? __alloc_percpu+0x10/0x20
[<ffffffff812026e6>] vfs_kern_mount+0x76/0x120
[<ffffffff81204917>] do_mount+0x237/0xa80
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3client.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 52e2344..9e9fa34 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -1,5 +1,6 @@
#include <linux/nfs_fs.h>
#include <linux/nfs_mount.h>
+#include <linux/sunrpc/addr.h>
#include "internal.h"
#include "nfs3_fs.h"
@@ -89,6 +90,12 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_client *mds_clp,
};
struct rpc_timeout ds_timeout;
struct nfs_client *clp;
+ char buf[INET6_ADDRSTRLEN + 1];
+
+ /* fake a hostname because lockd wants it */
+ if (rpc_ntop(ds_addr, buf, sizeof(buf)) <= 0)
+ return ERR_PTR(-EINVAL);
+ cl_init.hostname = buf;
/* Use the MDS nfs_client cl_ipaddr. */
nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
--
1.9.3
From: Peng Tao <[email protected]>
flexfiles needs to start layoutcommit when necessary
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/pnfs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0a5dda4..2d25670 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1966,6 +1966,7 @@ clear_layoutcommitting:
pnfs_clear_layoutcommitting(inode);
goto out;
}
+EXPORT_SYMBOL_GPL(pnfs_layoutcommit_inode);
struct nfs4_threshold *pnfs_mdsthreshold_alloc(void)
{
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2d25670..fa00b56 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1288,7 +1288,6 @@ pnfs_update_layout(struct inode *ino,
struct nfs_client *clp = server->nfs_client;
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg = NULL;
- bool first;
if (!pnfs_enabled_sb(NFS_SERVER(ino)))
goto out;
@@ -1321,16 +1320,15 @@ pnfs_update_layout(struct inode *ino,
if (pnfs_layoutgets_blocked(lo, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
-
- first = list_empty(&lo->plh_layouts) ? true : false;
spin_unlock(&ino->i_lock);
- if (first) {
+ if (list_empty(&lo->plh_layouts)) {
/* The lo must be on the clp list if there is any
* chance of a CB_LAYOUTRECALL(FILE) coming in.
*/
spin_lock(&clp->cl_lock);
- list_add_tail(&lo->plh_layouts, &server->layouts);
+ if (list_empty(&lo->plh_layouts))
+ list_add_tail(&lo->plh_layouts, &server->layouts);
spin_unlock(&clp->cl_lock);
}
--
1.9.3
From: Peng Tao <[email protected]>
Per RFC 5661 Errata 3208:
| A client MAY always forget its layout state and associated
| layout stateid at any time (See also section 12.5.5.1).
| In such case, the client MUST use a non-layout stateid for the next
| LAYOUTGET operation. This will signal the server that the client has
| no more layouts on the file and its respective layout state can be
| released before issuing a new layout in response to LAYOUTGET.
In order to make such a signal unique to server, client needs to serialize
all layoutgets using non-layout stateid. We implement this by serializing
layoutgets when client has no layout segments at hand.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 35 +++++++++++++++++++++++++++++++----
fs/nfs/pnfs.h | 1 +
2 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index fa00b56..7e1bac1 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1288,6 +1288,7 @@ pnfs_update_layout(struct inode *ino,
struct nfs_client *clp = server->nfs_client;
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg = NULL;
+ bool first;
if (!pnfs_enabled_sb(NFS_SERVER(ino)))
goto out;
@@ -1295,6 +1296,8 @@ pnfs_update_layout(struct inode *ino,
if (pnfs_within_mdsthreshold(ctx, ino, iomode))
goto out;
+lookup_again:
+ first = false;
spin_lock(&ino->i_lock);
lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
if (lo == NULL) {
@@ -1312,10 +1315,27 @@ pnfs_update_layout(struct inode *ino,
if (pnfs_layout_io_test_failed(lo, iomode))
goto out_unlock;
- /* Check to see if the layout for the given range already exists */
- lseg = pnfs_find_lseg(lo, &arg);
- if (lseg)
- goto out_unlock;
+ first = list_empty(&lo->plh_segs);
+ if (first) {
+ /* The first layoutget for the file. Need to serialize per
+ * RFC 5661 Errata 3208.
+ */
+ if (test_and_set_bit(NFS_LAYOUT_FIRST_LAYOUTGET,
+ &lo->plh_flags)) {
+ spin_unlock(&ino->i_lock);
+ wait_on_bit(&lo->plh_flags, NFS_LAYOUT_FIRST_LAYOUTGET,
+ TASK_UNINTERRUPTIBLE);
+ pnfs_put_layout_hdr(lo);
+ goto lookup_again;
+ }
+ } else {
+ /* Check to see if the layout for the given range
+ * already exists
+ */
+ lseg = pnfs_find_lseg(lo, &arg);
+ if (lseg)
+ goto out_unlock;
+ }
if (pnfs_layoutgets_blocked(lo, 0))
goto out_unlock;
@@ -1343,6 +1363,13 @@ pnfs_update_layout(struct inode *ino,
lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
atomic_dec(&lo->plh_outstanding);
out_put_layout_hdr:
+ if (first) {
+ unsigned long *bitlock = &lo->plh_flags;
+
+ clear_bit_unlock(NFS_LAYOUT_FIRST_LAYOUTGET, bitlock);
+ smp_mb__after_atomic();
+ wake_up_bit(bitlock, NFS_LAYOUT_FIRST_LAYOUTGET);
+ }
pnfs_put_layout_hdr(lo);
out:
dprintk("%s: inode %s/%llu pNFS layout segment %s for "
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 70ffec1..a9a7b26 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -95,6 +95,7 @@ enum {
NFS_LAYOUT_ROC, /* some lseg had roc bit set */
NFS_LAYOUT_RETURN, /* Return this layout ASAP */
NFS_LAYOUT_INVALID_STID, /* layout stateid id is invalid */
+ NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */
};
enum layoutdriver_policy_flags {
--
1.9.3
From: Peng Tao <[email protected]>
Flexfiles layout would want to use them to report DS IO status.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs2xdr.c | 10 +++++++---
fs/nfs/nfs3xdr.c | 3 +++
fs/nfs/nfs4xdr.c | 3 +++
include/linux/nfs_xdr.h | 2 ++
4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 5f61b83..b4e03ed 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -481,7 +481,8 @@ out_overflow:
* void;
* };
*/
-static int decode_attrstat(struct xdr_stream *xdr, struct nfs_fattr *result)
+static int decode_attrstat(struct xdr_stream *xdr, struct nfs_fattr *result,
+ __u32 *op_status)
{
enum nfs_stat status;
int error;
@@ -489,6 +490,8 @@ static int decode_attrstat(struct xdr_stream *xdr, struct nfs_fattr *result)
error = decode_stat(xdr, &status);
if (unlikely(error))
goto out;
+ if (op_status)
+ *op_status = status;
if (status != NFS_OK)
goto out_default;
error = decode_fattr(xdr, result);
@@ -808,7 +811,7 @@ out_default:
static int nfs2_xdr_dec_attrstat(struct rpc_rqst *req, struct xdr_stream *xdr,
struct nfs_fattr *result)
{
- return decode_attrstat(xdr, result);
+ return decode_attrstat(xdr, result, NULL);
}
static int nfs2_xdr_dec_diropres(struct rpc_rqst *req, struct xdr_stream *xdr,
@@ -865,6 +868,7 @@ static int nfs2_xdr_dec_readres(struct rpc_rqst *req, struct xdr_stream *xdr,
error = decode_stat(xdr, &status);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS_OK)
goto out_default;
error = decode_fattr(xdr, result->fattr);
@@ -882,7 +886,7 @@ static int nfs2_xdr_dec_writeres(struct rpc_rqst *req, struct xdr_stream *xdr,
{
/* All NFSv2 writes are "file sync" writes */
result->verf->committed = NFS_FILE_SYNC;
- return decode_attrstat(xdr, result->fattr);
+ return decode_attrstat(xdr, result->fattr, &result->op_status);
}
/**
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 8f4cbe7..2a932fd 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -1636,6 +1636,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr,
error = decode_post_op_attr(xdr, result->fattr);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS3_OK)
goto out_status;
error = decode_read3resok(xdr, result);
@@ -1708,6 +1709,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr,
error = decode_wcc_data(xdr, result->fattr);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS3_OK)
goto out_status;
error = decode_write3resok(xdr, result);
@@ -2323,6 +2325,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req,
error = decode_wcc_data(xdr, result->fattr);
if (unlikely(error))
goto out;
+ result->op_status = status;
if (status != NFS3_OK)
goto out_status;
error = decode_writeverf3(xdr, &result->verf->verifier);
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index cb4376b..7d8d7a4 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6567,6 +6567,7 @@ static int nfs4_xdr_dec_read(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
int status;
status = decode_compound_hdr(xdr, &hdr);
+ res->op_status = hdr.status;
if (status)
goto out;
status = decode_sequence(xdr, &res->seq_res, rqstp);
@@ -6592,6 +6593,7 @@ static int nfs4_xdr_dec_write(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
int status;
status = decode_compound_hdr(xdr, &hdr);
+ res->op_status = hdr.status;
if (status)
goto out;
status = decode_sequence(xdr, &res->seq_res, rqstp);
@@ -6621,6 +6623,7 @@ static int nfs4_xdr_dec_commit(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
int status;
status = decode_compound_hdr(xdr, &hdr);
+ res->op_status = hdr.status;
if (status)
goto out;
status = decode_sequence(xdr, &res->seq_res, rqstp);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 467c84e..962f461 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -513,6 +513,7 @@ struct nfs_pgio_res {
struct nfs4_sequence_res seq_res;
struct nfs_fattr * fattr;
__u32 count;
+ __u32 op_status;
int eof; /* used by read */
struct nfs_writeverf * verf; /* used by write */
const struct nfs_server *server; /* used by write */
@@ -532,6 +533,7 @@ struct nfs_commitargs {
struct nfs_commitres {
struct nfs4_sequence_res seq_res;
+ __u32 op_status;
struct nfs_fattr *fattr;
struct nfs_writeverf *verf;
const struct nfs_server *server;
--
1.9.3
From: Peng Tao <[email protected]>
So that it is possible to return a specific iomode layouts.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4xdr.c | 2 +-
fs/nfs/pnfs.c | 1 +
include/linux/nfs_xdr.h | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 7d8d7a4..3c3ff63 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2012,7 +2012,7 @@ encode_layoutreturn(struct xdr_stream *xdr,
p = reserve_space(xdr, 16);
*p++ = cpu_to_be32(0); /* reclaim. always 0 for now */
*p++ = cpu_to_be32(args->layout_type);
- *p++ = cpu_to_be32(IOMODE_ANY);
+ *p++ = cpu_to_be32(args->iomode);
*p = cpu_to_be32(RETURN_FILE);
p = reserve_space(xdr, 16);
p = xdr_encode_hyper(p, 0);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7e1bac1..1b544c1 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -914,6 +914,7 @@ _pnfs_return_layout(struct inode *ino)
lrp->args.stateid = stateid;
lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
lrp->args.inode = ino;
+ lrp->args.iomode = IOMODE_ANY;
lrp->args.layout = lo;
lrp->clp = NFS_SERVER(ino)->nfs_client;
lrp->cred = lo->plh_lc_cred;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 962f461..4fd7793 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -293,6 +293,7 @@ struct nfs4_layoutreturn_args {
struct nfs4_sequence_args seq_args;
struct pnfs_layout_hdr *layout;
struct inode *inode;
+ enum pnfs_iomode iomode;
nfs4_stateid stateid;
__u32 layout_type;
};
--
1.9.3
From: Peng Tao <[email protected]>
It allows to specify different iomode to return.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 53 +++++++++++++++++++++++++++++++++--------------------
1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 1b544c1..1b97209 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -845,6 +845,38 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
}
}
+static int
+pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
+ enum pnfs_iomode iomode)
+{
+ struct inode *ino = lo->plh_inode;
+ struct nfs4_layoutreturn *lrp;
+ int status = 0;
+
+ lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+ if (unlikely(lrp == NULL)) {
+ status = -ENOMEM;
+ spin_lock(&ino->i_lock);
+ lo->plh_block_lgets--;
+ spin_unlock(&ino->i_lock);
+ pnfs_put_layout_hdr(lo);
+ goto out;
+ }
+
+ lrp->args.stateid = stateid;
+ lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
+ lrp->args.inode = ino;
+ lrp->args.iomode = iomode;
+ lrp->args.layout = lo;
+ lrp->clp = NFS_SERVER(ino)->nfs_client;
+ lrp->cred = lo->plh_lc_cred;
+
+ status = nfs4_proc_layoutreturn(lrp);
+out:
+ dprintk("<-- %s status: %d\n", __func__, status);
+ return status;
+}
+
/*
* Initiates a LAYOUTRETURN(FILE), and removes the pnfs_layout_hdr
* when the layout segment list is empty.
@@ -859,7 +891,6 @@ _pnfs_return_layout(struct inode *ino)
struct pnfs_layout_hdr *lo = NULL;
struct nfs_inode *nfsi = NFS_I(ino);
LIST_HEAD(tmp_list);
- struct nfs4_layoutreturn *lrp;
nfs4_stateid stateid;
int status = 0, empty;
@@ -901,25 +932,7 @@ _pnfs_return_layout(struct inode *ino)
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&tmp_list);
- lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
- if (unlikely(lrp == NULL)) {
- status = -ENOMEM;
- spin_lock(&ino->i_lock);
- lo->plh_block_lgets--;
- spin_unlock(&ino->i_lock);
- pnfs_put_layout_hdr(lo);
- goto out;
- }
-
- lrp->args.stateid = stateid;
- lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
- lrp->args.inode = ino;
- lrp->args.iomode = IOMODE_ANY;
- lrp->args.layout = lo;
- lrp->clp = NFS_SERVER(ino)->nfs_client;
- lrp->cred = lo->plh_lc_cred;
-
- status = nfs4_proc_layoutreturn(lrp);
+ status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
--
1.9.3
From: Peng Tao <[email protected]>
It marks all matching layout segments as NFS_LSEG_LAYOUTRETURN,
which is an indicator for pnfs_put_lseg() to send layoutreturn,
and also prevents pnfs_update_layout() from using the returning
segments. Once it is set, it never gets cleared.
It also sets proper io failure bit so that pnfs path can be retried
after PNFS_LAYOUTGET_RETRY_TIMEOUT second.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/pnfs.h | 4 ++++
2 files changed, 59 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 1b97209..0bd149b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1479,6 +1479,61 @@ out_forget_reply:
goto out;
}
+static void
+pnfs_mark_matching_lsegs_return(struct pnfs_layout_hdr *lo,
+ struct list_head *tmp_list,
+ struct pnfs_layout_range *return_range)
+{
+ struct pnfs_layout_segment *lseg, *next;
+
+ dprintk("%s:Begin lo %p\n", __func__, lo);
+
+ if (list_empty(&lo->plh_segs))
+ return;
+
+ list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list)
+ if (should_free_lseg(&lseg->pls_range, return_range)) {
+ dprintk("%s: marking lseg %p iomode %d "
+ "offset %llu length %llu\n", __func__,
+ lseg, lseg->pls_range.iomode,
+ lseg->pls_range.offset,
+ lseg->pls_range.length);
+ set_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags);
+ mark_lseg_invalid(lseg, tmp_list);
+ }
+}
+
+void pnfs_error_mark_layout_for_return(struct inode *inode,
+ struct pnfs_layout_segment *lseg)
+{
+ struct pnfs_layout_hdr *lo = NFS_I(inode)->layout;
+ int iomode = pnfs_iomode_to_fail_bit(lseg->pls_range.iomode);
+ struct pnfs_layout_range range = {
+ .iomode = lseg->pls_range.iomode,
+ .offset = 0,
+ .length = NFS4_MAX_UINT64,
+ };
+ LIST_HEAD(free_me);
+
+ spin_lock(&inode->i_lock);
+ /* set failure bit so that pnfs path will be retried later */
+ pnfs_layout_set_fail_bit(lo, iomode);
+ set_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ if (lo->plh_return_iomode == 0)
+ lo->plh_return_iomode = range.iomode;
+ else if (lo->plh_return_iomode != range.iomode)
+ lo->plh_return_iomode = IOMODE_ANY;
+ /*
+ * mark all matching lsegs so that we are sure to have no live
+ * segments at hand when sending layoutreturn. See pnfs_put_lseg()
+ * for how it works.
+ */
+ pnfs_mark_matching_lsegs_return(lo, &free_me, &range);
+ spin_unlock(&inode->i_lock);
+ pnfs_free_lseg_list(&free_me);
+}
+EXPORT_SYMBOL_GPL(pnfs_error_mark_layout_for_return);
+
void
pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
{
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a9a7b26..28856f1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -38,6 +38,7 @@ enum {
NFS_LSEG_VALID = 0, /* cleared when lseg is recalled/returned */
NFS_LSEG_ROC, /* roc bit received from server */
NFS_LSEG_LAYOUTCOMMIT, /* layoutcommit bit set for layoutcommit */
+ NFS_LSEG_LAYOUTRETURN, /* layoutreturn bit set for layoutreturn */
};
/* Individual ip address */
@@ -184,6 +185,7 @@ struct pnfs_layout_hdr {
u32 plh_barrier; /* ignore lower seqids */
unsigned long plh_retry_timestamp;
unsigned long plh_flags;
+ enum pnfs_iomode plh_return_iomode;
loff_t plh_lwb; /* last write byte for layoutcommit */
struct rpc_cred *plh_lc_cred; /* layoutcommit cred */
struct inode *plh_inode;
@@ -274,6 +276,8 @@ void nfs4_deviceid_mark_client_invalid(struct nfs_client *clp);
int pnfs_read_done_resend_to_mds(struct nfs_pgio_header *);
int pnfs_write_done_resend_to_mds(struct nfs_pgio_header *);
struct nfs4_threshold *pnfs_mdsthreshold_alloc(void);
+void pnfs_error_mark_layout_for_return(struct inode *inode,
+ struct pnfs_layout_segment *lseg);
/* nfs4_deviceid_flags */
enum {
--
1.9.3
From: Peng Tao <[email protected]>
And if we are to return the same type of layouts, don't bother
sending more layoutgets.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
fs/nfs/pnfs.c | 23 ++++++++++++++++++-----
fs/nfs/pnfs.h | 1 +
3 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1dff02a..617ca2e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7531,6 +7531,7 @@ nfs4_layoutget_prepare(struct rpc_task *task, void *calldata)
return;
if (pnfs_choose_layoutget_stateid(&lgp->args.stateid,
NFS_I(lgp->args.inode)->layout,
+ &lgp->args.range,
lgp->args.ctx->state)) {
rpc_exit(task, NFS4_OK);
}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0bd149b..853b544 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -740,25 +740,37 @@ pnfs_layout_stateid_blocked(const struct pnfs_layout_hdr *lo,
return !pnfs_seqid_is_newer(seqid, lo->plh_barrier);
}
+static bool
+pnfs_layout_returning(const struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range)
+{
+ return test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags) &&
+ (lo->plh_return_iomode == IOMODE_ANY ||
+ lo->plh_return_iomode == range->iomode);
+}
+
/* lget is set to 1 if called from inside send_layoutget call chain */
static bool
-pnfs_layoutgets_blocked(const struct pnfs_layout_hdr *lo, int lget)
+pnfs_layoutgets_blocked(const struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range, int lget)
{
return lo->plh_block_lgets ||
test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
(list_empty(&lo->plh_segs) &&
- (atomic_read(&lo->plh_outstanding) > lget));
+ (atomic_read(&lo->plh_outstanding) > lget)) ||
+ pnfs_layout_returning(lo, range);
}
int
pnfs_choose_layoutget_stateid(nfs4_stateid *dst, struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range,
struct nfs4_state *open_state)
{
int status = 0;
dprintk("--> %s\n", __func__);
spin_lock(&lo->plh_inode->i_lock);
- if (pnfs_layoutgets_blocked(lo, 1)) {
+ if (pnfs_layoutgets_blocked(lo, range, 1)) {
status = -EAGAIN;
} else if (!nfs4_valid_open_stateid(open_state)) {
status = -EBADF;
@@ -1192,6 +1204,7 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo,
list_for_each_entry(lseg, &lo->plh_segs, pls_list) {
if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags) &&
+ !test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags) &&
pnfs_lseg_range_match(&lseg->pls_range, range)) {
ret = pnfs_get_lseg(lseg);
break;
@@ -1351,7 +1364,7 @@ lookup_again:
goto out_unlock;
}
- if (pnfs_layoutgets_blocked(lo, 0))
+ if (pnfs_layoutgets_blocked(lo, &arg, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
spin_unlock(&ino->i_lock);
@@ -1432,7 +1445,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
goto out_forget_reply;
}
- if (pnfs_layoutgets_blocked(lo, 1)) {
+ if (pnfs_layoutgets_blocked(lo, &lgp->args.range, 1)) {
dprintk("%s forget reply due to state\n", __func__);
goto out_forget_reply;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 28856f1..7ffadd8 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -249,6 +249,7 @@ void pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
bool update_barrier);
int pnfs_choose_layoutget_stateid(nfs4_stateid *dst,
struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_range *range,
struct nfs4_state *open_state);
int pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
struct list_head *tmp_list,
--
1.9.3
From: Peng Tao <[email protected]>
If current lseg is the last lseg marked with NFS_LSEG_LAYOUTRETURN,
send layoutreturn.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 853b544..e9acfcf 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -50,6 +50,10 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
*/
static LIST_HEAD(pnfs_modules_tbl);
+static int
+pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
+ enum pnfs_iomode iomode);
+
/* Return the registered pnfs layout driver module matching given id */
static struct pnfs_layoutdriver_type *
find_pnfs_driver_locked(u32 id)
@@ -337,6 +341,29 @@ pnfs_layout_remove_lseg(struct pnfs_layout_hdr *lo,
rpc_wake_up(&NFS_SERVER(inode)->roc_rpcwaitq);
}
+/* Return true if layoutreturn is needed */
+static bool
+pnfs_layout_need_return(struct pnfs_layout_hdr *lo,
+ struct pnfs_layout_segment *lseg,
+ nfs4_stateid *stateid, enum pnfs_iomode *iomode)
+{
+ struct pnfs_layout_segment *s;
+
+ if (!test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags))
+ return false;
+
+ list_for_each_entry(s, &lo->plh_segs, pls_list)
+ if (test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags))
+ return false;
+
+ *stateid = lo->plh_stateid;
+ *iomode = lo->plh_return_iomode;
+ /* decreased in pnfs_send_layoutreturn() */
+ lo->plh_block_lgets++;
+ lo->plh_return_iomode = 0;
+ return true;
+}
+
void
pnfs_put_lseg(struct pnfs_layout_segment *lseg)
{
@@ -352,11 +379,20 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
lo = lseg->pls_layout;
inode = lo->plh_inode;
if (atomic_dec_and_lock(&lseg->pls_refcount, &inode->i_lock)) {
+ bool need_return;
+ nfs4_stateid stateid;
+ enum pnfs_iomode iomode;
+
pnfs_get_layout_hdr(lo);
pnfs_layout_remove_lseg(lo, lseg);
+ need_return = pnfs_layout_need_return(lo, lseg,
+ &stateid, &iomode);
spin_unlock(&inode->i_lock);
pnfs_free_lseg(lseg);
- pnfs_put_layout_hdr(lo);
+ if (need_return)
+ pnfs_send_layoutreturn(lo, stateid, iomode);
+ else
+ pnfs_put_layout_hdr(lo);
}
}
EXPORT_SYMBOL_GPL(pnfs_put_lseg);
--
1.9.3
From: Peng Tao <[email protected]>
So that pnfs path is not disabled for ever.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
fs/nfs/pnfs.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 617ca2e..ac50af1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7787,6 +7787,7 @@ static void nfs4_layoutreturn_release(void *calldata)
spin_lock(&lo->plh_inode->i_lock);
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
+ clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
lo->plh_block_lgets--;
spin_unlock(&lo->plh_inode->i_lock);
pnfs_put_layout_hdr(lrp->args.layout);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e9acfcf..63992c8 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -921,6 +921,11 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = nfs4_proc_layoutreturn(lrp);
out:
+ if (status) {
+ spin_lock(&ino->i_lock);
+ clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ spin_unlock(&ino->i_lock);
+ }
dprintk("<-- %s status: %d\n", __func__, status);
return status;
}
--
1.9.3
From: Peng Tao <[email protected]>
Instead of calling layoutreturn directly, call pnfs_error_mark_layout_for_return
to mark layouts for return and let generic code return layout when
layout segments are freed.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
Conflicts:
fs/nfs/filelayout/filelayout.c
---
fs/nfs/filelayout/filelayout.c | 2 +-
fs/nfs/pnfs_nfs.c | 10 ----------
2 files changed, 1 insertion(+), 11 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index bfa8547..5d2eadc 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -200,7 +200,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
dprintk("%s DS connection error %d\n", __func__,
task->tk_status);
nfs4_mark_deviceid_unavailable(devid);
- set_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ pnfs_error_mark_layout_for_return(inode, lseg);
rpc_wake_up(&tbl->slot_tbl_waitq);
/* fall through */
default:
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index e6a4587..9822e86 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -17,20 +17,10 @@
#define NFSDBG_FACILITY NFSDBG_PNFS
-static void pnfs_generic_fenceme(struct inode *inode,
- struct pnfs_layout_hdr *lo)
-{
- if (!test_and_clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
- return;
- pnfs_return_layout(inode);
-}
-
void pnfs_generic_rw_release(void *data)
{
struct nfs_pgio_header *hdr = data;
- struct pnfs_layout_hdr *lo = hdr->lseg->pls_layout;
- pnfs_generic_fenceme(lo->plh_inode, lo);
nfs_put_client(hdr->ds_clp);
hdr->mds_ops->rpc_release(data);
}
--
1.9.3
From: Weston Andros Adamson <[email protected]>
Add a new operation to nfs_pageio_ops that is called on nfs_pageio_complete.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/pagelist.c | 5 ++++-
include/linux/nfs_page.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index c4d1758..1c03187 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1050,7 +1050,7 @@ int nfs_pageio_resend(struct nfs_pageio_descriptor *desc,
EXPORT_SYMBOL_GPL(nfs_pageio_resend);
/**
- * nfs_pageio_complete - Complete I/O on an nfs_pageio_descriptor
+ * nfs_pageio_complete - Complete I/O then cleanup an nfs_pageio_descriptor
* @desc: pointer to io descriptor
*/
void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
@@ -1062,6 +1062,9 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
if (!nfs_do_recoalesce(desc))
break;
}
+
+ if (desc->pg_ops->pg_cleanup)
+ desc->pg_ops->pg_cleanup(desc);
}
/**
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 4c3aa80..479c566 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -58,6 +58,7 @@ struct nfs_pageio_ops {
size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
struct nfs_page *);
int (*pg_doio)(struct nfs_pageio_descriptor *);
+ void (*pg_cleanup)(struct nfs_pageio_descriptor *);
};
struct nfs_rw_ops {
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/blocklayout/blocklayout.c | 2 ++
fs/nfs/filelayout/filelayout.c | 2 ++
fs/nfs/objlayout/objio_osd.c | 2 ++
fs/nfs/pnfs.c | 32 ++++++++++++++------------------
fs/nfs/pnfs.h | 1 +
5 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 77fec6a..1cac3c1 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -860,12 +860,14 @@ static const struct nfs_pageio_ops bl_pg_read_ops = {
.pg_init = bl_pg_init_read,
.pg_test = bl_pg_test_read,
.pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static const struct nfs_pageio_ops bl_pg_write_ops = {
.pg_init = bl_pg_init_write,
.pg_test = bl_pg_test_write,
.pg_doio = pnfs_generic_pg_writepages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static struct pnfs_layoutdriver_type blocklayout_type = {
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 5d2eadc..2af32fc 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -933,12 +933,14 @@ static const struct nfs_pageio_ops filelayout_pg_read_ops = {
.pg_init = filelayout_pg_init_read,
.pg_test = filelayout_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static const struct nfs_pageio_ops filelayout_pg_write_ops = {
.pg_init = filelayout_pg_init_write,
.pg_test = filelayout_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static u32 select_bucket_index(struct nfs4_filelayout_segment *fl, u32 j)
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 9e5bc42..d007780 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -607,12 +607,14 @@ static const struct nfs_pageio_ops objio_pg_read_ops = {
.pg_init = objio_init_read,
.pg_test = objio_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static const struct nfs_pageio_ops objio_pg_write_ops = {
.pg_init = objio_init_write,
.pg_test = objio_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
};
static struct pnfs_layoutdriver_type objlayout_type = {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 63992c8..2da2e77 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1631,6 +1631,16 @@ pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_init_write);
+void
+pnfs_generic_pg_cleanup(struct nfs_pageio_descriptor *desc)
+{
+ if (desc->pg_lseg) {
+ pnfs_put_lseg(desc->pg_lseg);
+ desc->pg_lseg = NULL;
+ }
+}
+EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
+
/*
* Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
* of bytes (maximum @req->wb_bytes) that can be coalesced.
@@ -1756,11 +1766,9 @@ pnfs_do_write(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- desc->pg_lseg = NULL;
trypnfs = pnfs_try_to_write_data(hdr, call_ops, lseg, how);
if (trypnfs == PNFS_NOT_ATTEMPTED)
pnfs_write_through_mds(desc, hdr);
- pnfs_put_lseg(lseg);
}
static void pnfs_writehdr_free(struct nfs_pgio_header *hdr)
@@ -1779,17 +1787,13 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
desc->pg_completion_ops->error_cleanup(&desc->pg_list);
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
+
hdr->lseg = pnfs_get_lseg(desc->pg_lseg);
ret = nfs_generic_pgio(desc, hdr);
- if (ret != 0) {
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
- } else
+ if (!ret)
pnfs_do_write(desc, hdr, desc->pg_ioflags);
return ret;
}
@@ -1874,11 +1878,9 @@ pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- desc->pg_lseg = NULL;
trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
if (trypnfs == PNFS_NOT_ATTEMPTED)
pnfs_read_through_mds(desc, hdr);
- pnfs_put_lseg(lseg);
}
static void pnfs_readhdr_free(struct nfs_pgio_header *hdr)
@@ -1897,18 +1899,12 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
desc->pg_completion_ops->error_cleanup(&desc->pg_list);
- ret = -ENOMEM;
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
- return ret;
+ return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
hdr->lseg = pnfs_get_lseg(desc->pg_lseg);
ret = nfs_generic_pgio(desc, hdr);
- if (ret != 0) {
- pnfs_put_lseg(desc->pg_lseg);
- desc->pg_lseg = NULL;
- } else
+ if (!ret)
pnfs_do_read(desc, hdr);
return ret;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7ffadd8..c146ed8 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -230,6 +230,7 @@ void pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *, struct nfs_page *
int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc);
void pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
struct nfs_page *req, u64 wb_size);
+void pnfs_generic_pg_cleanup(struct nfs_pageio_descriptor *);
int pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc);
size_t pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req);
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This is needed for mirrored DS support, where multuple requests
cover the same range.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/write.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8800bd3..e997457 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -473,13 +473,18 @@ try_again:
do {
/*
* Subrequests are always contiguous, non overlapping
- * and in order. If not, it's a programming error.
+ * and in order - but may be repeated (mirrored writes).
*/
- WARN_ON_ONCE(subreq->wb_offset !=
- (head->wb_offset + total_bytes));
-
- /* keep track of how many bytes this group covers */
- total_bytes += subreq->wb_bytes;
+ if (subreq->wb_offset == (head->wb_offset + total_bytes)) {
+ /* keep track of how many bytes this group covers */
+ total_bytes += subreq->wb_bytes;
+ } else if (WARN_ON_ONCE(subreq->wb_offset < head->wb_offset ||
+ ((subreq->wb_offset + subreq->wb_bytes) >
+ (head->wb_offset + total_bytes)))) {
+ nfs_page_group_unlock(head);
+ spin_unlock(&inode->i_lock);
+ return ERR_PTR(-EIO);
+ }
if (!nfs_lock_request(subreq)) {
/* releases page group bit lock and
--
1.9.3
From: Weston Andros Adamson <[email protected]>
'ds_commit_idx' is a better name - it is used to select the right
commit bucket for pnfs.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 14 ++++++--------
fs/nfs/filelayout/filelayout.c | 4 ++--
include/linux/nfs_xdr.h | 2 +-
3 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e84f764..d7c2d43 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -112,22 +112,22 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
* nfs_direct_select_verf - select the right verifier
* @dreq - direct request possibly spanning multiple servers
* @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
- * @ds_idx - index of data server in data server list, only valid if ds_clp set
+ * @commit_idx - commit bucket index for the DS
*
* returns the correct verifier to use given the role of the server
*/
static struct nfs_writeverf *
nfs_direct_select_verf(struct nfs_direct_req *dreq,
struct nfs_client *ds_clp,
- int ds_idx)
+ int commit_idx)
{
struct nfs_writeverf *verfp = &dreq->verf;
#ifdef CONFIG_NFS_V4_1
if (ds_clp) {
/* pNFS is in use, use the DS verf */
- if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
- verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
+ if (commit_idx >= 0 && commit_idx < dreq->ds_cinfo.nbuckets)
+ verfp = &dreq->ds_cinfo.buckets[commit_idx].direct_verf;
else
WARN_ON_ONCE(1);
}
@@ -148,8 +148,7 @@ static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
{
struct nfs_writeverf *verfp;
- verfp = nfs_direct_select_verf(dreq, hdr->ds_clp,
- hdr->ds_idx);
+ verfp = nfs_direct_select_verf(dreq, hdr->ds_clp, hdr->ds_commit_idx);
WARN_ON_ONCE(verfp->committed >= 0);
memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
WARN_ON_ONCE(verfp->committed < 0);
@@ -169,8 +168,7 @@ static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
{
struct nfs_writeverf *verfp;
- verfp = nfs_direct_select_verf(dreq, hdr->ds_clp,
- hdr->ds_idx);
+ verfp = nfs_direct_select_verf(dreq, hdr->ds_clp, hdr->ds_commit_idx);
if (verfp->committed < 0) {
nfs_direct_set_hdr_verf(dreq, hdr);
return 0;
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 2af32fc..520cbc5 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -492,7 +492,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
/* No multipath support. Use first DS */
atomic_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- hdr->ds_idx = idx;
+ hdr->ds_commit_idx = idx;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
hdr->args.fh = fh;
@@ -536,7 +536,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->pgio_done_cb = filelayout_write_done_cb;
atomic_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- hdr->ds_idx = idx;
+ hdr->ds_commit_idx = idx;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
hdr->args.fh = fh;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 4fd7793..5bc99f0 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1328,7 +1328,7 @@ struct nfs_pgio_header {
__u64 mds_offset; /* Filelayout dense stripe */
struct nfs_page_array page_array;
struct nfs_client *ds_clp; /* pNFS data server */
- int ds_idx; /* ds index if ds_clp is set */
+ int ds_commit_idx; /* ds index if ds_clp is set */
};
struct nfs_mds_commit_info {
--
1.9.3
From: Weston Andros Adamson <[email protected]>
Pass ds_commit_idx through the nfs commit path. It's used to select
the commit bucket when using pnfs and is ignored when not using pnfs.
Several functions had to be changed: nfs_retry_commit,
nfs_mark_request_commit, pnfs_mark_request_commit and the pnfs layout
driver .mark_request_commit functions.
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/direct.c | 5 +++--
fs/nfs/filelayout/filelayout.c | 3 ++-
fs/nfs/internal.h | 6 ++++--
fs/nfs/pnfs.h | 9 +++++----
fs/nfs/pnfs_nfs.c | 4 ++--
fs/nfs/write.c | 14 ++++++++------
6 files changed, 24 insertions(+), 17 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index d7c2d43..1ee41d7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -649,7 +649,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
nfs_list_remove_request(req);
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) {
/* Note the rewrite will go through mds */
- nfs_mark_request_commit(req, NULL, &cinfo);
+ nfs_mark_request_commit(req, NULL, &cinfo, 0);
} else
nfs_release_request(req);
nfs_unlock_and_release_request(req);
@@ -748,7 +748,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
nfs_list_remove_request(req);
if (request_commit) {
kref_get(&req->wb_kref);
- nfs_mark_request_commit(req, hdr->lseg, &cinfo);
+ nfs_mark_request_commit(req, hdr->lseg, &cinfo,
+ hdr->ds_commit_idx);
}
nfs_unlock_and_release_request(req);
}
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 520cbc5..3c97694 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -954,7 +954,8 @@ static u32 select_bucket_index(struct nfs4_filelayout_segment *fl, u32 j)
static void
filelayout_mark_request_commit(struct nfs_page *req,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx)
{
struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e9305e9..05f9a87 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -450,13 +450,15 @@ int nfs_scan_commit(struct inode *inode, struct list_head *dst,
struct nfs_commit_info *cinfo);
void nfs_mark_request_commit(struct nfs_page *req,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo);
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx);
int nfs_write_need_commit(struct nfs_pgio_header *);
int nfs_generic_commit_list(struct inode *inode, struct list_head *head,
int how, struct nfs_commit_info *cinfo);
void nfs_retry_commit(struct list_head *page_list,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo);
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx);
void nfs_commitdata_release(struct nfs_commit_data *data);
void nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
struct nfs_commit_info *cinfo);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index c146ed8..3d4a83b 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -137,7 +137,8 @@ struct pnfs_layoutdriver_type {
struct pnfs_ds_commit_info *(*get_ds_info) (struct inode *inode);
void (*mark_request_commit) (struct nfs_page *req,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo);
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx);
void (*clear_request_commit) (struct nfs_page *req,
struct nfs_commit_info *cinfo);
int (*scan_commit_lists) (struct nfs_commit_info *cinfo,
@@ -388,14 +389,14 @@ pnfs_generic_mark_devid_invalid(struct nfs4_deviceid_node *node)
static inline bool
pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo, u32 ds_commit_idx)
{
struct inode *inode = req->wb_context->dentry->d_inode;
struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
if (lseg == NULL || ld->mark_request_commit == NULL)
return false;
- ld->mark_request_commit(req, lseg, cinfo);
+ ld->mark_request_commit(req, lseg, cinfo, ds_commit_idx);
return true;
}
@@ -573,7 +574,7 @@ pnfs_get_ds_info(struct inode *inode)
static inline bool
pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo, u32 ds_commit_idx)
{
return false;
}
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 9822e86..6c37e02 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -187,7 +187,7 @@ static void pnfs_generic_retry_commit(struct nfs_commit_info *cinfo, int idx)
bucket = &fl_cinfo->buckets[i];
if (list_empty(&bucket->committing))
continue;
- nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
+ nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo, i);
spin_lock(cinfo->lock);
freeme = bucket->clseg;
bucket->clseg = NULL;
@@ -246,7 +246,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
list_add(&data->pages, &list);
nreq++;
} else {
- nfs_retry_commit(mds_pages, NULL, cinfo);
+ nfs_retry_commit(mds_pages, NULL, cinfo, 0);
pnfs_generic_retry_commit(cinfo, 0);
cinfo->completion_ops->error_cleanup(NFS_I(inode));
return -ENOMEM;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e997457..2bee165 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -847,9 +847,9 @@ EXPORT_SYMBOL_GPL(nfs_init_cinfo);
*/
void
nfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo, u32 ds_commit_idx)
{
- if (pnfs_mark_request_commit(req, lseg, cinfo))
+ if (pnfs_mark_request_commit(req, lseg, cinfo, ds_commit_idx))
return;
nfs_request_add_commit_list(req, &cinfo->mds->list, cinfo);
}
@@ -905,7 +905,8 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
}
if (nfs_write_need_commit(hdr)) {
memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
- nfs_mark_request_commit(req, hdr->lseg, &cinfo);
+ nfs_mark_request_commit(req, hdr->lseg, &cinfo,
+ 0);
goto next;
}
remove_req:
@@ -1560,14 +1561,15 @@ EXPORT_SYMBOL_GPL(nfs_init_commit);
void nfs_retry_commit(struct list_head *page_list,
struct pnfs_layout_segment *lseg,
- struct nfs_commit_info *cinfo)
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx)
{
struct nfs_page *req;
while (!list_empty(page_list)) {
req = nfs_list_entry(page_list->next);
nfs_list_remove_request(req);
- nfs_mark_request_commit(req, lseg, cinfo);
+ nfs_mark_request_commit(req, lseg, cinfo, ds_commit_idx);
if (!cinfo->dreq) {
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
dec_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
@@ -1598,7 +1600,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how, 0);
out_bad:
- nfs_retry_commit(head, NULL, cinfo);
+ nfs_retry_commit(head, NULL, cinfo, 0);
cinfo->completion_ops->error_cleanup(NFS_I(inode));
return -ENOMEM;
}
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.
The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 17 ++-
fs/nfs/internal.h | 1 +
fs/nfs/objlayout/objio_osd.c | 3 +-
fs/nfs/pagelist.c | 270 +++++++++++++++++++++++++++++++++++--------
fs/nfs/pnfs.c | 26 +++--
fs/nfs/read.c | 30 ++++-
fs/nfs/write.c | 10 +-
include/linux/nfs_page.h | 20 +++-
include/linux/nfs_xdr.h | 1 +
9 files changed, 311 insertions(+), 67 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1ee41d7..0178d4f 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -360,8 +360,14 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
spin_lock(&dreq->lock);
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
dreq->error = hdr->error;
- else
- dreq->count += hdr->good_bytes;
+ else {
+ /*
+ * FIXME: right now this only accounts for bytes written
+ * to the first mirror
+ */
+ if (hdr->pgio_mirror_idx == 0)
+ dreq->count += hdr->good_bytes;
+ }
spin_unlock(&dreq->lock);
while (!list_empty(&hdr->pages)) {
@@ -724,7 +730,12 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
dreq->error = hdr->error;
}
if (dreq->error == 0) {
- dreq->count += hdr->good_bytes;
+ /*
+ * FIXME: right now this only accounts for bytes written
+ * to the first mirror
+ */
+ if (hdr->pgio_mirror_idx == 0)
+ dreq->count += hdr->good_bytes;
if (nfs_write_need_commit(hdr)) {
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
request_commit = true;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 05f9a87..ef1c703 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -469,6 +469,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
struct nfs_direct_req *dreq);
int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
+void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index d007780..9a5f2ee 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -537,11 +537,12 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req)
{
+ struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
unsigned int size;
size = pnfs_generic_pg_test(pgio, prev, req);
- if (!size || pgio->pg_count + req->wb_bytes >
+ if (!size || mirror->pg_count + req->wb_bytes >
(unsigned long)pgio->pg_layout_private)
return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 1c03187..eec12b7 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -46,17 +46,22 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr,
void (*release)(struct nfs_pgio_header *hdr))
{
- hdr->req = nfs_list_entry(desc->pg_list.next);
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
+ hdr->req = nfs_list_entry(mirror->pg_list.next);
hdr->inode = desc->pg_inode;
hdr->cred = hdr->req->wb_context->cred;
hdr->io_start = req_offset(hdr->req);
- hdr->good_bytes = desc->pg_count;
+ hdr->good_bytes = mirror->pg_count;
hdr->dreq = desc->pg_dreq;
hdr->layout_private = desc->pg_layout_private;
hdr->release = release;
hdr->completion_ops = desc->pg_completion_ops;
if (hdr->completion_ops->init_hdr)
hdr->completion_ops->init_hdr(hdr);
+
+ hdr->pgio_mirror_idx = desc->pg_mirror_idx;
}
EXPORT_SYMBOL_GPL(nfs_pgheader_init);
@@ -480,7 +485,10 @@ nfs_wait_on_request(struct nfs_page *req)
size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev, struct nfs_page *req)
{
- if (desc->pg_count > desc->pg_bsize) {
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
+ if (mirror->pg_count > mirror->pg_bsize) {
/* should never happen */
WARN_ON_ONCE(1);
return 0;
@@ -490,11 +498,11 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
* Limit the request size so that we can still allocate a page array
* for it without upsetting the slab allocator.
*/
- if (((desc->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
+ if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
sizeof(struct page) > PAGE_SIZE)
return 0;
- return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
+ return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
}
EXPORT_SYMBOL_GPL(nfs_generic_pg_test);
@@ -651,10 +659,18 @@ EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror;
+ u32 midx;
+
set_bit(NFS_IOHDR_REDO, &hdr->flags);
nfs_pgio_data_destroy(hdr);
hdr->completion_ops->completion(hdr);
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ /* TODO: Make sure it's right to clean up all mirrors here
+ * and not just hdr->pgio_mirror_idx */
+ for (midx = 0; midx < desc->pg_mirror_count; midx++) {
+ mirror = &desc->pg_mirrors[midx];
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
+ }
return -ENOMEM;
}
@@ -671,6 +687,17 @@ static void nfs_pgio_release(void *calldata)
hdr->completion_ops->completion(hdr);
}
+static void nfs_pageio_mirror_init(struct nfs_pgio_mirror *mirror,
+ unsigned int bsize)
+{
+ INIT_LIST_HEAD(&mirror->pg_list);
+ mirror->pg_bytes_written = 0;
+ mirror->pg_count = 0;
+ mirror->pg_bsize = bsize;
+ mirror->pg_base = 0;
+ mirror->pg_recoalesce = 0;
+}
+
/**
* nfs_pageio_init - initialise a page io descriptor
* @desc: pointer to descriptor
@@ -687,13 +714,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
size_t bsize,
int io_flags)
{
- INIT_LIST_HEAD(&desc->pg_list);
- desc->pg_bytes_written = 0;
- desc->pg_count = 0;
- desc->pg_bsize = bsize;
- desc->pg_base = 0;
+ struct nfs_pgio_mirror *new;
+ int i;
+
desc->pg_moreio = 0;
- desc->pg_recoalesce = 0;
desc->pg_inode = inode;
desc->pg_ops = pg_ops;
desc->pg_completion_ops = compl_ops;
@@ -703,6 +727,26 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
desc->pg_lseg = NULL;
desc->pg_dreq = NULL;
desc->pg_layout_private = NULL;
+ desc->pg_bsize = bsize;
+
+ desc->pg_mirror_count = 1;
+ desc->pg_mirror_idx = 0;
+
+ if (pg_ops->pg_get_mirror_count) {
+ /* until we have a request, we don't have an lseg and no
+ * idea how many mirrors there will be */
+ new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
+ sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
+ desc->pg_mirrors_dynamic = new;
+ desc->pg_mirrors = new;
+
+ for (i = 0; i < NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX; i++)
+ nfs_pageio_mirror_init(&desc->pg_mirrors[i], bsize);
+ } else {
+ desc->pg_mirrors_dynamic = NULL;
+ desc->pg_mirrors = desc->pg_mirrors_static;
+ nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
+ }
}
EXPORT_SYMBOL_GPL(nfs_pageio_init);
@@ -738,14 +782,16 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_page *req;
struct page **pages,
*last_page;
- struct list_head *head = &desc->pg_list;
+ struct list_head *head = &mirror->pg_list;
struct nfs_commit_info cinfo;
unsigned int pagecount, pageused;
- pagecount = nfs_page_array_len(desc->pg_base, desc->pg_count);
+ pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
if (!nfs_pgarray_set(&hdr->page_array, pagecount))
return nfs_pgio_error(desc, hdr);
@@ -773,7 +819,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
desc->pg_ioflags &= ~FLUSH_COND_STABLE;
/* Set up the argument struct */
- nfs_pgio_rpcsetup(hdr, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
+ nfs_pgio_rpcsetup(hdr, mirror->pg_count, 0, desc->pg_ioflags, &cinfo);
desc->pg_rpc_callops = &nfs_pgio_common_ops;
return 0;
}
@@ -781,12 +827,17 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror;
struct nfs_pgio_header *hdr;
int ret;
+ mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ /* TODO: make sure this is right with mirroring - or
+ * should it back out all mirrors? */
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
@@ -801,6 +852,49 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
return ret;
}
+/*
+ * nfs_pageio_setup_mirroring - determine if mirroring is to be used
+ * by calling the pg_get_mirror_count op
+ */
+static int nfs_pageio_setup_mirroring(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ int mirror_count = 1;
+
+ if (!pgio->pg_ops->pg_get_mirror_count)
+ return 0;
+
+ mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
+
+ if (!mirror_count || mirror_count > NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX)
+ return -EINVAL;
+
+ if (WARN_ON_ONCE(!pgio->pg_mirrors_dynamic))
+ return -EINVAL;
+
+ pgio->pg_mirror_count = mirror_count;
+
+ return 0;
+}
+
+/*
+ * nfs_pageio_stop_mirroring - stop using mirroring (set mirror count to 1)
+ */
+void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio)
+{
+ pgio->pg_mirror_count = 1;
+ pgio->pg_mirror_idx = 0;
+}
+
+static void nfs_pageio_cleanup_mirroring(struct nfs_pageio_descriptor *pgio)
+{
+ pgio->pg_mirror_count = 1;
+ pgio->pg_mirror_idx = 0;
+ pgio->pg_mirrors = pgio->pg_mirrors_static;
+ kfree(pgio->pg_mirrors_dynamic);
+ pgio->pg_mirrors_dynamic = NULL;
+}
+
static bool nfs_match_open_context(const struct nfs_open_context *ctx1,
const struct nfs_open_context *ctx2)
{
@@ -867,19 +961,22 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_page *prev = NULL;
- if (desc->pg_count != 0) {
- prev = nfs_list_entry(desc->pg_list.prev);
+
+ if (mirror->pg_count != 0) {
+ prev = nfs_list_entry(mirror->pg_list.prev);
} else {
if (desc->pg_ops->pg_init)
desc->pg_ops->pg_init(desc, req);
- desc->pg_base = req->wb_pgbase;
+ mirror->pg_base = req->wb_pgbase;
}
if (!nfs_can_coalesce_requests(prev, req, desc))
return 0;
nfs_list_remove_request(req);
- nfs_list_add_request(req, &desc->pg_list);
- desc->pg_count += req->wb_bytes;
+ nfs_list_add_request(req, &mirror->pg_list);
+ mirror->pg_count += req->wb_bytes;
return 1;
}
@@ -888,16 +985,19 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
*/
static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
{
- if (!list_empty(&desc->pg_list)) {
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
+ if (!list_empty(&mirror->pg_list)) {
int error = desc->pg_ops->pg_doio(desc);
if (error < 0)
desc->pg_error = error;
else
- desc->pg_bytes_written += desc->pg_count;
+ mirror->pg_bytes_written += mirror->pg_count;
}
- if (list_empty(&desc->pg_list)) {
- desc->pg_count = 0;
- desc->pg_base = 0;
+ if (list_empty(&mirror->pg_list)) {
+ mirror->pg_count = 0;
+ mirror->pg_base = 0;
}
}
@@ -915,10 +1015,14 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_page *subreq;
unsigned int bytes_left = 0;
unsigned int offset, pgbase;
+ WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
+
nfs_page_group_lock(req, false);
subreq = req;
@@ -938,7 +1042,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
nfs_pageio_doio(desc);
if (desc->pg_error < 0)
return 0;
- if (desc->pg_recoalesce)
+ if (mirror->pg_recoalesce)
return 0;
/* retry add_request for this subreq */
nfs_page_group_lock(req, false);
@@ -976,14 +1080,16 @@ err_ptr:
static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
LIST_HEAD(head);
do {
- list_splice_init(&desc->pg_list, &head);
- desc->pg_bytes_written -= desc->pg_count;
- desc->pg_count = 0;
- desc->pg_base = 0;
- desc->pg_recoalesce = 0;
+ list_splice_init(&mirror->pg_list, &head);
+ mirror->pg_bytes_written -= mirror->pg_count;
+ mirror->pg_count = 0;
+ mirror->pg_base = 0;
+ mirror->pg_recoalesce = 0;
+
desc->pg_moreio = 0;
while (!list_empty(&head)) {
@@ -997,11 +1103,11 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
return 0;
break;
}
- } while (desc->pg_recoalesce);
+ } while (mirror->pg_recoalesce);
return 1;
}
-int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
+static int nfs_pageio_add_request_mirror(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
int ret;
@@ -1014,9 +1120,78 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
break;
ret = nfs_do_recoalesce(desc);
} while (ret);
+
return ret;
}
+int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
+ struct nfs_page *req)
+{
+ u32 midx;
+ unsigned int pgbase, offset, bytes;
+ struct nfs_page *dupreq, *lastreq;
+
+ pgbase = req->wb_pgbase;
+ offset = req->wb_offset;
+ bytes = req->wb_bytes;
+
+ nfs_pageio_setup_mirroring(desc, req);
+
+ for (midx = 0; midx < desc->pg_mirror_count; midx++) {
+ if (midx) {
+ nfs_page_group_lock(req, false);
+
+ /* find the last request */
+ for (lastreq = req->wb_head;
+ lastreq->wb_this_page != req->wb_head;
+ lastreq = lastreq->wb_this_page)
+ ;
+
+ dupreq = nfs_create_request(req->wb_context,
+ req->wb_page, lastreq, pgbase, bytes);
+
+ if (IS_ERR(dupreq)) {
+ nfs_page_group_unlock(req);
+ return 0;
+ }
+
+ nfs_lock_request(dupreq);
+ nfs_page_group_unlock(req);
+ dupreq->wb_offset = offset;
+ dupreq->wb_index = req->wb_index;
+ } else
+ dupreq = req;
+
+ desc->pg_mirror_idx = midx;
+ if (!nfs_pageio_add_request_mirror(desc, dupreq))
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * nfs_pageio_complete_mirror - Complete I/O on the current mirror of an
+ * nfs_pageio_descriptor
+ * @desc: pointer to io descriptor
+ */
+static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
+ u32 mirror_idx)
+{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
+ u32 restore_idx = desc->pg_mirror_idx;
+
+ desc->pg_mirror_idx = mirror_idx;
+ for (;;) {
+ nfs_pageio_doio(desc);
+ if (!mirror->pg_recoalesce)
+ break;
+ if (!nfs_do_recoalesce(desc))
+ break;
+ }
+ desc->pg_mirror_idx = restore_idx;
+}
+
/*
* nfs_pageio_resend - Transfer requests to new descriptor and resend
* @hdr - the pgio header to move request from
@@ -1055,16 +1230,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_resend);
*/
void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
{
- for (;;) {
- nfs_pageio_doio(desc);
- if (!desc->pg_recoalesce)
- break;
- if (!nfs_do_recoalesce(desc))
- break;
- }
+ u32 midx;
+
+ for (midx = 0; midx < desc->pg_mirror_count; midx++)
+ nfs_pageio_complete_mirror(desc, midx);
if (desc->pg_ops->pg_cleanup)
desc->pg_ops->pg_cleanup(desc);
+ nfs_pageio_cleanup_mirroring(desc);
}
/**
@@ -1080,10 +1253,17 @@ void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
*/
void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
{
- if (!list_empty(&desc->pg_list)) {
- struct nfs_page *prev = nfs_list_entry(desc->pg_list.prev);
- if (index != prev->wb_index + 1)
- nfs_pageio_complete(desc);
+ struct nfs_pgio_mirror *mirror;
+ struct nfs_page *prev;
+ u32 midx;
+
+ for (midx = 0; midx < desc->pg_mirror_count; midx++) {
+ mirror = &desc->pg_mirrors[midx];
+ if (!list_empty(&mirror->pg_list)) {
+ prev = nfs_list_entry(mirror->pg_list.prev);
+ if (index != prev->wb_index + 1)
+ nfs_pageio_complete_mirror(desc, midx);
+ }
}
}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2da2e77..5f7c422 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1646,8 +1646,8 @@ EXPORT_SYMBOL_GPL(pnfs_generic_pg_cleanup);
* of bytes (maximum @req->wb_bytes) that can be coalesced.
*/
size_t
-pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
- struct nfs_page *req)
+pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *prev, struct nfs_page *req)
{
unsigned int size;
u64 seg_end, req_start, seg_left;
@@ -1729,10 +1729,12 @@ static void
pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
- list_splice_tail_init(&hdr->pages, &desc->pg_list);
+ list_splice_tail_init(&hdr->pages, &mirror->pg_list);
nfs_pageio_reset_write_mds(desc);
- desc->pg_recoalesce = 1;
+ mirror->pg_recoalesce = 1;
}
nfs_pgio_data_destroy(hdr);
}
@@ -1781,12 +1783,14 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
int
pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_pgio_header *hdr;
int ret;
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_writehdr_free);
@@ -1795,6 +1799,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (!ret)
pnfs_do_write(desc, hdr, desc->pg_ioflags);
+
return ret;
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
@@ -1839,10 +1844,13 @@ static void
pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
+
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
- list_splice_tail_init(&hdr->pages, &desc->pg_list);
+ list_splice_tail_init(&hdr->pages, &mirror->pg_list);
nfs_pageio_reset_read_mds(desc);
- desc->pg_recoalesce = 1;
+ mirror->pg_recoalesce = 1;
}
nfs_pgio_data_destroy(hdr);
}
@@ -1893,12 +1901,14 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
int
pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
{
+ struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+
struct nfs_pgio_header *hdr;
int ret;
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
- desc->pg_completion_ops->error_cleanup(&desc->pg_list);
+ desc->pg_completion_ops->error_cleanup(&mirror->pg_list);
return -ENOMEM;
}
nfs_pgheader_init(desc, hdr, pnfs_readhdr_free);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 092ab49..568ecf0 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -70,8 +70,15 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_read);
void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
{
+ struct nfs_pgio_mirror *mirror;
+
pgio->pg_ops = &nfs_pgio_rw_ops;
- pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
+
+ /* read path should never have more than one mirror */
+ WARN_ON_ONCE(pgio->pg_mirror_count != 1);
+
+ mirror = &pgio->pg_mirrors[0];
+ mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->rsize;
}
EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
@@ -81,6 +88,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
struct nfs_page *new;
unsigned int len;
struct nfs_pageio_descriptor pgio;
+ struct nfs_pgio_mirror *pgm;
len = nfs_page_length(page);
if (len == 0)
@@ -97,7 +105,13 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
&nfs_async_read_completion_ops);
nfs_pageio_add_request(&pgio, new);
nfs_pageio_complete(&pgio);
- NFS_I(inode)->read_io += pgio.pg_bytes_written;
+
+ /* It doesn't make sense to do mirrored reads! */
+ WARN_ON_ONCE(pgio.pg_mirror_count != 1);
+
+ pgm = &pgio.pg_mirrors[0];
+ NFS_I(inode)->read_io += pgm->pg_bytes_written;
+
return 0;
}
@@ -352,6 +366,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages)
{
struct nfs_pageio_descriptor pgio;
+ struct nfs_pgio_mirror *pgm;
struct nfs_readdesc desc = {
.pgio = &pgio,
};
@@ -387,10 +402,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
&nfs_async_read_completion_ops);
ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
-
nfs_pageio_complete(&pgio);
- NFS_I(inode)->read_io += pgio.pg_bytes_written;
- npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+
+ /* It doesn't make sense to do mirrored reads! */
+ WARN_ON_ONCE(pgio.pg_mirror_count != 1);
+
+ pgm = &pgio.pg_mirrors[0];
+ NFS_I(inode)->read_io += pgm->pg_bytes_written;
+ npages = (pgm->pg_bytes_written + PAGE_CACHE_SIZE - 1) >>
+ PAGE_CACHE_SHIFT;
nfs_add_stats(inode, NFSIOS_READPAGES, npages);
read_complete:
put_nfs_open_context(desc.ctx);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2bee165..ceacfee 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -906,7 +906,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
if (nfs_write_need_commit(hdr)) {
memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
nfs_mark_request_commit(req, hdr->lseg, &cinfo,
- 0);
+ hdr->pgio_mirror_idx);
goto next;
}
remove_req:
@@ -1304,8 +1304,14 @@ EXPORT_SYMBOL_GPL(nfs_pageio_init_write);
void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
{
+ struct nfs_pgio_mirror *mirror;
+
pgio->pg_ops = &nfs_pgio_rw_ops;
- pgio->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
+
+ nfs_pageio_stop_mirroring(pgio);
+
+ mirror = &pgio->pg_mirrors[0];
+ mirror->pg_bsize = NFS_SERVER(pgio->pg_inode)->wsize;
}
EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 479c566..3eb072d 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -58,6 +58,8 @@ struct nfs_pageio_ops {
size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
struct nfs_page *);
int (*pg_doio)(struct nfs_pageio_descriptor *);
+ unsigned int (*pg_get_mirror_count)(struct nfs_pageio_descriptor *,
+ struct nfs_page *);
void (*pg_cleanup)(struct nfs_pageio_descriptor *);
};
@@ -74,15 +76,17 @@ struct nfs_rw_ops {
struct rpc_task_setup *, int);
};
-struct nfs_pageio_descriptor {
+struct nfs_pgio_mirror {
struct list_head pg_list;
unsigned long pg_bytes_written;
size_t pg_count;
size_t pg_bsize;
unsigned int pg_base;
- unsigned char pg_moreio : 1,
- pg_recoalesce : 1;
+ unsigned char pg_recoalesce : 1;
+};
+struct nfs_pageio_descriptor {
+ unsigned char pg_moreio : 1;
struct inode *pg_inode;
const struct nfs_pageio_ops *pg_ops;
const struct nfs_rw_ops *pg_rw_ops;
@@ -93,8 +97,18 @@ struct nfs_pageio_descriptor {
struct pnfs_layout_segment *pg_lseg;
struct nfs_direct_req *pg_dreq;
void *pg_layout_private;
+ unsigned int pg_bsize; /* default bsize for mirrors */
+
+ u32 pg_mirror_count;
+ struct nfs_pgio_mirror *pg_mirrors;
+ struct nfs_pgio_mirror pg_mirrors_static[1];
+ struct nfs_pgio_mirror *pg_mirrors_dynamic;
+ u32 pg_mirror_idx; /* current mirror */
};
+/* arbitrarily selected limit to number of mirrors */
+#define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
+
#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 5bc99f0..6400a1e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1329,6 +1329,7 @@ struct nfs_pgio_header {
struct nfs_page_array page_array;
struct nfs_client *ds_clp; /* pNFS data server */
int ds_commit_idx; /* ds index if ds_clp is set */
+ int pgio_mirror_idx;/* mirror index in pgio layer */
};
struct nfs_mds_commit_info {
--
1.9.3
From: Weston Andros Adamson <[email protected]>
The current mirroring code only notices short writes to the first
mirror. This patch keeps per-mirror byte counts and only considers
a byte to be written once all mirrors report so.
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 71 +++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 57 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 0178d4f..651387b 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -66,6 +66,10 @@ static struct kmem_cache *nfs_direct_cachep;
/*
* This represents a set of asynchronous requests that we're waiting on
*/
+struct nfs_direct_mirror {
+ ssize_t count;
+};
+
struct nfs_direct_req {
struct kref kref; /* release manager */
@@ -78,6 +82,10 @@ struct nfs_direct_req {
/* completion state */
atomic_t io_count; /* i/os we're waiting for */
spinlock_t lock; /* protect completion state */
+
+ struct nfs_direct_mirror mirrors[NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX];
+ int mirror_count;
+
ssize_t count, /* bytes actually processed */
bytes_left, /* bytes left to be sent */
error; /* any reported error */
@@ -108,6 +116,29 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
return atomic_dec_and_test(&dreq->io_count);
}
+static void
+nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr)
+{
+ int i;
+ ssize_t count;
+
+ WARN_ON_ONCE(hdr->pgio_mirror_idx >= dreq->mirror_count);
+
+ dreq->mirrors[hdr->pgio_mirror_idx].count += hdr->good_bytes;
+
+ if (hdr->pgio_mirror_idx == 0)
+ dreq->count += hdr->good_bytes;
+
+ /* update the dreq->count by finding the minimum agreed count from all
+ * mirrors */
+ count = dreq->mirrors[0].count;
+
+ for (i = 1; i < dreq->mirror_count; i++)
+ count = min(count, dreq->mirrors[i].count);
+
+ dreq->count = count;
+}
+
/*
* nfs_direct_select_verf - select the right verifier
* @dreq - direct request possibly spanning multiple servers
@@ -241,6 +272,18 @@ void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
cinfo->completion_ops = &nfs_direct_commit_completion_ops;
}
+static inline void nfs_direct_setup_mirroring(struct nfs_direct_req *dreq,
+ struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ int mirror_count = 1;
+
+ if (pgio->pg_ops->pg_get_mirror_count)
+ mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req);
+
+ dreq->mirror_count = mirror_count;
+}
+
static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
{
struct nfs_direct_req *dreq;
@@ -255,6 +298,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
INIT_LIST_HEAD(&dreq->mds_cinfo.list);
dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
+ dreq->mirror_count = 1;
spin_lock_init(&dreq->lock);
return dreq;
@@ -360,14 +404,9 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
spin_lock(&dreq->lock);
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && (hdr->good_bytes == 0))
dreq->error = hdr->error;
- else {
- /*
- * FIXME: right now this only accounts for bytes written
- * to the first mirror
- */
- if (hdr->pgio_mirror_idx == 0)
- dreq->count += hdr->good_bytes;
- }
+ else
+ nfs_direct_good_bytes(dreq, hdr);
+
spin_unlock(&dreq->lock);
while (!list_empty(&hdr->pages)) {
@@ -598,17 +637,23 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
LIST_HEAD(reqs);
struct nfs_commit_info cinfo;
LIST_HEAD(failed);
+ int i;
nfs_init_cinfo_from_dreq(&cinfo, dreq);
nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo);
dreq->count = 0;
+ for (i = 0; i < dreq->mirror_count; i++)
+ dreq->mirrors[i].count = 0;
get_dreq(dreq);
nfs_pageio_init_write(&desc, dreq->inode, FLUSH_STABLE, false,
&nfs_direct_write_completion_ops);
desc.pg_dreq = dreq;
+ req = nfs_list_entry(reqs.next);
+ nfs_direct_setup_mirroring(dreq, &desc, req);
+
list_for_each_entry_safe(req, tmp, &reqs, wb_list) {
if (!nfs_pageio_add_request(&desc, req)) {
nfs_list_remove_request(req);
@@ -730,12 +775,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
dreq->error = hdr->error;
}
if (dreq->error == 0) {
- /*
- * FIXME: right now this only accounts for bytes written
- * to the first mirror
- */
- if (hdr->pgio_mirror_idx == 0)
- dreq->count += hdr->good_bytes;
+ nfs_direct_good_bytes(dreq, hdr);
if (nfs_write_need_commit(hdr)) {
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
request_commit = true;
@@ -841,6 +881,9 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
result = PTR_ERR(req);
break;
}
+
+ nfs_direct_setup_mirroring(dreq, &desc, req);
+
nfs_lock_request(req);
req->wb_index = pos >> PAGE_SHIFT;
req->wb_offset = pos & ~PAGE_MASK;
--
1.9.3
From: Weston Andros Adamson <[email protected]>
This skips the WARN_ON_ONCE, but doesnt change behavior (the memcmp would
fail).
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/direct.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 651387b..eb81478 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -222,7 +222,11 @@ static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
verfp = nfs_direct_select_verf(dreq, data->ds_clp,
data->ds_commit_index);
- WARN_ON_ONCE(verfp->committed < 0);
+
+ /* verifier not set so always fail */
+ if (verfp->committed < 0)
+ return 1;
+
return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
}
--
1.9.3
From: Peng Tao <[email protected]>
So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/pnfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5f7c422..e123cfc 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -242,6 +242,8 @@ pnfs_put_layout_hdr(struct pnfs_layout_hdr *lo)
struct inode *inode = lo->plh_inode;
if (atomic_dec_and_lock(&lo->plh_refcount, &inode->i_lock)) {
+ if (!list_empty(&lo->plh_segs))
+ WARN_ONCE(1, "NFS: BUG unfreed layout segments.\n");
pnfs_detach_layout_hdr(lo);
spin_unlock(&inode->i_lock);
pnfs_free_layout_hdr(lo);
--
1.9.3
From: Peng Tao <[email protected]>
so that we don't reset desc->pg_mirror_idx for read unnecessarily.
Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to
set pg_mirror_idx for read where pg_mirror_count is always 1.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/internal.h | 7 +++++++
fs/nfs/pagelist.c | 8 ++++----
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ef1c703..5be06bc 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -6,6 +6,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/crc32.h>
+#include <linux/nfs_page.h>
#define NFS_MS_MASK (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_SYNCHRONOUS)
@@ -261,6 +262,12 @@ static inline void nfs_iocounter_init(struct nfs_io_counter *c)
atomic_set(&c->io_count, 0);
}
+static inline bool nfs_pgio_has_mirroring(struct nfs_pageio_descriptor *desc)
+{
+ WARN_ON_ONCE(desc->pg_mirror_count < 1);
+ return desc->pg_mirror_count > 1;
+}
+
/* nfs2xdr.c */
extern struct rpc_procinfo nfs_procedures[];
extern int nfs2_decode_dirent(struct xdr_stream *,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index eec12b7..f9d8c46 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1021,8 +1021,6 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
unsigned int bytes_left = 0;
unsigned int offset, pgbase;
- WARN_ON_ONCE(desc->pg_mirror_idx >= desc->pg_mirror_count);
-
nfs_page_group_lock(req, false);
subreq = req;
@@ -1162,7 +1160,8 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
} else
dupreq = req;
- desc->pg_mirror_idx = midx;
+ if (nfs_pgio_has_mirroring(desc))
+ desc->pg_mirror_idx = midx;
if (!nfs_pageio_add_request_mirror(desc, dupreq))
return 0;
}
@@ -1181,7 +1180,8 @@ static void nfs_pageio_complete_mirror(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[mirror_idx];
u32 restore_idx = desc->pg_mirror_idx;
- desc->pg_mirror_idx = mirror_idx;
+ if (nfs_pgio_has_mirroring(desc))
+ desc->pg_mirror_idx = mirror_idx;
for (;;) {
nfs_pageio_doio(desc);
if (!mirror->pg_recoalesce)
--
1.9.3
From: Peng Tao <[email protected]>
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/internal.h | 2 ++
fs/nfs/objlayout/objio_osd.c | 2 +-
fs/nfs/pagelist.c | 25 +++++++++++++++++--------
fs/nfs/pnfs.c | 9 ++++-----
4 files changed, 24 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5be06bc..ffe4b7a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -255,6 +255,8 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
struct rpc_cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
+struct nfs_pgio_mirror *
+nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
static inline void nfs_iocounter_init(struct nfs_io_counter *c)
{
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 9a5f2ee..24e1d74 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -537,7 +537,7 @@ int objio_write_pagelist(struct nfs_pgio_header *hdr, int how)
static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &pgio->pg_mirrors[pgio->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(pgio);
unsigned int size;
size = pnfs_generic_pg_test(pgio, prev, req);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index f9d8c46..960c99f 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -42,11 +42,20 @@ static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
return p->pagevec != NULL;
}
+struct nfs_pgio_mirror *
+nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc)
+{
+ return nfs_pgio_has_mirroring(desc) ?
+ &desc->pg_mirrors[desc->pg_mirror_idx] :
+ &desc->pg_mirrors[0];
+}
+EXPORT_SYMBOL_GPL(nfs_pgio_current_mirror);
+
void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr,
void (*release)(struct nfs_pgio_header *hdr))
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
hdr->req = nfs_list_entry(mirror->pg_list.next);
@@ -485,7 +494,7 @@ nfs_wait_on_request(struct nfs_page *req)
size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev, struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (mirror->pg_count > mirror->pg_bsize) {
@@ -782,7 +791,7 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_page *req;
struct page **pages,
@@ -831,7 +840,7 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
struct nfs_pgio_header *hdr;
int ret;
- mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ mirror = nfs_pgio_current_mirror(desc);
hdr = nfs_pgio_header_alloc(desc->pg_rw_ops);
if (!hdr) {
@@ -961,7 +970,7 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_page *prev = NULL;
@@ -985,7 +994,7 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
*/
static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (!list_empty(&mirror->pg_list)) {
@@ -1015,7 +1024,7 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_page *subreq;
unsigned int bytes_left = 0;
@@ -1078,7 +1087,7 @@ err_ptr:
static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
LIST_HEAD(head);
do {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e123cfc..b822b17 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1731,7 +1731,7 @@ static void
pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
list_splice_tail_init(&hdr->pages, &mirror->pg_list);
@@ -1785,7 +1785,7 @@ EXPORT_SYMBOL_GPL(pnfs_writehdr_free);
int
pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_pgio_header *hdr;
int ret;
@@ -1846,8 +1846,7 @@ static void
pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
-
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
list_splice_tail_init(&hdr->pages, &mirror->pg_list);
@@ -1903,7 +1902,7 @@ EXPORT_SYMBOL_GPL(pnfs_readhdr_free);
int
pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
{
- struct nfs_pgio_mirror *mirror = &desc->pg_mirrors[desc->pg_mirror_idx];
+ struct nfs_pgio_mirror *mirror = nfs_pgio_current_mirror(desc);
struct nfs_pgio_header *hdr;
int ret;
--
1.9.3
From: Peng Tao <[email protected]>
If current IO cannot be completed due to some transient errors,
LD may want to ask generic layer to resend the request through
pnfs again.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 15 ++++++++++++++-
fs/nfs/pnfs.h | 2 ++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b822b17..685af4f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1880,15 +1880,28 @@ pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
return trypnfs;
}
+/* Resend all requests through pnfs. */
+int pnfs_read_resend_pnfs(struct nfs_pgio_header *hdr)
+{
+ struct nfs_pageio_descriptor pgio;
+
+ nfs_pageio_init_read(&pgio, hdr->inode, false, hdr->completion_ops);
+ return nfs_pageio_resend(&pgio, hdr);
+}
+EXPORT_SYMBOL_GPL(pnfs_read_resend_pnfs);
+
static void
pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
{
const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
+ int err = 0;
trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
- if (trypnfs == PNFS_NOT_ATTEMPTED)
+ if (trypnfs == PNFS_TRY_AGAIN)
+ err = pnfs_read_resend_pnfs(hdr);
+ if (trypnfs == PNFS_NOT_ATTEMPTED || err)
pnfs_read_through_mds(desc, hdr);
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 3d4a83b..6d2b544 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -72,6 +72,7 @@ struct pnfs_layout_segment {
enum pnfs_try_status {
PNFS_ATTEMPTED = 0,
PNFS_NOT_ATTEMPTED = 1,
+ PNFS_TRY_AGAIN = 2,
};
#ifdef CONFIG_NFS_V4_1
@@ -268,6 +269,7 @@ int _pnfs_return_layout(struct inode *);
int pnfs_commit_and_return_layout(struct inode *);
void pnfs_ld_write_done(struct nfs_pgio_header *);
void pnfs_ld_read_done(struct nfs_pgio_header *);
+int pnfs_read_resend_pnfs(struct nfs_pgio_header *);
struct pnfs_layout_segment *pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
loff_t pos,
--
1.9.3
From: Peng Tao <[email protected]>
So that callers can specify which range to return.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4xdr.c | 6 +++---
fs/nfs/pnfs.c | 4 +++-
include/linux/nfs_xdr.h | 2 +-
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 3c3ff63..56d4c91 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2012,11 +2012,11 @@ encode_layoutreturn(struct xdr_stream *xdr,
p = reserve_space(xdr, 16);
*p++ = cpu_to_be32(0); /* reclaim. always 0 for now */
*p++ = cpu_to_be32(args->layout_type);
- *p++ = cpu_to_be32(args->iomode);
+ *p++ = cpu_to_be32(args->range.iomode);
*p = cpu_to_be32(RETURN_FILE);
p = reserve_space(xdr, 16);
- p = xdr_encode_hyper(p, 0);
- p = xdr_encode_hyper(p, NFS4_MAX_UINT64);
+ p = xdr_encode_hyper(p, args->range.offset);
+ p = xdr_encode_hyper(p, args->range.length);
spin_lock(&args->inode->i_lock);
encode_nfs4_stateid(xdr, &args->stateid);
spin_unlock(&args->inode->i_lock);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 685af4f..9549b89 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -916,7 +916,9 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
lrp->args.stateid = stateid;
lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
lrp->args.inode = ino;
- lrp->args.iomode = iomode;
+ lrp->args.range.iomode = iomode;
+ lrp->args.range.offset = 0;
+ lrp->args.range.length = NFS4_MAX_UINT64;
lrp->args.layout = lo;
lrp->clp = NFS_SERVER(ino)->nfs_client;
lrp->cred = lo->plh_lc_cred;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 6400a1e..3637923 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -293,7 +293,7 @@ struct nfs4_layoutreturn_args {
struct nfs4_sequence_args seq_args;
struct pnfs_layout_hdr *layout;
struct inode *inode;
- enum pnfs_iomode iomode;
+ struct pnfs_layout_range range;
nfs4_stateid stateid;
__u32 layout_type;
};
--
1.9.3
From: Peng Tao <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 11 +++++++++--
fs/nfs/pnfs.c | 11 ++++++-----
fs/nfs/pnfs.h | 2 +-
3 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ac50af1..c20c5d6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7801,7 +7801,7 @@ static const struct rpc_call_ops nfs4_layoutreturn_call_ops = {
.rpc_release = nfs4_layoutreturn_release,
};
-int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
+int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
{
struct rpc_task *task;
struct rpc_message msg = {
@@ -7815,16 +7815,23 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
.rpc_message = &msg,
.callback_ops = &nfs4_layoutreturn_call_ops,
.callback_data = lrp,
+ .flags = RPC_TASK_ASYNC,
};
- int status;
+ int status = 0;
dprintk("--> %s\n", __func__);
nfs4_init_sequence(&lrp->args.seq_args, &lrp->res.seq_res, 1);
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
+ if (sync == false)
+ goto out;
+ status = nfs4_wait_for_completion_rpc_task(task);
+ if (status != 0)
+ goto out;
status = task->tk_status;
trace_nfs4_layoutreturn(lrp->args.inode, status);
+out:
dprintk("<-- %s status=%d\n", __func__, status);
rpc_put_task(task);
return status;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 9549b89..0a0e209 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -52,7 +52,7 @@ static LIST_HEAD(pnfs_modules_tbl);
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
- enum pnfs_iomode iomode);
+ enum pnfs_iomode iomode, bool sync);
/* Return the registered pnfs layout driver module matching given id */
static struct pnfs_layoutdriver_type *
@@ -392,7 +392,8 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
spin_unlock(&inode->i_lock);
pnfs_free_lseg(lseg);
if (need_return)
- pnfs_send_layoutreturn(lo, stateid, iomode);
+ pnfs_send_layoutreturn(lo, stateid, iomode,
+ true);
else
pnfs_put_layout_hdr(lo);
}
@@ -897,7 +898,7 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
- enum pnfs_iomode iomode)
+ enum pnfs_iomode iomode, bool sync)
{
struct inode *ino = lo->plh_inode;
struct nfs4_layoutreturn *lrp;
@@ -923,7 +924,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
lrp->clp = NFS_SERVER(ino)->nfs_client;
lrp->cred = lo->plh_lc_cred;
- status = nfs4_proc_layoutreturn(lrp);
+ status = nfs4_proc_layoutreturn(lrp, sync);
out:
if (status) {
spin_lock(&ino->i_lock);
@@ -989,7 +990,7 @@ _pnfs_return_layout(struct inode *ino)
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&tmp_list);
- status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY);
+ status = pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, true);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 6d2b544..1da6641 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -219,7 +219,7 @@ extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
struct pnfs_device *dev,
struct rpc_cred *cred);
extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags);
-extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
+extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
/* pnfs.c */
void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
--
1.9.3
From: Peng Tao <[email protected]>
When it is set, generic pnfs would try to send layoutreturn right
before last close/delegation_return regard less NFS_LAYOUT_ROC is
set or not. LD can then make sure layoutreturn is always sent
rather than being omitted.
The difference against NFS_LAYOUT_RETURN is that
NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so
LD can set it and expect generic layer to try pnfs path at the
same time.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/nfs4proc.c | 2 ++
fs/nfs/pnfs.c | 40 +++++++++++++++++++++++++++++++++-------
fs/nfs/pnfs.h | 1 +
3 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c20c5d6..91733bb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7788,6 +7788,8 @@ static void nfs4_layoutreturn_release(void *calldata)
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE, &lo->plh_flags);
+ rpc_wake_up(&NFS_SERVER(lo->plh_inode)->roc_rpcwaitq);
lo->plh_block_lgets--;
spin_unlock(&lo->plh_inode->i_lock);
pnfs_put_layout_hdr(lrp->args.layout);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0a0e209..d3c2ca7 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -909,6 +909,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = -ENOMEM;
spin_lock(&ino->i_lock);
lo->plh_block_lgets--;
+ rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
spin_unlock(&ino->i_lock);
pnfs_put_layout_hdr(lo);
goto out;
@@ -926,11 +927,6 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = nfs4_proc_layoutreturn(lrp, sync);
out:
- if (status) {
- spin_lock(&ino->i_lock);
- clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
- spin_unlock(&ino->i_lock);
- }
dprintk("<-- %s status: %d\n", __func__, status);
return status;
}
@@ -1028,8 +1024,9 @@ bool pnfs_roc(struct inode *ino)
{
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg, *tmp;
+ nfs4_stateid stateid;
LIST_HEAD(tmp_list);
- bool found = false;
+ bool found = false, layoutreturn = false;
spin_lock(&ino->i_lock);
lo = NFS_I(ino)->layout;
@@ -1050,7 +1047,20 @@ bool pnfs_roc(struct inode *ino)
return true;
out_nolayout:
+ if (lo) {
+ stateid = lo->plh_stateid;
+ layoutreturn =
+ test_and_clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &lo->plh_flags);
+ if (layoutreturn) {
+ lo->plh_block_lgets++;
+ pnfs_get_layout_hdr(lo);
+ }
+ }
spin_unlock(&ino->i_lock);
+ if (layoutreturn)
+ pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
+ NFS4_MAX_UINT64, true);
return false;
}
@@ -1085,8 +1095,9 @@ bool pnfs_roc_drain(struct inode *ino, u32 *barrier, struct rpc_task *task)
struct nfs_inode *nfsi = NFS_I(ino);
struct pnfs_layout_hdr *lo;
struct pnfs_layout_segment *lseg;
+ nfs4_stateid stateid;
u32 current_seqid;
- bool found = false;
+ bool found = false, layoutreturn = false;
spin_lock(&ino->i_lock);
list_for_each_entry(lseg, &nfsi->layout->plh_segs, pls_list)
@@ -1103,7 +1114,22 @@ bool pnfs_roc_drain(struct inode *ino, u32 *barrier, struct rpc_task *task)
*/
*barrier = current_seqid + atomic_read(&lo->plh_outstanding);
out:
+ if (!found) {
+ stateid = lo->plh_stateid;
+ layoutreturn =
+ test_and_clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &lo->plh_flags);
+ if (layoutreturn) {
+ lo->plh_block_lgets++;
+ pnfs_get_layout_hdr(lo);
+ }
+ }
spin_unlock(&ino->i_lock);
+ if (layoutreturn) {
+ rpc_sleep_on(&NFS_SERVER(ino)->roc_rpcwaitq, task, NULL);
+ pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
+ NFS4_MAX_UINT64, false);
+ }
return found;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1da6641..ec81767 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -96,6 +96,7 @@ enum {
NFS_LAYOUT_BULK_RECALL, /* bulk recall affecting layout */
NFS_LAYOUT_ROC, /* some lseg had roc bit set */
NFS_LAYOUT_RETURN, /* Return this layout ASAP */
+ NFS_LAYOUT_RETURN_BEFORE_CLOSE, /* Return this layout before close */
NFS_LAYOUT_INVALID_STID, /* layout stateid id is invalid */
NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */
};
--
1.9.3
From: Peng Tao <[email protected]>
Otherwise we'll lose error tracking information when
encoding layoutreturn.
pnfs_put_lseg may be called from rpc callbacks. So we should not
call pnfs_send_layoutreturn directly because it can deadlock in
the rpc layer.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
---
fs/nfs/pnfs.c | 81 +++++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 56 insertions(+), 25 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d3c2ca7..108a619 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -346,8 +346,7 @@ pnfs_layout_remove_lseg(struct pnfs_layout_hdr *lo,
/* Return true if layoutreturn is needed */
static bool
pnfs_layout_need_return(struct pnfs_layout_hdr *lo,
- struct pnfs_layout_segment *lseg,
- nfs4_stateid *stateid, enum pnfs_iomode *iomode)
+ struct pnfs_layout_segment *lseg)
{
struct pnfs_layout_segment *s;
@@ -355,17 +354,54 @@ pnfs_layout_need_return(struct pnfs_layout_hdr *lo,
return false;
list_for_each_entry(s, &lo->plh_segs, pls_list)
- if (test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags))
+ if (s != lseg && test_bit(NFS_LSEG_LAYOUTRETURN, &s->pls_flags))
return false;
- *stateid = lo->plh_stateid;
- *iomode = lo->plh_return_iomode;
- /* decreased in pnfs_send_layoutreturn() */
- lo->plh_block_lgets++;
- lo->plh_return_iomode = 0;
return true;
}
+static void pnfs_layoutreturn_free_lseg(struct work_struct *work)
+{
+ struct pnfs_layout_segment *lseg;
+ struct pnfs_layout_hdr *lo;
+ struct inode *inode;
+
+ lseg = container_of(work, struct pnfs_layout_segment, pls_work);
+ WARN_ON(atomic_read(&lseg->pls_refcount));
+ lo = lseg->pls_layout;
+ inode = lo->plh_inode;
+
+ spin_lock(&inode->i_lock);
+ if (pnfs_layout_need_return(lo, lseg)) {
+ nfs4_stateid stateid;
+ enum pnfs_iomode iomode;
+
+ stateid = lo->plh_stateid;
+ iomode = lo->plh_return_iomode;
+ /* decreased in pnfs_send_layoutreturn() */
+ lo->plh_block_lgets++;
+ lo->plh_return_iomode = 0;
+ spin_unlock(&inode->i_lock);
+
+ pnfs_send_layoutreturn(lo, stateid, iomode, true);
+ spin_lock(&inode->i_lock);
+ } else
+ /* match pnfs_get_layout_hdr #2 in pnfs_put_lseg */
+ pnfs_put_layout_hdr(lo);
+ pnfs_layout_remove_lseg(lo, lseg);
+ spin_unlock(&inode->i_lock);
+ pnfs_free_lseg(lseg);
+ /* match pnfs_get_layout_hdr #1 in pnfs_put_lseg */
+ pnfs_put_layout_hdr(lo);
+}
+
+static void
+pnfs_layoutreturn_free_lseg_async(struct pnfs_layout_segment *lseg)
+{
+ INIT_WORK(&lseg->pls_work, pnfs_layoutreturn_free_lseg);
+ queue_work(nfsiod_workqueue, &lseg->pls_work);
+}
+
void
pnfs_put_lseg(struct pnfs_layout_segment *lseg)
{
@@ -381,21 +417,18 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
lo = lseg->pls_layout;
inode = lo->plh_inode;
if (atomic_dec_and_lock(&lseg->pls_refcount, &inode->i_lock)) {
- bool need_return;
- nfs4_stateid stateid;
- enum pnfs_iomode iomode;
-
pnfs_get_layout_hdr(lo);
- pnfs_layout_remove_lseg(lo, lseg);
- need_return = pnfs_layout_need_return(lo, lseg,
- &stateid, &iomode);
- spin_unlock(&inode->i_lock);
- pnfs_free_lseg(lseg);
- if (need_return)
- pnfs_send_layoutreturn(lo, stateid, iomode,
- true);
- else
+ if (pnfs_layout_need_return(lo, lseg)) {
+ spin_unlock(&inode->i_lock);
+ /* hdr reference dropped in nfs4_layoutreturn_release */
+ pnfs_get_layout_hdr(lo);
+ pnfs_layoutreturn_free_lseg_async(lseg);
+ } else {
+ pnfs_layout_remove_lseg(lo, lseg);
+ spin_unlock(&inode->i_lock);
+ pnfs_free_lseg(lseg);
pnfs_put_layout_hdr(lo);
+ }
}
}
EXPORT_SYMBOL_GPL(pnfs_put_lseg);
@@ -1059,8 +1092,7 @@ out_nolayout:
}
spin_unlock(&ino->i_lock);
if (layoutreturn)
- pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
- NFS4_MAX_UINT64, true);
+ pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, true);
return false;
}
@@ -1127,8 +1159,7 @@ out:
spin_unlock(&ino->i_lock);
if (layoutreturn) {
rpc_sleep_on(&NFS_SERVER(ino)->roc_rpcwaitq, task, NULL);
- pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, 0,
- NFS4_MAX_UINT64, false);
+ pnfs_send_layoutreturn(lo, stateid, IOMODE_ANY, false);
}
return found;
}
--
1.9.3
From: Peng Tao <[email protected]>
Use it to indicate that LD wants to retry layoutget. LD can set
it whenever it wants the common pnfs code to return and retry
pnfs path through a new layout.
The bit gets cleared when client does a new layoutget, when client
closes the file (ROC case), or when kernel needs to evict the inode
(non-ROC case).
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/pnfs.c | 3 +++
fs/nfs/pnfs.h | 18 ++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 108a619..893f6b5 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -615,6 +615,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
pnfs_get_layout_hdr(lo);
pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
+ pnfs_clear_retry_layoutget(lo);
spin_unlock(&nfsi->vfs_inode.i_lock);
pnfs_free_lseg_list(&tmp_list);
pnfs_put_layout_hdr(lo);
@@ -1066,6 +1067,7 @@ bool pnfs_roc(struct inode *ino)
if (!lo || !test_and_clear_bit(NFS_LAYOUT_ROC, &lo->plh_flags) ||
test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags))
goto out_nolayout;
+ pnfs_clear_retry_layoutget(lo);
list_for_each_entry_safe(lseg, tmp, &lo->plh_segs, pls_list)
if (test_bit(NFS_LSEG_ROC, &lseg->pls_flags)) {
mark_lseg_invalid(lseg, &tmp_list);
@@ -1491,6 +1493,7 @@ lookup_again:
arg.length = PAGE_CACHE_ALIGN(arg.length);
lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
+ pnfs_clear_retry_layoutget(lo);
atomic_dec(&lo->plh_outstanding);
out_put_layout_hdr:
if (first) {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index ec81767..665692e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -99,6 +99,7 @@ enum {
NFS_LAYOUT_RETURN_BEFORE_CLOSE, /* Return this layout before close */
NFS_LAYOUT_INVALID_STID, /* layout stateid id is invalid */
NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */
+ NFS_LAYOUT_RETRY_LAYOUTGET, /* Retry layoutget */
};
enum layoutdriver_policy_flags {
@@ -349,6 +350,23 @@ nfs4_get_deviceid(struct nfs4_deviceid_node *d)
return d;
}
+static inline void pnfs_set_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ if (!test_and_set_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags))
+ atomic_inc(&lo->plh_refcount);
+}
+
+static inline void pnfs_clear_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ if (test_and_clear_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags))
+ atomic_dec(&lo->plh_refcount);
+}
+
+static inline bool pnfs_should_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ return test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags);
+}
+
static inline struct pnfs_layout_segment *
pnfs_get_lseg(struct pnfs_layout_segment *lseg)
{
--
1.9.3
From: Peng Tao <[email protected]>
To allow pnfs LD to ask direct writes to be resend.
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/direct.c | 6 ++++++
fs/nfs/internal.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index eb81478..4fad6b7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -116,6 +116,12 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
return atomic_dec_and_test(&dreq->io_count);
}
+void nfs_direct_set_resched_writes(struct nfs_direct_req *dreq)
+{
+ dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
+}
+EXPORT_SYMBOL_GPL(nfs_direct_set_resched_writes);
+
static void
nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr)
{
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ffe4b7a..44e8496 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -502,6 +502,7 @@ static inline void nfs_inode_dio_wait(struct inode *inode)
inode_dio_wait(inode);
}
extern ssize_t nfs_dreq_bytes_left(struct nfs_direct_req *dreq);
+extern void nfs_direct_set_resched_writes(struct nfs_direct_req *dreq);
/* nfs4proc.c */
extern void __nfs4_read_done_cb(struct nfs_pgio_header *);
--
1.9.3
From: Peng Tao <[email protected]>
Also take care to stop waiting if someone clears retry bit.
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs4proc.c | 4 +++-
fs/nfs/pnfs.c | 39 ++++++++++++++++++++++++++++++++++++++-
fs/nfs/pnfs.h | 5 ++++-
3 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 91733bb..297c8f6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7787,7 +7787,9 @@ static void nfs4_layoutreturn_release(void *calldata)
spin_lock(&lo->plh_inode->i_lock);
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
- clear_bit(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ clear_bit_unlock(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ smp_mb__after_atomic();
+ wake_up_bit(&lo->plh_flags, NFS_LAYOUT_RETURN);
clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE, &lo->plh_flags);
rpc_wake_up(&NFS_SERVER(lo->plh_inode)->roc_rpcwaitq);
lo->plh_block_lgets--;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 893f6b5..c4c9fe6 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1398,6 +1398,26 @@ static bool pnfs_within_mdsthreshold(struct nfs_open_context *ctx,
return ret;
}
+/* stop waiting if someone clears NFS_LAYOUT_RETRY_LAYOUTGET bit. */
+static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key)
+{
+ if (!test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, key->flags))
+ return 1;
+ return nfs_wait_bit_killable(key);
+}
+
+static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
+{
+ /*
+ * send layoutcommit as it can hold up layoutreturn due to lseg
+ * reference
+ */
+ pnfs_layoutcommit_inode(lo->plh_inode, false);
+ return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
+ pnfs_layoutget_retry_bit_wait,
+ TASK_UNINTERRUPTIBLE);
+}
+
/*
* Layout segment is retreived from the server if not cached.
* The appropriate layout segment is referenced and returned to the caller.
@@ -1444,7 +1464,8 @@ lookup_again:
}
/* if LAYOUTGET already failed once we don't try again */
- if (pnfs_layout_io_test_failed(lo, iomode))
+ if (pnfs_layout_io_test_failed(lo, iomode) &&
+ !pnfs_should_retry_layoutget(lo))
goto out_unlock;
first = list_empty(&lo->plh_segs);
@@ -1469,6 +1490,22 @@ lookup_again:
goto out_unlock;
}
+ /*
+ * Because we free lsegs before sending LAYOUTRETURN, we need to wait
+ * for LAYOUTRETURN even if first is true.
+ */
+ if (!lseg && pnfs_should_retry_layoutget(lo) &&
+ test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags)) {
+ spin_unlock(&ino->i_lock);
+ dprintk("%s wait for layoutreturn\n", __func__);
+ if (pnfs_prepare_to_retry_layoutget(lo)) {
+ pnfs_put_layout_hdr(lo);
+ dprintk("%s retrying\n", __func__);
+ goto lookup_again;
+ }
+ goto out_put_layout_hdr;
+ }
+
if (pnfs_layoutgets_blocked(lo, &arg, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 665692e..5643a07 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -358,8 +358,11 @@ static inline void pnfs_set_retry_layoutget(struct pnfs_layout_hdr *lo)
static inline void pnfs_clear_retry_layoutget(struct pnfs_layout_hdr *lo)
{
- if (test_and_clear_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags))
+ if (test_and_clear_bit(NFS_LAYOUT_RETRY_LAYOUTGET, &lo->plh_flags)) {
atomic_dec(&lo->plh_refcount);
+ /* wake up waiters for LAYOUTRETURN as that is not needed */
+ wake_up_bit(&lo->plh_flags, NFS_LAYOUT_RETURN);
+ }
}
static inline bool pnfs_should_retry_layoutget(struct pnfs_layout_hdr *lo)
--
1.9.3
The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
Signed-off-by: Tao Peng <[email protected]>
---
fs/nfs/Kconfig | 5 +
fs/nfs/Makefile | 1 +
fs/nfs/flexfilelayout/Makefile | 5 +
fs/nfs/flexfilelayout/flexfilelayout.c | 1603 +++++++++++++++++++++++++++++
fs/nfs/flexfilelayout/flexfilelayout.h | 158 +++
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 552 ++++++++++
fs/nfs/nfs4proc.c | 4 +-
fs/nfs/pnfs.c | 32 +-
fs/nfs/pnfs.h | 1 +
include/linux/nfs4.h | 1 +
10 files changed, 2351 insertions(+), 11 deletions(-)
create mode 100644 fs/nfs/flexfilelayout/Makefile
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.c
create mode 100644 fs/nfs/flexfilelayout/flexfilelayout.h
create mode 100644 fs/nfs/flexfilelayout/flexfilelayoutdev.c
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 3dece03..c7abc10 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -128,6 +128,11 @@ config PNFS_OBJLAYOUT
depends on NFS_V4_1 && SCSI_OSD_ULD
default NFS_V4
+config PNFS_FLEXFILE_LAYOUT
+ tristate
+ depends on NFS_V4_1 && NFS_V3
+ default m
+
config NFS_V4_1_IMPLEMENTATION_ID_DOMAIN
string "NFSv4.1 Implementation ID Domain"
depends on NFS_V4_1
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 23abffa..1e987ac 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -33,3 +33,4 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayout/
obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
+obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/
diff --git a/fs/nfs/flexfilelayout/Makefile b/fs/nfs/flexfilelayout/Makefile
new file mode 100644
index 0000000..1d2c9f6
--- /dev/null
+++ b/fs/nfs/flexfilelayout/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the pNFS Flexfile Layout Driver kernel module
+#
+obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += nfs_layout_flexfiles.o
+nfs_layout_flexfiles-y := flexfilelayout.o flexfilelayoutdev.o
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
new file mode 100644
index 0000000..ad1e029
--- /dev/null
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -0,0 +1,1603 @@
+/*
+ * Module for pnfs flexfile layout driver.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <[email protected]>
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
+#include <linux/module.h>
+
+#include <linux/sunrpc/metrics.h>
+
+#include "flexfilelayout.h"
+#include "../nfs4session.h"
+#include "../internal.h"
+#include "../delegation.h"
+#include "../nfs4trace.h"
+#include "../iostat.h"
+#include "../nfs.h"
+
+#define NFSDBG_FACILITY NFSDBG_PNFS_LD
+
+#define FF_LAYOUT_POLL_RETRY_MAX (15*HZ)
+
+static struct pnfs_layout_hdr *
+ff_layout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
+{
+ struct nfs4_flexfile_layout *ffl;
+
+ ffl = kzalloc(sizeof(*ffl), gfp_flags);
+ INIT_LIST_HEAD(&ffl->error_list);
+ return ffl != NULL ? &ffl->generic_hdr : NULL;
+}
+
+static void
+ff_layout_free_layout_hdr(struct pnfs_layout_hdr *lo)
+{
+ struct nfs4_ff_layout_ds_err *err, *n;
+
+ list_for_each_entry_safe(err, n, &FF_LAYOUT_FROM_HDR(lo)->error_list,
+ list) {
+ list_del(&err->list);
+ kfree(err);
+ }
+ kfree(FF_LAYOUT_FROM_HDR(lo));
+}
+
+static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
+{
+ __be32 *p;
+
+ p = xdr_inline_decode(xdr, NFS4_STATEID_SIZE);
+ if (unlikely(p == NULL))
+ return -ENOBUFS;
+ memcpy(stateid, p, NFS4_STATEID_SIZE);
+ dprintk("%s: stateid id= [%x%x%x%x]\n", __func__,
+ p[0], p[1], p[2], p[3]);
+ return 0;
+}
+
+static int decode_deviceid(struct xdr_stream *xdr, struct nfs4_deviceid *devid)
+{
+ __be32 *p;
+
+ p = xdr_inline_decode(xdr, NFS4_DEVICEID4_SIZE);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ memcpy(devid, p, NFS4_DEVICEID4_SIZE);
+ nfs4_print_deviceid(devid);
+ return 0;
+}
+
+static int decode_nfs_fh(struct xdr_stream *xdr, struct nfs_fh *fh)
+{
+ __be32 *p;
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ fh->size = be32_to_cpup(p++);
+ if (fh->size > sizeof(struct nfs_fh)) {
+ printk(KERN_ERR "NFS flexfiles: Too big fh received %d\n",
+ fh->size);
+ return -EOVERFLOW;
+ }
+ /* fh.data */
+ p = xdr_inline_decode(xdr, fh->size);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ memcpy(&fh->data, p, fh->size);
+ dprintk("%s: fh len %d\n", __func__, fh->size);
+
+ return 0;
+}
+
+/*
+ * we only handle AUTH_NONE and AUTH_UNIX for now.
+ *
+ * For AUTH_UNIX, we want to parse
+ * struct authsys_parms {
+ * unsigned int stamp;
+ * string machinename<255>;
+ * unsigned int uid;
+ * unsigned int gid;
+ * unsigned int gids<16>;
+ * };
+ */
+static int
+ff_layout_parse_auth(struct xdr_stream *xdr,
+ struct nfs4_ff_layout_mirror *mirror)
+{
+ __be32 *p;
+ int flavor, len, gid_it = 0;
+
+ /* authflavor(4) + opaque_length(4)*/
+ p = xdr_inline_decode(xdr, 8);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ flavor = be32_to_cpup(p++);
+ len = be32_to_cpup(p++);
+ if (flavor < RPC_AUTH_NULL || flavor >= RPC_AUTH_MAXFLAVOR ||
+ len < 0)
+ return -EINVAL;
+
+ dprintk("%s: flavor %u len %u\n", __func__, flavor, len);
+
+ if (flavor == RPC_AUTH_NULL && len == 0)
+ goto out_fill;
+
+ /* opaque body */
+ p = xdr_inline_decode(xdr, len);
+ if (unlikely(!p))
+ return -ENOBUFS;
+
+ if (flavor == RPC_AUTH_NULL) {
+ mirror->uid = -1;
+ mirror->gid = -1;
+ } else if (flavor == RPC_AUTH_UNIX) {
+ int len2;
+
+ p++; /* stamp */
+ len2 = be32_to_cpup(p++); /* machinename length */
+ dprintk("%s: machinename length %u\n", __func__, len2);
+ if (len2 < 0 || len2 >= len || len2 > 255)
+ return -EINVAL;
+ p += XDR_QUADLEN(len2); /* machinename */
+
+ mirror->uid = be32_to_cpup(p++);
+ mirror->gid = be32_to_cpup(p++);
+
+ len2 = be32_to_cpup(p++); /* gid array length */
+ dprintk("%s: gid array length %u\n", __func__, len2);
+ if (len2 > 16)
+ return -EINVAL;
+ for (; gid_it < len2; gid_it++)
+ mirror->gids[gid_it] = be32_to_cpup(p++);
+ } else {
+ return -EPROTONOSUPPORT;
+ }
+
+out_fill:
+ /* filling the rest of gids */
+ for (; gid_it < 16; gid_it++)
+ mirror->gids[gid_it] = -1;
+
+ return 0;
+}
+
+static void ff_layout_free_mirror_array(struct nfs4_ff_layout_segment *fls)
+{
+ int i;
+
+ if (fls->mirror_array) {
+ for (i = 0; i < fls->mirror_array_cnt; i++) {
+ /* normally mirror_ds is freed in
+ * .free_deviceid_node but we still do it here
+ * for .alloc_lseg error path */
+ if (fls->mirror_array[i]) {
+ kfree(fls->mirror_array[i]->fh_versions);
+ nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
+ kfree(fls->mirror_array[i]);
+ }
+ }
+ kfree(fls->mirror_array);
+ fls->mirror_array = NULL;
+ }
+}
+
+static int ff_layout_check_layout(struct nfs4_layoutget_res *lgr)
+{
+ int ret = 0;
+
+ dprintk("--> %s\n", __func__);
+
+ /* FIXME: remove this check when layout segment support is added */
+ if (lgr->range.offset != 0 ||
+ lgr->range.length != NFS4_MAX_UINT64) {
+ dprintk("%s Only whole file layouts supported. Use MDS i/o\n",
+ __func__);
+ ret = -EINVAL;
+ }
+
+ dprintk("--> %s returns %d\n", __func__, ret);
+ return ret;
+}
+
+static void _ff_layout_free_lseg(struct nfs4_ff_layout_segment *fls)
+{
+ if (fls) {
+ ff_layout_free_mirror_array(fls);
+ kfree(fls);
+ }
+}
+
+static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
+{
+ struct nfs4_ff_layout_mirror *tmp;
+ int i, j;
+
+ for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
+ for (j = i + 1; j < fls->mirror_array_cnt; j++)
+ if (fls->mirror_array[i]->efficiency <
+ fls->mirror_array[j]->efficiency) {
+ tmp = fls->mirror_array[i];
+ fls->mirror_array[i] = fls->mirror_array[j];
+ fls->mirror_array[j] = tmp;
+ }
+ }
+}
+
+static struct pnfs_layout_segment *
+ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
+ struct nfs4_layoutget_res *lgr,
+ gfp_t gfp_flags)
+{
+ struct pnfs_layout_segment *ret;
+ struct nfs4_ff_layout_segment *fls = NULL;
+ struct xdr_stream stream;
+ struct xdr_buf buf;
+ struct page *scratch;
+ u64 stripe_unit;
+ u32 mirror_array_cnt;
+ __be32 *p;
+ int i, rc;
+
+ dprintk("--> %s\n", __func__);
+ scratch = alloc_page(gfp_flags);
+ if (!scratch)
+ return ERR_PTR(-ENOMEM);
+
+ xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages,
+ lgr->layoutp->len);
+ xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+
+ /* stripe unit and mirror_array_cnt */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 8 + 4);
+ if (!p)
+ goto out_err_free;
+
+ p = xdr_decode_hyper(p, &stripe_unit);
+ mirror_array_cnt = be32_to_cpup(p++);
+ dprintk("%s: stripe_unit=%llu mirror_array_cnt=%u\n", __func__,
+ stripe_unit, mirror_array_cnt);
+
+ if (mirror_array_cnt > NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT ||
+ mirror_array_cnt == 0)
+ goto out_err_free;
+
+ rc = -ENOMEM;
+ fls = kzalloc(sizeof(*fls), gfp_flags);
+ if (!fls)
+ goto out_err_free;
+
+ fls->mirror_array_cnt = mirror_array_cnt;
+ fls->stripe_unit = stripe_unit;
+ fls->mirror_array = kcalloc(fls->mirror_array_cnt,
+ sizeof(fls->mirror_array[0]), gfp_flags);
+ if (fls->mirror_array == NULL)
+ goto out_err_free;
+
+ for (i = 0; i < fls->mirror_array_cnt; i++) {
+ struct nfs4_deviceid devid;
+ struct nfs4_deviceid_node *idnode;
+ u32 ds_count;
+ u32 fh_count;
+ int j;
+
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ ds_count = be32_to_cpup(p);
+
+ /* FIXME: allow for striping? */
+ if (ds_count != 1)
+ goto out_err_free;
+
+ fls->mirror_array[i] =
+ kzalloc(sizeof(struct nfs4_ff_layout_mirror),
+ gfp_flags);
+ if (fls->mirror_array[i] == NULL) {
+ rc = -ENOMEM;
+ goto out_err_free;
+ }
+
+ spin_lock_init(&fls->mirror_array[i]->lock);
+ fls->mirror_array[i]->ds_count = ds_count;
+
+ /* deviceid */
+ rc = decode_deviceid(&stream, &devid);
+ if (rc)
+ goto out_err_free;
+
+ idnode = nfs4_find_get_deviceid(NFS_SERVER(lh->plh_inode),
+ &devid, lh->plh_lc_cred,
+ gfp_flags);
+ /*
+ * upon success, mirror_ds is allocated by previous
+ * getdeviceinfo, or newly by .alloc_deviceid_node
+ * nfs4_find_get_deviceid failure is indeed getdeviceinfo falure
+ */
+ if (idnode)
+ fls->mirror_array[i]->mirror_ds =
+ FF_LAYOUT_MIRROR_DS(idnode);
+ else
+ goto out_err_free;
+
+ /* efficiency */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ fls->mirror_array[i]->efficiency = be32_to_cpup(p);
+
+ /* stateid */
+ rc = decode_stateid(&stream, &fls->mirror_array[i]->stateid);
+ if (rc)
+ goto out_err_free;
+
+ /* fh */
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ fh_count = be32_to_cpup(p);
+
+ fls->mirror_array[i]->fh_versions =
+ kzalloc(fh_count * sizeof(struct nfs_fh),
+ gfp_flags);
+ if (fls->mirror_array[i]->fh_versions == NULL) {
+ rc = -ENOMEM;
+ goto out_err_free;
+ }
+
+ for (j = 0; j < fh_count; j++) {
+ rc = decode_nfs_fh(&stream,
+ &fls->mirror_array[i]->fh_versions[j]);
+ if (rc)
+ goto out_err_free;
+ }
+
+ fls->mirror_array[i]->fh_versions_cnt = fh_count;
+
+ /* opaque_auth */
+ rc = ff_layout_parse_auth(&stream, fls->mirror_array[i]);
+ if (rc)
+ goto out_err_free;
+
+ dprintk("%s: uid %d gid %d\n", __func__,
+ fls->mirror_array[i]->uid,
+ fls->mirror_array[i]->gid);
+ }
+
+ ff_layout_sort_mirrors(fls);
+ rc = ff_layout_check_layout(lgr);
+ if (rc)
+ goto out_err_free;
+
+ ret = &fls->generic_hdr;
+ dprintk("<-- %s (success)\n", __func__);
+out_free_page:
+ __free_page(scratch);
+ return ret;
+out_err_free:
+ _ff_layout_free_lseg(fls);
+ ret = ERR_PTR(rc);
+ dprintk("<-- %s (%d)\n", __func__, rc);
+ goto out_free_page;
+}
+
+static bool ff_layout_has_rw_segments(struct pnfs_layout_hdr *layout)
+{
+ struct pnfs_layout_segment *lseg;
+
+ list_for_each_entry(lseg, &layout->plh_segs, pls_list)
+ if (lseg->pls_range.iomode == IOMODE_RW)
+ return true;
+
+ return false;
+}
+
+static void
+ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
+{
+ struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
+ int i;
+
+ dprintk("--> %s\n", __func__);
+
+ for (i = 0; i < fls->mirror_array_cnt; i++) {
+ if (fls->mirror_array[i]) {
+ nfs4_ff_layout_put_deviceid(fls->mirror_array[i]->mirror_ds);
+ fls->mirror_array[i]->mirror_ds = NULL;
+ if (fls->mirror_array[i]->cred) {
+ put_rpccred(fls->mirror_array[i]->cred);
+ fls->mirror_array[i]->cred = NULL;
+ }
+ }
+ }
+
+ if (lseg->pls_range.iomode == IOMODE_RW) {
+ struct nfs4_flexfile_layout *ffl;
+ struct inode *inode;
+
+ ffl = FF_LAYOUT_FROM_HDR(lseg->pls_layout);
+ inode = ffl->generic_hdr.plh_inode;
+ spin_lock(&inode->i_lock);
+ if (!ff_layout_has_rw_segments(lseg->pls_layout)) {
+ ffl->commit_info.nbuckets = 0;
+ kfree(ffl->commit_info.buckets);
+ ffl->commit_info.buckets = NULL;
+ }
+ spin_unlock(&inode->i_lock);
+ }
+ _ff_layout_free_lseg(fls);
+}
+
+/* Return 1 until we have multiple lsegs support */
+static int
+ff_layout_get_lseg_count(struct nfs4_ff_layout_segment *fls)
+{
+ return 1;
+}
+
+static int
+ff_layout_alloc_commit_info(struct pnfs_layout_segment *lseg,
+ struct nfs_commit_info *cinfo,
+ gfp_t gfp_flags)
+{
+ struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
+ struct pnfs_commit_bucket *buckets;
+ int size;
+
+ if (cinfo->ds->nbuckets != 0) {
+ /* This assumes there is only one RW lseg per file.
+ * To support multiple lseg per file, we need to
+ * change struct pnfs_commit_bucket to allow dynamic
+ * increasing nbuckets.
+ */
+ return 0;
+ }
+
+ size = ff_layout_get_lseg_count(fls) * FF_LAYOUT_MIRROR_COUNT(lseg);
+
+ buckets = kcalloc(size, sizeof(struct pnfs_commit_bucket),
+ gfp_flags);
+ if (!buckets)
+ return -ENOMEM;
+ else {
+ int i;
+
+ spin_lock(cinfo->lock);
+ if (cinfo->ds->nbuckets != 0)
+ kfree(buckets);
+ else {
+ cinfo->ds->buckets = buckets;
+ cinfo->ds->nbuckets = size;
+ for (i = 0; i < size; i++) {
+ INIT_LIST_HEAD(&buckets[i].written);
+ INIT_LIST_HEAD(&buckets[i].committing);
+ /* mark direct verifier as unset */
+ buckets[i].direct_verf.committed =
+ NFS_INVALID_STABLE_HOW;
+ }
+ }
+ spin_unlock(cinfo->lock);
+ return 0;
+ }
+}
+
+static struct nfs4_pnfs_ds *
+ff_layout_choose_best_ds_for_read(struct nfs_pageio_descriptor *pgio,
+ int *best_idx)
+{
+ struct nfs4_ff_layout_segment *fls;
+ struct nfs4_pnfs_ds *ds;
+ int idx;
+
+ fls = FF_LAYOUT_LSEG(pgio->pg_lseg);
+ /* mirrors are sorted by efficiency */
+ for (idx = 0; idx < fls->mirror_array_cnt; idx++) {
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, idx, false);
+ if (ds) {
+ *best_idx = idx;
+ return ds;
+ }
+ }
+
+ return NULL;
+}
+
+static void
+ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ struct nfs_pgio_mirror *pgm;
+ struct nfs4_ff_layout_mirror *mirror;
+ struct nfs4_pnfs_ds *ds;
+ int ds_idx;
+
+ /* Use full layout for now */
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ req->wb_context,
+ 0,
+ NFS4_MAX_UINT64,
+ IOMODE_READ,
+ GFP_KERNEL);
+ /* If no lseg, fall back to read through mds */
+ if (pgio->pg_lseg == NULL)
+ goto out_mds;
+
+ ds = ff_layout_choose_best_ds_for_read(pgio, &ds_idx);
+ if (!ds)
+ goto out_mds;
+ mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
+
+ pgio->pg_mirror_idx = ds_idx;
+
+ /* read always uses only one mirror - idx 0 for pgio layer */
+ pgm = &pgio->pg_mirrors[0];
+ pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
+
+ return;
+out_mds:
+ pnfs_put_lseg(pgio->pg_lseg);
+ pgio->pg_lseg = NULL;
+ nfs_pageio_reset_read_mds(pgio);
+}
+
+static void
+ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ struct nfs4_ff_layout_mirror *mirror;
+ struct nfs_pgio_mirror *pgm;
+ struct nfs_commit_info cinfo;
+ struct nfs4_pnfs_ds *ds;
+ int i;
+ int status;
+
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ req->wb_context,
+ 0,
+ NFS4_MAX_UINT64,
+ IOMODE_RW,
+ GFP_NOFS);
+ /* If no lseg, fall back to write through mds */
+ if (pgio->pg_lseg == NULL)
+ goto out_mds;
+
+ nfs_init_cinfo(&cinfo, pgio->pg_inode, pgio->pg_dreq);
+ status = ff_layout_alloc_commit_info(pgio->pg_lseg, &cinfo, GFP_NOFS);
+ if (status < 0)
+ goto out_mds;
+
+ /* Use a direct mapping of ds_idx to pgio mirror_idx */
+ if (WARN_ON_ONCE(pgio->pg_mirror_count !=
+ FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg)))
+ goto out_mds;
+
+ for (i = 0; i < pgio->pg_mirror_count; i++) {
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, i, true);
+ if (!ds)
+ goto out_mds;
+ pgm = &pgio->pg_mirrors[i];
+ mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
+ pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
+ }
+
+ return;
+
+out_mds:
+ pnfs_put_lseg(pgio->pg_lseg);
+ pgio->pg_lseg = NULL;
+ nfs_pageio_reset_write_mds(pgio);
+}
+
+static unsigned int
+ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *req)
+{
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ req->wb_context,
+ 0,
+ NFS4_MAX_UINT64,
+ IOMODE_RW,
+ GFP_NOFS);
+ if (pgio->pg_lseg)
+ return FF_LAYOUT_MIRROR_COUNT(pgio->pg_lseg);
+
+ /* no lseg means that pnfs is not in use, so no mirroring here */
+ pnfs_put_lseg(pgio->pg_lseg);
+ pgio->pg_lseg = NULL;
+ nfs_pageio_reset_write_mds(pgio);
+ return 1;
+}
+
+static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
+ .pg_init = ff_layout_pg_init_read,
+ .pg_test = pnfs_generic_pg_test,
+ .pg_doio = pnfs_generic_pg_readpages,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
+};
+
+static const struct nfs_pageio_ops ff_layout_pg_write_ops = {
+ .pg_init = ff_layout_pg_init_write,
+ .pg_test = pnfs_generic_pg_test,
+ .pg_doio = pnfs_generic_pg_writepages,
+ .pg_get_mirror_count = ff_layout_pg_get_mirror_count_write,
+ .pg_cleanup = pnfs_generic_pg_cleanup,
+};
+
+static void ff_layout_reset_write(struct nfs_pgio_header *hdr, bool retry_pnfs)
+{
+ struct rpc_task *task = &hdr->task;
+
+ pnfs_layoutcommit_inode(hdr->inode, false);
+
+ if (retry_pnfs) {
+ dprintk("%s Reset task %5u for i/o through pNFS "
+ "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
+ hdr->task.tk_pid,
+ hdr->inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(hdr->inode),
+ hdr->args.count,
+ (unsigned long long)hdr->args.offset);
+
+ if (!hdr->dreq) {
+ struct nfs_open_context *ctx;
+
+ ctx = nfs_list_entry(hdr->pages.next)->wb_context;
+ set_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
+ hdr->completion_ops->error_cleanup(&hdr->pages);
+ } else {
+ nfs_direct_set_resched_writes(hdr->dreq);
+ /* fake unstable write to let common nfs resend pages */
+ hdr->verf.committed = NFS_UNSTABLE;
+ hdr->good_bytes = 0;
+ }
+ return;
+ }
+
+ if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
+ dprintk("%s Reset task %5u for i/o through MDS "
+ "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
+ hdr->task.tk_pid,
+ hdr->inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(hdr->inode),
+ hdr->args.count,
+ (unsigned long long)hdr->args.offset);
+
+ task->tk_status = pnfs_write_done_resend_to_mds(hdr);
+ }
+}
+
+static void ff_layout_reset_read(struct nfs_pgio_header *hdr)
+{
+ struct rpc_task *task = &hdr->task;
+
+ pnfs_layoutcommit_inode(hdr->inode, false);
+
+ if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
+ dprintk("%s Reset task %5u for i/o through MDS "
+ "(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
+ hdr->task.tk_pid,
+ hdr->inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(hdr->inode),
+ hdr->args.count,
+ (unsigned long long)hdr->args.offset);
+
+ task->tk_status = pnfs_read_done_resend_to_mds(hdr);
+ }
+}
+
+static int ff_layout_async_handle_error_v4(struct rpc_task *task,
+ struct nfs4_state *state,
+ struct nfs_client *clp,
+ struct pnfs_layout_segment *lseg,
+ int idx)
+{
+ struct pnfs_layout_hdr *lo = lseg->pls_layout;
+ struct inode *inode = lo->plh_inode;
+ struct nfs_server *mds_server = NFS_SERVER(inode);
+
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs_client *mds_client = mds_server->nfs_client;
+ struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
+
+ if (task->tk_status >= 0)
+ return 0;
+
+ switch (task->tk_status) {
+ /* MDS state errors */
+ case -NFS4ERR_DELEG_REVOKED:
+ case -NFS4ERR_ADMIN_REVOKED:
+ case -NFS4ERR_BAD_STATEID:
+ if (state == NULL)
+ break;
+ nfs_remove_bad_delegation(state->inode);
+ case -NFS4ERR_OPENMODE:
+ if (state == NULL)
+ break;
+ if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
+ goto out_bad_stateid;
+ goto wait_on_recovery;
+ case -NFS4ERR_EXPIRED:
+ if (state != NULL) {
+ if (nfs4_schedule_stateid_recovery(mds_server, state) < 0)
+ goto out_bad_stateid;
+ }
+ nfs4_schedule_lease_recovery(mds_client);
+ goto wait_on_recovery;
+ /* DS session errors */
+ case -NFS4ERR_BADSESSION:
+ case -NFS4ERR_BADSLOT:
+ case -NFS4ERR_BAD_HIGH_SLOT:
+ case -NFS4ERR_DEADSESSION:
+ case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+ case -NFS4ERR_SEQ_FALSE_RETRY:
+ case -NFS4ERR_SEQ_MISORDERED:
+ dprintk("%s ERROR %d, Reset session. Exchangeid "
+ "flags 0x%x\n", __func__, task->tk_status,
+ clp->cl_exchange_flags);
+ nfs4_schedule_session_recovery(clp->cl_session, task->tk_status);
+ break;
+ case -NFS4ERR_DELAY:
+ case -NFS4ERR_GRACE:
+ rpc_delay(task, FF_LAYOUT_POLL_RETRY_MAX);
+ break;
+ case -NFS4ERR_RETRY_UNCACHED_REP:
+ break;
+ /* Invalidate Layout errors */
+ case -NFS4ERR_PNFS_NO_LAYOUT:
+ case -ESTALE: /* mapped NFS4ERR_STALE */
+ case -EBADHANDLE: /* mapped NFS4ERR_BADHANDLE */
+ case -EISDIR: /* mapped NFS4ERR_ISDIR */
+ case -NFS4ERR_FHEXPIRED:
+ case -NFS4ERR_WRONG_TYPE:
+ dprintk("%s Invalid layout error %d\n", __func__,
+ task->tk_status);
+ /*
+ * Destroy layout so new i/o will get a new layout.
+ * Layout will not be destroyed until all current lseg
+ * references are put. Mark layout as invalid to resend failed
+ * i/o and all i/o waiting on the slot table to the MDS until
+ * layout is destroyed and a new valid layout is obtained.
+ */
+ pnfs_destroy_layout(NFS_I(inode));
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ goto reset;
+ /* RPC connection errors */
+ case -ECONNREFUSED:
+ case -EHOSTDOWN:
+ case -EHOSTUNREACH:
+ case -ENETUNREACH:
+ case -EIO:
+ case -ETIMEDOUT:
+ case -EPIPE:
+ dprintk("%s DS connection error %d\n", __func__,
+ task->tk_status);
+ nfs4_mark_deviceid_unavailable(devid);
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ /* fall through */
+ default:
+ if (ff_layout_has_available_ds(lseg))
+ return -NFS4ERR_RESET_TO_PNFS;
+reset:
+ dprintk("%s Retry through MDS. Error %d\n", __func__,
+ task->tk_status);
+ return -NFS4ERR_RESET_TO_MDS;
+ }
+out:
+ task->tk_status = 0;
+ return -EAGAIN;
+out_bad_stateid:
+ task->tk_status = -EIO;
+ return 0;
+wait_on_recovery:
+ rpc_sleep_on(&mds_client->cl_rpcwaitq, task, NULL);
+ if (test_bit(NFS4CLNT_MANAGER_RUNNING, &mds_client->cl_state) == 0)
+ rpc_wake_up_queued_task(&mds_client->cl_rpcwaitq, task);
+ goto out;
+}
+
+/* Retry all errors through either pNFS or MDS except for -EJUKEBOX */
+static int ff_layout_async_handle_error_v3(struct rpc_task *task,
+ struct pnfs_layout_segment *lseg,
+ int idx)
+{
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+
+ if (task->tk_status >= 0)
+ return 0;
+
+ if (task->tk_status != -EJUKEBOX) {
+ dprintk("%s DS connection error %d\n", __func__,
+ task->tk_status);
+ nfs4_mark_deviceid_unavailable(devid);
+ if (ff_layout_has_available_ds(lseg))
+ return -NFS4ERR_RESET_TO_PNFS;
+ else
+ return -NFS4ERR_RESET_TO_MDS;
+ }
+
+ if (task->tk_status == -EJUKEBOX)
+ nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY);
+ task->tk_status = 0;
+ rpc_restart_call(task);
+ rpc_delay(task, NFS_JUKEBOX_RETRY_TIME);
+ return -EAGAIN;
+}
+
+static int ff_layout_async_handle_error(struct rpc_task *task,
+ struct nfs4_state *state,
+ struct nfs_client *clp,
+ struct pnfs_layout_segment *lseg,
+ int idx)
+{
+ int vers = clp->cl_nfs_mod->rpc_vers->number;
+
+ switch (vers) {
+ case 3:
+ return ff_layout_async_handle_error_v3(task, lseg, idx);
+ case 4:
+ return ff_layout_async_handle_error_v4(task, state, clp,
+ lseg, idx);
+ default:
+ /* should never happen */
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+}
+
+static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
+ int idx, u64 offset, u64 length,
+ u32 status, int opnum)
+{
+ struct nfs4_ff_layout_mirror *mirror;
+ int err;
+
+ mirror = FF_LAYOUT_COMP(lseg, idx);
+ err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
+ mirror, offset, length, status, opnum,
+ GFP_NOIO);
+ dprintk("%s: err %d op %d status %u\n", __func__, err, opnum, status);
+}
+
+/* NFS_PROTO call done callback routines */
+
+static int ff_layout_read_done_cb(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ struct inode *inode;
+ int err;
+
+ trace_nfs4_pnfs_read(hdr, task->tk_status);
+ if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
+ hdr->res.op_status = NFS4ERR_NXIO;
+ if (task->tk_status < 0 && hdr->res.op_status)
+ ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
+ hdr->args.offset, hdr->args.count,
+ hdr->res.op_status, OP_READ);
+ err = ff_layout_async_handle_error(task, hdr->args.context->state,
+ hdr->ds_clp, hdr->lseg,
+ hdr->pgio_mirror_idx);
+
+ switch (err) {
+ case -NFS4ERR_RESET_TO_PNFS:
+ set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &hdr->lseg->pls_layout->plh_flags);
+ pnfs_read_resend_pnfs(hdr);
+ return task->tk_status;
+ case -NFS4ERR_RESET_TO_MDS:
+ inode = hdr->lseg->pls_layout->plh_inode;
+ pnfs_error_mark_layout_for_return(inode, hdr->lseg);
+ ff_layout_reset_read(hdr);
+ return task->tk_status;
+ case -EAGAIN:
+ rpc_restart_call_prepare(task);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+/*
+ * We reference the rpc_cred of the first WRITE that triggers the need for
+ * a LAYOUTCOMMIT, and use it to send the layoutcommit compound.
+ * rfc5661 is not clear about which credential should be used.
+ *
+ * Flexlayout client should treat DS replied FILE_SYNC as DATA_SYNC, so
+ * to follow http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
+ * we always send layoutcommit after DS writes.
+ */
+static void
+ff_layout_set_layoutcommit(struct nfs_pgio_header *hdr)
+{
+ pnfs_set_layoutcommit(hdr);
+ dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino,
+ (unsigned long) NFS_I(hdr->inode)->layout->plh_lwb);
+}
+
+static bool
+ff_layout_reset_to_mds(struct pnfs_layout_segment *lseg, int idx)
+{
+ /* No mirroring for now */
+ struct nfs4_deviceid_node *node = FF_LAYOUT_DEVID_NODE(lseg, idx);
+
+ return ff_layout_test_devid_unavailable(node);
+}
+
+static int ff_layout_read_prepare_common(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
+ rpc_exit(task, -EIO);
+ return -EIO;
+ }
+ if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
+ dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+ if (ff_layout_has_available_ds(hdr->lseg))
+ pnfs_read_resend_pnfs(hdr);
+ else
+ ff_layout_reset_read(hdr);
+ rpc_exit(task, 0);
+ return -EAGAIN;
+ }
+ hdr->pgio_done_cb = ff_layout_read_done_cb;
+
+ return 0;
+}
+
+/*
+ * Call ops for the async read/write cases
+ * In the case of dense layouts, the offset needs to be reset to its
+ * original value.
+ */
+static void ff_layout_read_prepare_v3(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_read_prepare_common(task, hdr))
+ return;
+
+ rpc_call_start(task);
+}
+
+static int ff_layout_setup_sequence(struct nfs_client *ds_clp,
+ struct nfs4_sequence_args *args,
+ struct nfs4_sequence_res *res,
+ struct rpc_task *task)
+{
+ if (ds_clp->cl_session)
+ return nfs41_setup_sequence(ds_clp->cl_session,
+ args,
+ res,
+ task);
+ return nfs40_setup_sequence(ds_clp->cl_slot_tbl,
+ args,
+ res,
+ task);
+}
+
+static void ff_layout_read_prepare_v4(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_read_prepare_common(task, hdr))
+ return;
+
+ if (ff_layout_setup_sequence(hdr->ds_clp,
+ &hdr->args.seq_args,
+ &hdr->res.seq_res,
+ task))
+ return;
+
+ if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
+ hdr->args.lock_context, FMODE_READ) == -EIO)
+ rpc_exit(task, -EIO); /* lost lock, terminate I/O */
+}
+
+static void ff_layout_read_call_done(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
+
+ if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
+ task->tk_status == 0) {
+ nfs4_sequence_done(task, &hdr->res.seq_res);
+ return;
+ }
+
+ /* Note this may cause RPC to be resent */
+ hdr->mds_ops->rpc_call_done(task, hdr);
+}
+
+static void ff_layout_read_count_stats(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ rpc_count_iostats_metrics(task,
+ &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_READ]);
+}
+
+static int ff_layout_write_done_cb(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ struct inode *inode;
+ int err;
+
+ trace_nfs4_pnfs_write(hdr, task->tk_status);
+ if (task->tk_status == -ETIMEDOUT && !hdr->res.op_status)
+ hdr->res.op_status = NFS4ERR_NXIO;
+ if (task->tk_status < 0 && hdr->res.op_status)
+ ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
+ hdr->args.offset, hdr->args.count,
+ hdr->res.op_status, OP_WRITE);
+ err = ff_layout_async_handle_error(task, hdr->args.context->state,
+ hdr->ds_clp, hdr->lseg,
+ hdr->pgio_mirror_idx);
+
+ switch (err) {
+ case -NFS4ERR_RESET_TO_PNFS:
+ case -NFS4ERR_RESET_TO_MDS:
+ inode = hdr->lseg->pls_layout->plh_inode;
+ pnfs_error_mark_layout_for_return(inode, hdr->lseg);
+ if (err == -NFS4ERR_RESET_TO_PNFS) {
+ pnfs_set_retry_layoutget(hdr->lseg->pls_layout);
+ ff_layout_reset_write(hdr, true);
+ } else {
+ pnfs_clear_retry_layoutget(hdr->lseg->pls_layout);
+ ff_layout_reset_write(hdr, false);
+ }
+ return task->tk_status;
+ case -EAGAIN:
+ rpc_restart_call_prepare(task);
+ return -EAGAIN;
+ }
+
+ if (hdr->res.verf->committed == NFS_FILE_SYNC ||
+ hdr->res.verf->committed == NFS_DATA_SYNC)
+ ff_layout_set_layoutcommit(hdr);
+
+ return 0;
+}
+
+static int ff_layout_commit_done_cb(struct rpc_task *task,
+ struct nfs_commit_data *data)
+{
+ struct inode *inode;
+ int err;
+
+ trace_nfs4_pnfs_commit_ds(data, task->tk_status);
+ if (task->tk_status == -ETIMEDOUT && !data->res.op_status)
+ data->res.op_status = NFS4ERR_NXIO;
+ if (task->tk_status < 0 && data->res.op_status)
+ ff_layout_io_track_ds_error(data->lseg, data->ds_commit_index,
+ data->args.offset, data->args.count,
+ data->res.op_status, OP_COMMIT);
+ err = ff_layout_async_handle_error(task, NULL, data->ds_clp,
+ data->lseg, data->ds_commit_index);
+
+ switch (err) {
+ case -NFS4ERR_RESET_TO_PNFS:
+ case -NFS4ERR_RESET_TO_MDS:
+ inode = data->lseg->pls_layout->plh_inode;
+ pnfs_error_mark_layout_for_return(inode, data->lseg);
+ if (err == -NFS4ERR_RESET_TO_PNFS)
+ pnfs_set_retry_layoutget(data->lseg->pls_layout);
+ else
+ pnfs_clear_retry_layoutget(data->lseg->pls_layout);
+ pnfs_generic_prepare_to_resend_writes(data);
+ return -EAGAIN;
+ case -EAGAIN:
+ rpc_restart_call_prepare(task);
+ return -EAGAIN;
+ }
+
+ if (data->verf.committed == NFS_UNSTABLE)
+ pnfs_commit_set_layoutcommit(data);
+
+ return 0;
+}
+
+static int ff_layout_write_prepare_common(struct rpc_task *task,
+ struct nfs_pgio_header *hdr)
+{
+ if (unlikely(test_bit(NFS_CONTEXT_BAD, &hdr->args.context->flags))) {
+ rpc_exit(task, -EIO);
+ return -EIO;
+ }
+
+ if (ff_layout_reset_to_mds(hdr->lseg, hdr->pgio_mirror_idx)) {
+ bool retry_pnfs;
+
+ retry_pnfs = ff_layout_has_available_ds(hdr->lseg);
+ dprintk("%s task %u reset io to %s\n", __func__,
+ task->tk_pid, retry_pnfs ? "pNFS" : "MDS");
+ ff_layout_reset_write(hdr, retry_pnfs);
+ rpc_exit(task, 0);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+static void ff_layout_write_prepare_v3(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_write_prepare_common(task, hdr))
+ return;
+
+ rpc_call_start(task);
+}
+
+static void ff_layout_write_prepare_v4(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (ff_layout_write_prepare_common(task, hdr))
+ return;
+
+ if (ff_layout_setup_sequence(hdr->ds_clp,
+ &hdr->args.seq_args,
+ &hdr->res.seq_res,
+ task))
+ return;
+
+ if (nfs4_set_rw_stateid(&hdr->args.stateid, hdr->args.context,
+ hdr->args.lock_context, FMODE_WRITE) == -EIO)
+ rpc_exit(task, -EIO); /* lost lock, terminate I/O */
+}
+
+static void ff_layout_write_call_done(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ if (test_bit(NFS_IOHDR_REDO, &hdr->flags) &&
+ task->tk_status == 0) {
+ nfs4_sequence_done(task, &hdr->res.seq_res);
+ return;
+ }
+
+ /* Note this may cause RPC to be resent */
+ hdr->mds_ops->rpc_call_done(task, hdr);
+}
+
+static void ff_layout_write_count_stats(struct rpc_task *task, void *data)
+{
+ struct nfs_pgio_header *hdr = data;
+
+ rpc_count_iostats_metrics(task,
+ &NFS_CLIENT(hdr->inode)->cl_metrics[NFSPROC4_CLNT_WRITE]);
+}
+
+static void ff_layout_commit_prepare_v3(struct rpc_task *task, void *data)
+{
+ rpc_call_start(task);
+}
+
+static void ff_layout_commit_prepare_v4(struct rpc_task *task, void *data)
+{
+ struct nfs_commit_data *wdata = data;
+
+ ff_layout_setup_sequence(wdata->ds_clp,
+ &wdata->args.seq_args,
+ &wdata->res.seq_res,
+ task);
+}
+
+static void ff_layout_commit_count_stats(struct rpc_task *task, void *data)
+{
+ struct nfs_commit_data *cdata = data;
+
+ rpc_count_iostats_metrics(task,
+ &NFS_CLIENT(cdata->inode)->cl_metrics[NFSPROC4_CLNT_COMMIT]);
+}
+
+static const struct rpc_call_ops ff_layout_read_call_ops_v3 = {
+ .rpc_call_prepare = ff_layout_read_prepare_v3,
+ .rpc_call_done = ff_layout_read_call_done,
+ .rpc_count_stats = ff_layout_read_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_read_call_ops_v4 = {
+ .rpc_call_prepare = ff_layout_read_prepare_v4,
+ .rpc_call_done = ff_layout_read_call_done,
+ .rpc_count_stats = ff_layout_read_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_write_call_ops_v3 = {
+ .rpc_call_prepare = ff_layout_write_prepare_v3,
+ .rpc_call_done = ff_layout_write_call_done,
+ .rpc_count_stats = ff_layout_write_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_write_call_ops_v4 = {
+ .rpc_call_prepare = ff_layout_write_prepare_v4,
+ .rpc_call_done = ff_layout_write_call_done,
+ .rpc_count_stats = ff_layout_write_count_stats,
+ .rpc_release = pnfs_generic_rw_release,
+};
+
+static const struct rpc_call_ops ff_layout_commit_call_ops_v3 = {
+ .rpc_call_prepare = ff_layout_commit_prepare_v3,
+ .rpc_call_done = pnfs_generic_write_commit_done,
+ .rpc_count_stats = ff_layout_commit_count_stats,
+ .rpc_release = pnfs_generic_commit_release,
+};
+
+static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
+ .rpc_call_prepare = ff_layout_commit_prepare_v4,
+ .rpc_call_done = pnfs_generic_write_commit_done,
+ .rpc_count_stats = ff_layout_commit_count_stats,
+ .rpc_release = pnfs_generic_commit_release,
+};
+
+static enum pnfs_try_status
+ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
+{
+ struct pnfs_layout_segment *lseg = hdr->lseg;
+ struct nfs4_pnfs_ds *ds;
+ struct rpc_clnt *ds_clnt;
+ struct rpc_cred *ds_cred;
+ loff_t offset = hdr->args.offset;
+ u32 idx = hdr->pgio_mirror_idx;
+ int vers;
+ struct nfs_fh *fh;
+
+ dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
+ __func__, hdr->inode->i_ino,
+ hdr->args.pgbase, (size_t)hdr->args.count, offset);
+
+ ds = nfs4_ff_layout_prepare_ds(lseg, idx, false);
+ if (!ds)
+ goto out_failed;
+
+ ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
+ hdr->inode);
+ if (IS_ERR(ds_clnt))
+ goto out_failed;
+
+ ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
+ if (IS_ERR(ds_cred))
+ goto out_failed;
+
+ vers = nfs4_ff_layout_ds_version(lseg, idx);
+
+ dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
+ ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count), vers);
+
+ atomic_inc(&ds->ds_clp->cl_count);
+ hdr->ds_clp = ds->ds_clp;
+ fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
+ if (fh)
+ hdr->args.fh = fh;
+
+ /*
+ * Note that if we ever decide to split across DSes,
+ * then we may need to handle dense-like offsets.
+ */
+ hdr->args.offset = offset;
+ hdr->mds_offset = offset;
+
+ /* Perform an asynchronous read to ds */
+ nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ vers == 3 ? &ff_layout_read_call_ops_v3 :
+ &ff_layout_read_call_ops_v4,
+ 0, RPC_TASK_SOFTCONN);
+
+ return PNFS_ATTEMPTED;
+
+out_failed:
+ if (ff_layout_has_available_ds(lseg))
+ return PNFS_TRY_AGAIN;
+ return PNFS_NOT_ATTEMPTED;
+}
+
+/* Perform async writes. */
+static enum pnfs_try_status
+ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+{
+ struct pnfs_layout_segment *lseg = hdr->lseg;
+ struct nfs4_pnfs_ds *ds;
+ struct rpc_clnt *ds_clnt;
+ struct rpc_cred *ds_cred;
+ loff_t offset = hdr->args.offset;
+ int vers;
+ struct nfs_fh *fh;
+ int idx = hdr->pgio_mirror_idx;
+
+ ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
+ if (!ds)
+ return PNFS_NOT_ATTEMPTED;
+
+ ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
+ hdr->inode);
+ if (IS_ERR(ds_clnt))
+ return PNFS_NOT_ATTEMPTED;
+
+ ds_cred = ff_layout_get_ds_cred(lseg, idx, hdr->cred);
+ if (IS_ERR(ds_cred))
+ return PNFS_NOT_ATTEMPTED;
+
+ vers = nfs4_ff_layout_ds_version(lseg, idx);
+
+ dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d vers %d\n",
+ __func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
+ offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count),
+ vers);
+
+ hdr->pgio_done_cb = ff_layout_write_done_cb;
+ atomic_inc(&ds->ds_clp->cl_count);
+ hdr->ds_clp = ds->ds_clp;
+ hdr->ds_commit_idx = idx;
+ fh = nfs4_ff_layout_select_ds_fh(lseg, idx);
+ if (fh)
+ hdr->args.fh = fh;
+
+ /*
+ * Note that if we ever decide to split across DSes,
+ * then we may need to handle dense-like offsets.
+ */
+ hdr->args.offset = offset;
+
+ /* Perform an asynchronous write */
+ nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ vers == 3 ? &ff_layout_write_call_ops_v3 :
+ &ff_layout_write_call_ops_v4,
+ sync, RPC_TASK_SOFTCONN);
+ return PNFS_ATTEMPTED;
+}
+
+static void
+ff_layout_mark_request_commit(struct nfs_page *req,
+ struct pnfs_layout_segment *lseg,
+ struct nfs_commit_info *cinfo,
+ u32 ds_commit_idx)
+{
+ struct list_head *list;
+ struct pnfs_commit_bucket *buckets;
+
+ spin_lock(cinfo->lock);
+ buckets = cinfo->ds->buckets;
+ list = &buckets[ds_commit_idx].written;
+ if (list_empty(list)) {
+ /* Non-empty buckets hold a reference on the lseg. That ref
+ * is normally transferred to the COMMIT call and released
+ * there. It could also be released if the last req is pulled
+ * off due to a rewrite, in which case it will be done in
+ * pnfs_common_clear_request_commit
+ */
+ WARN_ON_ONCE(buckets[ds_commit_idx].wlseg != NULL);
+ buckets[ds_commit_idx].wlseg = pnfs_get_lseg(lseg);
+ }
+ set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
+ cinfo->ds->nwritten++;
+
+ /* nfs_request_add_commit_list(). We need to add req to list without
+ * dropping cinfo lock.
+ */
+ set_bit(PG_CLEAN, &(req)->wb_flags);
+ nfs_list_add_request(req, list);
+ cinfo->mds->ncommit++;
+ spin_unlock(cinfo->lock);
+ if (!cinfo->dreq) {
+ inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+ inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
+ BDI_RECLAIMABLE);
+ __mark_inode_dirty(req->wb_context->dentry->d_inode,
+ I_DIRTY_DATASYNC);
+ }
+}
+
+static u32 calc_ds_index_from_commit(struct pnfs_layout_segment *lseg, u32 i)
+{
+ return i;
+}
+
+static struct nfs_fh *
+select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
+{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+
+ /* FIXME: Assume that there is only one NFS version available
+ * for the DS.
+ */
+ return &flseg->mirror_array[i]->fh_versions[0];
+}
+
+static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
+{
+ struct pnfs_layout_segment *lseg = data->lseg;
+ struct nfs4_pnfs_ds *ds;
+ struct rpc_clnt *ds_clnt;
+ struct rpc_cred *ds_cred;
+ u32 idx;
+ int vers;
+ struct nfs_fh *fh;
+
+ idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
+ ds = nfs4_ff_layout_prepare_ds(lseg, idx, true);
+ if (!ds)
+ goto out_err;
+
+ ds_clnt = nfs4_ff_find_or_create_ds_client(lseg, idx, ds->ds_clp,
+ data->inode);
+ if (IS_ERR(ds_clnt))
+ goto out_err;
+
+ ds_cred = ff_layout_get_ds_cred(lseg, idx, data->cred);
+ if (IS_ERR(ds_cred))
+ goto out_err;
+
+ vers = nfs4_ff_layout_ds_version(lseg, idx);
+
+ dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
+ data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count),
+ vers);
+ data->commit_done_cb = ff_layout_commit_done_cb;
+ data->cred = ds_cred;
+ atomic_inc(&ds->ds_clp->cl_count);
+ data->ds_clp = ds->ds_clp;
+ fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
+ if (fh)
+ data->args.fh = fh;
+ return nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
+ vers == 3 ? &ff_layout_commit_call_ops_v3 :
+ &ff_layout_commit_call_ops_v4,
+ how, RPC_TASK_SOFTCONN);
+out_err:
+ pnfs_generic_prepare_to_resend_writes(data);
+ pnfs_generic_commit_release(data);
+ return -EAGAIN;
+}
+
+static int
+ff_layout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
+ int how, struct nfs_commit_info *cinfo)
+{
+ return pnfs_generic_commit_pagelist(inode, mds_pages, how, cinfo,
+ ff_layout_initiate_commit);
+}
+
+static struct pnfs_ds_commit_info *
+ff_layout_get_ds_info(struct inode *inode)
+{
+ struct pnfs_layout_hdr *layout = NFS_I(inode)->layout;
+
+ if (layout == NULL)
+ return NULL;
+
+ return &FF_LAYOUT_FROM_HDR(layout)->commit_info;
+}
+
+static void
+ff_layout_free_deveiceid_node(struct nfs4_deviceid_node *d)
+{
+ nfs4_ff_layout_free_deviceid(container_of(d, struct nfs4_ff_layout_ds,
+ id_node));
+}
+
+static int ff_layout_encode_ioerr(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ struct pnfs_layout_hdr *hdr = &flo->generic_hdr;
+ __be32 *start;
+ int count = 0, ret = 0;
+
+ start = xdr_reserve_space(xdr, 4);
+ if (unlikely(!start))
+ return -E2BIG;
+
+ /* This assume we always return _ALL_ layouts */
+ spin_lock(&hdr->plh_inode->i_lock);
+ ret = ff_layout_encode_ds_ioerr(flo, xdr, &count, &args->range);
+ spin_unlock(&hdr->plh_inode->i_lock);
+
+ *start = cpu_to_be32(count);
+
+ return ret;
+}
+
+/* report nothing for now */
+static void ff_layout_encode_iostats(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 4);
+ if (likely(p))
+ *p = cpu_to_be32(0);
+}
+
+static struct nfs4_deviceid_node *
+ff_layout_alloc_deviceid_node(struct nfs_server *server,
+ struct pnfs_device *pdev, gfp_t gfp_flags)
+{
+ struct nfs4_ff_layout_ds *dsaddr;
+
+ dsaddr = nfs4_ff_alloc_deviceid_node(server, pdev, gfp_flags);
+ if (!dsaddr)
+ return NULL;
+ return &dsaddr->id_node;
+}
+
+static void
+ff_layout_encode_layoutreturn(struct pnfs_layout_hdr *lo,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ struct nfs4_flexfile_layout *flo = FF_LAYOUT_FROM_HDR(lo);
+ __be32 *start;
+
+ dprintk("%s: Begin\n", __func__);
+ start = xdr_reserve_space(xdr, 4);
+ BUG_ON(!start);
+
+ if (ff_layout_encode_ioerr(flo, xdr, args))
+ goto out;
+
+ ff_layout_encode_iostats(flo, xdr, args);
+out:
+ *start = cpu_to_be32((xdr->p - start - 1) * 4);
+ dprintk("%s: Return\n", __func__);
+}
+
+static struct pnfs_layoutdriver_type flexfilelayout_type = {
+ .id = LAYOUT_FLEX_FILES,
+ .name = "LAYOUT_FLEX_FILES",
+ .owner = THIS_MODULE,
+ .alloc_layout_hdr = ff_layout_alloc_layout_hdr,
+ .free_layout_hdr = ff_layout_free_layout_hdr,
+ .alloc_lseg = ff_layout_alloc_lseg,
+ .free_lseg = ff_layout_free_lseg,
+ .pg_read_ops = &ff_layout_pg_read_ops,
+ .pg_write_ops = &ff_layout_pg_write_ops,
+ .get_ds_info = ff_layout_get_ds_info,
+ .free_deviceid_node = ff_layout_free_deveiceid_node,
+ .mark_request_commit = ff_layout_mark_request_commit,
+ .clear_request_commit = pnfs_generic_clear_request_commit,
+ .scan_commit_lists = pnfs_generic_scan_commit_lists,
+ .recover_commit_reqs = pnfs_generic_recover_commit_reqs,
+ .commit_pagelist = ff_layout_commit_pagelist,
+ .read_pagelist = ff_layout_read_pagelist,
+ .write_pagelist = ff_layout_write_pagelist,
+ .alloc_deviceid_node = ff_layout_alloc_deviceid_node,
+ .encode_layoutreturn = ff_layout_encode_layoutreturn,
+};
+
+static int __init nfs4flexfilelayout_init(void)
+{
+ printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Registering...\n",
+ __func__);
+ return pnfs_register_layoutdriver(&flexfilelayout_type);
+}
+
+static void __exit nfs4flexfilelayout_exit(void)
+{
+ printk(KERN_INFO "%s: NFSv4 Flexfile Layout Driver Unregistering...\n",
+ __func__);
+ pnfs_unregister_layoutdriver(&flexfilelayout_type);
+}
+
+MODULE_ALIAS("nfs-layouttype4-4");
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("The NFSv4 flexfile layout driver");
+
+module_init(nfs4flexfilelayout_init);
+module_exit(nfs4flexfilelayout_exit);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
new file mode 100644
index 0000000..712fc55
--- /dev/null
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -0,0 +1,158 @@
+/*
+ * NFSv4 flexfile layout driver data structures.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <[email protected]>
+ */
+
+#ifndef FS_NFS_NFS4FLEXFILELAYOUT_H
+#define FS_NFS_NFS4FLEXFILELAYOUT_H
+
+#include "../pnfs.h"
+
+/* XXX: Let's filter out insanely large mirror count for now to avoid oom
+ * due to network error etc. */
+#define NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT 4096
+
+struct nfs4_ff_ds_version {
+ u32 version;
+ u32 minor_version;
+ u32 rsize;
+ u32 wsize;
+ bool tightly_coupled;
+};
+
+/* chained in global deviceid hlist */
+struct nfs4_ff_layout_ds {
+ struct nfs4_deviceid_node id_node;
+ u32 ds_versions_cnt;
+ struct nfs4_ff_ds_version *ds_versions;
+ struct nfs4_pnfs_ds *ds;
+};
+
+struct nfs4_ff_layout_ds_err {
+ struct list_head list; /* linked in mirror error_list */
+ u64 offset;
+ u64 length;
+ int status;
+ enum nfs_opnum4 opnum;
+ nfs4_stateid stateid;
+ struct nfs4_deviceid deviceid;
+};
+
+struct nfs4_ff_layout_mirror {
+ u32 ds_count;
+ u32 efficiency;
+ struct nfs4_ff_layout_ds *mirror_ds;
+ u32 fh_versions_cnt;
+ struct nfs_fh *fh_versions;
+ nfs4_stateid stateid;
+ union {
+ struct { /* same as struct unx_cred */
+ u32 uid; /* -1 iff AUTH_NONE */
+ u32 gid; /* -1 iff AUTH_NONE */
+ u32 gids[16];
+ };
+ };
+ struct rpc_cred *cred;
+ spinlock_t lock;
+};
+
+struct nfs4_ff_layout_segment {
+ struct pnfs_layout_segment generic_hdr;
+ u64 stripe_unit;
+ u32 mirror_array_cnt;
+ struct nfs4_ff_layout_mirror **mirror_array;
+};
+
+struct nfs4_flexfile_layout {
+ struct pnfs_layout_hdr generic_hdr;
+ struct pnfs_ds_commit_info commit_info;
+ struct list_head error_list; /* nfs4_ff_layout_ds_err */
+};
+
+static inline struct nfs4_flexfile_layout *
+FF_LAYOUT_FROM_HDR(struct pnfs_layout_hdr *lo)
+{
+ return container_of(lo, struct nfs4_flexfile_layout, generic_hdr);
+}
+
+static inline struct nfs4_ff_layout_segment *
+FF_LAYOUT_LSEG(struct pnfs_layout_segment *lseg)
+{
+ return container_of(lseg,
+ struct nfs4_ff_layout_segment,
+ generic_hdr);
+}
+
+static inline struct nfs4_deviceid_node *
+FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
+{
+ if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt ||
+ FF_LAYOUT_LSEG(lseg)->mirror_array[idx] == NULL ||
+ FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds == NULL)
+ return NULL;
+ return &FF_LAYOUT_LSEG(lseg)->mirror_array[idx]->mirror_ds->id_node;
+}
+
+static inline struct nfs4_ff_layout_ds *
+FF_LAYOUT_MIRROR_DS(struct nfs4_deviceid_node *node)
+{
+ return container_of(node, struct nfs4_ff_layout_ds, id_node);
+}
+
+static inline struct nfs4_ff_layout_mirror *
+FF_LAYOUT_COMP(struct pnfs_layout_segment *lseg, u32 idx)
+{
+ if (idx >= FF_LAYOUT_LSEG(lseg)->mirror_array_cnt)
+ return NULL;
+ return FF_LAYOUT_LSEG(lseg)->mirror_array[idx];
+}
+
+static inline u32
+FF_LAYOUT_MIRROR_COUNT(struct pnfs_layout_segment *lseg)
+{
+ return FF_LAYOUT_LSEG(lseg)->mirror_array_cnt;
+}
+
+static inline bool
+ff_layout_test_devid_unavailable(struct nfs4_deviceid_node *node)
+{
+ return nfs4_test_deviceid_unavailable(node);
+}
+
+static inline int
+nfs4_ff_layout_ds_version(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+ return FF_LAYOUT_COMP(lseg, ds_idx)->mirror_ds->ds_versions[0].version;
+}
+
+struct nfs4_ff_layout_ds *
+nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
+ gfp_t gfp_flags);
+void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
+void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
+int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
+ struct nfs4_ff_layout_mirror *mirror, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ gfp_t gfp_flags);
+int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr, int *count,
+ const struct pnfs_layout_range *range);
+struct nfs_fh *
+nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx);
+
+struct nfs4_pnfs_ds *
+nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ bool fail_return);
+
+struct rpc_clnt *
+nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg,
+ u32 ds_idx,
+ struct nfs_client *ds_clp,
+ struct inode *inode);
+struct rpc_cred *ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg,
+ u32 ds_idx, struct rpc_cred *mdscred);
+bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg);
+#endif /* FS_NFS_NFS4FLEXFILELAYOUT_H */
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
new file mode 100644
index 0000000..5dae5c2
--- /dev/null
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -0,0 +1,552 @@
+/*
+ * Device operations for the pnfs nfs4 file layout driver.
+ *
+ * Copyright (c) 2014, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <[email protected]>
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/vmalloc.h>
+#include <linux/module.h>
+#include <linux/sunrpc/addr.h>
+
+#include "../internal.h"
+#include "../nfs4session.h"
+#include "flexfilelayout.h"
+
+#define NFSDBG_FACILITY NFSDBG_PNFS_LD
+
+static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
+static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
+
+void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
+{
+ if (mirror_ds)
+ nfs4_put_deviceid_node(&mirror_ds->id_node);
+}
+
+void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds)
+{
+ nfs4_print_deviceid(&mirror_ds->id_node.deviceid);
+ nfs4_pnfs_ds_put(mirror_ds->ds);
+ kfree(mirror_ds);
+}
+
+/* Decode opaque device data and construct new_ds using it */
+struct nfs4_ff_layout_ds *
+nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
+ gfp_t gfp_flags)
+{
+ struct xdr_stream stream;
+ struct xdr_buf buf;
+ struct page *scratch;
+ struct list_head dsaddrs;
+ struct nfs4_pnfs_ds_addr *da;
+ struct nfs4_ff_layout_ds *new_ds = NULL;
+ struct nfs4_ff_ds_version *ds_versions = NULL;
+ u32 mp_count;
+ u32 version_count;
+ __be32 *p;
+ int i, ret = -ENOMEM;
+
+ /* set up xdr stream */
+ scratch = alloc_page(gfp_flags);
+ if (!scratch)
+ goto out_err;
+
+ new_ds = kzalloc(sizeof(struct nfs4_ff_layout_ds), gfp_flags);
+ if (!new_ds)
+ goto out_scratch;
+
+ nfs4_init_deviceid_node(&new_ds->id_node,
+ server,
+ &pdev->dev_id);
+ INIT_LIST_HEAD(&dsaddrs);
+
+ xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen);
+ xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+
+ /* multipath count */
+ p = xdr_inline_decode(&stream, 4);
+ if (unlikely(!p))
+ goto out_err_drain_dsaddrs;
+ mp_count = be32_to_cpup(p);
+ dprintk("%s: multipath ds count %d\n", __func__, mp_count);
+
+ for (i = 0; i < mp_count; i++) {
+ /* multipath ds */
+ da = nfs4_decode_mp_ds_addr(server->nfs_client->cl_net,
+ &stream, gfp_flags);
+ if (da)
+ list_add_tail(&da->da_node, &dsaddrs);
+ }
+ if (list_empty(&dsaddrs)) {
+ dprintk("%s: no suitable DS addresses found\n",
+ __func__);
+ ret = -ENOMEDIUM;
+ goto out_err_drain_dsaddrs;
+ }
+
+ /* version count */
+ p = xdr_inline_decode(&stream, 4);
+ if (unlikely(!p))
+ goto out_err_drain_dsaddrs;
+ version_count = be32_to_cpup(p);
+ dprintk("%s: version count %d\n", __func__, version_count);
+
+ ds_versions = kzalloc(version_count * sizeof(struct nfs4_ff_ds_version),
+ gfp_flags);
+ if (!ds_versions)
+ goto out_scratch;
+
+ for (i = 0; i < version_count; i++) {
+ /* 20 = version(4) + minor_version(4) + rsize(4) + wsize(4) +
+ * tightly_coupled(4) */
+ p = xdr_inline_decode(&stream, 20);
+ if (unlikely(!p))
+ goto out_err_drain_dsaddrs;
+ ds_versions[i].version = be32_to_cpup(p++);
+ ds_versions[i].minor_version = be32_to_cpup(p++);
+ ds_versions[i].rsize = nfs_block_size(be32_to_cpup(p++), NULL);
+ ds_versions[i].wsize = nfs_block_size(be32_to_cpup(p++), NULL);
+ ds_versions[i].tightly_coupled = be32_to_cpup(p);
+
+ if (ds_versions[i].rsize > NFS_MAX_FILE_IO_SIZE)
+ ds_versions[i].rsize = NFS_MAX_FILE_IO_SIZE;
+ if (ds_versions[i].wsize > NFS_MAX_FILE_IO_SIZE)
+ ds_versions[i].wsize = NFS_MAX_FILE_IO_SIZE;
+
+ if (ds_versions[i].version != 3 || ds_versions[i].minor_version != 0) {
+ dprintk("%s: [%d] unsupported ds version %d-%d\n", __func__,
+ i, ds_versions[i].version,
+ ds_versions[i].minor_version);
+ ret = -EPROTONOSUPPORT;
+ goto out_err_drain_dsaddrs;
+ }
+
+ dprintk("%s: [%d] vers %u minor_ver %u rsize %u wsize %u coupled %d\n",
+ __func__, i, ds_versions[i].version,
+ ds_versions[i].minor_version,
+ ds_versions[i].rsize,
+ ds_versions[i].wsize,
+ ds_versions[i].tightly_coupled);
+ }
+
+ new_ds->ds_versions = ds_versions;
+ new_ds->ds_versions_cnt = version_count;
+
+ new_ds->ds = nfs4_pnfs_ds_add(&dsaddrs, gfp_flags);
+ if (!new_ds->ds)
+ goto out_err_drain_dsaddrs;
+
+ /* If DS was already in cache, free ds addrs */
+ while (!list_empty(&dsaddrs)) {
+ da = list_first_entry(&dsaddrs,
+ struct nfs4_pnfs_ds_addr,
+ da_node);
+ list_del_init(&da->da_node);
+ kfree(da->da_remotestr);
+ kfree(da);
+ }
+
+ __free_page(scratch);
+ return new_ds;
+
+out_err_drain_dsaddrs:
+ while (!list_empty(&dsaddrs)) {
+ da = list_first_entry(&dsaddrs, struct nfs4_pnfs_ds_addr,
+ da_node);
+ list_del_init(&da->da_node);
+ kfree(da->da_remotestr);
+ kfree(da);
+ }
+
+ kfree(ds_versions);
+out_scratch:
+ __free_page(scratch);
+out_err:
+ kfree(new_ds);
+
+ dprintk("%s ERROR: returning %d\n", __func__, ret);
+ return NULL;
+}
+
+static u64
+end_offset(u64 start, u64 len)
+{
+ u64 end;
+
+ end = start + len;
+ return end >= start ? end : NFS4_MAX_UINT64;
+}
+
+static void extend_ds_error(struct nfs4_ff_layout_ds_err *err,
+ u64 offset, u64 length)
+{
+ u64 end;
+
+ end = max_t(u64, end_offset(err->offset, err->length),
+ end_offset(offset, length));
+ err->offset = min_t(u64, err->offset, offset);
+ err->length = end - err->offset;
+}
+
+static bool ds_error_can_merge(struct nfs4_ff_layout_ds_err *err, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ nfs4_stateid *stateid,
+ struct nfs4_deviceid *deviceid)
+{
+ return err->status == status && err->opnum == opnum &&
+ nfs4_stateid_match(&err->stateid, stateid) &&
+ !memcmp(&err->deviceid, deviceid, sizeof(*deviceid)) &&
+ end_offset(err->offset, err->length) >= offset &&
+ err->offset <= end_offset(offset, length);
+}
+
+static bool merge_ds_error(struct nfs4_ff_layout_ds_err *old,
+ struct nfs4_ff_layout_ds_err *new)
+{
+ if (!ds_error_can_merge(old, new->offset, new->length, new->status,
+ new->opnum, &new->stateid, &new->deviceid))
+ return false;
+
+ extend_ds_error(old, new->offset, new->length);
+ return true;
+}
+
+static bool
+ff_layout_add_ds_error_locked(struct nfs4_flexfile_layout *flo,
+ struct nfs4_ff_layout_ds_err *dserr)
+{
+ struct nfs4_ff_layout_ds_err *err;
+
+ list_for_each_entry(err, &flo->error_list, list) {
+ if (merge_ds_error(err, dserr)) {
+ return true;
+ }
+ }
+
+ list_add(&dserr->list, &flo->error_list);
+ return false;
+}
+
+static bool
+ff_layout_update_ds_error(struct nfs4_flexfile_layout *flo, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ nfs4_stateid *stateid, struct nfs4_deviceid *deviceid)
+{
+ bool found = false;
+ struct nfs4_ff_layout_ds_err *err;
+
+ list_for_each_entry(err, &flo->error_list, list) {
+ if (ds_error_can_merge(err, offset, length, status, opnum,
+ stateid, deviceid)) {
+ found = true;
+ extend_ds_error(err, offset, length);
+ break;
+ }
+ }
+
+ return found;
+}
+
+int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
+ struct nfs4_ff_layout_mirror *mirror, u64 offset,
+ u64 length, int status, enum nfs_opnum4 opnum,
+ gfp_t gfp_flags)
+{
+ struct nfs4_ff_layout_ds_err *dserr;
+ bool needfree;
+
+ if (status == 0)
+ return 0;
+
+ if (mirror->mirror_ds == NULL)
+ return -EINVAL;
+
+ spin_lock(&flo->generic_hdr.plh_inode->i_lock);
+ if (ff_layout_update_ds_error(flo, offset, length, status, opnum,
+ &mirror->stateid,
+ &mirror->mirror_ds->id_node.deviceid)) {
+ spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
+ return 0;
+ }
+ spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
+ dserr = kmalloc(sizeof(*dserr), gfp_flags);
+ if (!dserr)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&dserr->list);
+ dserr->offset = offset;
+ dserr->length = length;
+ dserr->status = status;
+ dserr->opnum = opnum;
+ nfs4_stateid_copy(&dserr->stateid, &mirror->stateid);
+ memcpy(&dserr->deviceid, &mirror->mirror_ds->id_node.deviceid,
+ NFS4_DEVICEID4_SIZE);
+
+ spin_lock(&flo->generic_hdr.plh_inode->i_lock);
+ needfree = ff_layout_add_ds_error_locked(flo, dserr);
+ spin_unlock(&flo->generic_hdr.plh_inode->i_lock);
+ if (needfree)
+ kfree(dserr);
+
+ return 0;
+}
+
+/* currently we only support AUTH_NONE and AUTH_SYS */
+static rpc_authflavor_t
+nfs4_ff_layout_choose_authflavor(struct nfs4_ff_layout_mirror *mirror)
+{
+ if (mirror->uid == (u32)-1)
+ return RPC_AUTH_NULL;
+ return RPC_AUTH_UNIX;
+}
+
+/* fetch cred for NFSv3 DS */
+static int ff_layout_update_mirror_cred(struct nfs4_ff_layout_mirror *mirror,
+ struct nfs4_pnfs_ds *ds)
+{
+ if (ds && !mirror->cred && mirror->mirror_ds->ds_versions[0].version == 3) {
+ struct rpc_auth *auth = ds->ds_clp->cl_rpcclient->cl_auth;
+ struct rpc_cred *cred;
+ struct auth_cred acred = {
+ .uid = make_kuid(&init_user_ns, mirror->uid),
+ .gid = make_kgid(&init_user_ns, mirror->gid),
+ };
+
+ /* AUTH_NULL ignores acred */
+ cred = auth->au_ops->lookup_cred(auth, &acred, 0);
+ if (IS_ERR(cred)) {
+ dprintk("%s: lookup_cred failed with %ld\n",
+ __func__, PTR_ERR(cred));
+ return PTR_ERR(cred);
+ } else {
+ mirror->cred = cred;
+ }
+ }
+ return 0;
+}
+
+struct nfs_fh *
+nfs4_ff_layout_select_ds_fh(struct pnfs_layout_segment *lseg, u32 mirror_idx)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, mirror_idx);
+ struct nfs_fh *fh = NULL;
+ struct nfs4_deviceid_node *devid;
+
+ if (mirror == NULL || mirror->mirror_ds == NULL ||
+ mirror->mirror_ds->ds == NULL) {
+ printk(KERN_ERR "NFS: %s: No data server for mirror offset index %d\n",
+ __func__, mirror_idx);
+ if (mirror && mirror->mirror_ds) {
+ devid = &mirror->mirror_ds->id_node;
+ pnfs_generic_mark_devid_invalid(devid);
+ }
+ goto out;
+ }
+
+ /* FIXME: For now assume there is only 1 version available for the DS */
+ fh = &mirror->fh_versions[0];
+out:
+ return fh;
+}
+
+/* Upon return, either ds is connected, or ds is NULL */
+struct nfs4_pnfs_ds *
+nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ bool fail_return)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct nfs4_pnfs_ds *ds = NULL;
+ struct nfs4_deviceid_node *devid;
+ struct inode *ino = lseg->pls_layout->plh_inode;
+ struct nfs_server *s = NFS_SERVER(ino);
+ unsigned int max_payload;
+ rpc_authflavor_t flavor;
+
+ if (mirror == NULL || mirror->mirror_ds == NULL ||
+ mirror->mirror_ds->ds == NULL) {
+ printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
+ __func__, ds_idx);
+ if (mirror && mirror->mirror_ds) {
+ devid = &mirror->mirror_ds->id_node;
+ pnfs_generic_mark_devid_invalid(devid);
+ }
+ goto out;
+ }
+
+ ds = mirror->mirror_ds->ds;
+ devid = &mirror->mirror_ds->id_node;
+
+ /* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
+ smp_rmb();
+ if (ds->ds_clp)
+ goto out_test_devid;
+
+ flavor = nfs4_ff_layout_choose_authflavor(mirror);
+
+ /* FIXME: For now we assume the server sent only one version of NFS
+ * to use for the DS.
+ */
+ nfs4_pnfs_ds_connect(s, ds, devid, dataserver_timeo,
+ dataserver_retrans,
+ mirror->mirror_ds->ds_versions[0].version,
+ mirror->mirror_ds->ds_versions[0].minor_version,
+ flavor);
+
+ /* connect success, check rsize/wsize limit */
+ if (ds->ds_clp) {
+ max_payload =
+ nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
+ NULL);
+ if (mirror->mirror_ds->ds_versions[0].rsize > max_payload)
+ mirror->mirror_ds->ds_versions[0].rsize = max_payload;
+ if (mirror->mirror_ds->ds_versions[0].wsize > max_payload)
+ mirror->mirror_ds->ds_versions[0].wsize = max_payload;
+ } else {
+ ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
+ mirror, lseg->pls_range.offset,
+ lseg->pls_range.length, NFS4ERR_NXIO,
+ OP_ILLEGAL, GFP_NOIO);
+ if (fail_return) {
+ pnfs_error_mark_layout_for_return(ino, lseg);
+ if (ff_layout_has_available_ds(lseg))
+ pnfs_set_retry_layoutget(lseg->pls_layout);
+ else
+ pnfs_clear_retry_layoutget(lseg->pls_layout);
+
+ } else {
+ if (ff_layout_has_available_ds(lseg))
+ set_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE,
+ &lseg->pls_layout->plh_flags);
+ else {
+ pnfs_error_mark_layout_for_return(ino, lseg);
+ pnfs_clear_retry_layoutget(lseg->pls_layout);
+ }
+ }
+ }
+
+out_test_devid:
+ if (ff_layout_test_devid_unavailable(devid))
+ ds = NULL;
+out:
+ if (ff_layout_update_mirror_cred(mirror, ds))
+ ds = NULL;
+ return ds;
+}
+
+struct rpc_cred *
+ff_layout_get_ds_cred(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ struct rpc_cred *mdscred)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct rpc_cred *cred = ERR_PTR(-EINVAL);
+
+ if (!nfs4_ff_layout_prepare_ds(lseg, ds_idx, true))
+ goto out;
+
+ if (mirror && mirror->cred)
+ cred = mirror->cred;
+ else
+ cred = mdscred;
+out:
+ return cred;
+}
+
+/**
+* Find or create a DS rpc client with th MDS server rpc client auth flavor
+* in the nfs_client cl_ds_clients list.
+*/
+struct rpc_clnt *
+nfs4_ff_find_or_create_ds_client(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ struct nfs_client *ds_clp, struct inode *inode)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+
+ switch (mirror->mirror_ds->ds_versions[0].version) {
+ case 3:
+ /* For NFSv3 DS, flavor is set when creating DS connections */
+ return ds_clp->cl_rpcclient;
+ case 4:
+ return nfs4_find_or_create_ds_client(ds_clp, inode);
+ default:
+ BUG();
+ }
+}
+
+static bool is_range_intersecting(u64 offset1, u64 length1,
+ u64 offset2, u64 length2)
+{
+ u64 end1 = end_offset(offset1, length1);
+ u64 end2 = end_offset(offset2, length2);
+
+ return (end1 == NFS4_MAX_UINT64 || end1 > offset2) &&
+ (end2 == NFS4_MAX_UINT64 || end2 > offset1);
+}
+
+/* called with inode i_lock held */
+int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
+ struct xdr_stream *xdr, int *count,
+ const struct pnfs_layout_range *range)
+{
+ struct nfs4_ff_layout_ds_err *err, *n;
+ __be32 *p;
+
+ list_for_each_entry_safe(err, n, &flo->error_list, list) {
+ if (!is_range_intersecting(err->offset, err->length,
+ range->offset, range->length))
+ continue;
+ /* offset(8) + length(8) + stateid(NFS4_STATEID_SIZE)
+ * + deviceid(NFS4_DEVICEID4_SIZE) + status(4) + opnum(4)
+ */
+ p = xdr_reserve_space(xdr,
+ 24 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ p = xdr_encode_hyper(p, err->offset);
+ p = xdr_encode_hyper(p, err->length);
+ p = xdr_encode_opaque_fixed(p, &err->stateid,
+ NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, &err->deviceid,
+ NFS4_DEVICEID4_SIZE);
+ *p++ = cpu_to_be32(err->status);
+ *p++ = cpu_to_be32(err->opnum);
+ *count += 1;
+ list_del(&err->list);
+ kfree(err);
+ dprintk("%s: offset %llu length %llu status %d op %d count %d\n",
+ __func__, err->offset, err->length, err->status,
+ err->opnum, *count);
+ }
+
+ return 0;
+}
+
+bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg)
+{
+ struct nfs4_ff_layout_mirror *mirror;
+ struct nfs4_deviceid_node *devid;
+ int idx;
+
+ for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
+ mirror = FF_LAYOUT_COMP(lseg, idx);
+ if (mirror && mirror->mirror_ds) {
+ devid = &mirror->mirror_ds->id_node;
+ if (!ff_layout_test_devid_unavailable(devid))
+ return true;
+ }
+ }
+
+ return false;
+}
+
+module_param(dataserver_retrans, uint, 0644);
+MODULE_PARM_DESC(dataserver_retrans, "The number of times the NFSv4.1 client "
+ "retries a request before it attempts further "
+ " recovery action.");
+module_param(dataserver_timeo, uint, 0644);
+MODULE_PARM_DESC(dataserver_timeo, "The time (in tenths of a second) the "
+ "NFSv4.1 client waits for a response from a "
+ " data server before it retries an NFS request.");
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 297c8f6..5f6d409 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7787,9 +7787,7 @@ static void nfs4_layoutreturn_release(void *calldata)
spin_lock(&lo->plh_inode->i_lock);
if (lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
- clear_bit_unlock(NFS_LAYOUT_RETURN, &lo->plh_flags);
- smp_mb__after_atomic();
- wake_up_bit(&lo->plh_flags, NFS_LAYOUT_RETURN);
+ pnfs_clear_layoutreturn_waitbit(lo);
clear_bit(NFS_LAYOUT_RETURN_BEFORE_CLOSE, &lo->plh_flags);
rpc_wake_up(&NFS_SERVER(lo->plh_inode)->roc_rpcwaitq);
lo->plh_block_lgets--;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c4c9fe6..0fb0f19 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -910,7 +910,9 @@ send_layoutget(struct pnfs_layout_hdr *lo,
pnfs_layout_io_set_failed(lo, range->iomode);
}
return NULL;
- }
+ } else
+ pnfs_layout_clear_fail_bit(lo,
+ pnfs_iomode_to_fail_bit(range->iomode));
return lseg;
}
@@ -930,6 +932,13 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
}
}
+void pnfs_clear_layoutreturn_waitbit(struct pnfs_layout_hdr *lo)
+{
+ clear_bit_unlock(NFS_LAYOUT_RETURN, &lo->plh_flags);
+ smp_mb__after_atomic();
+ wake_up_bit(&lo->plh_flags, NFS_LAYOUT_RETURN);
+}
+
static int
pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
enum pnfs_iomode iomode, bool sync)
@@ -943,6 +952,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo, nfs4_stateid stateid,
status = -ENOMEM;
spin_lock(&ino->i_lock);
lo->plh_block_lgets--;
+ pnfs_clear_layoutreturn_waitbit(lo);
rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
spin_unlock(&ino->i_lock);
pnfs_put_layout_hdr(lo);
@@ -1418,6 +1428,15 @@ static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
TASK_UNINTERRUPTIBLE);
}
+static void pnfs_clear_first_layoutget(struct pnfs_layout_hdr *lo)
+{
+ unsigned long *bitlock = &lo->plh_flags;
+
+ clear_bit_unlock(NFS_LAYOUT_FIRST_LAYOUTGET, bitlock);
+ smp_mb__after_atomic();
+ wake_up_bit(bitlock, NFS_LAYOUT_FIRST_LAYOUTGET);
+}
+
/*
* Layout segment is retreived from the server if not cached.
* The appropriate layout segment is referenced and returned to the caller.
@@ -1499,6 +1518,8 @@ lookup_again:
spin_unlock(&ino->i_lock);
dprintk("%s wait for layoutreturn\n", __func__);
if (pnfs_prepare_to_retry_layoutget(lo)) {
+ if (first)
+ pnfs_clear_first_layoutget(lo);
pnfs_put_layout_hdr(lo);
dprintk("%s retrying\n", __func__);
goto lookup_again;
@@ -1533,13 +1554,8 @@ lookup_again:
pnfs_clear_retry_layoutget(lo);
atomic_dec(&lo->plh_outstanding);
out_put_layout_hdr:
- if (first) {
- unsigned long *bitlock = &lo->plh_flags;
-
- clear_bit_unlock(NFS_LAYOUT_FIRST_LAYOUTGET, bitlock);
- smp_mb__after_atomic();
- wake_up_bit(bitlock, NFS_LAYOUT_FIRST_LAYOUTGET);
- }
+ if (first)
+ pnfs_clear_first_layoutget(lo);
pnfs_put_layout_hdr(lo);
out:
dprintk("%s: inode %s/%llu pNFS layout segment %s for "
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 5643a07..d52796e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -278,6 +278,7 @@ struct pnfs_layout_segment *pnfs_update_layout(struct inode *ino,
u64 count,
enum pnfs_iomode iomode,
gfp_t gfp_flags);
+void pnfs_clear_layoutreturn_waitbit(struct pnfs_layout_hdr *lo);
void nfs4_deviceid_mark_client_invalid(struct nfs_client *clp);
int pnfs_read_done_resend_to_mds(struct nfs_pgio_header *);
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 022b761..de7c91c 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -516,6 +516,7 @@ enum pnfs_layouttype {
LAYOUT_NFSV4_1_FILES = 1,
LAYOUT_OSD2_OBJECTS = 2,
LAYOUT_BLOCK_VOLUME = 3,
+ LAYOUT_FLEX_FILES = 4,
};
/* used for both layout return and recall */
--
1.9.3
Signed-off-by: Tom Haynes <[email protected]>
---
Documentation/filesystems/nfs/pnfs.txt | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt
index adc81a3..44a9f24 100644
--- a/Documentation/filesystems/nfs/pnfs.txt
+++ b/Documentation/filesystems/nfs/pnfs.txt
@@ -57,15 +57,16 @@ bit is set, preventing any new lsegs from being added.
layout drivers
--------------
-PNFS utilizes what is called layout drivers. The STD defines 3 basic
-layout types: "files" "objects" and "blocks". For each of these types
-there is a layout-driver with a common function-vectors table which
-are called by the nfs-client pnfs-core to implement the different layout
-types.
+PNFS utilizes what is called layout drivers. The STD defines 4 basic
+layout types: "files", "objects", "blocks", and "flexfiles". For each
+of these types there is a layout-driver with a common function-vectors
+table which are called by the nfs-client pnfs-core to implement the
+different layout types.
-Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c
+Files-layout-driver code is in: fs/nfs/filelayout/.. directory
Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory
Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory
+Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory
objects-layout setup
--------------------
--
1.9.3