2012-03-21 19:46:53

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 00/12] NFSv4.1 file layout data server quick failover

From: Andy Adamson <[email protected]>

Changes from Version 1:

Patch removed as rpc_wake_up is now fixed.
-SUNRPC-add-rpc_drain_queue-to-empty-an-rpc_waitq.patch

Patch changed; the forechannel rpc_sleep_on slot_tbl_waitq action is
removed and it's functionality is moved into the rpc_call_prepare
routines.
- NFSv4.1-Check-invalid-deviceid-upon-slot-table-waitq.patch

Responded to the changes in the filelayout commit code:
Patch is simplified.
-NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid

A bug fix and a cleanup.
- NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists
- NFSv4.1 have filelayout_initiate_commit return void

Currently, when a data server connection goes down due to a network partion,
a data server failure, or an administrative action, RPC tasks in various
stages of the RPC finite state machine (FSM) need to transmit and timeout
(or other failure) before being redirected towards an alternative server
(MDS or another DS).
This can take a very long time if the connection goes down during a heavy
I/O load where the data server fore channel session slot_tbl_waitq and the
transport sending/pending waitqs are populated with many requests.
(see RedHat Bugzilla 756212 "Redirecting I/O through the MDS after a data
server network partition is very slow")
The current code also keeps the client structure and the session to the failed
data server until umount.

These patches address this problem by setting data server RPC tasks to
RPC_TASK_SOFTCONN and handling the resultant connection errors as follows:

* The pNFS deviceid is marked invalid which blocks any new pNFS io using that
deviceid.
* The RPC done routines for READ, WRITE and COMMIT redirect the requests
to the new server (MDS) and send the request back through the RPC FSM.
* An rpc_action which also redirects the request to the MDS on an invalid
deviceid is registered with the data server session fore channel
slot_tbl_waitq rpc_sleep_on calls and is executed upon wake up.
* The data server session fore channel slot_tbl_waitq is drained using a
new rpc_drain_queue method.
* All data server io requests reference the data server client structure
across io calls, and the client is dereferenced upon deviceid invalidation so
that the client (and the session) is freed upon the last (failed) redirected io.

Testing:
I use a pynfs file layout server with a DS to test. The pynfs server and DS
is modified to use the local host for MDS to DS communication. I add a
second ipv4 address to the single machine interface for the DS to client
communication. While a "dd" or a read/write heavy Connectathon test is
running, the DS ip address is removed from the ethernet interface, and the
client recovers io to the MDS.
I have tested READ and WRITE recovery multiple times, and have managed to
time the removal of the DS ip address during a DS COMMIT and have seen it
recover as well. :)


Comments welcome

--> Andy


Andy Adamson (12):
NFSv4.1 move nfs4_reset_read and nfs_reset_write
NFSv4.1: cleanup filelayout invalid deviceid handling
NFSv4.1 cleanup filelayout invalid layout handling
NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
NFSv4.1: mark deviceid invalid on filelayout DS connection errors
NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid
NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
NFSv4.1 wake up all tasks on un-connected DS slot table waitq
NFSv4.1 ref count nfs_client across filelayout data server io
NFSv4.1 de reference a disconnected data server client record
NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists
NFSv4.1 have filelayout_initiate_commit return void

fs/nfs/internal.h | 11 ++-
fs/nfs/nfs4filelayout.c | 204 +++++++++++++++++++++++++++++++------------
fs/nfs/nfs4filelayout.h | 28 +++++-
fs/nfs/nfs4filelayoutdev.c | 54 ++++++------
fs/nfs/nfs4proc.c | 39 +--------
fs/nfs/pnfs.h | 3 +-
fs/nfs/read.c | 6 +-
fs/nfs/write.c | 13 ++--
8 files changed, 221 insertions(+), 137 deletions(-)

--
1.7.6.4



2012-03-21 19:46:54

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 06/12] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 1f1be26..fdec7a8 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -860,10 +860,11 @@ filelayout_choose_commit_list(struct nfs_page *req,
struct pnfs_layout_segment *lseg)
{
struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
u32 i, j;
struct list_head *list;

- if (fl->commit_through_mds)
+ if (fl->commit_through_mds || filelayout_test_devid_invalid(devid))
return &NFS_I(req->wb_context->dentry->d_inode)->commit_list;

/* Note that we are calling nfs4_fl_calc_j_index on each page
--
1.7.6.4


2012-03-21 19:46:52

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 03/12] NFSv4.1 cleanup filelayout invalid layout handling

From: Andy Adamson <[email protected]>

The invalid layout bits are should only be used to block LAYOUTGETs.

Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 26 ++++++--------------------
1 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index cb9ea7e..acafc4d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -187,10 +187,8 @@ static int filelayout_read_done_cb(struct rpc_task *task,
data->ds_clp, &reset) == -EAGAIN) {
dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
__func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset) {
- pnfs_set_lo_fail(data->lseg);
+ if (reset)
filelayout_reset_read(task, data);
- }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -268,10 +266,8 @@ static int filelayout_write_done_cb(struct rpc_task *task,
data->ds_clp, &reset) == -EAGAIN) {
dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
__func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset) {
- pnfs_set_lo_fail(data->lseg);
+ if (reset)
filelayout_reset_write(task, data);
- }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -300,10 +296,9 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
data->ds_clp, &reset) == -EAGAIN) {
dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
__func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset) {
+ if (reset)
prepare_to_resend_writes(data);
- pnfs_set_lo_fail(data->lseg);
- } else
+ else
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -396,12 +391,8 @@ filelayout_read_pagelist(struct nfs_read_data *data)
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
- if (!ds) {
- /* Either layout fh index faulty, or ds connect failed */
- set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
- set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+ if (!ds)
return PNFS_NOT_ATTEMPTED;
- }
dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);

/* No multipath support. Use first DS */
@@ -435,11 +426,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
- if (!ds) {
- set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
- set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+ if (!ds)
return PNFS_NOT_ATTEMPTED;
- }
dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
data->inode->i_ino, sync, (size_t) data->args.count, offset,
ds->ds_remotestr);
@@ -914,8 +902,6 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
- set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
prepare_to_resend_writes(data);
filelayout_commit_release(data);
return -EAGAIN;
--
1.7.6.4


2012-03-21 19:46:56

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 11/12] NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 442ebf6..3bd7e87 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -296,9 +296,10 @@ static inline int
pnfs_scan_commit_lists(struct inode *inode, int max, spinlock_t *lock)
{
struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
+ struct pnfs_layout_hdr *lh = NFS_I(inode)->layout;
int ret;

- if (ld == NULL || ld->scan_commit_lists == NULL)
+ if (ld == NULL || ld->scan_commit_lists == NULL || lh == NULL)
return 0;
ret = ld->scan_commit_lists(inode, max, lock);
if (ret != 0)
--
1.7.6.4


2012-03-21 19:46:54

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 05/12] NFSv4.1: mark deviceid invalid on filelayout DS connection errors

From: Andy Adamson <[email protected]>

This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 65 ++++++++++++++++++++++++++++++++++------------
fs/nfs/nfs4filelayout.h | 6 ++++
2 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 3802937..1f1be26 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -116,7 +116,7 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
static int filelayout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
- int *reset)
+ unsigned long *reset)
{
struct nfs_server *mds_server = NFS_SERVER(state->inode);
struct nfs_client *mds_client = mds_server->nfs_client;
@@ -158,10 +158,23 @@ static int filelayout_async_handle_error(struct rpc_task *task,
break;
case -NFS4ERR_RETRY_UNCACHED_REP:
break;
+ /* RPC connection errors */
+ case -ECONNREFUSED:
+ case -EHOSTDOWN:
+ case -EHOSTUNREACH:
+ case -ENETUNREACH:
+ case -EIO:
+ case -ETIMEDOUT:
+ case -EPIPE:
+ dprintk("%s DS connection error. Retry through MDS %d\n",
+ __func__, task->tk_status);
+ set_bit(NFS4_RESET_DEVICEID, reset);
+ set_bit(NFS4_RESET_TO_MDS, reset);
+ break;
default:
- dprintk("%s DS error. Retry through MDS %d\n", __func__,
- task->tk_status);
- *reset = 1;
+ dprintk("%s Unhandled DS error. Retry through MDS %d\n",
+ __func__, task->tk_status);
+ set_bit(NFS4_RESET_TO_MDS, reset);
break;
}
out:
@@ -179,16 +192,22 @@ wait_on_recovery:
static int filelayout_read_done_cb(struct rpc_task *task,
struct nfs_read_data *data)
{
- int reset = 0;
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ unsigned long reset = 0;

dprintk("%s DS read\n", __func__);

if (filelayout_async_handle_error(task, data->args.context->state,
data->ds_clp, &reset) == -EAGAIN) {
- dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
- __func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset)
+
+ dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
+ reset, data->ds_clp, data->ds_clp->cl_session);
+
+ if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_read(task, data);
+ if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ filelayout_mark_devid_invalid(devid);
+ }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -260,14 +279,20 @@ static void filelayout_read_release(void *data)
static int filelayout_write_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
- int reset = 0;
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
data->ds_clp, &reset) == -EAGAIN) {
- dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
- __func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset)
+
+ dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
+ reset, data->ds_clp, data->ds_clp->cl_session);
+
+ if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_write(task, data);
+ if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ filelayout_mark_devid_invalid(devid);
+ }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -290,16 +315,22 @@ static void prepare_to_resend_writes(struct nfs_write_data *data)
static int filelayout_commit_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
- int reset = 0;
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
data->ds_clp, &reset) == -EAGAIN) {
- dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
- __func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset)
+
+ dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
+ reset, data->ds_clp, data->ds_clp->cl_session);
+
+ if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
prepare_to_resend_writes(data);
- else
+ if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ filelayout_mark_devid_invalid(devid);
+ } else {
rpc_restart_call_prepare(task);
+ }
return -EAGAIN;
}

diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index b54b389..08b667a 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -41,6 +41,12 @@
#define NFS4_PNFS_MAX_STRIPE_CNT 4096
#define NFS4_PNFS_MAX_MULTI_CNT 256 /* 256 fit into a u8 stripe_index */

+/* internal use */
+enum nfs4_fl_reset_state {
+ NFS4_RESET_TO_MDS = 0,
+ NFS4_RESET_DEVICEID,
+};
+
enum stripetype4 {
STRIPE_SPARSE = 1,
STRIPE_DENSE = 2
--
1.7.6.4


2012-03-21 19:46:52

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 02/12] NFSv4.1: cleanup filelayout invalid deviceid handling

From: Andy Adamson <[email protected]>

Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.

An invalid device prevents pNFS io.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 10 ----------
fs/nfs/nfs4filelayout.h | 21 +++++++++++++++++----
fs/nfs/nfs4filelayoutdev.c | 37 +++++++++++--------------------------
3 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 36a65ce..cb9ea7e 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -392,9 +392,6 @@ filelayout_read_pagelist(struct nfs_read_data *data)
__func__, data->inode->i_ino,
data->args.pgbase, (size_t)data->args.count, offset);

- if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
- return PNFS_NOT_ATTEMPTED;
-
/* Retrieve the correct rpc_client for the byte range */
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
@@ -434,16 +431,11 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
struct nfs_fh *fh;
int status;

- if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
- return PNFS_NOT_ATTEMPTED;
-
/* Retrieve the correct rpc_client for the byte range */
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
- __func__);
set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
return PNFS_NOT_ATTEMPTED;
@@ -922,8 +914,6 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
- __func__);
set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
prepare_to_resend_writes(data);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 21190bb..b54b389 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -62,12 +62,8 @@ struct nfs4_pnfs_ds {
atomic_t ds_count;
};

-/* nfs4_file_layout_dsaddr flags */
-#define NFS4_DEVICE_ID_NEG_ENTRY 0x00000001
-
struct nfs4_file_layout_dsaddr {
struct nfs4_deviceid_node id_node;
- unsigned long flags;
u32 stripe_count;
u8 *stripe_indices;
u32 ds_num;
@@ -107,6 +103,23 @@ FILELAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg)
return &FILELAYOUT_LSEG(lseg)->dsaddr->id_node;
}

+static inline void
+filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
+{
+ u32 *p = (u32 *)&node->deviceid;
+
+ printk(KERN_WARNING "NFS: Deviceid [%x%x%x%x] marked out of use.\n",
+ p[0], p[1], p[2], p[3]);
+
+ set_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
+static inline bool
+filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
+{
+ return test_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
extern struct nfs_fh *
nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);

diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index a866bbd..2b8ae96 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -791,48 +791,33 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
return flseg->fh_array[i];
}

-static void
-filelayout_mark_devid_negative(struct nfs4_file_layout_dsaddr *dsaddr,
- int err, const char *ds_remotestr)
-{
- u32 *p = (u32 *)&dsaddr->id_node.deviceid;
-
- printk(KERN_ERR "NFS: data server %s connection error %d."
- " Deviceid [%x%x%x%x] marked out of use.\n",
- ds_remotestr, err, p[0], p[1], p[2], p[3]);
-
- spin_lock(&nfs4_ds_cache_lock);
- dsaddr->flags |= NFS4_DEVICE_ID_NEG_ENTRY;
- spin_unlock(&nfs4_ds_cache_lock);
-}
-
struct nfs4_pnfs_ds *
nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
{
struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
+
+ if (filelayout_test_devid_invalid(devid))
+ return NULL;

if (ds == NULL) {
printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
__func__, ds_idx);
- return NULL;
+ goto mark_dev_invalid;
}

if (!ds->ds_clp) {
struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
int err;

- if (dsaddr->flags & NFS4_DEVICE_ID_NEG_ENTRY) {
- /* Already tried to connect, don't try again */
- dprintk("%s Deviceid marked out of use\n", __func__);
- return NULL;
- }
err = nfs4_ds_connect(s, ds);
- if (err) {
- filelayout_mark_devid_negative(dsaddr, err,
- ds->ds_remotestr);
- return NULL;
- }
+ if (err)
+ goto mark_dev_invalid;
}
return ds;
+
+mark_dev_invalid:
+ filelayout_mark_devid_invalid(devid);
+ return NULL;
}
--
1.7.6.4


2012-03-22 13:44:24

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 2 05/12] NFSv4.1: mark deviceid invalid on filelayout DS connection errors

T24gVGh1LCAyMDEyLTAzLTIyIGF0IDEzOjIzICswMDAwLCBBZGFtc29uLCBBbmR5IHdyb3RlOg0K
PiBPbiBNYXIgMjEsIDIwMTIsIGF0IDQ6MzkgUE0sIE15a2xlYnVzdCwgVHJvbmQgd3JvdGU6DQo+
IA0KPiA+IE9uIFdlZCwgMjAxMi0wMy0yMSBhdCAxNTo0NiAtMDQwMCwgYW5kcm9zQG5ldGFwcC5j
b20gd3JvdGU6DQo+ID4+IEZyb206IEFuZHkgQWRhbXNvbiA8YW5kcm9zQG5ldGFwcC5jb20+DQo+
ID4+IA0KPiA+PiBUaGlzIHByZXZlbnRzIHRoZSB1c2Ugb2YgYW55IGxheW91dCBmb3IgaS9vIHRo
YXQgcmVmZXJlbmNlcyB0aGUgZGV2aWNlaWQuDQo+ID4+IEkvTyBpcyByZWRpcmVjdGVkIHRocm91
Z2ggdGhlIE1EUy4NCj4gPj4gDQo+ID4+IFJlZGlyZWN0IHRoZSB1bmhhbmRsZWQgZmFpbGVkIEkv
TyB0byB0aGUgTURTIHdpdGhvdXQgbWFya2luZyBlaXRoZXIgdGhlDQo+ID4+IGxheW91dCBvciB0
aGUgZGV2aWNlaWQgaW52YWxpZC4NCj4gPj4gDQo+ID4+IFNpZ25lZC1vZmYtYnk6IEFuZHkgQWRh
bXNvbiA8YW5kcm9zQG5ldGFwcC5jb20+DQo+ID4+IC0tLQ0KPiA+PiBmcy9uZnMvbmZzNGZpbGVs
YXlvdXQuYyB8ICAgNjUgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKy0tLS0tLS0t
LS0tLQ0KPiA+PiBmcy9uZnMvbmZzNGZpbGVsYXlvdXQuaCB8ICAgIDYgKysrKw0KPiA+PiAyIGZp
bGVzIGNoYW5nZWQsIDU0IGluc2VydGlvbnMoKyksIDE3IGRlbGV0aW9ucygtKQ0KPiA+PiANCj4g
Pj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9uZnM0ZmlsZWxheW91dC5jIGIvZnMvbmZzL25mczRmaWxl
bGF5b3V0LmMNCj4gPj4gaW5kZXggMzgwMjkzNy4uMWYxYmUyNiAxMDA2NDQNCj4gPj4gLS0tIGEv
ZnMvbmZzL25mczRmaWxlbGF5b3V0LmMNCj4gPj4gKysrIGIvZnMvbmZzL25mczRmaWxlbGF5b3V0
LmMNCj4gPj4gQEAgLTExNiw3ICsxMTYsNyBAQCB2b2lkIGZpbGVsYXlvdXRfcmVzZXRfcmVhZChz
dHJ1Y3QgcnBjX3Rhc2sgKnRhc2ssIHN0cnVjdCBuZnNfcmVhZF9kYXRhICpkYXRhKQ0KPiA+PiBz
dGF0aWMgaW50IGZpbGVsYXlvdXRfYXN5bmNfaGFuZGxlX2Vycm9yKHN0cnVjdCBycGNfdGFzayAq
dGFzaywNCj4gPj4gCQkJCQkgc3RydWN0IG5mczRfc3RhdGUgKnN0YXRlLA0KPiA+PiAJCQkJCSBz
dHJ1Y3QgbmZzX2NsaWVudCAqY2xwLA0KPiA+PiAtCQkJCQkgaW50ICpyZXNldCkNCj4gPj4gKwkJ
CQkJIHVuc2lnbmVkIGxvbmcgKnJlc2V0KQ0KPiA+PiB7DQo+ID4+IAlzdHJ1Y3QgbmZzX3NlcnZl
ciAqbWRzX3NlcnZlciA9IE5GU19TRVJWRVIoc3RhdGUtPmlub2RlKTsNCj4gPj4gCXN0cnVjdCBu
ZnNfY2xpZW50ICptZHNfY2xpZW50ID0gbWRzX3NlcnZlci0+bmZzX2NsaWVudDsNCj4gPj4gQEAg
LTE1OCwxMCArMTU4LDIzIEBAIHN0YXRpYyBpbnQgZmlsZWxheW91dF9hc3luY19oYW5kbGVfZXJy
b3Ioc3RydWN0IHJwY190YXNrICp0YXNrLA0KPiA+PiAJCWJyZWFrOw0KPiA+PiAJY2FzZSAtTkZT
NEVSUl9SRVRSWV9VTkNBQ0hFRF9SRVA6DQo+ID4+IAkJYnJlYWs7DQo+ID4+ICsJLyogUlBDIGNv
bm5lY3Rpb24gZXJyb3JzICovDQo+ID4+ICsJY2FzZSAtRUNPTk5SRUZVU0VEOg0KPiA+PiArCWNh
c2UgLUVIT1NURE9XTjoNCj4gPj4gKwljYXNlIC1FSE9TVFVOUkVBQ0g6DQo+ID4+ICsJY2FzZSAt
RU5FVFVOUkVBQ0g6DQo+ID4+ICsJY2FzZSAtRUlPOg0KPiA+PiArCWNhc2UgLUVUSU1FRE9VVDoN
Cj4gPj4gKwljYXNlIC1FUElQRToNCj4gPj4gKwkJZHByaW50aygiJXMgRFMgY29ubmVjdGlvbiBl
cnJvci4gUmV0cnkgdGhyb3VnaCBNRFMgJWRcbiIsDQo+ID4+ICsJCQlfX2Z1bmNfXywgdGFzay0+
dGtfc3RhdHVzKTsNCj4gPj4gKwkJc2V0X2JpdChORlM0X1JFU0VUX0RFVklDRUlELCByZXNldCk7
DQo+ID4+ICsJCXNldF9iaXQoTkZTNF9SRVNFVF9UT19NRFMsIHJlc2V0KTsNCj4gPj4gKwkJYnJl
YWs7DQo+ID4+IAlkZWZhdWx0Og0KPiA+PiAtCQlkcHJpbnRrKCIlcyBEUyBlcnJvci4gUmV0cnkg
dGhyb3VnaCBNRFMgJWRcbiIsIF9fZnVuY19fLA0KPiA+PiAtCQkJdGFzay0+dGtfc3RhdHVzKTsN
Cj4gPj4gLQkJKnJlc2V0ID0gMTsNCj4gPj4gKwkJZHByaW50aygiJXMgVW5oYW5kbGVkIERTIGVy
cm9yLiBSZXRyeSB0aHJvdWdoIE1EUyAlZFxuIiwNCj4gPj4gKwkJCV9fZnVuY19fLCB0YXNrLT50
a19zdGF0dXMpOw0KPiA+PiArCQlzZXRfYml0KE5GUzRfUkVTRVRfVE9fTURTLCByZXNldCk7DQo+
ID4+IAkJYnJlYWs7DQo+ID4+IAl9DQo+ID4+IG91dDoNCj4gPj4gQEAgLTE3OSwxNiArMTkyLDIy
IEBAIHdhaXRfb25fcmVjb3Zlcnk6DQo+ID4+IHN0YXRpYyBpbnQgZmlsZWxheW91dF9yZWFkX2Rv
bmVfY2Ioc3RydWN0IHJwY190YXNrICp0YXNrLA0KPiA+PiAJCQkJc3RydWN0IG5mc19yZWFkX2Rh
dGEgKmRhdGEpDQo+ID4+IHsNCj4gPj4gLQlpbnQgcmVzZXQgPSAwOw0KPiA+PiArCXN0cnVjdCBu
ZnM0X2RldmljZWlkX25vZGUgKmRldmlkID0gRklMRUxBWU9VVF9ERVZJRF9OT0RFKGRhdGEtPmxz
ZWcpOw0KPiA+PiArCXVuc2lnbmVkIGxvbmcgcmVzZXQgPSAwOw0KPiA+PiANCj4gPj4gCWRwcmlu
dGsoIiVzIERTIHJlYWRcbiIsIF9fZnVuY19fKTsNCj4gPj4gDQo+ID4+IAlpZiAoZmlsZWxheW91
dF9hc3luY19oYW5kbGVfZXJyb3IodGFzaywgZGF0YS0+YXJncy5jb250ZXh0LT5zdGF0ZSwNCj4g
Pj4gCQkJCQkgIGRhdGEtPmRzX2NscCwgJnJlc2V0KSA9PSAtRUFHQUlOKSB7DQo+ID4+IC0JCWRw
cmludGsoIiVzIGNhbGxpbmcgcmVzdGFydCBkc19jbHAgJXAgZHNfY2xwLT5jbF9zZXNzaW9uICVw
XG4iLA0KPiA+PiAtCQkJX19mdW5jX18sIGRhdGEtPmRzX2NscCwgZGF0YS0+ZHNfY2xwLT5jbF9z
ZXNzaW9uKTsNCj4gPj4gLQkJaWYgKHJlc2V0KQ0KPiA+PiArDQo+ID4+ICsJCWRwcmludGsoIiVz
IHJlc2V0IDB4JWx4IGRzX2NscCAlcCBzZXNzaW9uICVwXG4iLCBfX2Z1bmNfXywNCj4gPj4gKwkJ
CXJlc2V0LCBkYXRhLT5kc19jbHAsIGRhdGEtPmRzX2NscC0+Y2xfc2Vzc2lvbik7DQo+ID4+ICsN
Cj4gPj4gKwkJaWYgKHRlc3RfYml0KE5GUzRfUkVTRVRfVE9fTURTLCAmcmVzZXQpKSB7DQo+ID4+
IAkJCWZpbGVsYXlvdXRfcmVzZXRfcmVhZCh0YXNrLCBkYXRhKTsNCj4gPj4gKwkJCWlmICh0ZXN0
X2JpdChORlM0X1JFU0VUX0RFVklDRUlELCAmcmVzZXQpKQ0KPiA+PiArCQkJCWZpbGVsYXlvdXRf
bWFya19kZXZpZF9pbnZhbGlkKGRldmlkKTsNCj4gPiANCj4gPiBJcyB0aGVyZSBhbnkgcmVhc29u
IHdoeSB3ZSBzaG91bGRuJ3QganVzdCBkbyB0aGUNCj4gPiBmaWxlbGF5b3V0X21hcmtfZGV2aWRf
aW52YWxpZCgpIHdpdGhpbiBmaWxlbGF5b3V0X2FzeW5jX2hhbmRsZV9lcnJvcigpDQo+ID4gaW5z
dGVhZCBvZiBoYXZpbmcgdGhlIGNhbGxlciBkbyBpdD8NCj4gDQo+IFdlIHdvdWxkIGhhdmUgdG8g
cGFzcyBpbiB0aGUgbHNlZyBhcmd1bWVudC4NCg0KTm8uIFlvdSdkIG9ubHkgaGF2ZSB0byBwYXNz
IGluIHRoZSBkZXZpY2UgaWQuDQoNCj4gPiANCj4gPiBUaGF0IHNob3VsZCBhbHNvIGVuYWJsZSB1
cyB0byBnZXQgcmlkIG9mIHRoZSB3aG9sZSAncmVzZXQnIGFyZ3VtZW50IGFuZA0KPiA+IHJlcGxh
Y2UgaXQgd2l0aCBhIHJldHVybiB2YWx1ZSAhPSAwICYmICE9IC1FQUdBSU4uDQo+IA0KPiBXZSB3
b3VsZCBuZWVkIHRvIHBhc3MgaW4gdGhlIG9wZXJhdGlvbiBiZWNhdXNlIFJFQUQvV1JJVEUvQ09N
TUlUIGNhbGwgZGlmZmVyZW50IHJlc2V0IGZ1bmN0aW9ucy4NCg0KTm8uIFRoZSBjYWxsZXIgd291
bGQgc3RpbGwgZG8gdGhlIHJlc2V0LiBJdCdzIGp1c3QgdGhhdCB5b3Ugd291bGQgaGF2ZSBhDQpz
cGVjaWFsIHJldHVybiB2YWx1ZSB0byBpbmRpY2F0ZSBpdC4NCg0KPiBJIGNob3NlIHRoZSAncmVz
ZXQnIGFyZ3VtZW50IG1ldGhvZCBpbnN0ZWFkIG9mIHBhc3NpbmcgaW4gdGhlIG9wZXJhdGlvbiBi
ZWNhdXNlIEkgdGhvdWdodCBpdCBjbGVhbmVyIHRvIGtlZXAgdGhlIHBlciBvcGVyYXRpb24gbG9n
aWMgaW4gdGhlIHBlciBvcGVyYXRpb24gcnBjIGZ1bmN0aW9ucyBpbnN0ZWFkIG9mIGhhdmluZyBh
IHN3aXRjaChvcGVyYXRpb24pIHN0YXRlbWVudCBpbiB0aGUgYXN5bmMgaGFuZGxlciBmb3IgZWFj
aCBncm91cCBvZiBlcnJvcnMuDQoNCk15IHBvaW50IGlzIHRoYXQgeW91IGRvbid0IG5lZWQgYW4g
ZXh0cmEgcGFyYW1ldGVyIHRvIGRvIHRoaXMuIFlvdSBqdXN0DQpuZWVkIGEgc3BlY2lhbCByZXR1
cm4gdmFsdWUuDQoNCj4gPiANCj4gPj4gKwkJfQ0KPiA+PiAJCXJwY19yZXN0YXJ0X2NhbGxfcHJl
cGFyZSh0YXNrKTsNCj4gPiANCj4gPiBUaGlzIGNhbiBwcm9iYWJseSBhbHNvIGJlIGRvbmUgaW5z
aWRlIGZpbGVsYXlvdXRfYXN5bmNfaGFuZGxlX2Vycm9yKCksDQo+ID4gQlRXLg0KPiANCj4gQ09N
TUlUIGRvZXMgbm90IGNhbGwgcnBjX3Jlc3RhcnRfY2FsbF9wcmVwYXJlIG9uIHJlc2V0ICBvciBp
bnZhbGlkIGRldmljZWlkIGVycm9ycyBiZWNhdXNlIHdlIHdhbnQgdGhlIHJlbGVhc2UgZnVuY3Rp
b24gdG8gZmFpbCB3aXRoIGEgdmVyaWZpZXIgbWlzbWF0Y2guDQoNCldoaWNoIGlzIGEgZ3JvdGVz
cXVlIGhhY2sgaW4gaXRzZWxmLi4uICh0aGUgdmVyaWZpZXIgaGFjaywgdGhhdCBpcykuIEknbQ0K
aG9waW5nIHRoYXQgRnJlZCB3aWxsIGZpeCB0aGF0IGluIHRoZSBuZXcgY29kZS4gSXQgZG9lc24n
dCB0YWtlIG11Y2g6DQpqdXN0IGZsYWcgaW4gdGhlIHN0cnVjdCBuZnNfd3JpdGVfZGF0YSAob3Ig
aW4gaGlzIGNhc2U6IHN0cnVjdA0KbmZzX2NvbW1pdF9kYXRhKS4NCg0KPiBTbyBpdCdzIHVwIHRv
IHlvdTogSSBjb3VsZCBtb3ZlIGFsbCB0aGUgcGVyIG9wZXJhdGlvbiBsb2dpYyBpbnRvIHRoZSBh
c3luYyBoYW5kbGVyIGJ5IHBhc3NpbmcgaW4gdGhlIG9wZXJhdGlvbiBhbmQgdXNpbmcgaXQgdG8g
ZGVyZWZlcmVuY2luZyB0aGUgdGtfY2FsbGJhY2sgdG8gZ2V0IHRoZSBsc2VnIGFuZCBvdGhlciBu
ZWVkZWQgcGFyYW1ldGVycyAtIHRoZW4gbW92aW5nIHRoZSBjb2RlIGZyb20gdGhlIGRvbmVfY2Ig
cm91dGluZXMgaW50byB0aGUgYXN5bmMgaGFuZGxlciB1bmRlciBhIHN3aXRjaChvcGVyYXRpb24p
IGZvciBib3RoIHRoZSBkZWZhdWx0IHJlc2V0IHRvIG1kcyBhbmQgZm9yIHRoZSBpbnZhbGlkIGRl
dmljZWQuDQoNClRoZW4gbGV0J3Mga2VlcCB0aGUgcnBjX3Jlc3RhcnRfY2FsbF9wcmVwYXJlIGlu
IHRoZSBjYWxsZXIsIGJ1dCBtb3ZlIHRoZQ0KZGV2aWNlaWQgaW52YWxpZGF0aW9uIGludG8gdGhl
IGFzeW5jX2hhbmRlci4gTGV0J3MgYWxzbyByZXBsYWNlIHRoZQ0KcmVzZXQgYXJndW1lbnQgd2l0
aCBhIHJldHVybiB2YWx1ZS4uLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs
aWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3
dy5uZXRhcHAuY29tDQoNCg==

2012-03-21 19:46:54

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 07/12] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup

From: Andy Adamson <[email protected]>

Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state.
Reset the task for io through the MDS if the deviceid is invalid.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 26 +++++++++++++++++++++++++-
1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index fdec7a8..b73818f 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -241,6 +241,12 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
{
struct nfs_read_data *rdata = (struct nfs_read_data *)data;

+ if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(rdata->lseg))) {
+ dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+ filelayout_reset_read(task, rdata);
+ rpc_restart_call_prepare(task);
+ return;
+ }
rdata->read_done_cb = filelayout_read_done_cb;

if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
@@ -341,6 +347,12 @@ static void filelayout_write_prepare(struct rpc_task *task, void *data)
{
struct nfs_write_data *wdata = (struct nfs_write_data *)data;

+ if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(wdata->lseg))) {
+ dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+ filelayout_reset_write(task, wdata);
+ rpc_restart_call_prepare(task);
+ return;
+ }
if (nfs41_setup_sequence(wdata->ds_clp->cl_session,
&wdata->args.seq_args, &wdata->res.seq_res,
task))
@@ -372,6 +384,18 @@ static void filelayout_write_release(void *data)
wdata->mds_ops->rpc_release(data);
}

+static void filelayout_commit_prepare(struct rpc_task *task, void *data)
+{
+ struct nfs_write_data *wdata = (struct nfs_write_data *)data;
+
+ if (nfs41_setup_sequence(wdata->ds_clp->cl_session,
+ &wdata->args.seq_args, &wdata->res.seq_res,
+ task))
+ return;
+
+ rpc_call_start(task);
+}
+
static void filelayout_commit_release(void *data)
{
struct nfs_write_data *wdata = (struct nfs_write_data *)data;
@@ -398,7 +422,7 @@ static const struct rpc_call_ops filelayout_write_call_ops = {
};

static const struct rpc_call_ops filelayout_commit_call_ops = {
- .rpc_call_prepare = filelayout_write_prepare,
+ .rpc_call_prepare = filelayout_commit_prepare,
.rpc_call_done = filelayout_write_call_done,
.rpc_count_stats = filelayout_write_count_stats,
.rpc_release = filelayout_commit_release,
--
1.7.6.4


2012-03-25 14:59:40

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH Version 2 12/12] NFSv4.1 have filelayout_initiate_commit return void

On Wed, Mar 21, 2012 at 4:47 PM, Myklebust, Trond
<[email protected]> wrote:
> On Wed, 2012-03-21 at 15:46 -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> The return is ignored.
>
> Yes, but should it be? See the discussion between Fred and myself on the
> list yesterday. I still don't see why we should report some errors and
> ignore others...
>
>

I'll come back to this in a day or two, but let me note the following
problem with nfs_commit_inode, pnfs, and error reporting.

Consider nfs_wb_page, which causes a FLUSH_SYNC nfs_commit_inode.

Pre-pnfs, nfs_commit_inode either returned an error, or sent the
COMMIT. Even if the page didn't make it into the "to be cleaned" list
due to the INT_MAX restriction, the COMMIT was sent and on non-eror
return it was safe to assume that the page data was on disk.

But with pnfs (and commit to DS), lets say that the writes are
distributed between 2 data servers. On a non-error return from
nfs_commit_inode there is now no guarantee that a COMMIT will go to
the DS hosting the page. That is solely due to the INT_MAX
restriction. It of course gets even worse with the current
filelayout_commit_pagelist if an alloc fails.


Fred

2012-03-21 19:46:57

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 12/12] NFSv4.1 have filelayout_initiate_commit return void

From: Andy Adamson <[email protected]>

The return is ignored.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index ee0159b..9df7c17 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -972,7 +972,7 @@ select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
return flseg->fh_array[i];
}

-static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
+static void filelayout_initiate_commit(struct nfs_write_data *data, int how)
{
struct pnfs_layout_segment *lseg = data->lseg;
struct nfs4_pnfs_ds *ds;
@@ -984,7 +984,7 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
if (!ds) {
prepare_to_resend_writes(data);
filelayout_commit_release(data);
- return -EAGAIN;
+ return;
}
dprintk("%s ino %lu, how %d cl_count %d\n", __func__,
data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count));
@@ -994,9 +994,9 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
if (fh)
data->args.fh = fh;
- return nfs_initiate_commit(data, ds->ds_clp->cl_rpcclient,
- &filelayout_commit_call_ops, how,
- RPC_TASK_SOFTCONN);
+ nfs_initiate_commit(data, ds->ds_clp->cl_rpcclient,
+ &filelayout_commit_call_ops, how,
+ RPC_TASK_SOFTCONN);
}

/*
--
1.7.6.4


2012-03-21 19:46:55

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 10/12] NFSv4.1 de reference a disconnected data server client record

From: Andy Adamson <[email protected]>

When the last DS io is processed, the data server client record will be
freed.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 4 ++++
fs/nfs/nfs4filelayout.h | 1 +
fs/nfs/nfs4filelayoutdev.c | 17 +++++++++++++++++
3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 129b57f..ee0159b 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -209,10 +209,12 @@ static int filelayout_read_done_cb(struct rpc_task *task,
reset, data->ds_clp, data->ds_clp->cl_session);

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
+ struct nfs_client *clp = data->ds_clp;
filelayout_reset_read(task, data);
if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
rpc_wake_up(&tbl->slot_tbl_waitq);
+ nfs4_ds_disconnect(clp);
}
}
rpc_restart_call_prepare(task);
@@ -304,10 +306,12 @@ static int filelayout_write_done_cb(struct rpc_task *task,
reset, data->ds_clp, data->ds_clp->cl_session);

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
+ struct nfs_client *clp = data->ds_clp;
filelayout_reset_write(task, data);
if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
rpc_wake_up(&tbl->slot_tbl_waitq);
+ nfs4_ds_disconnect(clp);
}
}
rpc_restart_call_prepare(task);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 08b667a..3abf7d9 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -138,5 +138,6 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
struct nfs4_file_layout_dsaddr *
get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
+void nfs4_ds_disconnect(struct nfs_client *clp);

#endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 2b8ae96..0e54cdf 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -145,6 +145,23 @@ _data_server_lookup_locked(const struct list_head *dsaddrs)
}

/*
+ * Lookup DS by nfs_client pointer. Zero data server client pointer
+ */
+void nfs4_ds_disconnect(struct nfs_client *clp)
+{
+ struct nfs4_pnfs_ds *ds;
+
+ dprintk("%s clp %p\n", __func__, clp);
+ spin_lock(&nfs4_ds_cache_lock);
+ list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
+ if (ds->ds_clp && ds->ds_clp == clp) {
+ nfs_put_client(clp);
+ ds->ds_clp = NULL;
+ }
+ spin_unlock(&nfs4_ds_cache_lock);
+}
+
+/*
* Create an rpc connection to the nfs4_pnfs_ds data server
* Currently only supports IPv4 and IPv6 addresses
*/
--
1.7.6.4


2012-03-21 19:46:53

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 01/12] NFSv4.1 move nfs4_reset_read and nfs_reset_write

From: Andy Adamson <[email protected]>

Only called by the file layout code

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/internal.h | 5 +++--
fs/nfs/nfs4filelayout.c | 35 +++++++++++++++++++++++++++++++++--
fs/nfs/nfs4proc.c | 39 ++++-----------------------------------
3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 2476dc6..f9ac1f0 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -344,13 +344,14 @@ extern int nfs_migrate_page(struct address_space *,

/* nfs4proc.c */
extern void __nfs4_read_done_cb(struct nfs_read_data *);
-extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
+extern int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data);
+extern int nfs4_write_done_cb(struct rpc_task *task,
+ struct nfs_write_data *data);
extern int nfs4_init_client(struct nfs_client *clp,
const struct rpc_timeout *timeparms,
const char *ip_addr,
rpc_authflavor_t authflavour,
int noresvport);
-extern void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data);
extern int _nfs4_call_sync(struct rpc_clnt *clnt,
struct nfs_server *server,
struct rpc_message *msg,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 634c0bc..36a65ce 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -82,6 +82,37 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
BUG();
}

+/* Reset the the nfs_write_data to send the write to the MDS. */
+void filelayout_reset_write(struct rpc_task *task, struct nfs_write_data *data)
+{
+ dprintk("%s Reset task for i/o through MDS\n", __func__);
+ put_lseg(data->lseg);
+ data->lseg = NULL;
+ data->ds_clp = NULL;
+ data->write_done_cb = nfs4_write_done_cb;
+ data->args.fh = NFS_FH(data->inode);
+ data->args.bitmask = data->res.server->cache_consistency_bitmask;
+ data->args.offset = data->mds_offset;
+ data->res.fattr = &data->fattr;
+ task->tk_ops = data->mds_ops;
+ rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+
+/* Reset the the nfs_read_data to send the read to the MDS. */
+void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
+{
+ dprintk("%s Reset task for i/o through MDS\n", __func__);
+ put_lseg(data->lseg);
+ data->lseg = NULL;
+ /* offsets will differ in the dense stripe case */
+ data->args.offset = data->mds_offset;
+ data->ds_clp = NULL;
+ data->args.fh = NFS_FH(data->inode);
+ data->read_done_cb = nfs4_read_done_cb;
+ task->tk_ops = data->mds_ops;
+ rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+
static int filelayout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
@@ -158,7 +189,7 @@ static int filelayout_read_done_cb(struct rpc_task *task,
__func__, data->ds_clp, data->ds_clp->cl_session);
if (reset) {
pnfs_set_lo_fail(data->lseg);
- nfs4_reset_read(task, data);
+ filelayout_reset_read(task, data);
}
rpc_restart_call_prepare(task);
return -EAGAIN;
@@ -239,7 +270,7 @@ static int filelayout_write_done_cb(struct rpc_task *task,
__func__, data->ds_clp, data->ds_clp->cl_session);
if (reset) {
pnfs_set_lo_fail(data->lseg);
- nfs4_reset_write(task, data);
+ filelayout_reset_write(task, data);
}
rpc_restart_call_prepare(task);
return -EAGAIN;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b76dd0e..f3c67db 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3284,7 +3284,7 @@ void __nfs4_read_done_cb(struct nfs_read_data *data)
nfs_invalidate_atime(data->inode);
}

-static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
+int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
{
struct nfs_server *server = NFS_SERVER(data->inode);

@@ -3298,6 +3298,7 @@ static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
renew_lease(server, data->timestamp);
return 0;
}
+EXPORT_SYMBOL_GPL(nfs4_read_done_cb);

static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
{
@@ -3329,23 +3330,7 @@ static void nfs4_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_da
rpc_call_start(task);
}

-/* Reset the the nfs_read_data to send the read to the MDS. */
-void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
-{
- dprintk("%s Reset task for i/o through\n", __func__);
- put_lseg(data->lseg);
- data->lseg = NULL;
- /* offsets will differ in the dense stripe case */
- data->args.offset = data->mds_offset;
- data->ds_clp = NULL;
- data->args.fh = NFS_FH(data->inode);
- data->read_done_cb = nfs4_read_done_cb;
- task->tk_ops = data->mds_ops;
- rpc_task_reset_client(task, NFS_CLIENT(data->inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_read);
-
-static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
+int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
{
struct inode *inode = data->inode;

@@ -3359,6 +3344,7 @@ static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data
}
return 0;
}
+EXPORT_SYMBOL_GPL(nfs4_write_done_cb);

static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
{
@@ -3368,23 +3354,6 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
nfs4_write_done_cb(task, data);
}

-/* Reset the the nfs_write_data to send the write to the MDS. */
-void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data)
-{
- dprintk("%s Reset task for i/o through\n", __func__);
- put_lseg(data->lseg);
- data->lseg = NULL;
- data->ds_clp = NULL;
- data->write_done_cb = nfs4_write_done_cb;
- data->args.fh = NFS_FH(data->inode);
- data->args.bitmask = data->res.server->cache_consistency_bitmask;
- data->args.offset = data->mds_offset;
- data->res.fattr = &data->fattr;
- task->tk_ops = data->mds_ops;
- rpc_task_reset_client(task, NFS_CLIENT(data->inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_write);
-
static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_message *msg)
{
struct nfs_server *server = NFS_SERVER(data->inode);
--
1.7.6.4


2012-03-21 20:39:17

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 2 05/12] NFSv4.1: mark deviceid invalid on filelayout DS connection errors

T24gV2VkLCAyMDEyLTAzLTIxIGF0IDE1OjQ2IC0wNDAwLCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90
ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gDQo+IFRoaXMg
cHJldmVudHMgdGhlIHVzZSBvZiBhbnkgbGF5b3V0IGZvciBpL28gdGhhdCByZWZlcmVuY2VzIHRo
ZSBkZXZpY2VpZC4NCj4gSS9PIGlzIHJlZGlyZWN0ZWQgdGhyb3VnaCB0aGUgTURTLg0KPiANCj4g
UmVkaXJlY3QgdGhlIHVuaGFuZGxlZCBmYWlsZWQgSS9PIHRvIHRoZSBNRFMgd2l0aG91dCBtYXJr
aW5nIGVpdGhlciB0aGUNCj4gbGF5b3V0IG9yIHRoZSBkZXZpY2VpZCBpbnZhbGlkLg0KPiANCj4g
U2lnbmVkLW9mZi1ieTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gLS0tDQo+
ICBmcy9uZnMvbmZzNGZpbGVsYXlvdXQuYyB8ICAgNjUgKysrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKysrKy0tLS0tLS0tLS0tLQ0KPiAgZnMvbmZzL25mczRmaWxlbGF5b3V0LmggfCAgICA2
ICsrKysNCj4gIDIgZmlsZXMgY2hhbmdlZCwgNTQgaW5zZXJ0aW9ucygrKSwgMTcgZGVsZXRpb25z
KC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczRmaWxlbGF5b3V0LmMgYi9mcy9uZnMv
bmZzNGZpbGVsYXlvdXQuYw0KPiBpbmRleCAzODAyOTM3Li4xZjFiZTI2IDEwMDY0NA0KPiAtLS0g
YS9mcy9uZnMvbmZzNGZpbGVsYXlvdXQuYw0KPiArKysgYi9mcy9uZnMvbmZzNGZpbGVsYXlvdXQu
Yw0KPiBAQCAtMTE2LDcgKzExNiw3IEBAIHZvaWQgZmlsZWxheW91dF9yZXNldF9yZWFkKHN0cnVj
dCBycGNfdGFzayAqdGFzaywgc3RydWN0IG5mc19yZWFkX2RhdGEgKmRhdGEpDQo+ICBzdGF0aWMg
aW50IGZpbGVsYXlvdXRfYXN5bmNfaGFuZGxlX2Vycm9yKHN0cnVjdCBycGNfdGFzayAqdGFzaywN
Cj4gIAkJCQkJIHN0cnVjdCBuZnM0X3N0YXRlICpzdGF0ZSwNCj4gIAkJCQkJIHN0cnVjdCBuZnNf
Y2xpZW50ICpjbHAsDQo+IC0JCQkJCSBpbnQgKnJlc2V0KQ0KPiArCQkJCQkgdW5zaWduZWQgbG9u
ZyAqcmVzZXQpDQo+ICB7DQo+ICAJc3RydWN0IG5mc19zZXJ2ZXIgKm1kc19zZXJ2ZXIgPSBORlNf
U0VSVkVSKHN0YXRlLT5pbm9kZSk7DQo+ICAJc3RydWN0IG5mc19jbGllbnQgKm1kc19jbGllbnQg
PSBtZHNfc2VydmVyLT5uZnNfY2xpZW50Ow0KPiBAQCAtMTU4LDEwICsxNTgsMjMgQEAgc3RhdGlj
IGludCBmaWxlbGF5b3V0X2FzeW5jX2hhbmRsZV9lcnJvcihzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2ss
DQo+ICAJCWJyZWFrOw0KPiAgCWNhc2UgLU5GUzRFUlJfUkVUUllfVU5DQUNIRURfUkVQOg0KPiAg
CQlicmVhazsNCj4gKwkvKiBSUEMgY29ubmVjdGlvbiBlcnJvcnMgKi8NCj4gKwljYXNlIC1FQ09O
TlJFRlVTRUQ6DQo+ICsJY2FzZSAtRUhPU1RET1dOOg0KPiArCWNhc2UgLUVIT1NUVU5SRUFDSDoN
Cj4gKwljYXNlIC1FTkVUVU5SRUFDSDoNCj4gKwljYXNlIC1FSU86DQo+ICsJY2FzZSAtRVRJTUVE
T1VUOg0KPiArCWNhc2UgLUVQSVBFOg0KPiArCQlkcHJpbnRrKCIlcyBEUyBjb25uZWN0aW9uIGVy
cm9yLiBSZXRyeSB0aHJvdWdoIE1EUyAlZFxuIiwNCj4gKwkJCV9fZnVuY19fLCB0YXNrLT50a19z
dGF0dXMpOw0KPiArCQlzZXRfYml0KE5GUzRfUkVTRVRfREVWSUNFSUQsIHJlc2V0KTsNCj4gKwkJ
c2V0X2JpdChORlM0X1JFU0VUX1RPX01EUywgcmVzZXQpOw0KPiArCQlicmVhazsNCj4gIAlkZWZh
dWx0Og0KPiAtCQlkcHJpbnRrKCIlcyBEUyBlcnJvci4gUmV0cnkgdGhyb3VnaCBNRFMgJWRcbiIs
IF9fZnVuY19fLA0KPiAtCQkJdGFzay0+dGtfc3RhdHVzKTsNCj4gLQkJKnJlc2V0ID0gMTsNCj4g
KwkJZHByaW50aygiJXMgVW5oYW5kbGVkIERTIGVycm9yLiBSZXRyeSB0aHJvdWdoIE1EUyAlZFxu
IiwNCj4gKwkJCV9fZnVuY19fLCB0YXNrLT50a19zdGF0dXMpOw0KPiArCQlzZXRfYml0KE5GUzRf
UkVTRVRfVE9fTURTLCByZXNldCk7DQo+ICAJCWJyZWFrOw0KPiAgCX0NCj4gIG91dDoNCj4gQEAg
LTE3OSwxNiArMTkyLDIyIEBAIHdhaXRfb25fcmVjb3Zlcnk6DQo+ICBzdGF0aWMgaW50IGZpbGVs
YXlvdXRfcmVhZF9kb25lX2NiKHN0cnVjdCBycGNfdGFzayAqdGFzaywNCj4gIAkJCQlzdHJ1Y3Qg
bmZzX3JlYWRfZGF0YSAqZGF0YSkNCj4gIHsNCj4gLQlpbnQgcmVzZXQgPSAwOw0KPiArCXN0cnVj
dCBuZnM0X2RldmljZWlkX25vZGUgKmRldmlkID0gRklMRUxBWU9VVF9ERVZJRF9OT0RFKGRhdGEt
PmxzZWcpOw0KPiArCXVuc2lnbmVkIGxvbmcgcmVzZXQgPSAwOw0KPiAgDQo+ICAJZHByaW50aygi
JXMgRFMgcmVhZFxuIiwgX19mdW5jX18pOw0KPiAgDQo+ICAJaWYgKGZpbGVsYXlvdXRfYXN5bmNf
aGFuZGxlX2Vycm9yKHRhc2ssIGRhdGEtPmFyZ3MuY29udGV4dC0+c3RhdGUsDQo+ICAJCQkJCSAg
ZGF0YS0+ZHNfY2xwLCAmcmVzZXQpID09IC1FQUdBSU4pIHsNCj4gLQkJZHByaW50aygiJXMgY2Fs
bGluZyByZXN0YXJ0IGRzX2NscCAlcCBkc19jbHAtPmNsX3Nlc3Npb24gJXBcbiIsDQo+IC0JCQlf
X2Z1bmNfXywgZGF0YS0+ZHNfY2xwLCBkYXRhLT5kc19jbHAtPmNsX3Nlc3Npb24pOw0KPiAtCQlp
ZiAocmVzZXQpDQo+ICsNCj4gKwkJZHByaW50aygiJXMgcmVzZXQgMHglbHggZHNfY2xwICVwIHNl
c3Npb24gJXBcbiIsIF9fZnVuY19fLA0KPiArCQkJcmVzZXQsIGRhdGEtPmRzX2NscCwgZGF0YS0+
ZHNfY2xwLT5jbF9zZXNzaW9uKTsNCj4gKw0KPiArCQlpZiAodGVzdF9iaXQoTkZTNF9SRVNFVF9U
T19NRFMsICZyZXNldCkpIHsNCj4gIAkJCWZpbGVsYXlvdXRfcmVzZXRfcmVhZCh0YXNrLCBkYXRh
KTsNCj4gKwkJCWlmICh0ZXN0X2JpdChORlM0X1JFU0VUX0RFVklDRUlELCAmcmVzZXQpKQ0KPiAr
CQkJCWZpbGVsYXlvdXRfbWFya19kZXZpZF9pbnZhbGlkKGRldmlkKTsNCg0KSXMgdGhlcmUgYW55
IHJlYXNvbiB3aHkgd2Ugc2hvdWxkbid0IGp1c3QgZG8gdGhlDQpmaWxlbGF5b3V0X21hcmtfZGV2
aWRfaW52YWxpZCgpIHdpdGhpbiBmaWxlbGF5b3V0X2FzeW5jX2hhbmRsZV9lcnJvcigpDQppbnN0
ZWFkIG9mIGhhdmluZyB0aGUgY2FsbGVyIGRvIGl0Pw0KDQpUaGF0IHNob3VsZCBhbHNvIGVuYWJs
ZSB1cyB0byBnZXQgcmlkIG9mIHRoZSB3aG9sZSAncmVzZXQnIGFyZ3VtZW50IGFuZA0KcmVwbGFj
ZSBpdCB3aXRoIGEgcmV0dXJuIHZhbHVlICE9IDAgJiYgIT0gLUVBR0FJTi4NCg0KPiArCQl9DQo+
ICAJCXJwY19yZXN0YXJ0X2NhbGxfcHJlcGFyZSh0YXNrKTsNCg0KVGhpcyBjYW4gcHJvYmFibHkg
YWxzbyBiZSBkb25lIGluc2lkZSBmaWxlbGF5b3V0X2FzeW5jX2hhbmRsZV9lcnJvcigpLA0KQlRX
Lg0KDQo+ICAJCXJldHVybiAtRUFHQUlOOw0KPiAgCX0NCj4gQEAgLTI2MCwxNCArMjc5LDIwIEBA
IHN0YXRpYyB2b2lkIGZpbGVsYXlvdXRfcmVhZF9yZWxlYXNlKHZvaWQgKmRhdGEpDQo+ICBzdGF0
aWMgaW50IGZpbGVsYXlvdXRfd3JpdGVfZG9uZV9jYihzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2ssDQo+
ICAJCQkJc3RydWN0IG5mc193cml0ZV9kYXRhICpkYXRhKQ0KPiAgew0KPiAtCWludCByZXNldCA9
IDA7DQo+ICsJc3RydWN0IG5mczRfZGV2aWNlaWRfbm9kZSAqZGV2aWQgPSBGSUxFTEFZT1VUX0RF
VklEX05PREUoZGF0YS0+bHNlZyk7DQo+ICsJdW5zaWduZWQgbG9uZyByZXNldCA9IDA7DQo+ICAN
Cj4gIAlpZiAoZmlsZWxheW91dF9hc3luY19oYW5kbGVfZXJyb3IodGFzaywgZGF0YS0+YXJncy5j
b250ZXh0LT5zdGF0ZSwNCj4gIAkJCQkJICBkYXRhLT5kc19jbHAsICZyZXNldCkgPT0gLUVBR0FJ
Tikgew0KPiAtCQlkcHJpbnRrKCIlcyBjYWxsaW5nIHJlc3RhcnQgZHNfY2xwICVwIGRzX2NscC0+
Y2xfc2Vzc2lvbiAlcFxuIiwNCj4gLQkJCV9fZnVuY19fLCBkYXRhLT5kc19jbHAsIGRhdGEtPmRz
X2NscC0+Y2xfc2Vzc2lvbik7DQo+IC0JCWlmIChyZXNldCkNCj4gKw0KPiArCQlkcHJpbnRrKCIl
cyByZXNldCAweCVseCBkc19jbHAgJXAgc2Vzc2lvbiAlcFxuIiwgX19mdW5jX18sDQo+ICsJCQly
ZXNldCwgZGF0YS0+ZHNfY2xwLCBkYXRhLT5kc19jbHAtPmNsX3Nlc3Npb24pOw0KPiArDQo+ICsJ
CWlmICh0ZXN0X2JpdChORlM0X1JFU0VUX1RPX01EUywgJnJlc2V0KSkgew0KPiAgCQkJZmlsZWxh
eW91dF9yZXNldF93cml0ZSh0YXNrLCBkYXRhKTsNCj4gKwkJCWlmICh0ZXN0X2JpdChORlM0X1JF
U0VUX0RFVklDRUlELCAmcmVzZXQpKQ0KPiArCQkJCWZpbGVsYXlvdXRfbWFya19kZXZpZF9pbnZh
bGlkKGRldmlkKTsNCj4gKwkJfQ0KPiAgCQlycGNfcmVzdGFydF9jYWxsX3ByZXBhcmUodGFzayk7
DQo+ICAJCXJldHVybiAtRUFHQUlOOw0KPiAgCX0NCj4gQEAgLTI5MCwxNiArMzE1LDIyIEBAIHN0
YXRpYyB2b2lkIHByZXBhcmVfdG9fcmVzZW5kX3dyaXRlcyhzdHJ1Y3QgbmZzX3dyaXRlX2RhdGEg
KmRhdGEpDQo+ICBzdGF0aWMgaW50IGZpbGVsYXlvdXRfY29tbWl0X2RvbmVfY2Ioc3RydWN0IHJw
Y190YXNrICp0YXNrLA0KPiAgCQkJCSAgICAgc3RydWN0IG5mc193cml0ZV9kYXRhICpkYXRhKQ0K
PiAgew0KPiAtCWludCByZXNldCA9IDA7DQo+ICsJc3RydWN0IG5mczRfZGV2aWNlaWRfbm9kZSAq
ZGV2aWQgPSBGSUxFTEFZT1VUX0RFVklEX05PREUoZGF0YS0+bHNlZyk7DQo+ICsJdW5zaWduZWQg
bG9uZyByZXNldCA9IDA7DQo+ICANCj4gIAlpZiAoZmlsZWxheW91dF9hc3luY19oYW5kbGVfZXJy
b3IodGFzaywgZGF0YS0+YXJncy5jb250ZXh0LT5zdGF0ZSwNCj4gIAkJCQkJICBkYXRhLT5kc19j
bHAsICZyZXNldCkgPT0gLUVBR0FJTikgew0KPiAtCQlkcHJpbnRrKCIlcyBjYWxsaW5nIHJlc3Rh
cnQgZHNfY2xwICVwIGRzX2NscC0+Y2xfc2Vzc2lvbiAlcFxuIiwNCj4gLQkJCV9fZnVuY19fLCBk
YXRhLT5kc19jbHAsIGRhdGEtPmRzX2NscC0+Y2xfc2Vzc2lvbik7DQo+IC0JCWlmIChyZXNldCkN
Cj4gKw0KPiArCQlkcHJpbnRrKCIlcyByZXNldCAweCVseCBkc19jbHAgJXAgc2Vzc2lvbiAlcFxu
IiwgX19mdW5jX18sDQo+ICsJCQlyZXNldCwgZGF0YS0+ZHNfY2xwLCBkYXRhLT5kc19jbHAtPmNs
X3Nlc3Npb24pOw0KPiArDQo+ICsJCWlmICh0ZXN0X2JpdChORlM0X1JFU0VUX1RPX01EUywgJnJl
c2V0KSkgew0KPiAgCQkJcHJlcGFyZV90b19yZXNlbmRfd3JpdGVzKGRhdGEpOw0KPiAtCQllbHNl
DQo+ICsJCQlpZiAodGVzdF9iaXQoTkZTNF9SRVNFVF9ERVZJQ0VJRCwgJnJlc2V0KSkNCj4gKwkJ
CQlmaWxlbGF5b3V0X21hcmtfZGV2aWRfaW52YWxpZChkZXZpZCk7DQo+ICsJCX0gZWxzZSB7DQo+
ICAJCQlycGNfcmVzdGFydF9jYWxsX3ByZXBhcmUodGFzayk7DQo+ICsJCX0NCj4gIAkJcmV0dXJu
IC1FQUdBSU47DQo+ICAJfQ0KPiAgDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNGZpbGVsYXlv
dXQuaCBiL2ZzL25mcy9uZnM0ZmlsZWxheW91dC5oDQo+IGluZGV4IGI1NGIzODkuLjA4YjY2N2Eg
MTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9uZnM0ZmlsZWxheW91dC5oDQo+ICsrKyBiL2ZzL25mcy9u
ZnM0ZmlsZWxheW91dC5oDQo+IEBAIC00MSw2ICs0MSwxMiBAQA0KPiAgI2RlZmluZSBORlM0X1BO
RlNfTUFYX1NUUklQRV9DTlQgNDA5Ng0KPiAgI2RlZmluZSBORlM0X1BORlNfTUFYX01VTFRJX0NO
VCAgMjU2IC8qIDI1NiBmaXQgaW50byBhIHU4IHN0cmlwZV9pbmRleCAqLw0KPiAgDQo+ICsvKiBp
bnRlcm5hbCB1c2UgKi8NCj4gK2VudW0gbmZzNF9mbF9yZXNldF9zdGF0ZSB7DQo+ICsJTkZTNF9S
RVNFVF9UT19NRFMgPSAwLA0KPiArCU5GUzRfUkVTRVRfREVWSUNFSUQsDQo+ICt9Ow0KPiArDQo+
ICBlbnVtIHN0cmlwZXR5cGU0IHsNCj4gIAlTVFJJUEVfU1BBUlNFID0gMSwNCj4gIAlTVFJJUEVf
REVOU0UgPSAyDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50
YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5j
b20NCg0K

2012-03-22 13:23:53

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH Version 2 05/12] NFSv4.1: mark deviceid invalid on filelayout DS connection errors


On Mar 21, 2012, at 4:39 PM, Myklebust, Trond wrote:

> On Wed, 2012-03-21 at 15:46 -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> This prevents the use of any layout for i/o that references the deviceid.
>> I/O is redirected through the MDS.
>>
>> Redirect the unhandled failed I/O to the MDS without marking either the
>> layout or the deviceid invalid.
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> fs/nfs/nfs4filelayout.c | 65 ++++++++++++++++++++++++++++++++++------------
>> fs/nfs/nfs4filelayout.h | 6 ++++
>> 2 files changed, 54 insertions(+), 17 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 3802937..1f1be26 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -116,7 +116,7 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
>> static int filelayout_async_handle_error(struct rpc_task *task,
>> struct nfs4_state *state,
>> struct nfs_client *clp,
>> - int *reset)
>> + unsigned long *reset)
>> {
>> struct nfs_server *mds_server = NFS_SERVER(state->inode);
>> struct nfs_client *mds_client = mds_server->nfs_client;
>> @@ -158,10 +158,23 @@ static int filelayout_async_handle_error(struct rpc_task *task,
>> break;
>> case -NFS4ERR_RETRY_UNCACHED_REP:
>> break;
>> + /* RPC connection errors */
>> + case -ECONNREFUSED:
>> + case -EHOSTDOWN:
>> + case -EHOSTUNREACH:
>> + case -ENETUNREACH:
>> + case -EIO:
>> + case -ETIMEDOUT:
>> + case -EPIPE:
>> + dprintk("%s DS connection error. Retry through MDS %d\n",
>> + __func__, task->tk_status);
>> + set_bit(NFS4_RESET_DEVICEID, reset);
>> + set_bit(NFS4_RESET_TO_MDS, reset);
>> + break;
>> default:
>> - dprintk("%s DS error. Retry through MDS %d\n", __func__,
>> - task->tk_status);
>> - *reset = 1;
>> + dprintk("%s Unhandled DS error. Retry through MDS %d\n",
>> + __func__, task->tk_status);
>> + set_bit(NFS4_RESET_TO_MDS, reset);
>> break;
>> }
>> out:
>> @@ -179,16 +192,22 @@ wait_on_recovery:
>> static int filelayout_read_done_cb(struct rpc_task *task,
>> struct nfs_read_data *data)
>> {
>> - int reset = 0;
>> + struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
>> + unsigned long reset = 0;
>>
>> dprintk("%s DS read\n", __func__);
>>
>> if (filelayout_async_handle_error(task, data->args.context->state,
>> data->ds_clp, &reset) == -EAGAIN) {
>> - dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
>> - __func__, data->ds_clp, data->ds_clp->cl_session);
>> - if (reset)
>> +
>> + dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
>> + reset, data->ds_clp, data->ds_clp->cl_session);
>> +
>> + if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
>> filelayout_reset_read(task, data);
>> + if (test_bit(NFS4_RESET_DEVICEID, &reset))
>> + filelayout_mark_devid_invalid(devid);
>
> Is there any reason why we shouldn't just do the
> filelayout_mark_devid_invalid() within filelayout_async_handle_error()
> instead of having the caller do it?

We would have to pass in the lseg argument.

>
> That should also enable us to get rid of the whole 'reset' argument and
> replace it with a return value != 0 && != -EAGAIN.

We would need to pass in the operation because READ/WRITE/COMMIT call different reset functions.

I chose the 'reset' argument method instead of passing in the operation because I thought it cleaner to keep the per operation logic in the per operation rpc functions instead of having a switch(operation) statement in the async handler for each group of errors.

>
>> + }
>> rpc_restart_call_prepare(task);
>
> This can probably also be done inside filelayout_async_handle_error(),
> BTW.

COMMIT does not call rpc_restart_call_prepare on reset or invalid deviceid errors because we want the release function to fail with a verifier mismatch.

So it's up to you: I could move all the per operation logic into the async handler by passing in the operation and using it to dereferencing the tk_callback to get the lseg and other needed parameters - then moving the code from the done_cb routines into the async handler under a switch(operation) for both the default reset to mds and for the invalid deviced.

-->Andy

>
>> return -EAGAIN;
>> }
>> @@ -260,14 +279,20 @@ static void filelayout_read_release(void *data)
>> static int filelayout_write_done_cb(struct rpc_task *task,
>> struct nfs_write_data *data)
>> {
>> - int reset = 0;
>> + struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
>> + unsigned long reset = 0;
>>
>> if (filelayout_async_handle_error(task, data->args.context->state,
>> data->ds_clp, &reset) == -EAGAIN) {
>> - dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
>> - __func__, data->ds_clp, data->ds_clp->cl_session);
>> - if (reset)
>> +
>> + dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
>> + reset, data->ds_clp, data->ds_clp->cl_session);
>> +
>> + if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
>> filelayout_reset_write(task, data);
>> + if (test_bit(NFS4_RESET_DEVICEID, &reset))
>> + filelayout_mark_devid_invalid(devid);
>> + }
>> rpc_restart_call_prepare(task);
>> return -EAGAIN;
>> }
>> @@ -290,16 +315,22 @@ static void prepare_to_resend_writes(struct nfs_write_data *data)
>> static int filelayout_commit_done_cb(struct rpc_task *task,
>> struct nfs_write_data *data)
>> {
>> - int reset = 0;
>> + struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
>> + unsigned long reset = 0;
>>
>> if (filelayout_async_handle_error(task, data->args.context->state,
>> data->ds_clp, &reset) == -EAGAIN) {
>> - dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
>> - __func__, data->ds_clp, data->ds_clp->cl_session);
>> - if (reset)
>> +
>> + dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
>> + reset, data->ds_clp, data->ds_clp->cl_session);
>> +
>> + if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
>> prepare_to_resend_writes(data);
>> - else
>> + if (test_bit(NFS4_RESET_DEVICEID, &reset))
>> + filelayout_mark_devid_invalid(devid);
>> + } else {
>> rpc_restart_call_prepare(task);
>> + }
>> return -EAGAIN;
>> }
>>
>> diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
>> index b54b389..08b667a 100644
>> --- a/fs/nfs/nfs4filelayout.h
>> +++ b/fs/nfs/nfs4filelayout.h
>> @@ -41,6 +41,12 @@
>> #define NFS4_PNFS_MAX_STRIPE_CNT 4096
>> #define NFS4_PNFS_MAX_MULTI_CNT 256 /* 256 fit into a u8 stripe_index */
>>
>> +/* internal use */
>> +enum nfs4_fl_reset_state {
>> + NFS4_RESET_TO_MDS = 0,
>> + NFS4_RESET_DEVICEID,
>> +};
>> +
>> enum stripetype4 {
>> STRIPE_SPARSE = 1,
>> STRIPE_DENSE = 2
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>


2012-03-21 19:46:54

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 08/12] NFSv4.1 wake up all tasks on un-connected DS slot table waitq

From: Andy Adamson <[email protected]>

The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index b73818f..ccbafdd 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -193,6 +193,7 @@ static int filelayout_read_done_cb(struct rpc_task *task,
struct nfs_read_data *data)
{
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ struct nfs4_slot_table *tbl = &data->ds_clp->cl_session->fc_slot_table;
unsigned long reset = 0;

dprintk("%s DS read\n", __func__);
@@ -205,8 +206,10 @@ static int filelayout_read_done_cb(struct rpc_task *task,

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_read(task, data);
- if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ }
}
rpc_restart_call_prepare(task);
return -EAGAIN;
@@ -286,6 +289,7 @@ static int filelayout_write_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ struct nfs4_slot_table *tbl = &data->ds_clp->cl_session->fc_slot_table;
unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
@@ -296,8 +300,10 @@ static int filelayout_write_done_cb(struct rpc_task *task,

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_write(task, data);
- if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ }
}
rpc_restart_call_prepare(task);
return -EAGAIN;
@@ -322,6 +328,7 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ struct nfs4_slot_table *tbl = &data->ds_clp->cl_session->fc_slot_table;
unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
@@ -332,8 +339,10 @@ static int filelayout_commit_done_cb(struct rpc_task *task,

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
prepare_to_resend_writes(data);
- if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
+ rpc_wake_up(&tbl->slot_tbl_waitq);
+ }
} else {
rpc_restart_call_prepare(task);
}
--
1.7.6.4


2012-03-21 19:46:55

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 09/12] NFSv4.1 ref count nfs_client across filelayout data server io

From: Andy Adamson <[email protected]>

Prepare to put a dis-connected DS client record.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 22 +++++++++++++++++-----
1 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index ccbafdd..129b57f 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -88,6 +88,8 @@ void filelayout_reset_write(struct rpc_task *task, struct nfs_write_data *data)
dprintk("%s Reset task for i/o through MDS\n", __func__);
put_lseg(data->lseg);
data->lseg = NULL;
+ /* balance nfs_get_client in filelayout_write_pagelist */
+ nfs_put_client(data->ds_clp);
data->ds_clp = NULL;
data->write_done_cb = nfs4_write_done_cb;
data->args.fh = NFS_FH(data->inode);
@@ -106,6 +108,8 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
data->lseg = NULL;
/* offsets will differ in the dense stripe case */
data->args.offset = data->mds_offset;
+ /* balance nfs_get_client in filelayout_read_pagelist */
+ nfs_put_client(data->ds_clp);
data->ds_clp = NULL;
data->args.fh = NFS_FH(data->inode);
data->read_done_cb = nfs4_read_done_cb;
@@ -282,6 +286,7 @@ static void filelayout_read_release(void *data)
struct nfs_read_data *rdata = (struct nfs_read_data *)data;

put_lseg(rdata->lseg);
+ nfs_put_client(rdata->ds_clp);
rdata->mds_ops->rpc_release(data);
}

@@ -390,6 +395,7 @@ static void filelayout_write_release(void *data)
struct nfs_write_data *wdata = (struct nfs_write_data *)data;

put_lseg(wdata->lseg);
+ nfs_put_client(wdata->ds_clp);
wdata->mds_ops->rpc_release(data);
}

@@ -409,6 +415,7 @@ static void filelayout_commit_release(void *data)
{
struct nfs_write_data *wdata = (struct nfs_write_data *)data;

+ nfs_put_client(wdata->ds_clp);
nfs_commit_release_pages(wdata);
if (atomic_dec_and_test(&NFS_I(wdata->inode)->commits_outstanding))
nfs_commit_clear_lock(NFS_I(wdata->inode));
@@ -457,9 +464,11 @@ filelayout_read_pagelist(struct nfs_read_data *data)
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds)
return PNFS_NOT_ATTEMPTED;
- dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);
+ dprintk("%s USE DS: %s cl_count %d\n", __func__,
+ ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));

/* No multipath support. Use first DS */
+ atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
@@ -492,11 +501,12 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds)
return PNFS_NOT_ATTEMPTED;
- dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
- data->inode->i_ino, sync, (size_t) data->args.count, offset,
- ds->ds_remotestr);
+ dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d\n",
+ __func__, data->inode->i_ino, sync, (size_t) data->args.count,
+ offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));

data->write_done_cb = filelayout_write_done_cb;
+ atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
@@ -972,8 +982,10 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
filelayout_commit_release(data);
return -EAGAIN;
}
- dprintk("%s ino %lu, how %d\n", __func__, data->inode->i_ino, how);
+ dprintk("%s ino %lu, how %d cl_count %d\n", __func__,
+ data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count));
data->write_done_cb = filelayout_commit_done_cb;
+ atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
if (fh)
--
1.7.6.4


2012-03-21 20:47:52

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 2 12/12] NFSv4.1 have filelayout_initiate_commit return void

T24gV2VkLCAyMDEyLTAzLTIxIGF0IDE1OjQ2IC0wNDAwLCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90
ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gDQo+IFRoZSBy
ZXR1cm4gaXMgaWdub3JlZC4NCg0KWWVzLCBidXQgc2hvdWxkIGl0IGJlPyBTZWUgdGhlIGRpc2N1
c3Npb24gYmV0d2VlbiBGcmVkIGFuZCBteXNlbGYgb24gdGhlDQpsaXN0IHllc3RlcmRheS4gSSBz
dGlsbCBkb24ndCBzZWUgd2h5IHdlIHNob3VsZCByZXBvcnQgc29tZSBlcnJvcnMgYW5kDQppZ25v
cmUgb3RoZXJzLi4uDQoNCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0
YXBwLmNvbQ0KDQo=

2012-03-21 19:46:53

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 2 04/12] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls

From: Andy Adamson <[email protected]>

RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/internal.h | 6 +++---
fs/nfs/nfs4filelayout.c | 10 ++++++----
fs/nfs/read.c | 6 +++---
fs/nfs/write.c | 13 +++++++------
4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f9ac1f0..eebd7f1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -297,7 +297,7 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
struct nfs_pageio_descriptor;
/* read.c */
extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
- const struct rpc_call_ops *call_ops);
+ const struct rpc_call_ops *call_ops, int flags);
extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
extern int nfs_generic_pagein(struct nfs_pageio_descriptor *desc,
struct list_head *head);
@@ -318,12 +318,12 @@ extern void nfs_commit_free(struct nfs_write_data *p);
extern int nfs_initiate_write(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how);
+ int how, int flags);
extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
extern int nfs_initiate_commit(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how);
+ int how, int flags);
extern void nfs_init_commit(struct nfs_write_data *data,
struct list_head *head,
struct pnfs_layout_segment *lseg);
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index acafc4d..3802937 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -406,7 +406,7 @@ filelayout_read_pagelist(struct nfs_read_data *data)

/* Perform an asynchronous read to ds */
status = nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
- &filelayout_read_call_ops);
+ &filelayout_read_call_ops, RPC_TASK_SOFTCONN);
BUG_ON(status != 0);
return PNFS_ATTEMPTED;
}
@@ -445,7 +445,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)

/* Perform an asynchronous write */
status = nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
- &filelayout_write_call_ops, sync);
+ &filelayout_write_call_ops, sync,
+ RPC_TASK_SOFTCONN);
BUG_ON(status != 0);
return PNFS_ATTEMPTED;
}
@@ -913,7 +914,8 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
if (fh)
data->args.fh = fh;
return nfs_initiate_commit(data, ds->ds_clp->cl_rpcclient,
- &filelayout_commit_call_ops, how);
+ &filelayout_commit_call_ops, how,
+ RPC_TASK_SOFTCONN);
}

/*
@@ -1064,7 +1066,7 @@ filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
if (!data->lseg) {
nfs_init_commit(data, mds_pages, NULL);
nfs_initiate_commit(data, NFS_CLIENT(inode),
- data->mds_ops, how);
+ data->mds_ops, how, 0);
} else {
nfs_init_commit(data, &FILELAYOUT_LSEG(data->lseg)->commit_buckets[data->ds_commit_index].committing, data->lseg);
filelayout_initiate_commit(data, how);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index cc1f758..da7c0b1 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -171,7 +171,7 @@ static void nfs_readpage_release(struct nfs_page *req)
}

int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
- const struct rpc_call_ops *call_ops)
+ const struct rpc_call_ops *call_ops, int flags)
{
struct inode *inode = data->inode;
int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
@@ -188,7 +188,7 @@ int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
.callback_ops = call_ops,
.callback_data = data,
.workqueue = nfsiod_workqueue,
- .flags = RPC_TASK_ASYNC | swap_flags,
+ .flags = RPC_TASK_ASYNC | swap_flags | flags,
};

/* Set up the initial task struct. */
@@ -241,7 +241,7 @@ static int nfs_do_read(struct nfs_read_data *data,
{
struct inode *inode = data->args.context->dentry->d_inode;

- return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
+ return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops, 0);
}

static int
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2c68818..3b620e4 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -839,7 +839,7 @@ static int flush_task_priority(int how)
int nfs_initiate_write(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how)
+ int how, int flags)
{
struct inode *inode = data->inode;
int priority = flush_task_priority(how);
@@ -856,7 +856,7 @@ int nfs_initiate_write(struct nfs_write_data *data,
.callback_ops = call_ops,
.callback_data = data,
.workqueue = nfsiod_workqueue,
- .flags = RPC_TASK_ASYNC,
+ .flags = RPC_TASK_ASYNC | flags,
.priority = priority,
};
int ret = 0;
@@ -937,7 +937,7 @@ static int nfs_do_write(struct nfs_write_data *data,
{
struct inode *inode = data->args.context->dentry->d_inode;

- return nfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how);
+ return nfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how, 0);
}

static int nfs_do_multiple_writes(struct list_head *head,
@@ -1365,7 +1365,7 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);

int nfs_initiate_commit(struct nfs_write_data *data, struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how)
+ int how, int flags)
{
struct rpc_task *task;
int priority = flush_task_priority(how);
@@ -1381,7 +1381,7 @@ int nfs_initiate_commit(struct nfs_write_data *data, struct rpc_clnt *clnt,
.callback_ops = call_ops,
.callback_data = data,
.workqueue = nfsiod_workqueue,
- .flags = RPC_TASK_ASYNC,
+ .flags = RPC_TASK_ASYNC | flags,
.priority = priority,
};
/* Set up the initial task struct. */
@@ -1463,7 +1463,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)

/* Set up the argument struct */
nfs_init_commit(data, head, NULL);
- return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops, how);
+ return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops,
+ how, 0);
out_bad:
nfs_retry_commit(head, NULL);
nfs_commit_clear_lock(NFS_I(inode));
--
1.7.6.4


2012-03-22 13:53:03

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH Version 2 05/12] NFSv4.1: mark deviceid invalid on filelayout DS connection errors

On Thu, Mar 22, 2012 at 9:44 AM, Myklebust, Trond
<[email protected]> wrote:
> On Thu, 2012-03-22 at 13:23 +0000, Adamson, Andy wrote:
>> On Mar 21, 2012, at 4:39 PM, Myklebust, Trond wrote:
>>
>> > On Wed, 2012-03-21 at 15:46 -0400, [email protected] wrote:
>> >> From: Andy Adamson <[email protected]>
>> >>
>> >> This prevents the use of any layout for i/o that references the deviceid.
>> >> I/O is redirected through the MDS.
>> >>
>> >> Redirect the unhandled failed I/O to the MDS without marking either the
>> >> layout or the deviceid invalid.
>> >>
>> >> Signed-off-by: Andy Adamson <[email protected]>
>> >> ---
>> >> fs/nfs/nfs4filelayout.c | ? 65 ++++++++++++++++++++++++++++++++++------------
>> >> fs/nfs/nfs4filelayout.h | ? ?6 ++++
>> >> 2 files changed, 54 insertions(+), 17 deletions(-)
>> >>
>> >> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> >> index 3802937..1f1be26 100644
>> >> --- a/fs/nfs/nfs4filelayout.c
>> >> +++ b/fs/nfs/nfs4filelayout.c
>> >> @@ -116,7 +116,7 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
>> >> static int filelayout_async_handle_error(struct rpc_task *task,
>> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct nfs4_state *state,
>> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct nfs_client *clp,
>> >> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int *reset)
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long *reset)
>> >> {
>> >> ? ?struct nfs_server *mds_server = NFS_SERVER(state->inode);
>> >> ? ?struct nfs_client *mds_client = mds_server->nfs_client;
>> >> @@ -158,10 +158,23 @@ static int filelayout_async_handle_error(struct rpc_task *task,
>> >> ? ? ? ? ? ?break;
>> >> ? ?case -NFS4ERR_RETRY_UNCACHED_REP:
>> >> ? ? ? ? ? ?break;
>> >> + ?/* RPC connection errors */
>> >> + ?case -ECONNREFUSED:
>> >> + ?case -EHOSTDOWN:
>> >> + ?case -EHOSTUNREACH:
>> >> + ?case -ENETUNREACH:
>> >> + ?case -EIO:
>> >> + ?case -ETIMEDOUT:
>> >> + ?case -EPIPE:
>> >> + ? ? ? ? ?dprintk("%s DS connection error. Retry through MDS %d\n",
>> >> + ? ? ? ? ? ? ? ? ?__func__, task->tk_status);
>> >> + ? ? ? ? ?set_bit(NFS4_RESET_DEVICEID, reset);
>> >> + ? ? ? ? ?set_bit(NFS4_RESET_TO_MDS, reset);
>> >> + ? ? ? ? ?break;
>> >> ? ?default:
>> >> - ? ? ? ? ?dprintk("%s DS error. Retry through MDS %d\n", __func__,
>> >> - ? ? ? ? ? ? ? ? ?task->tk_status);
>> >> - ? ? ? ? ?*reset = 1;
>> >> + ? ? ? ? ?dprintk("%s Unhandled DS error. Retry through MDS %d\n",
>> >> + ? ? ? ? ? ? ? ? ?__func__, task->tk_status);
>> >> + ? ? ? ? ?set_bit(NFS4_RESET_TO_MDS, reset);
>> >> ? ? ? ? ? ?break;
>> >> ? ?}
>> >> out:
>> >> @@ -179,16 +192,22 @@ wait_on_recovery:
>> >> static int filelayout_read_done_cb(struct rpc_task *task,
>> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct nfs_read_data *data)
>> >> {
>> >> - ?int reset = 0;
>> >> + ?struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
>> >> + ?unsigned long reset = 0;
>> >>
>> >> ? ?dprintk("%s DS read\n", __func__);
>> >>
>> >> ? ?if (filelayout_async_handle_error(task, data->args.context->state,
>> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?data->ds_clp, &reset) == -EAGAIN) {
>> >> - ? ? ? ? ?dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
>> >> - ? ? ? ? ? ? ? ? ?__func__, data->ds_clp, data->ds_clp->cl_session);
>> >> - ? ? ? ? ?if (reset)
>> >> +
>> >> + ? ? ? ? ?dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
>> >> + ? ? ? ? ? ? ? ? ?reset, data->ds_clp, data->ds_clp->cl_session);
>> >> +
>> >> + ? ? ? ? ?if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
>> >> ? ? ? ? ? ? ? ? ? ?filelayout_reset_read(task, data);
>> >> + ? ? ? ? ? ? ? ? ?if (test_bit(NFS4_RESET_DEVICEID, &reset))
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ?filelayout_mark_devid_invalid(devid);
>> >
>> > Is there any reason why we shouldn't just do the
>> > filelayout_mark_devid_invalid() within filelayout_async_handle_error()
>> > instead of having the caller do it?
>>
>> We would have to pass in the lseg argument.
>
> No. You'd only have to pass in the device id.
>
>> >
>> > That should also enable us to get rid of the whole 'reset' argument and
>> > replace it with a return value != 0 && != -EAGAIN.
>>
>> We would need to pass in the operation because READ/WRITE/COMMIT call different reset functions.
>
> No. The caller would still do the reset. It's just that you would have a
> special return value to indicate it.
>
>> I chose the 'reset' argument method instead of passing in the operation because I thought it cleaner to keep the per operation logic in the per operation rpc functions instead of having a switch(operation) statement in the async handler for each group of errors.
>
> My point is that you don't need an extra parameter to do this. You just
> need a special return value.
>
>> >
>> >> + ? ? ? ? ?}
>> >> ? ? ? ? ? ?rpc_restart_call_prepare(task);
>> >
>> > This can probably also be done inside filelayout_async_handle_error(),
>> > BTW.
>>
>> COMMIT does not call rpc_restart_call_prepare on reset ?or invalid deviceid errors because we want the release function to fail with a verifier mismatch.
>
> Which is a grotesque hack in itself... (the verifier hack, that is). I'm
> hoping that Fred will fix that in the new code. It doesn't take much:
> just flag in the struct nfs_write_data (or in his case: struct
> nfs_commit_data).
>
>> So it's up to you: I could move all the per operation logic into the async handler by passing in the operation and using it to dereferencing the tk_callback to get the lseg and other needed parameters - then moving the code from the done_cb routines into the async handler under a switch(operation) for both the default reset to mds and for the invalid deviced.
>
> Then let's keep the rpc_restart_call_prepare in the caller, but move the
> deviceid invalidation into the async_hander. Let's also replace the
> reset argument with a return value...

OK

-->Andy

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>