2012-03-15 18:40:56

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 00/11] NFSv4.1 file layout data server quick failover

From: Andy Adamson <[email protected]>

Currently, when a data server connection goes down due to a network partion,
a data server failure, or an administrative action, RPC tasks in various
stages of the RPC finite state machine (FSM) need to transmit and timeout
(or other failure) before being redirected towards an alternative server
(MDS or another DS).
This can take a very long time if the connection goes down during a heavy
I/O load where the data server fore channel session slot_tbl_waitq and the
transport sending/pending waitqs are populated with many requests.
(see RedHat Bugzilla 756212 "Redirecting I/O through the MDS after a data
server network partition is very slow")
The current code also keeps the client structure and the session to the failed
data server until umount.

These patches address this problem by setting data server RPC tasks to
RPC_TASK_SOFTCONN and handling the resultant connection errors as follows:

* The pNFS deviceid is marked invalid which blocks any new pNFS io using that
deviceid.
* The RPC done routines for READ, WRITE and COMMIT redirect the requests
to the new server (MDS) and send the request back through the RPC FSM.
* An rpc_action which also redirects the request to the MDS on an invalid
deviceid is registered with the data server session fore channel
slot_tbl_waitq rpc_sleep_on calls and is executed upon wake up.
* The data server session fore channel slot_tbl_waitq is drained using a
new rpc_drain_queue method.
* All data server io requests reference the data server client structure
across io calls, and the client is dereferenced upon deviceid invalidation so
that the client (and the session) is freed upon the last (failed) redirected io.

Testing:
I use a pynfs file layout server with a DS to test. The pynfs server and DS
is modified to use the local host for MDS to DS communication. I add a
second ipv4 address to the single machine interface for the DS to client
communication. While a "dd" or a read/write heavy Connectathon test is
running, the DS ip address is removed from the ethernet interface, and the
client recovers io to the MDS.
I have tested READ and WRITE recovery multiple times, and have managed to
time the removal of the DS ip address during a DS COMMIT and have seen it
recover as well. :)


Comments welcome

--> Andy

Clean-up patches:
0001-NFSv4.1-move-nfs4_reset_read-and-nfs_reset_write.patch
0002-NFSv4.1-cleanup-filelayout-invalid-deviceid-handling.patch
0003-NFSv4.1-cleanup-filelayout-invalid-layout-handling.patch

Quick failover patches:
0004-NFSv4.1-set-RPC_TASK_SOFTCONN-for-filelayout-DS-RPC-.patch
0005-NFSv4.1-mark-deviceid-invalid-on-filelayout-DS-conne.patch
0006-NFSv4.1-send-filelayout-DS-commits-to-the-MDS-on-inv.patch
0007-NFSv4.1-Check-invalid-deviceid-upon-slot-table-waitq.patch
0008-SUNRPC-add-rpc_drain_queue-to-empty-an-rpc_waitq.patch
0009-NFSv4.1-wake-up-all-tasks-on-un-connected-DS-slot-ta.patch
0010-NFSv4.1-ref-count-nfs_client-across-filelayout-data-.patch
0011-NFSv4.1-de-reference-a-disconnected-data-server-clie.patch


Andy Adamson (11):
NFSv4.1 move nfs4_reset_read and nfs_reset_write
NFSv4.1: cleanup filelayout invalid deviceid handling
NFSv4.1 cleanup filelayout invalid layout handling
NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
NFSv4.1: mark deviceid invalid on filelayout DS connection errors
NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid
NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
SUNRPC: add rpc_drain_queue to empty an rpc_waitq
NFSv4.1 wake up all tasks on un-connected DS slot table waitq
NFSv4.1 ref count nfs_client across filelayout data server io
NFSv4.1 de reference a disconnected data server client record

fs/nfs/internal.h | 11 +-
fs/nfs/nfs4filelayout.c | 207 +++++++++++++++++++++++++++++++-----------
fs/nfs/nfs4filelayout.h | 28 +++++-
fs/nfs/nfs4filelayoutdev.c | 54 ++++++-----
fs/nfs/nfs4proc.c | 43 +--------
fs/nfs/read.c | 6 +-
fs/nfs/write.c | 13 ++-
include/linux/nfs_xdr.h | 1 +
include/linux/sunrpc/sched.h | 1 +
net/sunrpc/sched.c | 27 ++++++
10 files changed, 257 insertions(+), 134 deletions(-)

--
1.7.6.4



2012-03-15 18:40:57

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 01/11] NFSv4.1 move nfs4_reset_read and nfs_reset_write

From: Andy Adamson <[email protected]>

Only called by the file layout code

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/internal.h | 5 +++--
fs/nfs/nfs4filelayout.c | 35 +++++++++++++++++++++++++++++++++--
fs/nfs/nfs4proc.c | 39 ++++-----------------------------------
3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 04a9147..44066d9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -344,13 +344,14 @@ extern int nfs_migrate_page(struct address_space *,

/* nfs4proc.c */
extern void __nfs4_read_done_cb(struct nfs_read_data *);
-extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
+extern int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data);
+extern int nfs4_write_done_cb(struct rpc_task *task,
+ struct nfs_write_data *data);
extern int nfs4_init_client(struct nfs_client *clp,
const struct rpc_timeout *timeparms,
const char *ip_addr,
rpc_authflavor_t authflavour,
int noresvport);
-extern void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data);
extern int _nfs4_call_sync(struct rpc_clnt *clnt,
struct nfs_server *server,
struct rpc_message *msg,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 379a085..447c2c9 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -82,6 +82,37 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
BUG();
}

+/* Reset the the nfs_write_data to send the write to the MDS. */
+void filelayout_reset_write(struct rpc_task *task, struct nfs_write_data *data)
+{
+ dprintk("%s Reset task for i/o through\n", __func__);
+ put_lseg(data->lseg);
+ data->lseg = NULL;
+ data->ds_clp = NULL;
+ data->write_done_cb = nfs4_write_done_cb;
+ data->args.fh = NFS_FH(data->inode);
+ data->args.bitmask = data->res.server->cache_consistency_bitmask;
+ data->args.offset = data->mds_offset;
+ data->res.fattr = &data->fattr;
+ task->tk_ops = data->mds_ops;
+ rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+
+/* Reset the the nfs_read_data to send the read to the MDS. */
+void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
+{
+ dprintk("%s Reset task for i/o through\n", __func__);
+ put_lseg(data->lseg);
+ data->lseg = NULL;
+ /* offsets will differ in the dense stripe case */
+ data->args.offset = data->mds_offset;
+ data->ds_clp = NULL;
+ data->args.fh = NFS_FH(data->inode);
+ data->read_done_cb = nfs4_read_done_cb;
+ task->tk_ops = data->mds_ops;
+ rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+
static int filelayout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
@@ -158,7 +189,7 @@ static int filelayout_read_done_cb(struct rpc_task *task,
__func__, data->ds_clp, data->ds_clp->cl_session);
if (reset) {
pnfs_set_lo_fail(data->lseg);
- nfs4_reset_read(task, data);
+ filelayout_reset_read(task, data);
}
rpc_restart_call_prepare(task);
return -EAGAIN;
@@ -238,7 +269,7 @@ static int filelayout_write_done_cb(struct rpc_task *task,
__func__, data->ds_clp, data->ds_clp->cl_session);
if (reset) {
pnfs_set_lo_fail(data->lseg);
- nfs4_reset_write(task, data);
+ filelayout_reset_write(task, data);
}
rpc_restart_call_prepare(task);
return -EAGAIN;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5e0961a..45b67d8 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3264,7 +3264,7 @@ void __nfs4_read_done_cb(struct nfs_read_data *data)
nfs_invalidate_atime(data->inode);
}

-static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
+int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
{
struct nfs_server *server = NFS_SERVER(data->inode);

@@ -3278,6 +3278,7 @@ static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
renew_lease(server, data->timestamp);
return 0;
}
+EXPORT_SYMBOL_GPL(nfs4_read_done_cb);

static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
{
@@ -3299,23 +3300,7 @@ static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message
nfs41_init_sequence(&data->args.seq_args, &data->res.seq_res, 0);
}

-/* Reset the the nfs_read_data to send the read to the MDS. */
-void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
-{
- dprintk("%s Reset task for i/o through\n", __func__);
- put_lseg(data->lseg);
- data->lseg = NULL;
- /* offsets will differ in the dense stripe case */
- data->args.offset = data->mds_offset;
- data->ds_clp = NULL;
- data->args.fh = NFS_FH(data->inode);
- data->read_done_cb = nfs4_read_done_cb;
- task->tk_ops = data->mds_ops;
- rpc_task_reset_client(task, NFS_CLIENT(data->inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_read);
-
-static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
+int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
{
struct inode *inode = data->inode;

@@ -3329,6 +3314,7 @@ static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data
}
return 0;
}
+EXPORT_SYMBOL_GPL(nfs4_write_done_cb);

static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
{
@@ -3338,23 +3324,6 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
nfs4_write_done_cb(task, data);
}

-/* Reset the the nfs_write_data to send the write to the MDS. */
-void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data)
-{
- dprintk("%s Reset task for i/o through\n", __func__);
- put_lseg(data->lseg);
- data->lseg = NULL;
- data->ds_clp = NULL;
- data->write_done_cb = nfs4_write_done_cb;
- data->args.fh = NFS_FH(data->inode);
- data->args.bitmask = data->res.server->cache_consistency_bitmask;
- data->args.offset = data->mds_offset;
- data->res.fattr = &data->fattr;
- task->tk_ops = data->mds_ops;
- rpc_task_reset_client(task, NFS_CLIENT(data->inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_write);
-
static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_message *msg)
{
struct nfs_server *server = NFS_SERVER(data->inode);
--
1.7.6.4


2012-03-15 18:41:01

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 2528cb2..26c83b0 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -847,12 +847,16 @@ filelayout_choose_commit_list(struct nfs_page *req,
struct pnfs_layout_segment *lseg)
{
struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
u32 i, j;
struct list_head *list;

if (fl->commit_through_mds)
return &NFS_I(req->wb_context->dentry->d_inode)->commit_list;

+ if (filelayout_test_devid_invalid(devid))
+ return NULL; /* Resend I/O (writes and commits) to MDS */
+
/* Note that we are calling nfs4_fl_calc_j_index on each page
* that ends up being committed to a data server. An attractive
* alternative is to add a field to nfs_write_data and nfs_page
@@ -933,9 +937,14 @@ find_only_write_lseg_locked(struct inode *inode)
{
struct pnfs_layout_segment *lseg;

- list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list)
+ list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
+ if (filelayout_test_devid_invalid(devid))
+ /* Resend I/O (writes and commits) to MDS */
+ return NULL;
if (lseg->pls_range.iomode == IOMODE_RW)
return get_lseg(lseg);
+ }
return NULL;
}

--
1.7.6.4


2012-03-16 15:13:05

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq

On Thu, Mar 15, 2012 at 8:10 PM, Myklebust, Trond
<[email protected]> wrote:
> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> ?include/linux/sunrpc/sched.h | ? ?1 +
>> ?net/sunrpc/sched.c ? ? ? ? ? | ? 27 +++++++++++++++++++++++++++
>> ?2 files changed, 28 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>> index dc0c3cc..fce0873 100644
>> --- a/include/linux/sunrpc/sched.h
>> +++ b/include/linux/sunrpc/sched.h
>> @@ -235,6 +235,7 @@ void ? ? ? ? ? ? ?rpc_sleep_on_priority(struct rpc_wait_queue *,
>> ?void ? ? ? ? rpc_wake_up_queued_task(struct rpc_wait_queue *,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct rpc_task *);
>> ?void ? ? ? ? rpc_wake_up(struct rpc_wait_queue *);
>> +void ? ? ? ? rpc_drain_queue(struct rpc_wait_queue *);
>> ?struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
>> ?struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool (*)(struct rpc_task *, void *),
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 1c570a8..11928ff 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue)
>> ?EXPORT_SYMBOL_GPL(rpc_wake_up);
>>
>> ?/**
>> + * rpc_drain_queue - empty the queue and wake up all rpc_tasks
>> + * @queue: rpc_wait_queue on which the tasks are sleeping
>> + *
>> + * Grabs queue->lock
>> + */
>> +void rpc_drain_queue(struct rpc_wait_queue *queue)
>> +{
>> + ? ? struct rpc_task *task;
>> + ? ? struct list_head *head;
>> +
>> + ? ? spin_lock_bh(&queue->lock);
>> + ? ? head = &queue->tasks[queue->maxpriority];
>> + ? ? for (;;) {
>> + ? ? ? ? ? ? while (!list_empty(head)) {
>> + ? ? ? ? ? ? ? ? ? ? task = list_entry(head->next, struct rpc_task,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u.tk_wait.list);
>> + ? ? ? ? ? ? ? ? ? ? rpc_wake_up_task_queue_locked(queue, task);
>> + ? ? ? ? ? ? }
>> + ? ? ? ? ? ? if (head == &queue->tasks[0])
>> + ? ? ? ? ? ? ? ? ? ? break;
>> + ? ? ? ? ? ? head--;
>> + ? ? }
>> + ? ? spin_unlock_bh(&queue->lock);
>> +}
>> +EXPORT_SYMBOL_GPL(rpc_drain_queue);
>> +
>
> Confused... How is this function any different from rpc_wake_up()?

Because it actually drains the queues where rpc_wake_up does not. See
the attached output where I added the same printks to both
rpc_drain_queue and rpc_wake_up.

-->Andy




>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>


Attachments:
rpc-wake_up.txt (5.49 kB)
rpc-drain.txt (9.73 kB)
Download all attachments

2012-03-19 17:31:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq

T24gTW9uLCAyMDEyLTAzLTE5IGF0IDE2OjQ5ICswMDAwLCBBZGFtc29uLCBBbmR5IHdyb3RlOg0K
PiBPbiBNYXIgMTksIDIwMTIsIGF0IDEyOjQ0IFBNLCBBbmR5IEFkYW1zb24gd3JvdGU6DQo+IA0K
PiA+IA0KPiA+IFRoZSByZWFzb24gcnBjX3dha2VfdXAoKSBkb2VzIG5vdCB3YWtlIHVwIGFsbCB0
aGUgdGFza3MsIGlzIHRoYXQgaXQgdXNlcyBsaXN0X2Zvcl9lYWNoX2VudHJ5X3NhZmUgd2hpY2gg
YXNzaWducyB0aGUgIm5leHQiIHRhc2stPnUudGtfd2FpdC5saXN0cyBwb2ludGVyIGluIGl0J3Mg
Zm9yIGxvb3Agd2l0aG91dCB0YWtpbmcgaW50byBjb25zaWRlcmF0aW9uIHRoZSBwcmlvcml0eSBx
dWV1ZSB0YXNrcyBoYW5naW5nIG9mZiB0aGUgIHRhc2stPnUudGtfd2FpdC5saW5rcyBsaXN0IChu
b3RlICJsaXN0IiB2cnMgImxpbmsiKSBhc3NpZ25lZCBpbiB0aGUgZm9yIGxvb3AgYm9keSBieSB0
aGUgY2FsbCB0byBycGNfd2FrZV91cF90YXNrX3F1ZXVlX2xvY2tlZC4NCj4gPiANCj4gPiBTYXkg
d2UgaGF2ZSBhIHdhaXQgcXVldWUgd2l0aCB0aGUgZm9sbG93aW5nOiBPbmUgdGFzayBpbiB0aGUg
bGlzdCAoVDEpIGFuZCB0d28gcHJpb3JpdHkgdGFza3MgaW4gVDEncyAibGlua3MiIGxpc3QgKFQy
LCBUMykuDQo+ID4gDQo+ID4gSCAtPiBUMSAtPiBIICAgIHRoZSBuZXh0IHBvaW50ZXJzIGluIHRo
ZSAibGlzdHMiIGxpc3QgKFQxLT51LnRrX3dhaXQubGlzdCkgaGFuZ2luZyBvZmYgdGhlIGxpc3Qg
aGVhZCAiSCINCj4gPiANCj4gPiBUMSAtPiBUMiAtPiBUMy0+VDEgIHRoZSBuZXh0IHBvaW50ZXJz
IGluIHRoZSAibGlua3MiIGxpc3QuIChUMS0+dS50a193YWl0LmxpbmtzIGlzIHRoZSBsaXN0IGhl
YWQpDQo+ID4gDQo+ID4gSGVyZSBpcyB3aGF0IHJwY193YWtlX3VwX3Rhc2tfcXVldWVfbG9ja2Vk
IGRvZXMgKGl0IGNhbGxzIF9fcnBjX3JlbW92ZV93YWl0X3F1ZXVlX3ByaW9yaXR5IG9uIHByaW9y
aXR5IHF1ZXVlcykNCj4gPiANCj4gPiBIIC0+IFQyIC0+SA0KPiA+IA0KPiA+IFQyIC0+IFQzIC0+
IFQyDQo+ID4gDQo+ID4gd2l0aCBUMSByZW1vdmVkLiAgVGhpcyBpcyBleGFjdGx5IHdoYXQgc2hv
dWxkIGhhcHBlbiwgVDEgaXMgcmVtb3ZlZCwgVDEncyB1LnRrX3dhaXQubGlua3MgbGlzdCBpcyBz
cGxpY2VkIG9udG8gVDIncyB1LnRrX3dhaXQubGlua3MsIGFuZCBUMiBpcyBtb3ZlZCB0byBIJ3Mg
bGlzdCBvZiB1LnRrX3dhaXQubGlzdCB0YXNrcy4NCj4gPiANCj4gPiANCj4gPiAjZGVmaW5lIGxp
c3RfZm9yX2VhY2hfZW50cnlfc2FmZShwb3MsIG4sIGhlYWQsIG1lbWJlcikgICAgICAgICAgICAg
ICAgICBcDQo+ID4gICAgICAgIGZvciAocG9zID0gbGlzdF9lbnRyeSgoaGVhZCktPm5leHQsIHR5
cGVvZigqcG9zKSwgbWVtYmVyKSwgICAgICBcDQo+ID4gICAgICAgICAgICAgICAgbiA9IGxpc3Rf
ZW50cnkocG9zLT5tZW1iZXIubmV4dCwgdHlwZW9mKCpwb3MpLCBtZW1iZXIpOyBcDQo+ID4gICAg
ICAgICAgICAgJnBvcy0+bWVtYmVyICE9IChoZWFkKTsgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICBcDQo+ID4gICAgICAgICAgICAgcG9zID0gbiwgbiA9IGxpc3RfZW50cnkobi0+
bWVtYmVyLm5leHQsIHR5cGVvZigqbiksIG1lbWJlcikpDQo+ID4gDQo+ID4gDQo+ID4gQlVUISAg
dGhlIGZvciBsb29wIGluICBsaXN0X2Zvcl9lYWNoX2VudHJ5X3NhZmUodGFzaywgbmV4dCwgaGVh
ZCwgdS50a193YWl0Lmxpc3QpICBkb2VzIHRoZSBmb2xsb3dpbmcgb24gdGhlIGFib3ZlIGV4YW1w
bGU6DQo+ID4gDQo+ID4gU3RhcnQ6DQo+ID4gDQo+ID4gSC0+IFQxIC0+IEgNCj4gPiBUMSAtPiBU
MiAtPiBUMyAtPiBUMQ0KPiA+IA0KPiA+IGZvciBsb29wIEluaWl0aWFsaXplcjoNCj4gPiANCj4g
PiB0YXNrID0gbGlzdF9lbnRyeSgoaGVhZCktPm5leHQsIHR5cGVvZigqdGFzayksIHUudGtfd2Fp
dC5saXN0KTogDQo+ID4gbmV4dCA9IGxpc3RfZW50cnkodGFzay0+dS50a193YWl0Lmxpc3QubmV4
dCwgdHlwZW9mKCp0YXNrKSwgdS50a193YWl0Lmxpc3QpDQo+ID4gDQo+ID4gc28gdGFzayA9IFQx
LCBuZXh0ID0gSC4NCj4gPiANCj4gPiBmb3IgbG9vcCB0ZXN0Og0KPiA+IHRhc2sgIT0gSC4gVGhp
cyBpcyBGQUxTRQ0KPiANCj4gb29wcyAtIGdvdCBteSBUUlVFL0ZBTFNFIG1peGVkIDspDQo+ICB0
aGlzIGlzIFRSVUUgc28gZXhlY3V0ZSBib2R5DQo+IA0KPiA+IA0KPiA+IGZvciBsb29wIGJvZHk6
DQo+ID4gY2FsbCBycGNfd2FrZV91cF90YXNrX3F1ZXVlX2xvY2tlZA0KPiA+IA0KPiA+IEggLT4g
VDIgLT4gSA0KPiA+IFQyIC0+LlQzIC0+IFQyDQo+ID4gDQo+ID4gZm9yIGxvb3AgaW5jcmVtZW50
IHN0ZXANCj4gPiB0YXNrID0gbmV4dDsgDQo+ID4gbmV4dCA9IGxpc3RfZW50cnkobmV4dC0+dS50
a193YWl0Lmxpc3QubmV4dCwgdHlwZW9mKCpuZXh0KSwgdS50a193YWl0Lmxpc3QpOw0KPiA+IA0K
PiA+IHNvIHRhc2sgPSBILCBuZXh0ID0gSC4NCj4gPiANCj4gPiBmb3IgbG9vcCB0ZXN0DQo+ID4g
DQo+ID4gdGFzayAhPSBIICBUaGlzIGlzIFRSVUUNCj4gDQo+IGFuZCB0aGlzIGlzIEZBTFNFIGRv
IGRvbid0IGV4ZWN1dGUgYm9keQ0KPiANCj4gLS0+QW5keQ0KPiA+IA0KPiA+IHNvIHN0b3AuDQo+
ID4gDQo+ID4gTm90ZSB0aGF0IG9ubHkgVDEgd2FzIHByb2Nlc3NlZCAtIFQyIGFuZCBUMyBhcmUg
c3RpbGwgb24gdGhlIHF1ZXVlLg0KPiA+IA0KPiA+IFNvIGxpc3RfZm9yX2VhY2hfZW50cnlfc2Fm
ZSB3aWxsIG5vdCBwcm9jZXNzIGFueSB1LnRrX3dhaXQubGlua3MgdGFza3MsIGJlY2F1c2UgdGhl
IG5leHQgcG9pbnRlciBpcyBhc3NpZ25lZCBwcmlvciB0byB0aGUgY2FsbCB0byBycGNfd2FrZV91
cF90YXNrX3F1ZXVlX2xvY2tlZCwgc28gdGhlIGZvciBsb29wIGluY3JlbWVudCAob3IgaW5pdGlh
bGl6YXRpb24pIHNldHRpbmcgb2Y6DQo+ID4gDQo+ID4gdGFzayA9IG5leHQ6DQo+ID4gDQo+ID4g
ZG9lcyBOT1QgcGljayB1cCB0aGUgZmFjdCB0aGF0IChpbiBvdXIgZXhhbXBsZSkgVDEgd2FzIFJF
UExBQ0VEIGJ5IFQyLCBub3QganVzdCByZW1vdmVkLg0KPiA+IA0KPiA+IE9uIHRoZSBvdGhlciBo
YW5kLCB0aGUgcnBjX2RyYWluX3F1ZXVlKCkgcGF0Y2ggdXNlcyB3aGlsZSghbGlzdF9lbXB0eSgp
KSBpbnN0ZWFkIG9mIGxpc3RfZm9yX2VhY2hfZW50cnlfc2FmZSgpIGFuZCBoYXBwaWx5IHByb2Nl
c3NlcyBhbGwgb2YgdGhlIHRhc2tzIG9uIHRoZSBxdWV1ZToNCj4gPiANCj4gPiBIIC0+IFQxIC0+
IEgNCj4gPiANCj4gPiBsaWxzdCBpcyBub3QgZW1wdHksIHNvIGNhbGwgcnBjX3dha2VfdXBfdGFz
a19xdWV1ZV9sb2NrZWQuDQo+ID4gDQo+ID4gSCAtPiBUMiAtPiBIDQo+ID4gDQo+ID4gbGlzdCBp
cyBub3QgZW1wdHk7IHNvIGNhbGwgcnBjX3dha2VfdXAgX3Rhc2tfcXVldWVfbG9ja2VkLg0KPiA+
IA0KPiA+IEggLT4gVDMgLT4gSA0KPiA+IA0KPiA+IGxpc3QgaXMgbm90IGVtcHR5OyBzbyBjYWxs
IHJwY193YWtlX3VwX3Rhc2tfcXVldWVfbG9ja2VkDQo+ID4gDQo+ID4gbGlzdCBpcyBlbXB0eS4N
Cj4gPiANCj4gPiANCj4gPiAtLT5BbmR5DQo+ID4gDQoNClRoYW5rcyBBbmR5ISEhIFlvdXIgZXhw
bGFuYXRpb24gbWFrZXMgcGVyZmVjdCBzZW5zZS4gSSBjYW4ndCBiZWxpZXZlDQp0aGF0IHdlJ3Zl
IG1pc3NlZCB0aGlzLi4uDQoNCkknbGwgd3JpdGUgYSBwYXRjaCBmb3IgdGhlIHZhcmlvdXMgY2Fz
ZXMgdGhhdCB3ZSBuZWVkIHRvIGZpeC4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29t
DQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-03-15 18:41:00

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 04/11] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls

From: Andy Adamson <[email protected]>

RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/internal.h | 6 +++---
fs/nfs/nfs4filelayout.c | 10 ++++++----
fs/nfs/read.c | 6 +++---
fs/nfs/write.c | 13 +++++++------
4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 44066d9..2986604 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -297,7 +297,7 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
struct nfs_pageio_descriptor;
/* read.c */
extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
- const struct rpc_call_ops *call_ops);
+ const struct rpc_call_ops *call_ops, int flags);
extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
extern int nfs_generic_pagein(struct nfs_pageio_descriptor *desc,
struct list_head *head);
@@ -320,12 +320,12 @@ extern void nfs_commit_free(struct nfs_write_data *p);
extern int nfs_initiate_write(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how);
+ int how, int flags);
extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
extern int nfs_initiate_commit(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how);
+ int how, int flags);
extern void nfs_init_commit(struct nfs_write_data *data,
struct list_head *head,
struct pnfs_layout_segment *lseg);
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index b0b775c..6da0874 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -403,7 +403,7 @@ filelayout_read_pagelist(struct nfs_read_data *data)

/* Perform an asynchronous read to ds */
status = nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
- &filelayout_read_call_ops);
+ &filelayout_read_call_ops, RPC_TASK_SOFTCONN);
BUG_ON(status != 0);
return PNFS_ATTEMPTED;
}
@@ -442,7 +442,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)

/* Perform an asynchronous write */
status = nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
- &filelayout_write_call_ops, sync);
+ &filelayout_write_call_ops, sync,
+ RPC_TASK_SOFTCONN);
BUG_ON(status != 0);
return PNFS_ATTEMPTED;
}
@@ -889,7 +890,8 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
if (fh)
data->args.fh = fh;
return nfs_initiate_commit(data, ds->ds_clp->cl_rpcclient,
- &filelayout_commit_call_ops, how);
+ &filelayout_commit_call_ops, how,
+ RPC_TASK_SOFTCONN);
}

/*
@@ -1008,7 +1010,7 @@ filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
if (!data->lseg) {
nfs_init_commit(data, mds_pages, NULL);
nfs_initiate_commit(data, NFS_CLIENT(inode),
- data->mds_ops, how);
+ data->mds_ops, how, 0);
} else {
nfs_init_commit(data, &FILELAYOUT_LSEG(data->lseg)->commit_buckets[data->ds_commit_index].committing, data->lseg);
filelayout_initiate_commit(data, how);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 3c2540d..0c1ed1a 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -172,7 +172,7 @@ static void nfs_readpage_release(struct nfs_page *req)
}

int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
- const struct rpc_call_ops *call_ops)
+ const struct rpc_call_ops *call_ops, int flags)
{
struct inode *inode = data->inode;
int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
@@ -189,7 +189,7 @@ int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
.callback_ops = call_ops,
.callback_data = data,
.workqueue = nfsiod_workqueue,
- .flags = RPC_TASK_ASYNC | swap_flags,
+ .flags = RPC_TASK_ASYNC | swap_flags | flags,
};

/* Set up the initial task struct. */
@@ -242,7 +242,7 @@ static int nfs_do_read(struct nfs_read_data *data,
{
struct inode *inode = data->args.context->dentry->d_inode;

- return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
+ return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops, 0);
}

static int
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a630ad6..f9ae1a2 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -807,7 +807,7 @@ static int flush_task_priority(int how)
int nfs_initiate_write(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how)
+ int how, int flags)
{
struct inode *inode = data->inode;
int priority = flush_task_priority(how);
@@ -824,7 +824,7 @@ int nfs_initiate_write(struct nfs_write_data *data,
.callback_ops = call_ops,
.callback_data = data,
.workqueue = nfsiod_workqueue,
- .flags = RPC_TASK_ASYNC,
+ .flags = RPC_TASK_ASYNC | flags,
.priority = priority,
};
int ret = 0;
@@ -905,7 +905,7 @@ static int nfs_do_write(struct nfs_write_data *data,
{
struct inode *inode = data->args.context->dentry->d_inode;

- return nfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how);
+ return nfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how, 0);
}

static int nfs_do_multiple_writes(struct list_head *head,
@@ -1345,7 +1345,7 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);

int nfs_initiate_commit(struct nfs_write_data *data, struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how)
+ int how, int flags)
{
struct rpc_task *task;
int priority = flush_task_priority(how);
@@ -1361,7 +1361,7 @@ int nfs_initiate_commit(struct nfs_write_data *data, struct rpc_clnt *clnt,
.callback_ops = call_ops,
.callback_data = data,
.workqueue = nfsiod_workqueue,
- .flags = RPC_TASK_ASYNC,
+ .flags = RPC_TASK_ASYNC | flags,
.priority = priority,
};
/* Set up the initial task struct. */
@@ -1443,7 +1443,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)

/* Set up the argument struct */
nfs_init_commit(data, head, NULL);
- return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops, how);
+ return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops,
+ how, 0);
out_bad:
nfs_retry_commit(head, NULL);
nfs_commit_clear_lock(NFS_I(inode));
--
1.7.6.4


2012-03-15 18:41:03

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
include/linux/sunrpc/sched.h | 1 +
net/sunrpc/sched.c | 27 +++++++++++++++++++++++++++
2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index dc0c3cc..fce0873 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -235,6 +235,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *,
void rpc_wake_up_queued_task(struct rpc_wait_queue *,
struct rpc_task *);
void rpc_wake_up(struct rpc_wait_queue *);
+void rpc_drain_queue(struct rpc_wait_queue *);
struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
bool (*)(struct rpc_task *, void *),
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 1c570a8..11928ff 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue)
EXPORT_SYMBOL_GPL(rpc_wake_up);

/**
+ * rpc_drain_queue - empty the queue and wake up all rpc_tasks
+ * @queue: rpc_wait_queue on which the tasks are sleeping
+ *
+ * Grabs queue->lock
+ */
+void rpc_drain_queue(struct rpc_wait_queue *queue)
+{
+ struct rpc_task *task;
+ struct list_head *head;
+
+ spin_lock_bh(&queue->lock);
+ head = &queue->tasks[queue->maxpriority];
+ for (;;) {
+ while (!list_empty(head)) {
+ task = list_entry(head->next, struct rpc_task,
+ u.tk_wait.list);
+ rpc_wake_up_task_queue_locked(queue, task);
+ }
+ if (head == &queue->tasks[0])
+ break;
+ head--;
+ }
+ spin_unlock_bh(&queue->lock);
+}
+EXPORT_SYMBOL_GPL(rpc_drain_queue);
+
+/**
* rpc_wake_up_status - wake up all rpc_tasks and set their status value.
* @queue: rpc_wait_queue on which the tasks are sleeping
* @status: status value to set
--
1.7.6.4


2012-03-15 18:40:58

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 02/11] NFSv4.1: cleanup filelayout invalid deviceid handling

From: Andy Adamson <[email protected]>

Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.

An invalid device prevents pNFS io.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 10 ----------
fs/nfs/nfs4filelayout.h | 21 +++++++++++++++++----
fs/nfs/nfs4filelayoutdev.c | 37 +++++++++++--------------------------
3 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 447c2c9..e1c0670 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -389,9 +389,6 @@ filelayout_read_pagelist(struct nfs_read_data *data)
__func__, data->inode->i_ino,
data->args.pgbase, (size_t)data->args.count, offset);

- if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
- return PNFS_NOT_ATTEMPTED;
-
/* Retrieve the correct rpc_client for the byte range */
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
@@ -431,16 +428,11 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
struct nfs_fh *fh;
int status;

- if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
- return PNFS_NOT_ATTEMPTED;
-
/* Retrieve the correct rpc_client for the byte range */
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
- __func__);
set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
return PNFS_NOT_ATTEMPTED;
@@ -898,8 +890,6 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
- __func__);
set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
prepare_to_resend_writes(data);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 21190bb..b54b389 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -62,12 +62,8 @@ struct nfs4_pnfs_ds {
atomic_t ds_count;
};

-/* nfs4_file_layout_dsaddr flags */
-#define NFS4_DEVICE_ID_NEG_ENTRY 0x00000001
-
struct nfs4_file_layout_dsaddr {
struct nfs4_deviceid_node id_node;
- unsigned long flags;
u32 stripe_count;
u8 *stripe_indices;
u32 ds_num;
@@ -107,6 +103,23 @@ FILELAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg)
return &FILELAYOUT_LSEG(lseg)->dsaddr->id_node;
}

+static inline void
+filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
+{
+ u32 *p = (u32 *)&node->deviceid;
+
+ printk(KERN_WARNING "NFS: Deviceid [%x%x%x%x] marked out of use.\n",
+ p[0], p[1], p[2], p[3]);
+
+ set_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
+static inline bool
+filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
+{
+ return test_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
extern struct nfs_fh *
nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);

diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index a866bbd..2b8ae96 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -791,48 +791,33 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
return flseg->fh_array[i];
}

-static void
-filelayout_mark_devid_negative(struct nfs4_file_layout_dsaddr *dsaddr,
- int err, const char *ds_remotestr)
-{
- u32 *p = (u32 *)&dsaddr->id_node.deviceid;
-
- printk(KERN_ERR "NFS: data server %s connection error %d."
- " Deviceid [%x%x%x%x] marked out of use.\n",
- ds_remotestr, err, p[0], p[1], p[2], p[3]);
-
- spin_lock(&nfs4_ds_cache_lock);
- dsaddr->flags |= NFS4_DEVICE_ID_NEG_ENTRY;
- spin_unlock(&nfs4_ds_cache_lock);
-}
-
struct nfs4_pnfs_ds *
nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
{
struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
+
+ if (filelayout_test_devid_invalid(devid))
+ return NULL;

if (ds == NULL) {
printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
__func__, ds_idx);
- return NULL;
+ goto mark_dev_invalid;
}

if (!ds->ds_clp) {
struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
int err;

- if (dsaddr->flags & NFS4_DEVICE_ID_NEG_ENTRY) {
- /* Already tried to connect, don't try again */
- dprintk("%s Deviceid marked out of use\n", __func__);
- return NULL;
- }
err = nfs4_ds_connect(s, ds);
- if (err) {
- filelayout_mark_devid_negative(dsaddr, err,
- ds->ds_remotestr);
- return NULL;
- }
+ if (err)
+ goto mark_dev_invalid;
}
return ds;
+
+mark_dev_invalid:
+ filelayout_mark_devid_invalid(devid);
+ return NULL;
}
--
1.7.6.4


2012-03-15 18:41:04

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 10/11] NFSv4.1 ref count nfs_client across filelayout data server io

From: Andy Adamson <[email protected]>

Prepare to put a dis-connected DS client record.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 22 +++++++++++++++++-----
1 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index a67a137..4c846cb 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -88,6 +88,8 @@ void filelayout_reset_write(struct rpc_task *task, struct nfs_write_data *data)
dprintk("%s Reset task for i/o through\n", __func__);
put_lseg(data->lseg);
data->lseg = NULL;
+ /* balance nfs_get_client in filelayout_write_pagelist */
+ nfs_put_client(data->ds_clp);
data->ds_clp = NULL;
data->write_done_cb = nfs4_write_done_cb;
data->args.fh = NFS_FH(data->inode);
@@ -106,6 +108,8 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
data->lseg = NULL;
/* offsets will differ in the dense stripe case */
data->args.offset = data->mds_offset;
+ /* balance nfs_get_client in filelayout_read_pagelist */
+ nfs_put_client(data->ds_clp);
data->ds_clp = NULL;
data->args.fh = NFS_FH(data->inode);
data->read_done_cb = nfs4_read_done_cb;
@@ -288,6 +292,7 @@ static void filelayout_read_release(void *data)
{
struct nfs_read_data *rdata = (struct nfs_read_data *)data;

+ nfs_put_client(rdata->ds_clp);
rdata->mds_ops->rpc_release(data);
}

@@ -402,6 +407,7 @@ static void filelayout_write_release(void *data)
{
struct nfs_write_data *wdata = (struct nfs_write_data *)data;

+ nfs_put_client(wdata->ds_clp);
wdata->mds_ops->rpc_release(data);
}

@@ -409,6 +415,7 @@ static void filelayout_commit_release(void *data)
{
struct nfs_write_data *wdata = (struct nfs_write_data *)data;

+ nfs_put_client(wdata->ds_clp);
nfs_commit_release_pages(wdata);
if (atomic_dec_and_test(&NFS_I(wdata->inode)->commits_outstanding))
nfs_commit_clear_lock(NFS_I(wdata->inode));
@@ -456,9 +463,11 @@ filelayout_read_pagelist(struct nfs_read_data *data)
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds)
return PNFS_NOT_ATTEMPTED;
- dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);
+ dprintk("%s USE DS: %s cl_count %d\n", __func__,
+ ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));

/* No multipath support. Use first DS */
+ atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
@@ -491,11 +500,12 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds)
return PNFS_NOT_ATTEMPTED;
- dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
- data->inode->i_ino, sync, (size_t) data->args.count, offset,
- ds->ds_remotestr);
+ dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d\n",
+ __func__, data->inode->i_ino, sync, (size_t) data->args.count,
+ offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));

data->write_done_cb = filelayout_write_done_cb;
+ atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
@@ -953,8 +963,10 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
data->mds_ops->rpc_release(data);
return -EAGAIN;
}
- dprintk("%s ino %lu, how %d\n", __func__, data->inode->i_ino, how);
+ dprintk("%s ino %lu, how %d cl_count %d\n", __func__,
+ data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count));
data->write_done_cb = filelayout_commit_done_cb;
+ atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
if (fh)
--
1.7.6.4


2012-03-16 00:28:05

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup

T24gVGh1LCAyMDEyLTAzLTE1IGF0IDE0OjQwIC0wNDAwLCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90
ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gDQo+IFJlZ2lz
dGVyIGEgbmV3IGZpbGVsYXlvdXQgRFMgcnBjX2FjdGlvbiBjYWxsYmFjayBmb3Igc2xlZXBpbmcg
b24gdGhlIGZvcmUNCj4gY2hhbm5lbCBzbG90IHRhYmxlIHdhaXRxLiAgQXZvaWQgYW55IGFkZGl0
aW9uYWwgUlBDIEZTTSBzdGF0ZXMNCj4gKHN1Y2ggYXMgdGltZW91dCkgd2hlbiB3YWtpbmcgdXAg
dG8gYW4gaW52YWxpZCBkZXZpY2VpZCBhbmQgcmVzZXQNCj4gdGhlIHRhc2sgZm9yIGlvIHRvIHRo
ZSBNRFMuDQoNCldoeSBjYW4ndCB5b3Ugc2ltcGx5IHB1dCB0aGlzIGNhbGwgdG8gZmlsZWxheW91
dF93cml0ZV9zbGVlcG9uX2NiIGluDQpmaWxlbGF5b3V0X3dyaXRlX3ByZXBhcmUgKGJlZm9yZSBj
YWxsaW5nIG5mczQxX3NldHVwX3NlcXVlbmNlKCkpPw0KDQpTaW5jZSBub3RoaW5nIGlzIGdvaW5n
IHRvIGNoYW5nZSB0aGUgdGFzay0+dGtfYWN0aW9uIGlmDQpuZnM0MV9zZXR1cF9zZXF1ZW5jZSgp
IHB1dHMgeW91IHRvIHNsZWVwLCB3aGF0IHZhbHVlIGRvZXMgdGhlIGNhbGxiYWNrDQphZGQ/DQoN
Ci0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0
QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

2012-03-19 21:32:48

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH Version 1 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup


On Mar 16, 2012, at 11:12 AM, Andy Adamson wrote:

> On Thu, Mar 15, 2012 at 8:27 PM, Myklebust, Trond
> <[email protected]> wrote:
>> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>>> From: Andy Adamson <[email protected]>
>>>
>>> Register a new filelayout DS rpc_action callback for sleeping on the fore
>>> channel slot table waitq. Avoid any additional RPC FSM states
>>> (such as timeout) when waking up to an invalid deviceid and reset
>>> the task for io to the MDS.
>>
>> Why can't you simply put this call to filelayout_write_sleepon_cb in
>> filelayout_write_prepare (before calling nfs41_setup_sequence())?
>
> I guess I can. Will do.

Actually, I that won't work.

>
> -->Andy
>
>>
>> Since nothing is going to change the task->tk_action if
>> nfs41_setup_sequence() puts you to sleep, what value does the callback
>> add?

The rpc_action remains rpc_call_prepare. But! We want the tasks coming off the failed DS fore channel slot table queue to be redirected to the MDS nfs_XXX_prepare, not the filelayout_xxx_prepare.

Note that filelayout_reset_read and filelayout_reset_write set the data->ds_clp to NULL which makes the call to nfs41_setup_sequence() in filelayout_write/read_prepare Oops?..

So, I'll keep this patch as is.

-->Andy

>
>> Linux NFS client maintainer
>>
>> NetApp
>> [email protected]
>> http://www.netapp.com
>>


2012-03-15 18:41:00

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 05/11] NFSv4.1: mark deviceid invalid on filelayout DS connection errors

From: Andy Adamson <[email protected]>

This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 65 ++++++++++++++++++++++++++++++++++------------
fs/nfs/nfs4filelayout.h | 6 ++++
2 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 6da0874..2528cb2 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -116,7 +116,7 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
static int filelayout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
- int *reset)
+ unsigned long *reset)
{
struct nfs_server *mds_server = NFS_SERVER(state->inode);
struct nfs_client *mds_client = mds_server->nfs_client;
@@ -158,10 +158,23 @@ static int filelayout_async_handle_error(struct rpc_task *task,
break;
case -NFS4ERR_RETRY_UNCACHED_REP:
break;
+ /* RPC connection errors */
+ case -ECONNREFUSED:
+ case -EHOSTDOWN:
+ case -EHOSTUNREACH:
+ case -ENETUNREACH:
+ case -EIO:
+ case -ETIMEDOUT:
+ case -EPIPE:
+ dprintk("%s DS connection error. Retry through MDS %d\n",
+ __func__, task->tk_status);
+ set_bit(NFS4_RESET_DEVICEID, reset);
+ set_bit(NFS4_RESET_TO_MDS, reset);
+ break;
default:
- dprintk("%s DS error. Retry through MDS %d\n", __func__,
- task->tk_status);
- *reset = 1;
+ dprintk("%s Unhandled DS error. Retry through MDS %d\n",
+ __func__, task->tk_status);
+ set_bit(NFS4_RESET_TO_MDS, reset);
break;
}
out:
@@ -179,16 +192,22 @@ wait_on_recovery:
static int filelayout_read_done_cb(struct rpc_task *task,
struct nfs_read_data *data)
{
- int reset = 0;
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ unsigned long reset = 0;

dprintk("%s DS read\n", __func__);

if (filelayout_async_handle_error(task, data->args.context->state,
data->ds_clp, &reset) == -EAGAIN) {
- dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
- __func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset)
+
+ dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
+ reset, data->ds_clp, data->ds_clp->cl_session);
+
+ if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_read(task, data);
+ if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ filelayout_mark_devid_invalid(devid);
+ }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -259,14 +278,20 @@ static void filelayout_read_release(void *data)
static int filelayout_write_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
- int reset = 0;
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
data->ds_clp, &reset) == -EAGAIN) {
- dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
- __func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset)
+
+ dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
+ reset, data->ds_clp, data->ds_clp->cl_session);
+
+ if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_write(task, data);
+ if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ filelayout_mark_devid_invalid(devid);
+ }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -289,16 +314,22 @@ static void prepare_to_resend_writes(struct nfs_write_data *data)
static int filelayout_commit_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
- int reset = 0;
+ struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
data->ds_clp, &reset) == -EAGAIN) {
- dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
- __func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset)
+
+ dprintk("%s reset 0x%lx ds_clp %p session %p\n", __func__,
+ reset, data->ds_clp, data->ds_clp->cl_session);
+
+ if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
prepare_to_resend_writes(data);
- else
+ if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ filelayout_mark_devid_invalid(devid);
+ } else {
rpc_restart_call_prepare(task);
+ }
return -EAGAIN;
}

diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index b54b389..08b667a 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -41,6 +41,12 @@
#define NFS4_PNFS_MAX_STRIPE_CNT 4096
#define NFS4_PNFS_MAX_MULTI_CNT 256 /* 256 fit into a u8 stripe_index */

+/* internal use */
+enum nfs4_fl_reset_state {
+ NFS4_RESET_TO_MDS = 0,
+ NFS4_RESET_DEVICEID,
+};
+
enum stripetype4 {
STRIPE_SPARSE = 1,
STRIPE_DENSE = 2
--
1.7.6.4


2012-03-15 18:40:58

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 03/11] NFSv4.1 cleanup filelayout invalid layout handling

From: Andy Adamson <[email protected]>

The invalid layout bits are should only be used to block LAYOUTGETs.

Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 26 ++++++--------------------
1 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index e1c0670..b0b775c 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -187,10 +187,8 @@ static int filelayout_read_done_cb(struct rpc_task *task,
data->ds_clp, &reset) == -EAGAIN) {
dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
__func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset) {
- pnfs_set_lo_fail(data->lseg);
+ if (reset)
filelayout_reset_read(task, data);
- }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -267,10 +265,8 @@ static int filelayout_write_done_cb(struct rpc_task *task,
data->ds_clp, &reset) == -EAGAIN) {
dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
__func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset) {
- pnfs_set_lo_fail(data->lseg);
+ if (reset)
filelayout_reset_write(task, data);
- }
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -299,10 +295,9 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
data->ds_clp, &reset) == -EAGAIN) {
dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
__func__, data->ds_clp, data->ds_clp->cl_session);
- if (reset) {
+ if (reset)
prepare_to_resend_writes(data);
- pnfs_set_lo_fail(data->lseg);
- } else
+ else
rpc_restart_call_prepare(task);
return -EAGAIN;
}
@@ -393,12 +388,8 @@ filelayout_read_pagelist(struct nfs_read_data *data)
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
- if (!ds) {
- /* Either layout fh index faulty, or ds connect failed */
- set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
- set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+ if (!ds)
return PNFS_NOT_ATTEMPTED;
- }
dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);

/* No multipath support. Use first DS */
@@ -432,11 +423,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
j = nfs4_fl_calc_j_index(lseg, offset);
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
- if (!ds) {
- set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
- set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+ if (!ds)
return PNFS_NOT_ATTEMPTED;
- }
dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
data->inode->i_ino, sync, (size_t) data->args.count, offset,
ds->ds_remotestr);
@@ -890,8 +878,6 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
- set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
prepare_to_resend_writes(data);
data->mds_ops->rpc_release(data);
return -EAGAIN;
--
1.7.6.4


2012-03-16 00:10:45

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq

T24gVGh1LCAyMDEyLTAzLTE1IGF0IDE0OjQwIC0wNDAwLCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90
ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gDQo+IFNpZ25l
ZC1vZmYtYnk6IEFuZHkgQWRhbXNvbiA8YW5kcm9zQG5ldGFwcC5jb20+DQo+IC0tLQ0KPiAgaW5j
bHVkZS9saW51eC9zdW5ycGMvc2NoZWQuaCB8ICAgIDEgKw0KPiAgbmV0L3N1bnJwYy9zY2hlZC5j
ICAgICAgICAgICB8ICAgMjcgKysrKysrKysrKysrKysrKysrKysrKysrKysrDQo+ICAyIGZpbGVz
IGNoYW5nZWQsIDI4IGluc2VydGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0t
Z2l0IGEvaW5jbHVkZS9saW51eC9zdW5ycGMvc2NoZWQuaCBiL2luY2x1ZGUvbGludXgvc3VucnBj
L3NjaGVkLmgNCj4gaW5kZXggZGMwYzNjYy4uZmNlMDg3MyAxMDA2NDQNCj4gLS0tIGEvaW5jbHVk
ZS9saW51eC9zdW5ycGMvc2NoZWQuaA0KPiArKysgYi9pbmNsdWRlL2xpbnV4L3N1bnJwYy9zY2hl
ZC5oDQo+IEBAIC0yMzUsNiArMjM1LDcgQEAgdm9pZAkJcnBjX3NsZWVwX29uX3ByaW9yaXR5KHN0
cnVjdCBycGNfd2FpdF9xdWV1ZSAqLA0KPiAgdm9pZAkJcnBjX3dha2VfdXBfcXVldWVkX3Rhc2so
c3RydWN0IHJwY193YWl0X3F1ZXVlICosDQo+ICAJCQkJCXN0cnVjdCBycGNfdGFzayAqKTsNCj4g
IHZvaWQJCXJwY193YWtlX3VwKHN0cnVjdCBycGNfd2FpdF9xdWV1ZSAqKTsNCj4gK3ZvaWQJCXJw
Y19kcmFpbl9xdWV1ZShzdHJ1Y3QgcnBjX3dhaXRfcXVldWUgKik7DQo+ICBzdHJ1Y3QgcnBjX3Rh
c2sgKnJwY193YWtlX3VwX25leHQoc3RydWN0IHJwY193YWl0X3F1ZXVlICopOw0KPiAgc3RydWN0
IHJwY190YXNrICpycGNfd2FrZV91cF9maXJzdChzdHJ1Y3QgcnBjX3dhaXRfcXVldWUgKiwNCj4g
IAkJCQkJYm9vbCAoKikoc3RydWN0IHJwY190YXNrICosIHZvaWQgKiksDQo+IGRpZmYgLS1naXQg
YS9uZXQvc3VucnBjL3NjaGVkLmMgYi9uZXQvc3VucnBjL3NjaGVkLmMNCj4gaW5kZXggMWM1NzBh
OC4uMTE5MjhmZiAxMDA2NDQNCj4gLS0tIGEvbmV0L3N1bnJwYy9zY2hlZC5jDQo+ICsrKyBiL25l
dC9zdW5ycGMvc2NoZWQuYw0KPiBAQCAtNTUxLDYgKzU1MSwzMyBAQCB2b2lkIHJwY193YWtlX3Vw
KHN0cnVjdCBycGNfd2FpdF9xdWV1ZSAqcXVldWUpDQo+ICBFWFBPUlRfU1lNQk9MX0dQTChycGNf
d2FrZV91cCk7DQo+ICANCj4gIC8qKg0KPiArICogcnBjX2RyYWluX3F1ZXVlIC0gZW1wdHkgdGhl
IHF1ZXVlIGFuZCB3YWtlIHVwIGFsbCBycGNfdGFza3MNCj4gKyAqIEBxdWV1ZTogcnBjX3dhaXRf
cXVldWUgb24gd2hpY2ggdGhlIHRhc2tzIGFyZSBzbGVlcGluZw0KPiArICoNCj4gKyAqIEdyYWJz
IHF1ZXVlLT5sb2NrDQo+ICsgKi8NCj4gK3ZvaWQgcnBjX2RyYWluX3F1ZXVlKHN0cnVjdCBycGNf
d2FpdF9xdWV1ZSAqcXVldWUpDQo+ICt7DQo+ICsJc3RydWN0IHJwY190YXNrICp0YXNrOw0KPiAr
CXN0cnVjdCBsaXN0X2hlYWQgKmhlYWQ7DQo+ICsNCj4gKwlzcGluX2xvY2tfYmgoJnF1ZXVlLT5s
b2NrKTsNCj4gKwloZWFkID0gJnF1ZXVlLT50YXNrc1txdWV1ZS0+bWF4cHJpb3JpdHldOw0KPiAr
CWZvciAoOzspIHsNCj4gKwkJd2hpbGUgKCFsaXN0X2VtcHR5KGhlYWQpKSB7DQo+ICsJCQl0YXNr
ID0gbGlzdF9lbnRyeShoZWFkLT5uZXh0LCBzdHJ1Y3QgcnBjX3Rhc2ssDQo+ICsJCQkJCSAgdS50
a193YWl0Lmxpc3QpOw0KPiArCQkJcnBjX3dha2VfdXBfdGFza19xdWV1ZV9sb2NrZWQocXVldWUs
IHRhc2spOw0KPiArCQl9DQo+ICsJCWlmIChoZWFkID09ICZxdWV1ZS0+dGFza3NbMF0pDQo+ICsJ
CQlicmVhazsNCj4gKwkJaGVhZC0tOw0KPiArCX0NCj4gKwlzcGluX3VubG9ja19iaCgmcXVldWUt
PmxvY2spOw0KPiArfQ0KPiArRVhQT1JUX1NZTUJPTF9HUEwocnBjX2RyYWluX3F1ZXVlKTsNCj4g
Kw0KDQpDb25mdXNlZC4uLiBIb3cgaXMgdGhpcyBmdW5jdGlvbiBhbnkgZGlmZmVyZW50IGZyb20g
cnBjX3dha2VfdXAoKT8NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0
YXBwLmNvbQ0KDQo=

2012-03-16 15:25:42

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq


On Mar 16, 2012, at 11:21 AM, Myklebust, Trond wrote:

> On Fri, 2012-03-16 at 11:13 -0400, Andy Adamson wrote:
>> On Thu, Mar 15, 2012 at 8:10 PM, Myklebust, Trond
>> <[email protected]> wrote:
>>> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>>>> From: Andy Adamson <[email protected]>
>>>>
>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>> ---
>>>> include/linux/sunrpc/sched.h | 1 +
>>>> net/sunrpc/sched.c | 27 +++++++++++++++++++++++++++
>>>> 2 files changed, 28 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>>>> index dc0c3cc..fce0873 100644
>>>> --- a/include/linux/sunrpc/sched.h
>>>> +++ b/include/linux/sunrpc/sched.h
>>>> @@ -235,6 +235,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *,
>>>> void rpc_wake_up_queued_task(struct rpc_wait_queue *,
>>>> struct rpc_task *);
>>>> void rpc_wake_up(struct rpc_wait_queue *);
>>>> +void rpc_drain_queue(struct rpc_wait_queue *);
>>>> struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
>>>> struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
>>>> bool (*)(struct rpc_task *, void *),
>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>> index 1c570a8..11928ff 100644
>>>> --- a/net/sunrpc/sched.c
>>>> +++ b/net/sunrpc/sched.c
>>>> @@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue)
>>>> EXPORT_SYMBOL_GPL(rpc_wake_up);
>>>>
>>>> /**
>>>> + * rpc_drain_queue - empty the queue and wake up all rpc_tasks
>>>> + * @queue: rpc_wait_queue on which the tasks are sleeping
>>>> + *
>>>> + * Grabs queue->lock
>>>> + */
>>>> +void rpc_drain_queue(struct rpc_wait_queue *queue)
>>>> +{
>>>> + struct rpc_task *task;
>>>> + struct list_head *head;
>>>> +
>>>> + spin_lock_bh(&queue->lock);
>>>> + head = &queue->tasks[queue->maxpriority];
>>>> + for (;;) {
>>>> + while (!list_empty(head)) {
>>>> + task = list_entry(head->next, struct rpc_task,
>>>> + u.tk_wait.list);
>>>> + rpc_wake_up_task_queue_locked(queue, task);
>>>> + }
>>>> + if (head == &queue->tasks[0])
>>>> + break;
>>>> + head--;
>>>> + }
>>>> + spin_unlock_bh(&queue->lock);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(rpc_drain_queue);
>>>> +
>>>
>>> Confused... How is this function any different from rpc_wake_up()?
>>
>> Because it actually drains the queues where rpc_wake_up does not. See
>> the attached output where I added the same printks to both
>> rpc_drain_queue and rpc_wake_up.
>
> So you are seeing a bug in rpc_wake_up()? I'm surprised; a bug of that
> magnitude should have caused a lot of hangs. Can you please look into
> what is happening.

OK. Didn't know it was a bug - just thought the comment was off...

-->Andy

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>


2012-03-16 15:12:25

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH Version 1 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup

On Thu, Mar 15, 2012 at 8:27 PM, Myklebust, Trond
<[email protected]> wrote:
> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Register a new filelayout DS rpc_action callback for sleeping on the fore
>> channel slot table waitq. ?Avoid any additional RPC FSM states
>> (such as timeout) when waking up to an invalid deviceid and reset
>> the task for io to the MDS.
>
> Why can't you simply put this call to filelayout_write_sleepon_cb in
> filelayout_write_prepare (before calling nfs41_setup_sequence())?

I guess I can. Will do.

-->Andy

>
> Since nothing is going to change the task->tk_action if
> nfs41_setup_sequence() puts you to sleep, what value does the callback
> add?

> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>

2012-03-19 21:49:26

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup

T24gTW9uLCAyMDEyLTAzLTE5IGF0IDIxOjMyICswMDAwLCBBZGFtc29uLCBBbmR5IHdyb3RlOg0K
PiBPbiBNYXIgMTYsIDIwMTIsIGF0IDExOjEyIEFNLCBBbmR5IEFkYW1zb24gd3JvdGU6DQo+IA0K
PiA+IE9uIFRodSwgTWFyIDE1LCAyMDEyIGF0IDg6MjcgUE0sIE15a2xlYnVzdCwgVHJvbmQNCj4g
PiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+PiBPbiBUaHUsIDIwMTIt
MDMtMTUgYXQgMTQ6NDAgLTA0MDAsIGFuZHJvc0BuZXRhcHAuY29tIHdyb3RlOg0KPiA+Pj4gRnJv
bTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gPj4+IA0KPiA+Pj4gUmVnaXN0
ZXIgYSBuZXcgZmlsZWxheW91dCBEUyBycGNfYWN0aW9uIGNhbGxiYWNrIGZvciBzbGVlcGluZyBv
biB0aGUgZm9yZQ0KPiA+Pj4gY2hhbm5lbCBzbG90IHRhYmxlIHdhaXRxLiAgQXZvaWQgYW55IGFk
ZGl0aW9uYWwgUlBDIEZTTSBzdGF0ZXMNCj4gPj4+IChzdWNoIGFzIHRpbWVvdXQpIHdoZW4gd2Fr
aW5nIHVwIHRvIGFuIGludmFsaWQgZGV2aWNlaWQgYW5kIHJlc2V0DQo+ID4+PiB0aGUgdGFzayBm
b3IgaW8gdG8gdGhlIE1EUy4NCj4gPj4gDQo+ID4+IFdoeSBjYW4ndCB5b3Ugc2ltcGx5IHB1dCB0
aGlzIGNhbGwgdG8gZmlsZWxheW91dF93cml0ZV9zbGVlcG9uX2NiIGluDQo+ID4+IGZpbGVsYXlv
dXRfd3JpdGVfcHJlcGFyZSAoYmVmb3JlIGNhbGxpbmcgbmZzNDFfc2V0dXBfc2VxdWVuY2UoKSk/
DQo+ID4gDQo+ID4gSSBndWVzcyBJIGNhbi4gV2lsbCBkby4NCj4gDQo+IEFjdHVhbGx5LCBJIHRo
YXQgd29uJ3Qgd29yay4NCj4gDQo+ID4gDQo+ID4gLS0+QW5keQ0KPiA+IA0KPiA+PiANCj4gPj4g
U2luY2Ugbm90aGluZyBpcyBnb2luZyB0byBjaGFuZ2UgdGhlIHRhc2stPnRrX2FjdGlvbiBpZg0K
PiA+PiBuZnM0MV9zZXR1cF9zZXF1ZW5jZSgpIHB1dHMgeW91IHRvIHNsZWVwLCB3aGF0IHZhbHVl
IGRvZXMgdGhlIGNhbGxiYWNrDQo+ID4+IGFkZD8NCj4gDQo+IFRoZSBycGNfYWN0aW9uIHJlbWFp
bnMgcnBjX2NhbGxfcHJlcGFyZS4gQnV0ISBXZSB3YW50IHRoZSB0YXNrcyBjb21pbmcgb2ZmIHRo
ZSBmYWlsZWQgRFMgZm9yZSBjaGFubmVsIHNsb3QgdGFibGUgcXVldWUgdG8gYmUgcmVkaXJlY3Rl
ZCB0byB0aGUgTURTIG5mc19YWFhfcHJlcGFyZSwgbm90IHRoZSBmaWxlbGF5b3V0X3h4eF9wcmVw
YXJlLg0KDQpTbz8gQ2FsbGluZyBycGNfcmVzdGFydF9jYWxsX3ByZXBhcmUoKSB3aWxsIGhhdmUg
ZXhhY3RseSB0aGUgc2FtZSBlZmZlY3QNCmlmIHlvdSBkbyBpdCB3aXRoaW4gZmlsZWxheW91dF94
eHhfcHJlcGFyZSBhcyBpdCB3aWxsIHdpdGhpbiBhIGNhbGxiYWNrLg0KDQo+IE5vdGUgdGhhdCBm
aWxlbGF5b3V0X3Jlc2V0X3JlYWQgYW5kIGZpbGVsYXlvdXRfcmVzZXRfd3JpdGUgc2V0IHRoZSBk
YXRhLT5kc19jbHAgdG8gTlVMTCB3aGljaCBtYWtlcyB0aGUgY2FsbCB0byBuZnM0MV9zZXR1cF9z
ZXF1ZW5jZSgpIGluIGZpbGVsYXlvdXRfd3JpdGUvcmVhZF9wcmVwYXJlIE9vcHPigKYuLg0KPiAN
Cj4gU28sIEknbGwga2VlcCB0aGlzIHBhdGNoIGFzIGlzLg0KDQpOQUNLLiBBZGRpbmcgYSBwYXJh
bWV0ZXIgdG8gdGhlIHNlcXVlbmNlIGFyZ3VtZW50cyBpcyB3YXkgdG9vIHVnbHkgYQ0KaGFjay4g
VGhpcyBoYXMgbm90aGluZyB0byBkbyB3aXRoIHRoZSBzZXF1ZW5jZSBvcC4NCg0KLS0gDQpUcm9u
ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25k
Lk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-03-19 16:44:09

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq


The reason rpc_wake_up() does not wake up all the tasks, is that it uses list_for_each_entry_safe which assigns the "next" task->u.tk_wait.lists pointer in it's for loop without taking into consideration the priority queue tasks hanging off the task->u.tk_wait.links list (note "list" vrs "link") assigned in the for loop body by the call to rpc_wake_up_task_queue_locked.

Say we have a wait queue with the following: One task in the list (T1) and two priority tasks in T1's "links" list (T2, T3).

H -> T1 -> H the next pointers in the "lists" list (T1->u.tk_wait.list) hanging off the list head "H"

T1 -> T2 -> T3->T1 the next pointers in the "links" list. (T1->u.tk_wait.links is the list head)

Here is what rpc_wake_up_task_queue_locked does (it calls __rpc_remove_wait_queue_priority on priority queues)

H -> T2 ->H

T2 -> T3 -> T2

with T1 removed. This is exactly what should happen, T1 is removed, T1's u.tk_wait.links list is spliced onto T2's u.tk_wait.links, and T2 is moved to H's list of u.tk_wait.list tasks.


#define list_for_each_entry_safe(pos, n, head, member) \
for (pos = list_entry((head)->next, typeof(*pos), member), \
n = list_entry(pos->member.next, typeof(*pos), member); \
&pos->member != (head); \
pos = n, n = list_entry(n->member.next, typeof(*n), member))


BUT! the for loop in list_for_each_entry_safe(task, next, head, u.tk_wait.list) does the following on the above example:

Start:

H-> T1 -> H
T1 -> T2 -> T3 -> T1

for loop Iniitializer:

task = list_entry((head)->next, typeof(*task), u.tk_wait.list):
next = list_entry(task->u.tk_wait.list.next, typeof(*task), u.tk_wait.list)

so task = T1, next = H.

for loop test:
task != H. This is FALSE

for loop body:
call rpc_wake_up_task_queue_locked

H -> T2 -> H
T2 ->.T3 -> T2

for loop increment step
task = next;
next = list_entry(next->u.tk_wait.list.next, typeof(*next), u.tk_wait.list);

so task = H, next = H.

for loop test

task != H This is TRUE

so stop.

Note that only T1 was processed - T2 and T3 are still on the queue.

So list_for_each_entry_safe will not process any u.tk_wait.links tasks, because the next pointer is assigned prior to the call to rpc_wake_up_task_queue_locked, so the for loop increment (or initialization) setting of:

task = next:

does NOT pick up the fact that (in our example) T1 was REPLACED by T2, not just removed.

On the other hand, the rpc_drain_queue() patch uses while(!list_empty()) instead of list_for_each_entry_safe() and happily processes all of the tasks on the queue:

H -> T1 -> H

lilst is not empty, so call rpc_wake_up_task_queue_locked.

H -> T2 -> H

list is not empty; so call rpc_wake_up _task_queue_locked.

H -> T3 -> H

list is not empty; so call rpc_wake_up_task_queue_locked

list is empty.


-->Andy

On Mar 16, 2012, at 11:25 AM, Adamson, Andy wrote:

>
> On Mar 16, 2012, at 11:21 AM, Myklebust, Trond wrote:
>
>> On Fri, 2012-03-16 at 11:13 -0400, Andy Adamson wrote:
>>> On Thu, Mar 15, 2012 at 8:10 PM, Myklebust, Trond
>>> <[email protected]> wrote:
>>>> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>>>>> From: Andy Adamson <[email protected]>
>>>>>
>>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>>> ---
>>>>> include/linux/sunrpc/sched.h | 1 +
>>>>> net/sunrpc/sched.c | 27 +++++++++++++++++++++++++++
>>>>> 2 files changed, 28 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>>>>> index dc0c3cc..fce0873 100644
>>>>> --- a/include/linux/sunrpc/sched.h
>>>>> +++ b/include/linux/sunrpc/sched.h
>>>>> @@ -235,6 +235,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *,
>>>>> void rpc_wake_up_queued_task(struct rpc_wait_queue *,
>>>>> struct rpc_task *);
>>>>> void rpc_wake_up(struct rpc_wait_queue *);
>>>>> +void rpc_drain_queue(struct rpc_wait_queue *);
>>>>> struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
>>>>> struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
>>>>> bool (*)(struct rpc_task *, void *),
>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>>> index 1c570a8..11928ff 100644
>>>>> --- a/net/sunrpc/sched.c
>>>>> +++ b/net/sunrpc/sched.c
>>>>> @@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue)
>>>>> EXPORT_SYMBOL_GPL(rpc_wake_up);
>>>>>
>>>>> /**
>>>>> + * rpc_drain_queue - empty the queue and wake up all rpc_tasks
>>>>> + * @queue: rpc_wait_queue on which the tasks are sleeping
>>>>> + *
>>>>> + * Grabs queue->lock
>>>>> + */
>>>>> +void rpc_drain_queue(struct rpc_wait_queue *queue)
>>>>> +{
>>>>> + struct rpc_task *task;
>>>>> + struct list_head *head;
>>>>> +
>>>>> + spin_lock_bh(&queue->lock);
>>>>> + head = &queue->tasks[queue->maxpriority];
>>>>> + for (;;) {
>>>>> + while (!list_empty(head)) {
>>>>> + task = list_entry(head->next, struct rpc_task,
>>>>> + u.tk_wait.list);
>>>>> + rpc_wake_up_task_queue_locked(queue, task);
>>>>> + }
>>>>> + if (head == &queue->tasks[0])
>>>>> + break;
>>>>> + head--;
>>>>> + }
>>>>> + spin_unlock_bh(&queue->lock);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(rpc_drain_queue);
>>>>> +
>>>>
>>>> Confused... How is this function any different from rpc_wake_up()?
>>>
>>> Because it actually drains the queues where rpc_wake_up does not. See
>>> the attached output where I added the same printks to both
>>> rpc_drain_queue and rpc_wake_up.
>>
>> So you are seeing a bug in rpc_wake_up()? I'm surprised; a bug of that
>> magnitude should have caused a lot of hangs. Can you please look into
>> what is happening.
>
> OK. Didn't know it was a bug - just thought the comment was off...
>
> -->Andy
>
>>
>> --
>> Trond Myklebust
>> Linux NFS client maintainer
>>
>> NetApp
>> [email protected]
>> http://www.netapp.com
>>
>


2012-03-16 00:36:10

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 11/11] NFSv4.1 de reference a disconnected data server client record

T24gVGh1LCAyMDEyLTAzLTE1IGF0IDE0OjQwIC0wNDAwLCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90
ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gDQo+IFdoZW4g
dGhlIGxhc3QgRFMgaW8gaXMgcHJvY2Vzc2VkLCB0aGUgZGF0YSBzZXJ2ZXIgY2xpZW50IHJlY29y
ZCB3aWxsIGJlDQo+IGZyZWVkLg0KPiANCj4gU2lnbmVkLW9mZi1ieTogQW5keSBBZGFtc29uIDxh
bmRyb3NAbmV0YXBwLmNvbT4NCj4gLS0tDQo+ICBmcy9uZnMvbmZzNGZpbGVsYXlvdXQuYyAgICB8
ICAgIDQgKysrKw0KPiAgZnMvbmZzL25mczRmaWxlbGF5b3V0LmggICAgfCAgICAxICsNCj4gIGZz
L25mcy9uZnM0ZmlsZWxheW91dGRldi5jIHwgICAxNyArKysrKysrKysrKysrKysrKw0KPiAgMyBm
aWxlcyBjaGFuZ2VkLCAyMiBpbnNlcnRpb25zKCspLCAwIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlm
ZiAtLWdpdCBhL2ZzL25mcy9uZnM0ZmlsZWxheW91dC5jIGIvZnMvbmZzL25mczRmaWxlbGF5b3V0
LmMNCj4gaW5kZXggNGM4NDZjYi4uMjhiNzFiZiAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL25mczRm
aWxlbGF5b3V0LmMNCj4gKysrIGIvZnMvbmZzL25mczRmaWxlbGF5b3V0LmMNCj4gQEAgLTIwOSwx
MCArMjA5LDEyIEBAIHN0YXRpYyBpbnQgZmlsZWxheW91dF9yZWFkX2RvbmVfY2Ioc3RydWN0IHJw
Y190YXNrICp0YXNrLA0KPiAgCQkJcmVzZXQsIGRhdGEtPmRzX2NscCwgZGF0YS0+ZHNfY2xwLT5j
bF9zZXNzaW9uKTsNCj4gIA0KPiAgCQlpZiAodGVzdF9iaXQoTkZTNF9SRVNFVF9UT19NRFMsICZy
ZXNldCkpIHsNCj4gKwkJCXN0cnVjdCBuZnNfY2xpZW50ICpjbHAgPSBkYXRhLT5kc19jbHA7DQo+
ICAJCQlmaWxlbGF5b3V0X3Jlc2V0X3JlYWQodGFzaywgZGF0YSk7DQo+ICAJCQlpZiAodGVzdF9i
aXQoTkZTNF9SRVNFVF9ERVZJQ0VJRCwgJnJlc2V0KSkgew0KPiAgCQkJCWZpbGVsYXlvdXRfbWFy
a19kZXZpZF9pbnZhbGlkKGRldmlkKTsNCj4gIAkJCQlycGNfZHJhaW5fcXVldWUoJnRibC0+c2xv
dF90Ymxfd2FpdHEpOw0KPiArCQkJCW5mczRfZHNfZGlzY29ubmVjdChjbHApOw0KPiAgCQkJfQ0K
PiAgCQl9DQo+ICAJCXJwY19yZXN0YXJ0X2NhbGxfcHJlcGFyZSh0YXNrKTsNCj4gQEAgLTMxMCwx
MCArMzEyLDEyIEBAIHN0YXRpYyBpbnQgZmlsZWxheW91dF93cml0ZV9kb25lX2NiKHN0cnVjdCBy
cGNfdGFzayAqdGFzaywNCj4gIAkJCXJlc2V0LCBkYXRhLT5kc19jbHAsIGRhdGEtPmRzX2NscC0+
Y2xfc2Vzc2lvbik7DQo+ICANCj4gIAkJaWYgKHRlc3RfYml0KE5GUzRfUkVTRVRfVE9fTURTLCAm
cmVzZXQpKSB7DQo+ICsJCQlzdHJ1Y3QgbmZzX2NsaWVudCAqY2xwID0gZGF0YS0+ZHNfY2xwOw0K
PiAgCQkJZmlsZWxheW91dF9yZXNldF93cml0ZSh0YXNrLCBkYXRhKTsNCj4gIAkJCWlmICh0ZXN0
X2JpdChORlM0X1JFU0VUX0RFVklDRUlELCAmcmVzZXQpKSB7DQo+ICAJCQkJZmlsZWxheW91dF9t
YXJrX2RldmlkX2ludmFsaWQoZGV2aWQpOw0KPiAgCQkJCXJwY19kcmFpbl9xdWV1ZSgmdGJsLT5z
bG90X3RibF93YWl0cSk7DQo+ICsJCQkJbmZzNF9kc19kaXNjb25uZWN0KGNscCk7DQoNCldoYXQg
Z3VhcmFudGVlcyBhdCB0aGlzIHBvaW50IHRoYXQgYWxsIHRoZSBSUEMgY2FsbHMgaGF2ZSB0ZXJt
aW5hdGVkPw0KDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50
YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5j
b20NCg0K

2012-03-19 22:12:25

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH Version 1 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup


On Mar 19, 2012, at 5:49 PM, Myklebust, Trond wrote:

> On Mon, 2012-03-19 at 21:32 +0000, Adamson, Andy wrote:
>> On Mar 16, 2012, at 11:12 AM, Andy Adamson wrote:
>>
>>> On Thu, Mar 15, 2012 at 8:27 PM, Myklebust, Trond
>>> <[email protected]> wrote:
>>>> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>>>>> From: Andy Adamson <[email protected]>
>>>>>
>>>>> Register a new filelayout DS rpc_action callback for sleeping on the fore
>>>>> channel slot table waitq. Avoid any additional RPC FSM states
>>>>> (such as timeout) when waking up to an invalid deviceid and reset
>>>>> the task for io to the MDS.
>>>>
>>>> Why can't you simply put this call to filelayout_write_sleepon_cb in
>>>> filelayout_write_prepare (before calling nfs41_setup_sequence())?
>>>
>>> I guess I can. Will do.
>>
>> Actually, I that won't work.
>>
>>>
>>> -->Andy
>>>
>>>>
>>>> Since nothing is going to change the task->tk_action if
>>>> nfs41_setup_sequence() puts you to sleep, what value does the callback
>>>> add?
>>
>> The rpc_action remains rpc_call_prepare. But! We want the tasks coming off the failed DS fore channel slot table queue to be redirected to the MDS nfs_XXX_prepare, not the filelayout_xxx_prepare.
>
> So? Calling rpc_restart_call_prepare() will have exactly the same effect
> if you do it within filelayout_xxx_prepare as it will within a callback.

Sigh. For some reason I removed the call to rpc_restart_call_prepare.

Got it.

-->Andy
>
>> Note that filelayout_reset_read and filelayout_reset_write set the data->ds_clp to NULL which makes the call to nfs41_setup_sequence() in filelayout_write/read_prepare Oops?..
>>
>> So, I'll keep this patch as is.
>
> NACK. Adding a parameter to the sequence arguments is way too ugly a
> hack. This has nothing to do with the sequence op.
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>


2012-03-15 18:41:04

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 09/11] NFSv4.1 wake up all tasks on un-connected DS slot table waitq

From: Andy Adamson <[email protected]>

The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 8531161..a67a137 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -193,6 +193,7 @@ static int filelayout_read_done_cb(struct rpc_task *task,
struct nfs_read_data *data)
{
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ struct nfs4_slot_table *tbl = &data->ds_clp->cl_session->fc_slot_table;
unsigned long reset = 0;

dprintk("%s DS read\n", __func__);
@@ -205,8 +206,10 @@ static int filelayout_read_done_cb(struct rpc_task *task,

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_read(task, data);
- if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
+ rpc_drain_queue(&tbl->slot_tbl_waitq);
+ }
}
rpc_restart_call_prepare(task);
return -EAGAIN;
@@ -292,6 +295,7 @@ static int filelayout_write_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ struct nfs4_slot_table *tbl = &data->ds_clp->cl_session->fc_slot_table;
unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
@@ -302,8 +306,10 @@ static int filelayout_write_done_cb(struct rpc_task *task,

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
filelayout_reset_write(task, data);
- if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
+ rpc_drain_queue(&tbl->slot_tbl_waitq);
+ }
}
rpc_restart_call_prepare(task);
return -EAGAIN;
@@ -328,6 +334,7 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
struct nfs_write_data *data)
{
struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+ struct nfs4_slot_table *tbl = &data->ds_clp->cl_session->fc_slot_table;
unsigned long reset = 0;

if (filelayout_async_handle_error(task, data->args.context->state,
@@ -338,8 +345,10 @@ static int filelayout_commit_done_cb(struct rpc_task *task,

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
prepare_to_resend_writes(data);
- if (test_bit(NFS4_RESET_DEVICEID, &reset))
+ if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
+ rpc_drain_queue(&tbl->slot_tbl_waitq);
+ }
} else {
rpc_restart_call_prepare(task);
}
--
1.7.6.4


2012-03-15 18:41:04

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 11/11] NFSv4.1 de reference a disconnected data server client record

From: Andy Adamson <[email protected]>

When the last DS io is processed, the data server client record will be
freed.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 4 ++++
fs/nfs/nfs4filelayout.h | 1 +
fs/nfs/nfs4filelayoutdev.c | 17 +++++++++++++++++
3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 4c846cb..28b71bf 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -209,10 +209,12 @@ static int filelayout_read_done_cb(struct rpc_task *task,
reset, data->ds_clp, data->ds_clp->cl_session);

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
+ struct nfs_client *clp = data->ds_clp;
filelayout_reset_read(task, data);
if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
rpc_drain_queue(&tbl->slot_tbl_waitq);
+ nfs4_ds_disconnect(clp);
}
}
rpc_restart_call_prepare(task);
@@ -310,10 +312,12 @@ static int filelayout_write_done_cb(struct rpc_task *task,
reset, data->ds_clp, data->ds_clp->cl_session);

if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
+ struct nfs_client *clp = data->ds_clp;
filelayout_reset_write(task, data);
if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
filelayout_mark_devid_invalid(devid);
rpc_drain_queue(&tbl->slot_tbl_waitq);
+ nfs4_ds_disconnect(clp);
}
}
rpc_restart_call_prepare(task);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 08b667a..3abf7d9 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -138,5 +138,6 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
struct nfs4_file_layout_dsaddr *
get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
+void nfs4_ds_disconnect(struct nfs_client *clp);

#endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 2b8ae96..025275f 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -145,6 +145,23 @@ _data_server_lookup_locked(const struct list_head *dsaddrs)
}

/*
+ * Lookup DS by nfs_client pointer. Zero data server client pointer
+ */
+void nfs4_ds_disconnect(struct nfs_client *clp)
+{
+ struct nfs4_pnfs_ds *ds;
+
+ dprintk("%s clp %p\n", __func__, clp);
+ spin_lock(&nfs4_ds_cache_lock);
+ list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
+ if (ds->ds_clp && ds->ds_clp == clp) {
+ nfs_put_client(clp);
+ ds->ds_clp = NULL;
+ }
+ spin_unlock(&nfs4_ds_cache_lock);
+}
+
+/*
* Create an rpc connection to the nfs4_pnfs_ds data server
* Currently only supports IPv4 and IPv6 addresses
*/
--
1.7.6.4


2012-03-16 00:18:14

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid

SG93IGRvZXMgc2VuZGluZyB0aGUgY29tbWl0IHRvIHRoZSBNRFMgaGVscCBpZiB0aGlzIGlzIGEg
Y29tbWl0LXRvLURTDQpzZXR1cD8gQUZBSUNTIHdlIG5lZWQgdG8gcmVzZW5kIGFsbCB0aGUgd3Jp
dGVzIHRocm91Z2ggdGhlIE1EUy4NCg0KT24gVGh1LCAyMDEyLTAzLTE1IGF0IDE0OjQwIC0wNDAw
LCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NA
bmV0YXBwLmNvbT4NCj4gDQo+IFNpZ25lZC1vZmYtYnk6IEFuZHkgQWRhbXNvbiA8YW5kcm9zQG5l
dGFwcC5jb20+DQo+IC0tLQ0KPiAgZnMvbmZzL25mczRmaWxlbGF5b3V0LmMgfCAgIDExICsrKysr
KysrKystDQo+ICAxIGZpbGVzIGNoYW5nZWQsIDEwIGluc2VydGlvbnMoKyksIDEgZGVsZXRpb25z
KC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczRmaWxlbGF5b3V0LmMgYi9mcy9uZnMv
bmZzNGZpbGVsYXlvdXQuYw0KPiBpbmRleCAyNTI4Y2IyLi4yNmM4M2IwIDEwMDY0NA0KPiAtLS0g
YS9mcy9uZnMvbmZzNGZpbGVsYXlvdXQuYw0KPiArKysgYi9mcy9uZnMvbmZzNGZpbGVsYXlvdXQu
Yw0KPiBAQCAtODQ3LDEyICs4NDcsMTYgQEAgZmlsZWxheW91dF9jaG9vc2VfY29tbWl0X2xpc3Qo
c3RydWN0IG5mc19wYWdlICpyZXEsDQo+ICAJCQkgICAgICBzdHJ1Y3QgcG5mc19sYXlvdXRfc2Vn
bWVudCAqbHNlZykNCj4gIHsNCj4gIAlzdHJ1Y3QgbmZzNF9maWxlbGF5b3V0X3NlZ21lbnQgKmZs
ID0gRklMRUxBWU9VVF9MU0VHKGxzZWcpOw0KPiArCXN0cnVjdCBuZnM0X2RldmljZWlkX25vZGUg
KmRldmlkID0gRklMRUxBWU9VVF9ERVZJRF9OT0RFKGxzZWcpOw0KPiAgCXUzMiBpLCBqOw0KPiAg
CXN0cnVjdCBsaXN0X2hlYWQgKmxpc3Q7DQo+ICANCj4gIAlpZiAoZmwtPmNvbW1pdF90aHJvdWdo
X21kcykNCj4gIAkJcmV0dXJuICZORlNfSShyZXEtPndiX2NvbnRleHQtPmRlbnRyeS0+ZF9pbm9k
ZSktPmNvbW1pdF9saXN0Ow0KPiAgDQo+ICsJaWYgKGZpbGVsYXlvdXRfdGVzdF9kZXZpZF9pbnZh
bGlkKGRldmlkKSkNCj4gKwkJcmV0dXJuIE5VTEw7IC8qIFJlc2VuZCBJL08gKHdyaXRlcyBhbmQg
Y29tbWl0cykgdG8gTURTICovDQo+ICsNCj4gIAkvKiBOb3RlIHRoYXQgd2UgYXJlIGNhbGxpbmcg
bmZzNF9mbF9jYWxjX2pfaW5kZXggb24gZWFjaCBwYWdlDQo+ICAJICogdGhhdCBlbmRzIHVwIGJl
aW5nIGNvbW1pdHRlZCB0byBhIGRhdGEgc2VydmVyLiAgQW4gYXR0cmFjdGl2ZQ0KPiAgCSAqIGFs
dGVybmF0aXZlIGlzIHRvIGFkZCBhIGZpZWxkIHRvIG5mc193cml0ZV9kYXRhIGFuZCBuZnNfcGFn
ZQ0KPiBAQCAtOTMzLDkgKzkzNywxNCBAQCBmaW5kX29ubHlfd3JpdGVfbHNlZ19sb2NrZWQoc3Ry
dWN0IGlub2RlICppbm9kZSkNCj4gIHsNCj4gIAlzdHJ1Y3QgcG5mc19sYXlvdXRfc2VnbWVudCAq
bHNlZzsNCj4gIA0KPiAtCWxpc3RfZm9yX2VhY2hfZW50cnkobHNlZywgJk5GU19JKGlub2RlKS0+
bGF5b3V0LT5wbGhfc2VncywgcGxzX2xpc3QpDQo+ICsJbGlzdF9mb3JfZWFjaF9lbnRyeShsc2Vn
LCAmTkZTX0koaW5vZGUpLT5sYXlvdXQtPnBsaF9zZWdzLCBwbHNfbGlzdCkgew0KPiArCQlzdHJ1
Y3QgbmZzNF9kZXZpY2VpZF9ub2RlICpkZXZpZCA9IEZJTEVMQVlPVVRfREVWSURfTk9ERShsc2Vn
KTsNCj4gKwkJaWYgKGZpbGVsYXlvdXRfdGVzdF9kZXZpZF9pbnZhbGlkKGRldmlkKSkNCj4gKwkJ
CS8qIFJlc2VuZCBJL08gKHdyaXRlcyBhbmQgY29tbWl0cykgdG8gTURTICovDQo+ICsJCQlyZXR1
cm4gTlVMTDsNCj4gIAkJaWYgKGxzZWctPnBsc19yYW5nZS5pb21vZGUgPT0gSU9NT0RFX1JXKQ0K
PiAgCQkJcmV0dXJuIGdldF9sc2VnKGxzZWcpOw0KPiArCX0NCj4gIAlyZXR1cm4gTlVMTDsNCj4g
IH0NCj4gIA0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFp
bmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29t
DQoNCg==

2012-03-19 16:49:35

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq


On Mar 19, 2012, at 12:44 PM, Andy Adamson wrote:

>
> The reason rpc_wake_up() does not wake up all the tasks, is that it uses list_for_each_entry_safe which assigns the "next" task->u.tk_wait.lists pointer in it's for loop without taking into consideration the priority queue tasks hanging off the task->u.tk_wait.links list (note "list" vrs "link") assigned in the for loop body by the call to rpc_wake_up_task_queue_locked.
>
> Say we have a wait queue with the following: One task in the list (T1) and two priority tasks in T1's "links" list (T2, T3).
>
> H -> T1 -> H the next pointers in the "lists" list (T1->u.tk_wait.list) hanging off the list head "H"
>
> T1 -> T2 -> T3->T1 the next pointers in the "links" list. (T1->u.tk_wait.links is the list head)
>
> Here is what rpc_wake_up_task_queue_locked does (it calls __rpc_remove_wait_queue_priority on priority queues)
>
> H -> T2 ->H
>
> T2 -> T3 -> T2
>
> with T1 removed. This is exactly what should happen, T1 is removed, T1's u.tk_wait.links list is spliced onto T2's u.tk_wait.links, and T2 is moved to H's list of u.tk_wait.list tasks.
>
>
> #define list_for_each_entry_safe(pos, n, head, member) \
> for (pos = list_entry((head)->next, typeof(*pos), member), \
> n = list_entry(pos->member.next, typeof(*pos), member); \
> &pos->member != (head); \
> pos = n, n = list_entry(n->member.next, typeof(*n), member))
>
>
> BUT! the for loop in list_for_each_entry_safe(task, next, head, u.tk_wait.list) does the following on the above example:
>
> Start:
>
> H-> T1 -> H
> T1 -> T2 -> T3 -> T1
>
> for loop Iniitializer:
>
> task = list_entry((head)->next, typeof(*task), u.tk_wait.list):
> next = list_entry(task->u.tk_wait.list.next, typeof(*task), u.tk_wait.list)
>
> so task = T1, next = H.
>
> for loop test:
> task != H. This is FALSE

oops - got my TRUE/FALSE mixed ;)
this is TRUE so execute body

>
> for loop body:
> call rpc_wake_up_task_queue_locked
>
> H -> T2 -> H
> T2 ->.T3 -> T2
>
> for loop increment step
> task = next;
> next = list_entry(next->u.tk_wait.list.next, typeof(*next), u.tk_wait.list);
>
> so task = H, next = H.
>
> for loop test
>
> task != H This is TRUE

and this is FALSE do don't execute body

-->Andy
>
> so stop.
>
> Note that only T1 was processed - T2 and T3 are still on the queue.
>
> So list_for_each_entry_safe will not process any u.tk_wait.links tasks, because the next pointer is assigned prior to the call to rpc_wake_up_task_queue_locked, so the for loop increment (or initialization) setting of:
>
> task = next:
>
> does NOT pick up the fact that (in our example) T1 was REPLACED by T2, not just removed.
>
> On the other hand, the rpc_drain_queue() patch uses while(!list_empty()) instead of list_for_each_entry_safe() and happily processes all of the tasks on the queue:
>
> H -> T1 -> H
>
> lilst is not empty, so call rpc_wake_up_task_queue_locked.
>
> H -> T2 -> H
>
> list is not empty; so call rpc_wake_up _task_queue_locked.
>
> H -> T3 -> H
>
> list is not empty; so call rpc_wake_up_task_queue_locked
>
> list is empty.
>
>
> -->Andy
>
> On Mar 16, 2012, at 11:25 AM, Adamson, Andy wrote:
>
>>
>> On Mar 16, 2012, at 11:21 AM, Myklebust, Trond wrote:
>>
>>> On Fri, 2012-03-16 at 11:13 -0400, Andy Adamson wrote:
>>>> On Thu, Mar 15, 2012 at 8:10 PM, Myklebust, Trond
>>>> <[email protected]> wrote:
>>>>> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>>>>>> From: Andy Adamson <[email protected]>
>>>>>>
>>>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>>>> ---
>>>>>> include/linux/sunrpc/sched.h | 1 +
>>>>>> net/sunrpc/sched.c | 27 +++++++++++++++++++++++++++
>>>>>> 2 files changed, 28 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>>>>>> index dc0c3cc..fce0873 100644
>>>>>> --- a/include/linux/sunrpc/sched.h
>>>>>> +++ b/include/linux/sunrpc/sched.h
>>>>>> @@ -235,6 +235,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *,
>>>>>> void rpc_wake_up_queued_task(struct rpc_wait_queue *,
>>>>>> struct rpc_task *);
>>>>>> void rpc_wake_up(struct rpc_wait_queue *);
>>>>>> +void rpc_drain_queue(struct rpc_wait_queue *);
>>>>>> struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
>>>>>> struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
>>>>>> bool (*)(struct rpc_task *, void *),
>>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>>>> index 1c570a8..11928ff 100644
>>>>>> --- a/net/sunrpc/sched.c
>>>>>> +++ b/net/sunrpc/sched.c
>>>>>> @@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue)
>>>>>> EXPORT_SYMBOL_GPL(rpc_wake_up);
>>>>>>
>>>>>> /**
>>>>>> + * rpc_drain_queue - empty the queue and wake up all rpc_tasks
>>>>>> + * @queue: rpc_wait_queue on which the tasks are sleeping
>>>>>> + *
>>>>>> + * Grabs queue->lock
>>>>>> + */
>>>>>> +void rpc_drain_queue(struct rpc_wait_queue *queue)
>>>>>> +{
>>>>>> + struct rpc_task *task;
>>>>>> + struct list_head *head;
>>>>>> +
>>>>>> + spin_lock_bh(&queue->lock);
>>>>>> + head = &queue->tasks[queue->maxpriority];
>>>>>> + for (;;) {
>>>>>> + while (!list_empty(head)) {
>>>>>> + task = list_entry(head->next, struct rpc_task,
>>>>>> + u.tk_wait.list);
>>>>>> + rpc_wake_up_task_queue_locked(queue, task);
>>>>>> + }
>>>>>> + if (head == &queue->tasks[0])
>>>>>> + break;
>>>>>> + head--;
>>>>>> + }
>>>>>> + spin_unlock_bh(&queue->lock);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(rpc_drain_queue);
>>>>>> +
>>>>>
>>>>> Confused... How is this function any different from rpc_wake_up()?
>>>>
>>>> Because it actually drains the queues where rpc_wake_up does not. See
>>>> the attached output where I added the same printks to both
>>>> rpc_drain_queue and rpc_wake_up.
>>>
>>> So you are seeing a bug in rpc_wake_up()? I'm surprised; a bug of that
>>> magnitude should have caused a lot of hangs. Can you please look into
>>> what is happening.
>>
>> OK. Didn't know it was a bug - just thought the comment was off...
>>
>> -->Andy
>>
>>>
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer
>>>
>>> NetApp
>>> [email protected]
>>> http://www.netapp.com
>>>
>>
>


2012-03-16 15:05:18

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH Version 1 11/11] NFSv4.1 de reference a disconnected data server client record

On Thu, Mar 15, 2012 at 8:35 PM, Myklebust, Trond
<[email protected]> wrote:
> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> When the last DS io is processed, the data server client record will be
>> freed.
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> ?fs/nfs/nfs4filelayout.c ? ?| ? ?4 ++++
>> ?fs/nfs/nfs4filelayout.h ? ?| ? ?1 +
>> ?fs/nfs/nfs4filelayoutdev.c | ? 17 +++++++++++++++++
>> ?3 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 4c846cb..28b71bf 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -209,10 +209,12 @@ static int filelayout_read_done_cb(struct rpc_task *task,
>> ? ? ? ? ? ? ? ? ? ? ? reset, data->ds_clp, data->ds_clp->cl_session);
>>
>> ? ? ? ? ? ? ? if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
>> + ? ? ? ? ? ? ? ? ? ? struct nfs_client *clp = data->ds_clp;
>> ? ? ? ? ? ? ? ? ? ? ? filelayout_reset_read(task, data);
>> ? ? ? ? ? ? ? ? ? ? ? if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? filelayout_mark_devid_invalid(devid);
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rpc_drain_queue(&tbl->slot_tbl_waitq);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? nfs4_ds_disconnect(clp);
>> ? ? ? ? ? ? ? ? ? ? ? }
>> ? ? ? ? ? ? ? }
>> ? ? ? ? ? ? ? rpc_restart_call_prepare(task);
>> @@ -310,10 +312,12 @@ static int filelayout_write_done_cb(struct rpc_task *task,
>> ? ? ? ? ? ? ? ? ? ? ? reset, data->ds_clp, data->ds_clp->cl_session);
>>
>> ? ? ? ? ? ? ? if (test_bit(NFS4_RESET_TO_MDS, &reset)) {
>> + ? ? ? ? ? ? ? ? ? ? struct nfs_client *clp = data->ds_clp;
>> ? ? ? ? ? ? ? ? ? ? ? filelayout_reset_write(task, data);
>> ? ? ? ? ? ? ? ? ? ? ? if (test_bit(NFS4_RESET_DEVICEID, &reset)) {
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? filelayout_mark_devid_invalid(devid);
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rpc_drain_queue(&tbl->slot_tbl_waitq);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? nfs4_ds_disconnect(clp);
>
> What guarantees at this point that all the RPC calls have terminated?

That is not the intention of the call to nfs4_ds_disconnect.

What is guaranteed is:

1) there are no more new calls using the DS session due to the invalid deviceid.
2) all calls (including outstanding calls) have a reference to the DS nfs_client
3) once the last outstanding call has been processed and it's
nfs_put_client has been called, because of the nfs_put_client call in
nfs4_ds_disconnect, the DS nfs_client is freed.


>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>

2012-03-16 15:21:59

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq

T24gRnJpLCAyMDEyLTAzLTE2IGF0IDExOjEzIC0wNDAwLCBBbmR5IEFkYW1zb24gd3JvdGU6DQo+
IE9uIFRodSwgTWFyIDE1LCAyMDEyIGF0IDg6MTAgUE0sIE15a2xlYnVzdCwgVHJvbmQNCj4gPFRy
b25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTItMDMtMTUg
YXQgMTQ6NDAgLTA0MDAsIGFuZHJvc0BuZXRhcHAuY29tIHdyb3RlOg0KPiA+PiBGcm9tOiBBbmR5
IEFkYW1zb24gPGFuZHJvc0BuZXRhcHAuY29tPg0KPiA+Pg0KPiA+PiBTaWduZWQtb2ZmLWJ5OiBB
bmR5IEFkYW1zb24gPGFuZHJvc0BuZXRhcHAuY29tPg0KPiA+PiAtLS0NCj4gPj4gIGluY2x1ZGUv
bGludXgvc3VucnBjL3NjaGVkLmggfCAgICAxICsNCj4gPj4gIG5ldC9zdW5ycGMvc2NoZWQuYyAg
ICAgICAgICAgfCAgIDI3ICsrKysrKysrKysrKysrKysrKysrKysrKysrKw0KPiA+PiAgMiBmaWxl
cyBjaGFuZ2VkLCAyOCBpbnNlcnRpb25zKCspLCAwIGRlbGV0aW9ucygtKQ0KPiA+Pg0KPiA+PiBk
aWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51eC9zdW5ycGMvc2NoZWQuaCBiL2luY2x1ZGUvbGludXgv
c3VucnBjL3NjaGVkLmgNCj4gPj4gaW5kZXggZGMwYzNjYy4uZmNlMDg3MyAxMDA2NDQNCj4gPj4g
LS0tIGEvaW5jbHVkZS9saW51eC9zdW5ycGMvc2NoZWQuaA0KPiA+PiArKysgYi9pbmNsdWRlL2xp
bnV4L3N1bnJwYy9zY2hlZC5oDQo+ID4+IEBAIC0yMzUsNiArMjM1LDcgQEAgdm9pZCAgICAgICAg
ICAgICAgcnBjX3NsZWVwX29uX3ByaW9yaXR5KHN0cnVjdCBycGNfd2FpdF9xdWV1ZSAqLA0KPiA+
PiAgdm9pZCAgICAgICAgIHJwY193YWtlX3VwX3F1ZXVlZF90YXNrKHN0cnVjdCBycGNfd2FpdF9x
dWV1ZSAqLA0KPiA+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHN0cnVj
dCBycGNfdGFzayAqKTsNCj4gPj4gIHZvaWQgICAgICAgICBycGNfd2FrZV91cChzdHJ1Y3QgcnBj
X3dhaXRfcXVldWUgKik7DQo+ID4+ICt2b2lkICAgICAgICAgcnBjX2RyYWluX3F1ZXVlKHN0cnVj
dCBycGNfd2FpdF9xdWV1ZSAqKTsNCj4gPj4gIHN0cnVjdCBycGNfdGFzayAqcnBjX3dha2VfdXBf
bmV4dChzdHJ1Y3QgcnBjX3dhaXRfcXVldWUgKik7DQo+ID4+ICBzdHJ1Y3QgcnBjX3Rhc2sgKnJw
Y193YWtlX3VwX2ZpcnN0KHN0cnVjdCBycGNfd2FpdF9xdWV1ZSAqLA0KPiA+PiAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgIGJvb2wgKCopKHN0cnVjdCBycGNfdGFzayAqLCB2
b2lkICopLA0KPiA+PiBkaWZmIC0tZ2l0IGEvbmV0L3N1bnJwYy9zY2hlZC5jIGIvbmV0L3N1bnJw
Yy9zY2hlZC5jDQo+ID4+IGluZGV4IDFjNTcwYTguLjExOTI4ZmYgMTAwNjQ0DQo+ID4+IC0tLSBh
L25ldC9zdW5ycGMvc2NoZWQuYw0KPiA+PiArKysgYi9uZXQvc3VucnBjL3NjaGVkLmMNCj4gPj4g
QEAgLTU1MSw2ICs1NTEsMzMgQEAgdm9pZCBycGNfd2FrZV91cChzdHJ1Y3QgcnBjX3dhaXRfcXVl
dWUgKnF1ZXVlKQ0KPiA+PiAgRVhQT1JUX1NZTUJPTF9HUEwocnBjX3dha2VfdXApOw0KPiA+Pg0K
PiA+PiAgLyoqDQo+ID4+ICsgKiBycGNfZHJhaW5fcXVldWUgLSBlbXB0eSB0aGUgcXVldWUgYW5k
IHdha2UgdXAgYWxsIHJwY190YXNrcw0KPiA+PiArICogQHF1ZXVlOiBycGNfd2FpdF9xdWV1ZSBv
biB3aGljaCB0aGUgdGFza3MgYXJlIHNsZWVwaW5nDQo+ID4+ICsgKg0KPiA+PiArICogR3JhYnMg
cXVldWUtPmxvY2sNCj4gPj4gKyAqLw0KPiA+PiArdm9pZCBycGNfZHJhaW5fcXVldWUoc3RydWN0
IHJwY193YWl0X3F1ZXVlICpxdWV1ZSkNCj4gPj4gK3sNCj4gPj4gKyAgICAgc3RydWN0IHJwY190
YXNrICp0YXNrOw0KPiA+PiArICAgICBzdHJ1Y3QgbGlzdF9oZWFkICpoZWFkOw0KPiA+PiArDQo+
ID4+ICsgICAgIHNwaW5fbG9ja19iaCgmcXVldWUtPmxvY2spOw0KPiA+PiArICAgICBoZWFkID0g
JnF1ZXVlLT50YXNrc1txdWV1ZS0+bWF4cHJpb3JpdHldOw0KPiA+PiArICAgICBmb3IgKDs7KSB7
DQo+ID4+ICsgICAgICAgICAgICAgd2hpbGUgKCFsaXN0X2VtcHR5KGhlYWQpKSB7DQo+ID4+ICsg
ICAgICAgICAgICAgICAgICAgICB0YXNrID0gbGlzdF9lbnRyeShoZWFkLT5uZXh0LCBzdHJ1Y3Qg
cnBjX3Rhc2ssDQo+ID4+ICsgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB1
LnRrX3dhaXQubGlzdCk7DQo+ID4+ICsgICAgICAgICAgICAgICAgICAgICBycGNfd2FrZV91cF90
YXNrX3F1ZXVlX2xvY2tlZChxdWV1ZSwgdGFzayk7DQo+ID4+ICsgICAgICAgICAgICAgfQ0KPiA+
PiArICAgICAgICAgICAgIGlmIChoZWFkID09ICZxdWV1ZS0+dGFza3NbMF0pDQo+ID4+ICsgICAg
ICAgICAgICAgICAgICAgICBicmVhazsNCj4gPj4gKyAgICAgICAgICAgICBoZWFkLS07DQo+ID4+
ICsgICAgIH0NCj4gPj4gKyAgICAgc3Bpbl91bmxvY2tfYmgoJnF1ZXVlLT5sb2NrKTsNCj4gPj4g
K30NCj4gPj4gK0VYUE9SVF9TWU1CT0xfR1BMKHJwY19kcmFpbl9xdWV1ZSk7DQo+ID4+ICsNCj4g
Pg0KPiA+IENvbmZ1c2VkLi4uIEhvdyBpcyB0aGlzIGZ1bmN0aW9uIGFueSBkaWZmZXJlbnQgZnJv
bSBycGNfd2FrZV91cCgpPw0KPiANCj4gQmVjYXVzZSBpdCBhY3R1YWxseSBkcmFpbnMgdGhlIHF1
ZXVlcyB3aGVyZSBycGNfd2FrZV91cCBkb2VzIG5vdC4gIFNlZQ0KPiB0aGUgYXR0YWNoZWQgb3V0
cHV0IHdoZXJlIEkgYWRkZWQgdGhlIHNhbWUgcHJpbnRrcyB0byBib3RoDQo+IHJwY19kcmFpbl9x
dWV1ZSBhbmQgcnBjX3dha2VfdXAuDQoNClNvIHlvdSBhcmUgc2VlaW5nIGEgYnVnIGluIHJwY193
YWtlX3VwKCk/IEknbSBzdXJwcmlzZWQ7IGEgYnVnIG9mIHRoYXQNCm1hZ25pdHVkZSBzaG91bGQg
aGF2ZSBjYXVzZWQgYSBsb3Qgb2YgaGFuZ3MuIENhbiB5b3UgcGxlYXNlIGxvb2sgaW50bw0Kd2hh
dCBpcyBoYXBwZW5pbmcuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50
IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5l
dGFwcC5jb20NCg0K

2012-03-15 18:41:02

by Andy Adamson

[permalink] [raw]
Subject: [PATCH Version 1 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup

From: Andy Adamson <[email protected]>

Register a new filelayout DS rpc_action callback for sleeping on the fore
channel slot table waitq. Avoid any additional RPC FSM states
(such as timeout) when waking up to an invalid deviceid and reset
the task for io to the MDS.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 26 ++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 4 ++--
include/linux/nfs_xdr.h | 1 +
3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 26c83b0..8531161 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -232,6 +232,18 @@ filelayout_set_layoutcommit(struct nfs_write_data *wdata)
(unsigned long) NFS_I(wdata->inode)->layout->plh_lwb);
}

+static void filelayout_read_sleepon_cb(struct rpc_task *task)
+{
+ struct nfs_read_data *rd = (struct nfs_read_data *)task->tk_calldata;
+
+ if (rd->lseg &&
+ filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(rd->lseg))) {
+ dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+ filelayout_reset_read(task, rd);
+ rpc_restart_call_prepare(task);
+ }
+}
+
/*
* Call ops for the async read/write cases
* In the case of dense layouts, the offset needs to be reset to its
@@ -241,6 +253,7 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
{
struct nfs_read_data *rdata = (struct nfs_read_data *)data;

+ rdata->args.seq_args.sa_action = &filelayout_read_sleepon_cb;
rdata->read_done_cb = filelayout_read_done_cb;

if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
@@ -336,10 +349,23 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
return 0;
}

+static void filelayout_write_sleepon_cb(struct rpc_task *task)
+{
+ struct nfs_write_data *wd = (struct nfs_write_data *)task->tk_calldata;
+
+ if (wd->lseg &&
+ filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(wd->lseg))) {
+ dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+ filelayout_reset_write(task, wd);
+ rpc_restart_call_prepare(task);
+ }
+}
+
static void filelayout_write_prepare(struct rpc_task *task, void *data)
{
struct nfs_write_data *wdata = (struct nfs_write_data *)data;

+ wdata->args.seq_args.sa_action = &filelayout_write_sleepon_cb;
if (nfs41_setup_sequence(wdata->ds_clp->cl_session,
&wdata->args.seq_args, &wdata->res.seq_res,
task))
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 45b67d8..e5e651e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -590,7 +590,7 @@ int nfs41_setup_sequence(struct nfs4_session *session,

if (!rpc_queue_empty(&tbl->slot_tbl_waitq) &&
!rpc_task_has_priority(task, RPC_PRIORITY_PRIVILEGED)) {
- rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL);
+ rpc_sleep_on(&tbl->slot_tbl_waitq, task, args->sa_action);
spin_unlock(&tbl->slot_tbl_lock);
dprintk("%s enforce FIFO order\n", __func__);
return -EAGAIN;
@@ -598,7 +598,7 @@ int nfs41_setup_sequence(struct nfs4_session *session,

slotid = nfs4_find_slot(tbl);
if (slotid == NFS4_NO_SLOT) {
- rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL);
+ rpc_sleep_on(&tbl->slot_tbl_waitq, task, args->sa_action);
spin_unlock(&tbl->slot_tbl_lock);
dprintk("<-- %s: no free slots\n", __func__);
return -EAGAIN;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index c1cf86c..2d86f6e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -182,6 +182,7 @@ struct nfs4_slot {

struct nfs4_sequence_args {
struct nfs4_session *sa_session;
+ rpc_action sa_action; /* for slot_tbl_waitq wakeup */
u32 sa_slotid;
u8 sa_cache_this;
};
--
1.7.6.4


2012-03-16 15:12:05

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH Version 1 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid

On Thu, Mar 15, 2012 at 8:18 PM, Myklebust, Trond
<[email protected]> wrote:
> How does sending the commit to the MDS help if this is a commit-to-DS
> setup? AFAICS we need to resend all the writes through the MDS.

That is actually what happens - bad patch name. Of course, after all
the writes are re-sent, the commit is sent to the MDS...

-->Andy

BTW: I note that filelayout_commit_pagelist returns -ENOMEM on error
which is incorrect: being a pnfs_layoutdriver_type.commit_pagelist
routine, it should return one of PNFS_ATTEMPTED, PNFS_NOT_ATTEMPED.
I'll send a patch.

>
> On Thu, 2012-03-15 at 14:40 -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> ?fs/nfs/nfs4filelayout.c | ? 11 ++++++++++-
>> ?1 files changed, 10 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 2528cb2..26c83b0 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -847,12 +847,16 @@ filelayout_choose_commit_list(struct nfs_page *req,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct pnfs_layout_segment *lseg)
>> ?{
>> ? ? ? struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
>> + ? ? struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
>> ? ? ? u32 i, j;
>> ? ? ? struct list_head *list;
>>
>> ? ? ? if (fl->commit_through_mds)
>> ? ? ? ? ? ? ? return &NFS_I(req->wb_context->dentry->d_inode)->commit_list;
>>
>> + ? ? if (filelayout_test_devid_invalid(devid))
>> + ? ? ? ? ? ? return NULL; /* Resend I/O (writes and commits) to MDS */
>> +
>> ? ? ? /* Note that we are calling nfs4_fl_calc_j_index on each page
>> ? ? ? ?* that ends up being committed to a data server. ?An attractive
>> ? ? ? ?* alternative is to add a field to nfs_write_data and nfs_page
>> @@ -933,9 +937,14 @@ find_only_write_lseg_locked(struct inode *inode)
>> ?{
>> ? ? ? struct pnfs_layout_segment *lseg;
>>
>> - ? ? list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list)
>> + ? ? list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
>> + ? ? ? ? ? ? struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
>> + ? ? ? ? ? ? if (filelayout_test_devid_invalid(devid))
>> + ? ? ? ? ? ? ? ? ? ? /* Resend I/O (writes and commits) to MDS */
>> + ? ? ? ? ? ? ? ? ? ? return NULL;
>> ? ? ? ? ? ? ? if (lseg->pls_range.iomode == IOMODE_RW)
>> ? ? ? ? ? ? ? ? ? ? ? return get_lseg(lseg);
>> + ? ? }
>> ? ? ? return NULL;
>> ?}
>>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>