2011-02-14 19:18:42

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 0/16] pnfs wave 3 submission

These patches implement wave 3 of the pNFS submission, which encompasses file
layout data server connection, READ I/O, and recovery through the MDS.

They apply on top of Fred's recent 3 patch 'lock inversion' series
commits 2d767077 .. 2cc09edd in Trond's nfs-for-next tree.

They are based upon Benny's current pnfs-submit-wave3 branch re-arranged
into a more choerent series of patches and rebased upon Trond's nfs-for-next.

-->Andy

0001-NFS-remove-unnecessary-CONFIG_NFS_V4-from-nfs_read_d.patch
0002-NFS-put_layout_hdr-can-remove-nfsi-layout.patch
0003-NFS-move-nfs_client-initialization-into-nfs_get_clie.patch
0004-pnfs-wave-3-send-zero-stateid-seqid-on-v4.1-i-o.patch
0005-pnfs-wave-3-new-flag-for-state-renewal-check.patch
0006-pnfs-wave-3-new-flag-for-lease-time-check.patch
0007-pnfs-wave-3-add-MDS-mount-DS-only-check.patch
0008-pnfs-wave-3-lseg-refcounting.patch
0009-pnfs-wave-3-shift-pnfs_update_layout-locations.patch
0010-pnfs-wave-3-coelesce-across-layout-stripes.patch
0011-pnfs-wave-3-generic-read.patch
0012-pnfs-wave-3-data-server-connection.patch
0013-pnfs-wave-3-filelayout-read.patch
0014-pnfs-wave-3-filelayout-read.patch
0015-pnfs-wave-3-filelayout-async-error-handler.patch
0016-pnfs-wave-3-turn-off-pNFS-on-ds-connection-failure.patch



2011-02-16 15:52:22

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read

On 2011-02-16 10:09, Trond Myklebust wrote:
> On Wed, 2011-02-16 at 09:53 -0500, Andy Adamson wrote:
>> On Feb 15, 2011, at 10:16 PM, Benny Halevy wrote:
>>
>>> On 2011-02-14 14:18, [email protected] wrote:
>>>> From: Andy Adamson <[email protected]>
>>>
>>> Andy, taking into account the many contributors to this patch
>>> the author should be "The pNFS Team" IMO.
>>
>> The author can't be "The pNFS Team". Somebody needs to be the author. I asked for volunteers and said I would be the default. Do you want to be the author?
>
> Right. Patches authored by 'The pNFS Team' will be rejected, as
> discussed in Hopkinton last autumn.
>

OK. I'm not the original author so I can't claim authorship for this patch.
FWIW, The earliest record I have in my tree for the earliest versions of this code is
authored by Andy an Mike Sager...

Benny

2011-02-14 19:19:00

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 15/16] pnfs: wave 3: filelayout async error handler

From: Andy Adamson <[email protected]>

Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.

Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
to a DS gets correctly resent through the MDS.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/internal.h | 1 +
fs/nfs/nfs4filelayout.c | 79 +++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 33 +++++++++++++++---
fs/nfs/nfs4state.c | 1 +
fs/nfs/read.c | 1 +
include/linux/nfs_xdr.h | 1 +
include/linux/sunrpc/clnt.h | 1 +
net/sunrpc/clnt.c | 8 ++++
8 files changed, 119 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5518d61..f69a322 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -281,6 +281,7 @@ extern int nfs_migrate_page(struct address_space *,
#endif

/* nfs4proc.c */
+extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
extern int _nfs4_call_sync(struct nfs_server *server,
struct rpc_message *msg,
struct nfs4_sequence_args *args,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index a352674..c818042 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -40,6 +40,8 @@ MODULE_LICENSE("GPL");
MODULE_AUTHOR("Dean Hildebrand <[email protected]>");
MODULE_DESCRIPTION("The NFSv4 file layout driver");

+#define FILELAYOUT_POLL_RETRY_MAX (15*HZ)
+
static int
filelayout_set_layoutdriver(struct nfs_server *nfss)
{
@@ -95,6 +97,81 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
BUG();
}

+/* For data server errors we don't recover from */
+static void
+filelayout_set_lo_fail(struct pnfs_layout_segment *lseg)
+{
+ if (lseg->pls_range.iomode == IOMODE_RW) {
+ dprintk("%s Setting layout IOMODE_RW fail bit\n", __func__);
+ set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
+ } else {
+ dprintk("%s Setting layout IOMODE_READ fail bit\n", __func__);
+ set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+ }
+}
+
+static int filelayout_async_handle_error(struct rpc_task *task,
+ struct nfs4_state *state,
+ struct nfs_client *clp,
+ int *reset)
+{
+ if (task->tk_status >= 0)
+ return 0;
+ switch (task->tk_status) {
+ case -NFS4ERR_BADSESSION:
+ case -NFS4ERR_BADSLOT:
+ case -NFS4ERR_BAD_HIGH_SLOT:
+ case -NFS4ERR_DEADSESSION:
+ case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+ case -NFS4ERR_SEQ_FALSE_RETRY:
+ case -NFS4ERR_SEQ_MISORDERED:
+ dprintk("%s ERROR %d, Reset session. Exchangeid "
+ "flags 0x%x\n", __func__, task->tk_status,
+ clp->cl_exchange_flags);
+ nfs4_schedule_state_recovery(clp);
+ task->tk_status = 0;
+ return -EAGAIN;
+ case -NFS4ERR_DELAY:
+ case -NFS4ERR_GRACE:
+ case -EKEYEXPIRED:
+ rpc_delay(task, FILELAYOUT_POLL_RETRY_MAX);
+ task->tk_status = 0;
+ return -EAGAIN;
+ default:
+ dprintk("%s DS error. Retry through MDS %d\n", __func__,
+ task->tk_status);
+ *reset = 1;
+ task->tk_status = 0;
+ return -EAGAIN;
+ }
+}
+
+/* NFS_PROTO call done callback routines */
+
+static int filelayout_read_done_cb(struct rpc_task *task,
+ struct nfs_read_data *data)
+{
+ struct nfs_client *clp = data->ds_clp;
+ int reset = 0;
+
+ dprintk("%s DS read\n", __func__);
+
+ if (filelayout_async_handle_error(task, data->args.context->state,
+ data->ds_clp, &reset) == -EAGAIN) {
+ dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
+ __func__, data->ds_clp, data->ds_clp->cl_session);
+ if (reset) {
+ nfs4_reset_read(task, data);
+ filelayout_set_lo_fail(data->lseg);
+ clp = NFS_SERVER(data->inode)->nfs_client;
+ }
+ nfs_restart_rpc(task, clp);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
/*
* Call ops for the async read/write cases
* In the case of dense layouts, the offset needs to be reset to its
@@ -104,6 +181,8 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
{
struct nfs_read_data *rdata = (struct nfs_read_data *)data;

+ rdata->read_done_cb = filelayout_read_done_cb;
+
if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
&rdata->args.seq_args, &rdata->res.seq_res,
0, task))
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index bdf6fa6..0f73db0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3075,15 +3075,10 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
return err;
}

-static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
+static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
{
struct nfs_server *server = NFS_SERVER(data->inode);

- dprintk("--> %s\n", __func__);
-
- if (!nfs4_sequence_done(task, &data->res.seq_res))
- return -EAGAIN;
-
if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
nfs_restart_rpc(task, server->nfs_client);
return -EAGAIN;
@@ -3095,12 +3090,38 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
return 0;
}

+static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
+{
+
+ dprintk("--> %s\n", __func__);
+
+ if (!nfs4_sequence_done(task, &data->res.seq_res))
+ return -EAGAIN;
+
+ return data->read_done_cb(task, data);
+}
+
static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message *msg)
{
data->timestamp = jiffies;
+ data->read_done_cb = nfs4_read_done_cb;
msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
}

+/* Reset the the nfs_read_data to send the read to the MDS. */
+void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
+{
+ dprintk("%s Reset task for i/o through\n", __func__);
+ /* offsets will differ in the dense stripe case */
+ data->args.offset = data->mds_offset;
+ data->ds_clp = NULL;
+ data->args.fh = NFS_FH(data->inode);
+ data->read_done_cb = nfs4_read_done_cb;
+ task->tk_ops = data->mds_ops;
+ rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+EXPORT_SYMBOL_GPL(nfs4_reset_read);
+
static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
{
struct inode *inode = data->inode;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 9e33e88..6da026a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1022,6 +1022,7 @@ void nfs4_schedule_state_recovery(struct nfs_client *clp)
set_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
nfs4_schedule_state_manager(clp);
}
+EXPORT_SYMBOL_GPL(nfs4_schedule_state_recovery);

int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
{
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 5c09d72..9447156 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -387,6 +387,7 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
return;

/* Yes, so retry the read at the end of the data */
+ data->mds_offset += resp->count;
argp->offset += resp->count;
argp->pgbase += resp->count;
argp->count -= resp->count;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 3c74807..4121c3e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1020,6 +1020,7 @@ struct nfs_read_data {
struct pnfs_layout_segment *lseg;
struct nfs_client *ds_clp; /* pNFS data server */
const struct rpc_call_ops *mds_ops;
+ int (*read_done_cb) (struct rpc_task *task, struct nfs_read_data *data);
__u64 mds_offset;
struct page *page_array[NFS_PAGEVEC_SIZE];
};
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index ef9476a..db7bcaf 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -129,6 +129,7 @@ struct rpc_create_args {
struct rpc_clnt *rpc_create(struct rpc_create_args *args);
struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *,
struct rpc_program *, u32);
+void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt);
struct rpc_clnt *rpc_clone_client(struct rpc_clnt *);
void rpc_shutdown_client(struct rpc_clnt *);
void rpc_release_client(struct rpc_clnt *);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 57d344c..5c4df70 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -597,6 +597,14 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
}
}

+void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt)
+{
+ rpc_task_release_client(task);
+ rpc_task_set_client(task, clnt);
+}
+EXPORT_SYMBOL_GPL(rpc_task_reset_client);
+
+
static void
rpc_task_set_rpc_message(struct rpc_task *task, const struct rpc_message *msg)
{
--
1.7.2.3


2011-02-14 19:18:59

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 14/16] pnfs: wave 3: filelayout read

From: Andy Adamson <[email protected]>

Attempt a pNFS file layout read by setting up the nfs_read_data struct and
calling nfs_initiate_read with the data server rpc client and the
filelayout rpc call ops.

Error handling is implemented in a subsequent patch.

Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Dean Hildebrand <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Mingyang Guo <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Ricardo Labiaga <[email protected]>
Tested-by: Guo Mingyang <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/nfs4_fs.h | 3 ++
fs/nfs/nfs4filelayout.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 3 +-
include/linux/nfs_xdr.h | 1 +
4 files changed, 86 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 5dc378e..457b1fe 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -252,6 +252,9 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
extern int nfs4_setup_sequence(const struct nfs_server *server,
struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
int cache_reply, struct rpc_task *task);
+extern int nfs41_setup_sequence(struct nfs4_session *session,
+ struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
+ int cache_reply, struct rpc_task *task);
extern void nfs4_destroy_session(struct nfs4_session *session);
extern struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp);
extern int nfs4_proc_create_session(struct nfs_client *);
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 1c34809..a352674 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -96,6 +96,85 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
}

/*
+ * Call ops for the async read/write cases
+ * In the case of dense layouts, the offset needs to be reset to its
+ * original value.
+ */
+static void filelayout_read_prepare(struct rpc_task *task, void *data)
+{
+ struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+ if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
+ &rdata->args.seq_args, &rdata->res.seq_res,
+ 0, task))
+ return;
+
+ rpc_call_start(task);
+}
+
+static void filelayout_read_call_done(struct rpc_task *task, void *data)
+{
+ struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+ dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
+
+ /* Note this may cause RPC to be resent */
+ rdata->mds_ops->rpc_call_done(task, data);
+}
+
+static void filelayout_read_release(void *data)
+{
+ struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+ rdata->mds_ops->rpc_release(data);
+}
+
+struct rpc_call_ops filelayout_read_call_ops = {
+ .rpc_call_prepare = filelayout_read_prepare,
+ .rpc_call_done = filelayout_read_call_done,
+ .rpc_release = filelayout_read_release,
+};
+
+static enum pnfs_try_status
+filelayout_read_pagelist(struct nfs_read_data *data)
+{
+ struct pnfs_layout_segment *lseg = data->lseg;
+ struct nfs4_pnfs_ds *ds;
+ loff_t offset = data->args.offset;
+ u32 j, idx;
+ struct nfs_fh *fh;
+
+ dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
+ __func__, data->inode->i_ino,
+ data->args.pgbase, (size_t)data->args.count, offset);
+
+ /* Retrieve the correct rpc_client for the byte range */
+ j = nfs4_fl_calc_j_index(lseg, offset);
+ idx = nfs4_fl_calc_ds_index(lseg, j);
+ ds = nfs4_fl_prepare_ds(lseg, idx);
+ if (!ds) {
+ printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
+ return PNFS_NOT_ATTEMPTED;
+ }
+ dprintk("%s USE DS:ip %x %hu\n", __func__,
+ ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
+
+ /* No multipath support. Use first DS */
+ data->ds_clp = ds->ds_clp;
+ fh = nfs4_fl_select_ds_fh(lseg, j);
+ if (fh)
+ data->args.fh = fh;
+
+ data->args.offset = filelayout_get_dserver_offset(lseg, offset);
+ data->mds_offset = offset;
+
+ /* Perform an asynchronous read to ds */
+ nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
+ &filelayout_read_call_ops);
+ return PNFS_ATTEMPTED;
+}
+
+/*
* filelayout_check_layout()
*
* Make sure layout segment parameters are sane WRT the device.
@@ -315,6 +394,7 @@ static struct pnfs_layoutdriver_type filelayout_type = {
.alloc_lseg = filelayout_alloc_lseg,
.free_lseg = filelayout_free_lseg,
.pg_test = filelayout_pg_test,
+ .read_pagelist = filelayout_read_pagelist,
};

static int __init nfs4filelayout_init(void)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fe75ebd..bdf6fa6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -505,7 +505,7 @@ out:
return ret_id;
}

-static int nfs41_setup_sequence(struct nfs4_session *session,
+int nfs41_setup_sequence(struct nfs4_session *session,
struct nfs4_sequence_args *args,
struct nfs4_sequence_res *res,
int cache_reply,
@@ -571,6 +571,7 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
res->sr_status = 1;
return 0;
}
+EXPORT_SYMBOL_GPL(nfs41_setup_sequence);

int nfs4_setup_sequence(const struct nfs_server *server,
struct nfs4_sequence_args *args,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index a607c65..3c74807 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1020,6 +1020,7 @@ struct nfs_read_data {
struct pnfs_layout_segment *lseg;
struct nfs_client *ds_clp; /* pNFS data server */
const struct rpc_call_ops *mds_ops;
+ __u64 mds_offset;
struct page *page_array[NFS_PAGEVEC_SIZE];
};

--
1.7.2.3


2011-02-15 09:31:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/16] pnfs: wave 3: filelayout i/o helpers

On Mon, Feb 14, 2011 at 02:18:33PM -0500, [email protected] wrote:
> +static loff_t
> +filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
> +{
> + struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
> +
> + switch (flseg->stripe_type) {
> + case STRIPE_SPARSE:
> + return offset;
> +
> + case STRIPE_DENSE:
> + {
> + u32 stripe_width;
> + u64 tmp, off;
> + u32 unit = flseg->stripe_unit;
> +
> + stripe_width = unit * flseg->dsaddr->stripe_count;
> + tmp = off = offset - flseg->pattern_offset;
> + do_div(tmp, stripe_width);
> + return tmp * unit + do_div(off, unit);

For readability's sake I'd split this out into a helper:

static loff_t
filelayout_get_dense_offset(struct nfs4_filelayout_segment *flseg,
loff_t offset)
{
u32 stripe_width = flseg->stripe_unit * flseg->dsaddr->stripe_count;
u64 tmp;

offset -= flseg->pattern_offset

tmp = off;
do_div(tmp, stripe_width);

return tmp * unit + do_div(offset, flseg->stripe_unit);
}

...


case STRIPE_DENSE:
return filelayout_get_dense_offset(flset, offset);


2011-02-15 09:25:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

> +_put_lseg_common(struct pnfs_layout_segment *lseg)

The naming of _put_lseg_common is pretty weird compared to standard Linux
function naming. I'd either expect __put_lseg or put_lseg_common.

> +{
> + struct inode *ino = lseg->pls_layout->plh_inode;

Please call this inode. ino is usually used for variables of type ino_t.


> + BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
> + list_del(&lseg->pls_list);
> + if (list_empty(&lseg->pls_layout->plh_segs)) {
> + set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
> + /* Matched by initial refcount set in alloc_init_layout_hdr */
> + put_layout_hdr_locked(lseg->pls_layout);
> + }
> + rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
> +}
> +



> @@ -242,22 +257,35 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
> atomic_read(&lseg->pls_refcount),
> test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
> if (atomic_dec_and_test(&lseg->pls_refcount)) {
> + _put_lseg_common(lseg);
> list_add(&lseg->pls_list, tmp_list);
> return 1;
> }
> return 0;

Given that put_lseg_locked is pretty trivial now, and has a awkward
calling convention I would just inline it into the only caller.

> +static void
> +put_lseg(struct pnfs_layout_segment *lseg)
> +{
> + struct inode *ino;

Again, please call this inode.

> +
> + if (!lseg)
> + return;
> +
> + dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
> + atomic_read(&lseg->pls_refcount),
> + test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
> + ino = lseg->pls_layout->plh_inode;
> + if (atomic_dec_and_lock(&lseg->pls_refcount, &ino->i_lock)) {
> + LIST_HEAD(free_me);
> +
> + _put_lseg_common(lseg);
> + list_add(&lseg->pls_list, &free_me);
> + spin_unlock(&ino->i_lock);
> + pnfs_free_lseg_list(&free_me);
> + }

What's the point of the list operations here? You'd be much better to
just do a

free_lseg(lseg);

after releasing the lock.


2011-02-14 19:18:52

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 10/16] pnfs: wave 3: coelesce across layout stripes

From: Fred Isaman <[email protected]>

Add a pg_test layout driver hook which is used to avoid coelescing I/O across
layout stripes.

Signed-off-by: Andy Adamon <[email protected]>
Signed-off-by: Andy Adamon <[email protected]>
Signed-off-by: Dean Hildebrand <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Boaz Harrosh <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Tao Guo <[email protected]>
---
fs/nfs/nfs4filelayout.c | 26 ++++++++++++++++++++++++++
fs/nfs/pagelist.c | 18 +++++++++++++-----
fs/nfs/pnfs.c | 19 +++++++++++++++++++
fs/nfs/pnfs.h | 12 ++++++++++++
fs/nfs/read.c | 1 +
fs/nfs/write.c | 3 +++
include/linux/nfs_page.h | 1 +
7 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 23f930c..98e26e0 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -252,6 +252,31 @@ filelayout_free_lseg(struct pnfs_layout_segment *lseg)
_filelayout_free_lseg(fl);
}

+/*
+ * filelayout_pg_test(). Called by nfs_can_coalesce_requests()
+ *
+ * return 1 : coalesce page
+ * return 0 : don't coalesce page
+ */
+int
+filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
+ struct nfs_page *req)
+{
+ u64 p_stripe, r_stripe;
+ u32 stripe_unit;
+
+ if (!pgio->pg_lseg)
+ return 1;
+ p_stripe = (u64)prev->wb_index << PAGE_CACHE_SHIFT;
+ r_stripe = (u64)req->wb_index << PAGE_CACHE_SHIFT;
+ stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
+
+ do_div(p_stripe, stripe_unit);
+ do_div(r_stripe, stripe_unit);
+
+ return (p_stripe == r_stripe);
+}
+
static struct pnfs_layoutdriver_type filelayout_type = {
.id = LAYOUT_NFSV4_1_FILES,
.name = "LAYOUT_NFSV4_1_FILES",
@@ -260,6 +285,7 @@ static struct pnfs_layoutdriver_type filelayout_type = {
.clear_layoutdriver = filelayout_clear_layoutdriver,
.alloc_lseg = filelayout_alloc_lseg,
.free_lseg = filelayout_free_lseg,
+ .pg_test = filelayout_pg_test,
};

static int __init nfs4filelayout_init(void)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e0a0cb4..2c793a7 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -242,7 +242,8 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
* Return 'true' if this is the case, else return 'false'.
*/
static int nfs_can_coalesce_requests(struct nfs_page *prev,
- struct nfs_page *req)
+ struct nfs_page *req,
+ struct nfs_pageio_descriptor *pgio)
{
if (req->wb_context->cred != prev->wb_context->cred)
return 0;
@@ -256,6 +257,12 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
return 0;
if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
return 0;
+ /*
+ * Non-whole file layouts need to check that req is inside of
+ * pgio->pg_lseg.
+ */
+ if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
+ return 0;
return 1;
}

@@ -288,14 +295,15 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
if (newlen > desc->pg_bsize)
return 0;
prev = nfs_list_entry(desc->pg_list.prev);
- if (!nfs_can_coalesce_requests(prev, req))
+ if (!nfs_can_coalesce_requests(prev, req, desc))
return 0;
} else {
put_lseg(desc->pg_lseg);
desc->pg_base = req->wb_pgbase;
- desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
- req->wb_context,
- IOMODE_READ);
+ if (desc->pg_test)
+ desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
+ req->wb_context,
+ IOMODE_READ);
}
nfs_list_remove_request(req);
nfs_list_add_request(req, &desc->pg_list);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index dcd4356..f200e34 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -871,6 +871,25 @@ out_forget_reply:
goto out;
}

+static void
+pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
+{
+ struct pnfs_layoutdriver_type *ld;
+
+ ld = NFS_SERVER(inode)->pnfs_curr_ld;
+ pgio->pg_test = (ld ? ld->pg_test : NULL);
+}
+
+/*
+ * rsize is already set by caller to MDS rsize.
+ */
+void
+pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
+ struct inode *inode)
+{
+ pnfs_set_pg_test(inode, pgio);
+}
+
/*
* Device ID cache. Currently supports one layout type per struct nfs_client.
* Add layout type to the lookup key to expand to support multiple types.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 121d6a3..5107d14 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -30,6 +30,8 @@
#ifndef FS_NFS_PNFS_H
#define FS_NFS_PNFS_H

+#include <linux/nfs_page.h>
+
enum {
NFS_LSEG_VALID = 0, /* cleared when lseg is recalled/returned */
NFS_LSEG_ROC, /* roc bit received from server */
@@ -65,6 +67,9 @@ struct pnfs_layoutdriver_type {
int (*clear_layoutdriver) (struct nfs_server *);
struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
void (*free_lseg) (struct pnfs_layout_segment *lseg);
+
+ /* test for nfs page cache coalescing */
+ int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
};

struct pnfs_layout_hdr {
@@ -152,6 +157,7 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
enum pnfs_iomode access_type);
void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
void unset_pnfs_layoutdriver(struct nfs_server *);
+void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
int pnfs_layout_process(struct nfs4_layoutget *lgp);
void pnfs_free_lseg_list(struct list_head *tmp_list);
void pnfs_destroy_layout(struct nfs_inode *);
@@ -251,6 +257,12 @@ static inline void unset_pnfs_layoutdriver(struct nfs_server *s)
{
}

+static inline void
+pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio, struct inode *ino)
+{
+ pgio->pg_test = NULL;
+}
+
#endif /* CONFIG_NFS_V4_1 */

#endif /* FS_NFS_PNFS_H */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index c453164..20cc936 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -630,6 +630,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
if (ret == 0)
goto read_complete; /* all pages were read */

+ pnfs_pageio_init_read(&pgio, inode);
if (rsize < PAGE_CACHE_SIZE)
nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
else
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 004c28b..aca0268 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -28,6 +28,7 @@
#include "iostat.h"
#include "nfs4_fs.h"
#include "fscache.h"
+#include "pnfs.h"

#define NFSDBG_FACILITY NFSDBG_PAGECACHE

@@ -982,6 +983,8 @@ static void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
{
size_t wsize = NFS_SERVER(inode)->wsize;

+ pgio->pg_test = NULL;
+
if (wsize < PAGE_CACHE_SIZE)
nfs_pageio_init(pgio, inode, nfs_flush_multi, wsize, ioflags);
else
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 2db0372..ba88ff4 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -63,6 +63,7 @@ struct nfs_pageio_descriptor {
int pg_ioflags;
int pg_error;
struct pnfs_layout_segment *pg_lseg;
+ int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
};

#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
--
1.7.2.3


2011-02-16 03:11:44

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 09/16] pnfs: wave 3: shift pnfs_update_layout locations

On 2011-02-15 09:41, Fred Isaman wrote:
> On Mon, Feb 14, 2011 at 6:14 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
>>> From: Fred Isaman <[email protected]>
>>>
>>> Move the pnfs_update_layout call location to nfs_pageio_do_add_request().
>>> Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach
>>> it to each nfs_read_data so it can be sent to the layout driver.
>>>
>>> Signed-off-by: Andy Adamon <[email protected]>
>>> Signed-off-by: Andy Adamon <[email protected]>
>>> Signed-off-by: Dean Hildebrand <[email protected]>
>>> Signed-off-by: Fred Isaman <[email protected]>
>>> Signed-off-by: Fred Isaman <[email protected]>
>>> Signed-off-by: Benny Halevy <[email protected]>
>>> Signed-off-by: Boaz Harrosh <[email protected]>
>>> Signed-off-by: Oleg Drokin <[email protected]>
>>> Signed-off-by: Tao Guo <[email protected]>
>>> ---
>>> fs/nfs/file.c | 4 ----
>>> fs/nfs/pagelist.c | 15 ++++++++++++---
>>> fs/nfs/pnfs.c | 4 ++--
>>> fs/nfs/pnfs.h | 1 +
>>> fs/nfs/read.c | 28 ++++++++++++++++------------
>>> fs/nfs/write.c | 4 ++--
>>> include/linux/nfs_page.h | 5 +++--
>>> include/linux/nfs_xdr.h | 1 +
>>> 8 files changed, 37 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>>> index 7bf029e..d85a534 100644
>>> --- a/fs/nfs/file.c
>>> +++ b/fs/nfs/file.c
>>> @@ -387,10 +387,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>>> file->f_path.dentry->d_name.name,
>>> mapping->host->i_ino, len, (long long) pos);
>>>
>>> - pnfs_update_layout(mapping->host,
>>> - nfs_file_open_context(file),
>>> - IOMODE_RW);
>>> -
>>> start:
>>> /*
>>> * Prevent starvation issues if someone is doing a consistency
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index e1164e3..e0a0cb4 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -20,6 +20,7 @@
>>> #include <linux/nfs_mount.h>
>>>
>>> #include "internal.h"
>>> +#include "pnfs.h"
>>>
>>> static struct kmem_cache *nfs_page_cachep;
>>>
>>> @@ -213,7 +214,7 @@ nfs_wait_on_request(struct nfs_page *req)
>>> */
>>> void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>> struct inode *inode,
>>> - int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
>>> + int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
>>> size_t bsize,
>>> int io_flags)
>>> {
>>> @@ -226,6 +227,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>> desc->pg_doio = doio;
>>> desc->pg_ioflags = io_flags;
>>> desc->pg_error = 0;
>>> + desc->pg_lseg = NULL;
>>> }
>>>
>>> /**
>>> @@ -288,8 +290,13 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>>> prev = nfs_list_entry(desc->pg_list.prev);
>>> if (!nfs_can_coalesce_requests(prev, req))
>>> return 0;
>>> - } else
>>> + } else {
>>> + put_lseg(desc->pg_lseg);
>>> desc->pg_base = req->wb_pgbase;
>>> + desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
>>> + req->wb_context,
>>> + IOMODE_READ);
>>
>> Looking at this afresh after a week of vacation. Isn't it more natural
>> to do this as part of the pg_doio() callback?
>>
>> Your only reason for introducing the ->pg_lseg pointer is to be able to
>> pass it to the ->pg_doio() in the first place. Why not do that by simply
>> passing the 'desc' pointer to ->pg_doio(), and then having it call
>> pnfs_update_layout() instead of 'get_layout()'?
>>
>
> The problem is that it is not the only reason. Passing the lseg into
> the nfs_can_coalesce_requests is another. Calling pnfs_update_layout
> in ->pg_doio would be eliminate the opportunity to have a say in
> coalescing based on the layout.
>
>

As long as you correctly deal with short I/Os in to doio path (like we did
many moons ago) you should be fine if the layout you got does not cover
the whole coalesced range.

>>> + }
>>> nfs_list_remove_request(req);
>>> nfs_list_add_request(req, &desc->pg_list);
>>> desc->pg_count = newlen;
>>> @@ -307,7 +314,8 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>> nfs_page_array_len(desc->pg_base,
>>> desc->pg_count),
>>> desc->pg_count,
>>> - desc->pg_ioflags);
>>> + desc->pg_ioflags,
>>> + desc->pg_lseg);
>>> if (error < 0)
>>> desc->pg_error = error;
>>> else
>>> @@ -345,6 +353,7 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>>> {
>>> nfs_pageio_doio(desc);
>>> + put_lseg(desc->pg_lseg);
>>> }
>>>
>>> /**
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index f0a9578..dcd4356 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -264,7 +264,7 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
>>> return 0;
>>> }
>>>
>>> -static void
>>> +void
>>> put_lseg(struct pnfs_layout_segment *lseg)
>>> {
>>> struct inode *ino;
>>> @@ -285,6 +285,7 @@ put_lseg(struct pnfs_layout_segment *lseg)
>>> pnfs_free_lseg_list(&free_me);
>>> }
>>> }
>>> +EXPORT_SYMBOL_GPL(put_lseg);
>>
>> Why is this needed here?
>>
>
> That looks like an artifact left over from older code. It is not needed.
>
>>
>>> static bool
>>> should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
>>> @@ -797,7 +798,6 @@ pnfs_update_layout(struct inode *ino,
>>> out:
>>> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
>>> nfsi->layout ? nfsi->layout->plh_flags : -1, lseg);
>>> - put_lseg(lseg); /* STUB - callers currently ignore return value */
>>> return lseg;
>>> out_unlock:
>>> spin_unlock(&ino->i_lock);
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index 9a994bc..121d6a3 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -146,6 +146,7 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
>>>
>>> /* pnfs.c */
>>> void get_layout_hdr(struct pnfs_layout_hdr *lo);
>>> +void put_lseg(struct pnfs_layout_segment *lseg);
>>> struct pnfs_layout_segment *
>>> pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>>> enum pnfs_iomode access_type);
>>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>>> index aedcaa7..c453164 100644
>>> --- a/fs/nfs/read.c
>>> +++ b/fs/nfs/read.c
>>> @@ -20,17 +20,17 @@
>>> #include <linux/nfs_page.h>
>>>
>>> #include <asm/system.h>
>>> +#include "pnfs.h"
>>>
>>> #include "nfs4_fs.h"
>>> #include "internal.h"
>>> #include "iostat.h"
>>> #include "fscache.h"
>>> -#include "pnfs.h"
>>>
>>> #define NFSDBG_FACILITY NFSDBG_PAGECACHE
>>>
>>> -static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int);
>>> -static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int);
>>> +static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
>>> +static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
>>> static const struct rpc_call_ops nfs_read_partial_ops;
>>> static const struct rpc_call_ops nfs_read_full_ops;
>>>
>>> @@ -70,6 +70,7 @@ void nfs_readdata_free(struct nfs_read_data *p)
>>> static void nfs_readdata_release(struct nfs_read_data *rdata)
>>> {
>>> put_nfs_open_context(rdata->args.context);
>>> + put_lseg(rdata->lseg);
>>
>> Shouldn't you be calling put_lseg() _before_ put_nfs_open_context()? You
>> are not guaranteed that the inode still exists after that call.
>>

Good catch. If we need the layout to outlive the open context then
we should get a reference on the inode using iget and iput the inode
in put_layout_hdr_locked.

Benny
>
> Yes.
>
> Fred

2011-02-14 19:18:45

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 04/16] pnfs: wave 3: send zero stateid seqid on v4.1 i/o

From: Andy Adamson <[email protected]>

Data servers require a zero stateid seqid, and there is no advantage to not
doing the same for all NFSv4.1

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4xdr.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 4e2c168..2380c45 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1384,7 +1384,7 @@ static void encode_putrootfh(struct xdr_stream *xdr, struct compound_hdr *hdr)
hdr->replen += decode_putrootfh_maxsz;
}

-static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx)
+static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx, int zero_seqid)
{
nfs4_stateid stateid;
__be32 *p;
@@ -1392,6 +1392,8 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
p = reserve_space(xdr, NFS4_STATEID_SIZE);
if (ctx->state != NULL) {
nfs4_copy_stateid(&stateid, ctx->state, l_ctx->lockowner, l_ctx->pid);
+ if (zero_seqid)
+ stateid.stateid.seqid = 0;
xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
} else
xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
@@ -1404,7 +1406,8 @@ static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args,
p = reserve_space(xdr, 4);
*p = cpu_to_be32(OP_READ);

- encode_stateid(xdr, args->context, args->lock_context);
+ encode_stateid(xdr, args->context, args->lock_context,
+ hdr->minorversion);

p = reserve_space(xdr, 12);
p = xdr_encode_hyper(p, args->offset);
@@ -1592,7 +1595,8 @@ static void encode_write(struct xdr_stream *xdr, const struct nfs_writeargs *arg
p = reserve_space(xdr, 4);
*p = cpu_to_be32(OP_WRITE);

- encode_stateid(xdr, args->context, args->lock_context);
+ encode_stateid(xdr, args->context, args->lock_context,
+ hdr->minorversion);

p = reserve_space(xdr, 16);
p = xdr_encode_hyper(p, args->offset);
--
1.7.2.3


2011-02-15 14:48:28

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On Tue, Feb 15, 2011 at 4:25 AM, Christoph Hellwig <[email protected]> wrote:
>> +_put_lseg_common(struct pnfs_layout_segment *lseg)
>
> The naming of _put_lseg_common is pretty weird compared to standard Linux
> function naming. ?I'd either expect __put_lseg or put_lseg_common.
>
>> +{
>> + ? ? struct inode *ino = lseg->pls_layout->plh_inode;
>
> Please call this inode. ?ino is usually used for variables of type ino_t.
>
>
>> + ? ? BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
>> + ? ? list_del(&lseg->pls_list);
>> + ? ? if (list_empty(&lseg->pls_layout->plh_segs)) {
>> + ? ? ? ? ? ? set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
>> + ? ? ? ? ? ? /* Matched by initial refcount set in alloc_init_layout_hdr */
>> + ? ? ? ? ? ? put_layout_hdr_locked(lseg->pls_layout);
>> + ? ? }
>> + ? ? rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
>> +}
>> +
>
>
>
>> @@ -242,22 +257,35 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
>> ? ? ? ? ? ? ? atomic_read(&lseg->pls_refcount),
>> ? ? ? ? ? ? ? test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
>> ? ? ? if (atomic_dec_and_test(&lseg->pls_refcount)) {
>> + ? ? ? ? ? ? _put_lseg_common(lseg);
>> ? ? ? ? ? ? ? list_add(&lseg->pls_list, tmp_list);
>> ? ? ? ? ? ? ? return 1;
>> ? ? ? }
>> ? ? ? return 0;
>
> Given that put_lseg_locked is pretty trivial now, and has a awkward
> calling convention I would just inline it into the only caller.
>
>> +static void
>> +put_lseg(struct pnfs_layout_segment *lseg)
>> +{
>> + ? ? struct inode *ino;
>
> Again, please call this inode.
>
>> +
>> + ? ? if (!lseg)
>> + ? ? ? ? ? ? return;
>> +
>> + ? ? dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
>> + ? ? ? ? ? ? atomic_read(&lseg->pls_refcount),
>> + ? ? ? ? ? ? test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
>> + ? ? ino = lseg->pls_layout->plh_inode;
>> + ? ? if (atomic_dec_and_lock(&lseg->pls_refcount, &ino->i_lock)) {
>> + ? ? ? ? ? ? LIST_HEAD(free_me);
>> +
>> + ? ? ? ? ? ? _put_lseg_common(lseg);
>> + ? ? ? ? ? ? list_add(&lseg->pls_list, &free_me);
>> + ? ? ? ? ? ? spin_unlock(&ino->i_lock);
>> + ? ? ? ? ? ? pnfs_free_lseg_list(&free_me);
>> + ? ? }
>
> What's the point of the list operations here? ?You'd be much better to
> just do a
>
> ? ? ? ?free_lseg(lseg);
>
> after releasing the lock.
>

pnfs_free_lseg_list, besides calling free_lseg, also potentially
removes the layout from the clients list of inodes with layouts.

Fred

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-02-14 19:18:46

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 05/16] pnfs: wave 3: new flag for state renewal check

From: Andy Adamson <[email protected]>

Data servers not sharing a session with the mount MDS always have an empty
cl_superblocks list.
Replace the cl_superblocks empty list check to see if it is time to shut down
renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/client.c | 5 +++++
fs/nfs/nfs4renewd.c | 6 +-----
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 75b236f..5891cf8 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1032,14 +1032,19 @@ static void nfs_server_insert_lists(struct nfs_server *server)
spin_lock(&nfs_client_lock);
list_add_tail_rcu(&server->client_link, &clp->cl_superblocks);
list_add_tail(&server->master_link, &nfs_volume_list);
+ clear_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state);
spin_unlock(&nfs_client_lock);

}

static void nfs_server_remove_lists(struct nfs_server *server)
{
+ struct nfs_client *clp = server->nfs_client;
+
spin_lock(&nfs_client_lock);
list_del_rcu(&server->client_link);
+ if (clp && list_empty(&clp->cl_superblocks))
+ set_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state);
list_del(&server->master_link);
spin_unlock(&nfs_client_lock);

diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index 402143d..df8e7f3 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -64,12 +64,8 @@ nfs4_renew_state(struct work_struct *work)
ops = clp->cl_mvops->state_renewal_ops;
dprintk("%s: start\n", __func__);

- rcu_read_lock();
- if (list_empty(&clp->cl_superblocks)) {
- rcu_read_unlock();
+ if (test_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state))
goto out;
- }
- rcu_read_unlock();

spin_lock(&clp->cl_lock);
lease = clp->cl_lease_time;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b197563..2c2dc18 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -30,6 +30,7 @@ struct nfs_client {
#define NFS_CS_CALLBACK 1 /* - callback started */
#define NFS_CS_IDMAP 2 /* - idmap started */
#define NFS_CS_RENEWD 3 /* - renewd started */
+#define NFS_CS_STOP_RENEW 4 /* no more state to renew */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
--
1.7.2.3


2011-02-15 14:58:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On Tue, Feb 15, 2011 at 09:48:26AM -0500, Fred Isaman wrote:
> pnfs_free_lseg_list, besides calling free_lseg, also potentially
> removes the layout from the clients list of inodes with layouts.

Looks like the routine than changed from the mainline variant
I looked at. I took a quick look at the one from pnfs-submit,
which looks quite suspicios, as it special cases the first item
on the list without a good explanation and then iterates the list.

Does your tree have another caller of pnfs_free_lseg_list? If not
please just open code the right thing in the caller, instead of
pretending we're dealing with a list if you're always dealing with
one entry. If the tree grows a caller that needs to deal with a list
with more than 1 entry we can revisit if there's a point in sharing
code.


2011-02-14 19:18:44

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 03/16] NFS move nfs_client initialization into nfs_get_client

From: Andy Adamson <[email protected]>

Now nfs_get_client returns an nfs_client ready to be used no matter if it was
found or created.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/client.c | 67 ++++++++++++++++++++++++++++++++++++++----------------
1 files changed, 47 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index bd3ca32..75b236f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -81,6 +81,15 @@ retry:
}
#endif /* CONFIG_NFS_V4 */

+static int nfs4_init_client(struct nfs_client *clp,
+ const struct rpc_timeout *timeparms,
+ const char *ip_addr,
+ rpc_authflavor_t authflavour,
+ int noresvport);
+static int nfs_init_client(struct nfs_client *clp,
+ const struct rpc_timeout *timeparms,
+ int noresvport);
+
/*
* RPC cruft for NFS
*/
@@ -481,7 +490,12 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
* Look up a client by IP address and protocol version
* - creates a new record if one doesn't yet exist
*/
-static struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
+static struct nfs_client *
+nfs_get_client(const struct nfs_client_initdata *cl_init,
+ const struct rpc_timeout *timeparms,
+ const char *ip_addr,
+ rpc_authflavor_t authflavour,
+ int noresvport)
{
struct nfs_client *clp, *new = NULL;
int error;
@@ -512,6 +526,17 @@ install_client:
clp = new;
list_add(&clp->cl_share_link, &nfs_client_list);
spin_unlock(&nfs_client_lock);
+
+ if (cl_init->rpc_ops->version == 4)
+ error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
+ noresvport);
+ else
+ error = nfs_init_client(clp, timeparms, noresvport);
+
+ if (error < 0) {
+ nfs_put_client(clp);
+ return ERR_PTR(error);
+ }
dprintk("--> nfs_get_client() = %p [new]\n", clp);
return clp;

@@ -769,7 +794,7 @@ static int nfs_init_server_rpcclient(struct nfs_server *server,
*/
static int nfs_init_client(struct nfs_client *clp,
const struct rpc_timeout *timeparms,
- const struct nfs_parsed_mount_data *data)
+ int noresvport)
{
int error;

@@ -784,7 +809,7 @@ static int nfs_init_client(struct nfs_client *clp,
* - RFC 2623, sec 2.3.2
*/
error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_UNIX,
- 0, data->flags & NFS_MOUNT_NORESVPORT);
+ 0, noresvport);
if (error < 0)
goto error;
nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -820,19 +845,17 @@ static int nfs_init_server(struct nfs_server *server,
cl_init.rpc_ops = &nfs_v3_clientops;
#endif

+ nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
+ data->timeo, data->retrans);
+
/* Allocate or find a client reference we can use */
- clp = nfs_get_client(&cl_init);
+ clp = nfs_get_client(&cl_init, &timeparms, NULL, RPC_AUTH_UNIX,
+ data->flags & NFS_MOUNT_NORESVPORT);
if (IS_ERR(clp)) {
dprintk("<-- nfs_init_server() = error %ld\n", PTR_ERR(clp));
return PTR_ERR(clp);
}

- nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
- data->timeo, data->retrans);
- error = nfs_init_client(clp, &timeparms, data);
- if (error < 0)
- goto error;
-
server->nfs_client = clp;

/* Initialise the client representation from the mount data */
@@ -1311,7 +1334,7 @@ static int nfs4_init_client(struct nfs_client *clp,
const struct rpc_timeout *timeparms,
const char *ip_addr,
rpc_authflavor_t authflavour,
- int flags)
+ int noresvport)
{
int error;

@@ -1325,7 +1348,7 @@ static int nfs4_init_client(struct nfs_client *clp,
clp->rpc_ops = &nfs_v4_clientops;

error = nfs_create_rpc_client(clp, timeparms, authflavour,
- 1, flags & NFS_MOUNT_NORESVPORT);
+ 1, noresvport);
if (error < 0)
goto error;
strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
@@ -1378,22 +1401,16 @@ static int nfs4_set_client(struct nfs_server *server,
dprintk("--> nfs4_set_client()\n");

/* Allocate or find a client reference we can use */
- clp = nfs_get_client(&cl_init);
+ clp = nfs_get_client(&cl_init, timeparms, ip_addr, authflavour,
+ server->flags & NFS_MOUNT_NORESVPORT);
if (IS_ERR(clp)) {
error = PTR_ERR(clp);
goto error;
}
- error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
- server->flags);
- if (error < 0)
- goto error_put;

server->nfs_client = clp;
dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
return 0;
-
-error_put:
- nfs_put_client(clp);
error:
dprintk("<-- nfs4_set_client() = xerror %d\n", error);
return error;
@@ -1611,6 +1628,16 @@ error:
return ERR_PTR(error);
}

+#else /* CONFIG_NFS_V4 */
+static int nfs4_init_client(struct nfs_client *clp,
+ const struct rpc_timeout *timeparms,
+ const char *ip_addr,
+ rpc_authflavor_t authflavour,
+ int noresvport)
+{
+ return -EPROTONOSUPPORT;
+}
+
#endif /* CONFIG_NFS_V4 */

/*
--
1.7.2.3


2011-02-16 15:56:31

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read


On Feb 16, 2011, at 10:52 AM, Benny Halevy wrote:

> On 2011-02-16 10:09, Trond Myklebust wrote:
>> On Wed, 2011-02-16 at 09:53 -0500, Andy Adamson wrote:
>>> On Feb 15, 2011, at 10:16 PM, Benny Halevy wrote:
>>>
>>>> On 2011-02-14 14:18, [email protected] wrote:
>>>>> From: Andy Adamson <[email protected]>
>>>>
>>>> Andy, taking into account the many contributors to this patch
>>>> the author should be "The pNFS Team" IMO.
>>>
>>> The author can't be "The pNFS Team". Somebody needs to be the author. I asked for volunteers and said I would be the default. Do you want to be the author?
>>
>> Right. Patches authored by 'The pNFS Team' will be rejected, as
>> discussed in Hopkinton last autumn.
>>
>
> OK. I'm not the original author so I can't claim authorship for this patch.
> FWIW, The earliest record I have in my tree for the earliest versions of this code is
> authored by Andy an Mike Sager...

Thanks for looking Benny. I'll remain the author.

-->Andy

>
> Benny
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-02-15 19:17:56

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting


On Feb 15, 2011, at 11:37 AM, William A. (Andy) Adamson wrote:

> On Tue, Feb 15, 2011 at 11:02 AM, Christoph Hellwig <[email protected]> wrote:
>> FYI that whole device layout cache thingy looks like a complete fucking
>> mess to me.
>>
>> It's nothing but a trivial hash lookup which is only used in the file
>> layout driver. But instead of just having a hash allocated in the file
>> layout driver on module load, and a trivial opencoded lookup for it it's
>> a massively overcomplicated set of routines. Please rip this stuff out
>> before doing further work in this area.
>>
>> The patch below removes the maze of pointless abstractions and just
>> keeps a simple hash of deviceids in the filelayout driver.
>
>
> The abstract layer is so that this code is not replicated per layout
> driver. Object and block drivers need to do the same task, and indeed
> use this code in their prototypes.
> That said, we don't have those other layout drivers in kernel, so
> moving it all to the file layout driver is fine with me, so long as we
> don't have to move it back once we get other drivers.
>
> Trond?
>
> -->Andy

OK. We all agree. Move the deviceid cache to the filelayout driver until there is a need for a common cache.

-->Andy

>
>>
>>
>> Index: linux-2.6/fs/nfs/nfs4filelayout.c
>> ===================================================================
>> --- linux-2.6.orig/fs/nfs/nfs4filelayout.c 2011-02-15 16:10:51.108421283 +0100
>> +++ linux-2.6/fs/nfs/nfs4filelayout.c 2011-02-15 16:55:22.087422176 +0100
>> @@ -40,32 +40,6 @@ MODULE_LICENSE("GPL");
>> MODULE_AUTHOR("Dean Hildebrand <[email protected]>");
>> MODULE_DESCRIPTION("The NFSv4 file layout driver");
>>
>> -static int
>> -filelayout_set_layoutdriver(struct nfs_server *nfss)
>> -{
>> - int status = pnfs_alloc_init_deviceid_cache(nfss->nfs_client,
>> - nfs4_fl_free_deviceid_callback);
>> - if (status) {
>> - printk(KERN_WARNING "%s: deviceid cache could not be "
>> - "initialized\n", __func__);
>> - return status;
>> - }
>> - dprintk("%s: deviceid cache has been initialized successfully\n",
>> - __func__);
>> - return 0;
>> -}
>> -
>> -/* Clear out the layout by destroying its device list */
>> -static int
>> -filelayout_clear_layoutdriver(struct nfs_server *nfss)
>> -{
>> - dprintk("--> %s\n", __func__);
>> -
>> - if (nfss->nfs_client->cl_devid_cache)
>> - pnfs_put_deviceid_cache(nfss->nfs_client);
>> - return 0;
>> -}
>> -
>> /*
>> * filelayout_check_layout()
>> *
>> @@ -99,7 +73,7 @@ filelayout_check_layout(struct pnfs_layo
>> }
>>
>> /* find and reference the deviceid */
>> - dsaddr = nfs4_fl_find_get_deviceid(nfss->nfs_client, id);
>> + dsaddr = nfs4_fl_find_get_deviceid(id);
>> if (dsaddr == NULL) {
>> dsaddr = get_device_info(lo->plh_inode, id);
>> if (dsaddr == NULL)
>> @@ -134,7 +108,7 @@ out:
>> dprintk("--> %s returns %d\n", __func__, status);
>> return status;
>> out_put:
>> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache, &dsaddr->deviceid);
>> + nfs4_fl_put_deviceid(dsaddr);
>> goto out;
>> }
>>
>> @@ -243,23 +217,19 @@ filelayout_alloc_lseg(struct pnfs_layout
>> static void
>> filelayout_free_lseg(struct pnfs_layout_segment *lseg)
>> {
>> - struct nfs_server *nfss = NFS_SERVER(lseg->pls_layout->plh_inode);
>> struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
>>
>> dprintk("--> %s\n", __func__);
>> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache,
>> - &fl->dsaddr->deviceid);
>> + nfs4_fl_put_deviceid(fl->dsaddr);
>> _filelayout_free_lseg(fl);
>> }
>>
>> static struct pnfs_layoutdriver_type filelayout_type = {
>> - .id = LAYOUT_NFSV4_1_FILES,
>> - .name = "LAYOUT_NFSV4_1_FILES",
>> - .owner = THIS_MODULE,
>> - .set_layoutdriver = filelayout_set_layoutdriver,
>> - .clear_layoutdriver = filelayout_clear_layoutdriver,
>> - .alloc_lseg = filelayout_alloc_lseg,
>> - .free_lseg = filelayout_free_lseg,
>> + .id = LAYOUT_NFSV4_1_FILES,
>> + .name = "LAYOUT_NFSV4_1_FILES",
>> + .owner = THIS_MODULE,
>> + .alloc_lseg = filelayout_alloc_lseg,
>> + .free_lseg = filelayout_free_lseg,
>> };
>>
>> static int __init nfs4filelayout_init(void)
>> Index: linux-2.6/fs/nfs/nfs4filelayout.h
>> ===================================================================
>> --- linux-2.6.orig/fs/nfs/nfs4filelayout.h 2011-02-15 16:30:25.270920897 +0100
>> +++ linux-2.6/fs/nfs/nfs4filelayout.h 2011-02-15 16:47:50.063445740 +0100
>> @@ -56,7 +56,9 @@ struct nfs4_pnfs_ds {
>> };
>>
>> struct nfs4_file_layout_dsaddr {
>> - struct pnfs_deviceid_node deviceid;
>> + struct hlist_node node;
>> + struct nfs4_deviceid deviceid;
>> + atomic_t ref;
>> u32 stripe_count;
>> u8 *stripe_indices;
>> u32 ds_num;
>> @@ -83,11 +85,11 @@ FILELAYOUT_LSEG(struct pnfs_layout_segme
>> generic_hdr);
>> }
>>
>> -extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
>> extern void print_ds(struct nfs4_pnfs_ds *ds);
>> extern void print_deviceid(struct nfs4_deviceid *dev_id);
>> extern struct nfs4_file_layout_dsaddr *
>> -nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
>> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
>> +extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
>> struct nfs4_file_layout_dsaddr *
>> get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
>>
>> Index: linux-2.6/fs/nfs/nfs4filelayoutdev.c
>> ===================================================================
>> --- linux-2.6.orig/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:23:03.480487362 +0100
>> +++ linux-2.6/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:55:02.894924739 +0100
>> @@ -37,6 +37,30 @@
>> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>>
>> /*
>> + * Device ID RCU cache. A device ID is unique per client ID and layout type.
>> + */
>> +#define NFS4_FL_DEVICE_ID_HASH_BITS 5
>> +#define NFS4_FL_DEVICE_ID_HASH_SIZE (1 << NFS4_FL_DEVICE_ID_HASH_BITS)
>> +#define NFS4_FL_DEVICE_ID_HASH_MASK (NFS4_FL_DEVICE_ID_HASH_SIZE - 1)
>> +
>> +static inline u32
>> +nfs4_fl_deviceid_hash(struct nfs4_deviceid *id)
>> +{
>> + unsigned char *cptr = (unsigned char *)id->data;
>> + unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>> + u32 x = 0;
>> +
>> + while (nbytes--) {
>> + x *= 37;
>> + x += *cptr++;
>> + }
>> + return x & NFS4_FL_DEVICE_ID_HASH_MASK;
>> +}
>> +
>> +static struct hlist_head filelayout_deviceid_cache[NFS4_FL_DEVICE_ID_HASH_SIZE];
>> +static DEFINE_SPINLOCK(filelayout_deviceid_lock);
>> +
>> +/*
>> * Data server cache
>> *
>> * Data servers can be mapped to different device ids.
>> @@ -122,7 +146,7 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
>> struct nfs4_pnfs_ds *ds;
>> int i;
>>
>> - print_deviceid(&dsaddr->deviceid.de_id);
>> + print_deviceid(&dsaddr->deviceid);
>>
>> for (i = 0; i < dsaddr->ds_num; i++) {
>> ds = dsaddr->ds_list[i];
>> @@ -139,15 +163,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
>> kfree(dsaddr);
>> }
>>
>> -void
>> -nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *device)
>> -{
>> - struct nfs4_file_layout_dsaddr *dsaddr =
>> - container_of(device, struct nfs4_file_layout_dsaddr, deviceid);
>> -
>> - nfs4_fl_free_deviceid(dsaddr);
>> -}
>> -
>> static struct nfs4_pnfs_ds *
>> nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port)
>> {
>> @@ -296,7 +311,7 @@ decode_device(struct inode *ino, struct
>> dsaddr->stripe_count = cnt;
>> dsaddr->ds_num = num;
>>
>> - memcpy(&dsaddr->deviceid.de_id, &pdev->dev_id, sizeof(pdev->dev_id));
>> + memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));
>>
>> /* Go back an read stripe indices */
>> p = indicesp;
>> @@ -346,28 +361,37 @@ out_err:
>> }
>>
>> /*
>> - * Decode the opaque device specified in 'dev'
>> - * and add it to the list of available devices.
>> - * If the deviceid is already cached, nfs4_add_deviceid will return
>> - * a pointer to the cached struct and throw away the new.
>> + * Decode the opaque device specified in 'dev' and add it to the cache of
>> + * available devices.
>> */
>> -static struct nfs4_file_layout_dsaddr*
>> +static struct nfs4_file_layout_dsaddr *
>> decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
>> {
>> - struct nfs4_file_layout_dsaddr *dsaddr;
>> - struct pnfs_deviceid_node *d;
>> + struct nfs4_file_layout_dsaddr *d, *new;
>> + long hash;
>>
>> - dsaddr = decode_device(inode, dev);
>> - if (!dsaddr) {
>> + new = decode_device(inode, dev);
>> + if (!new) {
>> printk(KERN_WARNING "%s: Could not decode or add device\n",
>> __func__);
>> return NULL;
>> }
>>
>> - d = pnfs_add_deviceid(NFS_SERVER(inode)->nfs_client->cl_devid_cache,
>> - &dsaddr->deviceid);
>> + spin_lock(&filelayout_deviceid_lock);
>> + d = nfs4_fl_find_get_deviceid(&new->deviceid);
>> + if (d) {
>> + spin_unlock(&filelayout_deviceid_lock);
>> + nfs4_fl_free_deviceid(new);
>> + return d;
>> + }
>> +
>> + INIT_HLIST_NODE(&new->node);
>> + atomic_set(&new->ref, 1);
>> + hash = nfs4_fl_deviceid_hash(&new->deviceid);
>> + hlist_add_head_rcu(&new->node, &filelayout_deviceid_cache[hash]);
>> + spin_unlock(&filelayout_deviceid_lock);
>>
>> - return container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
>> + return new;
>> }
>>
>> /*
>> @@ -442,12 +466,36 @@ out_free:
>> return dsaddr;
>> }
>>
>> +void
>> +nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
>> +{
>> + if (atomic_dec_and_lock(&dsaddr->ref, &filelayout_deviceid_lock)) {
>> + hlist_del_rcu(&dsaddr->node);
>> + spin_unlock(&filelayout_deviceid_lock);
>> +
>> + synchronize_rcu();
>> + nfs4_fl_free_deviceid(dsaddr);
>> + }
>> +}
>> +
>> struct nfs4_file_layout_dsaddr *
>> -nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
>> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *id)
>> {
>> - struct pnfs_deviceid_node *d;
>> + struct nfs4_file_layout_dsaddr *d;
>> + struct hlist_node *n;
>> + long hash = nfs4_fl_deviceid_hash(id);
>> +
>>
>> - d = pnfs_find_get_deviceid(clp->cl_devid_cache, id);
>> - return (d == NULL) ? NULL :
>> - container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
>> + rcu_read_lock();
>> + hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
>> + if (!memcmp(&d->deviceid, id, sizeof(*id))) {
>> + if (!atomic_inc_not_zero(&d->ref))
>> + goto fail;
>> + rcu_read_unlock();
>> + return d;
>> + }
>> + }
>> +fail:
>> + rcu_read_unlock();
>> + return NULL;
>> }
>> Index: linux-2.6/fs/nfs/pnfs.c
>> ===================================================================
>> --- linux-2.6.orig/fs/nfs/pnfs.c 2011-02-15 16:10:33.284421051 +0100
>> +++ linux-2.6/fs/nfs/pnfs.c 2011-02-15 16:21:47.115422052 +0100
>> @@ -74,10 +74,8 @@ find_pnfs_driver(u32 id)
>> void
>> unset_pnfs_layoutdriver(struct nfs_server *nfss)
>> {
>> - if (nfss->pnfs_curr_ld) {
>> - nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
>> + if (nfss->pnfs_curr_ld)
>> module_put(nfss->pnfs_curr_ld->owner);
>> - }
>> nfss->pnfs_curr_ld = NULL;
>> }
>>
>> @@ -115,13 +113,7 @@ set_pnfs_layoutdriver(struct nfs_server
>> goto out_no_driver;
>> }
>> server->pnfs_curr_ld = ld_type;
>> - if (ld_type->set_layoutdriver(server)) {
>> - printk(KERN_ERR
>> - "%s: Error initializing mount point for layout driver %u.\n",
>> - __func__, id);
>> - module_put(ld_type->owner);
>> - goto out_no_driver;
>> - }
>> +
>> dprintk("%s: pNFS module for %u set\n", __func__, id);
>> return;
>>
>> @@ -828,138 +820,3 @@ out_forget_reply:
>> NFS_SERVER(ino)->pnfs_curr_ld->free_lseg(lseg);
>> goto out;
>> }
>> -
>> -/*
>> - * Device ID cache. Currently supports one layout type per struct nfs_client.
>> - * Add layout type to the lookup key to expand to support multiple types.
>> - */
>> -int
>> -pnfs_alloc_init_deviceid_cache(struct nfs_client *clp,
>> - void (*free_callback)(struct pnfs_deviceid_node *))
>> -{
>> - struct pnfs_deviceid_cache *c;
>> -
>> - c = kzalloc(sizeof(struct pnfs_deviceid_cache), GFP_KERNEL);
>> - if (!c)
>> - return -ENOMEM;
>> - spin_lock(&clp->cl_lock);
>> - if (clp->cl_devid_cache != NULL) {
>> - atomic_inc(&clp->cl_devid_cache->dc_ref);
>> - dprintk("%s [kref [%d]]\n", __func__,
>> - atomic_read(&clp->cl_devid_cache->dc_ref));
>> - kfree(c);
>> - } else {
>> - /* kzalloc initializes hlists */
>> - spin_lock_init(&c->dc_lock);
>> - atomic_set(&c->dc_ref, 1);
>> - c->dc_free_callback = free_callback;
>> - clp->cl_devid_cache = c;
>> - dprintk("%s [new]\n", __func__);
>> - }
>> - spin_unlock(&clp->cl_lock);
>> - return 0;
>> -}
>> -EXPORT_SYMBOL_GPL(pnfs_alloc_init_deviceid_cache);
>> -
>> -/*
>> - * Called from pnfs_layoutdriver_type->free_lseg
>> - * last layout segment reference frees deviceid
>> - */
>> -void
>> -pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
>> - struct pnfs_deviceid_node *devid)
>> -{
>> - struct nfs4_deviceid *id = &devid->de_id;
>> - struct pnfs_deviceid_node *d;
>> - struct hlist_node *n;
>> - long h = nfs4_deviceid_hash(id);
>> -
>> - dprintk("%s [%d]\n", __func__, atomic_read(&devid->de_ref));
>> - if (!atomic_dec_and_lock(&devid->de_ref, &c->dc_lock))
>> - return;
>> -
>> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[h], de_node)
>> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
>> - hlist_del_rcu(&d->de_node);
>> - spin_unlock(&c->dc_lock);
>> - synchronize_rcu();
>> - c->dc_free_callback(devid);
>> - return;
>> - }
>> - spin_unlock(&c->dc_lock);
>> - /* Why wasn't it found in the list? */
>> - BUG();
>> -}
>> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid);
>> -
>> -/* Find and reference a deviceid */
>> -struct pnfs_deviceid_node *
>> -pnfs_find_get_deviceid(struct pnfs_deviceid_cache *c, struct nfs4_deviceid *id)
>> -{
>> - struct pnfs_deviceid_node *d;
>> - struct hlist_node *n;
>> - long hash = nfs4_deviceid_hash(id);
>> -
>> - dprintk("--> %s hash %ld\n", __func__, hash);
>> - rcu_read_lock();
>> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
>> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
>> - if (!atomic_inc_not_zero(&d->de_ref)) {
>> - goto fail;
>> - } else {
>> - rcu_read_unlock();
>> - return d;
>> - }
>> - }
>> - }
>> -fail:
>> - rcu_read_unlock();
>> - return NULL;
>> -}
>> -EXPORT_SYMBOL_GPL(pnfs_find_get_deviceid);
>> -
>> -/*
>> - * Add a deviceid to the cache.
>> - * GETDEVICEINFOs for same deviceid can race. If deviceid is found, discard new
>> - */
>> -struct pnfs_deviceid_node *
>> -pnfs_add_deviceid(struct pnfs_deviceid_cache *c, struct pnfs_deviceid_node *new)
>> -{
>> - struct pnfs_deviceid_node *d;
>> - long hash = nfs4_deviceid_hash(&new->de_id);
>> -
>> - dprintk("--> %s hash %ld\n", __func__, hash);
>> - spin_lock(&c->dc_lock);
>> - d = pnfs_find_get_deviceid(c, &new->de_id);
>> - if (d) {
>> - spin_unlock(&c->dc_lock);
>> - dprintk("%s [discard]\n", __func__);
>> - c->dc_free_callback(new);
>> - return d;
>> - }
>> - INIT_HLIST_NODE(&new->de_node);
>> - atomic_set(&new->de_ref, 1);
>> - hlist_add_head_rcu(&new->de_node, &c->dc_deviceids[hash]);
>> - spin_unlock(&c->dc_lock);
>> - dprintk("%s [new]\n", __func__);
>> - return new;
>> -}
>> -EXPORT_SYMBOL_GPL(pnfs_add_deviceid);
>> -
>> -void
>> -pnfs_put_deviceid_cache(struct nfs_client *clp)
>> -{
>> - struct pnfs_deviceid_cache *local = clp->cl_devid_cache;
>> -
>> - dprintk("--> %s ({%d})\n", __func__, atomic_read(&local->dc_ref));
>> - if (atomic_dec_and_lock(&local->dc_ref, &clp->cl_lock)) {
>> - int i;
>> - /* Verify cache is empty */
>> - for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE; i++)
>> - BUG_ON(!hlist_empty(&local->dc_deviceids[i]));
>> - clp->cl_devid_cache = NULL;
>> - spin_unlock(&clp->cl_lock);
>> - kfree(local);
>> - }
>> -}
>> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
>> Index: linux-2.6/fs/nfs/pnfs.h
>> ===================================================================
>> --- linux-2.6.orig/fs/nfs/pnfs.h 2011-02-15 16:10:51.088421060 +0100
>> +++ linux-2.6/fs/nfs/pnfs.h 2011-02-15 16:21:34.995159583 +0100
>> @@ -61,8 +61,6 @@ struct pnfs_layoutdriver_type {
>> const u32 id;
>> const char *name;
>> struct module *owner;
>> - int (*set_layoutdriver) (struct nfs_server *);
>> - int (*clear_layoutdriver) (struct nfs_server *);
>> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
>> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>> };
>> @@ -90,52 +88,6 @@ struct pnfs_device {
>> unsigned int pglen;
>> };
>>
>> -/*
>> - * Device ID RCU cache. A device ID is unique per client ID and layout type.
>> - */
>> -#define NFS4_DEVICE_ID_HASH_BITS 5
>> -#define NFS4_DEVICE_ID_HASH_SIZE (1 << NFS4_DEVICE_ID_HASH_BITS)
>> -#define NFS4_DEVICE_ID_HASH_MASK (NFS4_DEVICE_ID_HASH_SIZE - 1)
>> -
>> -static inline u32
>> -nfs4_deviceid_hash(struct nfs4_deviceid *id)
>> -{
>> - unsigned char *cptr = (unsigned char *)id->data;
>> - unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>> - u32 x = 0;
>> -
>> - while (nbytes--) {
>> - x *= 37;
>> - x += *cptr++;
>> - }
>> - return x & NFS4_DEVICE_ID_HASH_MASK;
>> -}
>> -
>> -struct pnfs_deviceid_node {
>> - struct hlist_node de_node;
>> - struct nfs4_deviceid de_id;
>> - atomic_t de_ref;
>> -};
>> -
>> -struct pnfs_deviceid_cache {
>> - spinlock_t dc_lock;
>> - atomic_t dc_ref;
>> - void (*dc_free_callback)(struct pnfs_deviceid_node *);
>> - struct hlist_head dc_deviceids[NFS4_DEVICE_ID_HASH_SIZE];
>> -};
>> -
>> -extern int pnfs_alloc_init_deviceid_cache(struct nfs_client *,
>> - void (*free_callback)(struct pnfs_deviceid_node *));
>> -extern void pnfs_put_deviceid_cache(struct nfs_client *);
>> -extern struct pnfs_deviceid_node *pnfs_find_get_deviceid(
>> - struct pnfs_deviceid_cache *,
>> - struct nfs4_deviceid *);
>> -extern struct pnfs_deviceid_node *pnfs_add_deviceid(
>> - struct pnfs_deviceid_cache *,
>> - struct pnfs_deviceid_node *);
>> -extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
>> - struct pnfs_deviceid_node *devid);
>> -
>> extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
>> extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>>
>> Index: linux-2.6/include/linux/nfs_fs_sb.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/nfs_fs_sb.h 2011-02-15 16:16:45.976420895 +0100
>> +++ linux-2.6/include/linux/nfs_fs_sb.h 2011-02-15 16:16:50.347380534 +0100
>> @@ -79,7 +79,6 @@ struct nfs_client {
>> u32 cl_exchange_flags;
>> struct nfs4_session *cl_session; /* sharred session */
>> struct list_head cl_layouts;
>> - struct pnfs_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
>> #endif /* CONFIG_NFS_V4_1 */
>>
>> #ifdef CONFIG_NFS_FSCACHE
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-02-15 09:16:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 01/16] NFS remove unnecessary CONFIG_NFS_V4 from nfs_read_data

On Mon, Feb 14, 2011 at 02:18:21PM -0500, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Signed-off-by: Andy Adamson <[email protected]>

Either the patch or the description is incorrect. If you actually need
it for NFSv2/3 the description should say it. Otherwise it's just a
"cleanup" which bloats the structure for people not having v4 support
compiled in.


2011-02-15 15:03:44

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 10/16] pnfs: wave 3: coelesce across layout stripes

On Tue, 2011-02-15 at 09:43 -0500, William A. (Andy) Adamson wrote:
> On Mon, Feb 14, 2011 at 6:42 PM, Trond Myklebust
> <[email protected]> wrote:
> > On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
> >> From: Fred Isaman <[email protected]>
> >>
> >> Add a pg_test layout driver hook which is used to avoid coelescing I/O across
> >> layout stripes.
> >
> > Doesn't this belong before [PATCH 09/16] pnfs: wave 3: shift
> > pnfs_update_layout locations?
>
> The pg_test uses the pg_lseg declared in [PATCH 09/16] pnfs: wave 3:
> shift pnfs_update_layout locations, which is why the patches are
> ordered this way.

What prevents you from moving the pg_lseg declaration into this patch,
and just relying on the initialisation being NULL?

The current ordering means that applying 9/16 without 10/16 gives rise
to broken stripe sizes.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-15 15:06:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

Btw, what's the point for deferring the free_lseg calls? It looks like it's
to avoid calling something that might block under i_lock, but looking around
the pnfs-submit branch it seems that root cause could be fixed trivially.

In common code *free_lseg* and *put_layout_hdr* do nothing but list
manipulations and kfrees. And in filelayout_free_lseg we have just kfrees
and a call to pnfs_put_deviceid which may sleep due to calling
synchronize_rcu. But synchronize_rcu is horribly inefficient to start with,
and you'd better be off using call_rcu to free the device id, which will
lead to much saner code and better performance.


2011-02-14 19:18:47

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 07/16] pnfs: wave 3: add MDS mount DS only check

From: Andy Adamson <[email protected]>

The DS only role cannot be used to mount.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/client.c | 6 ++++++
fs/nfs/nfs4_fs.h | 13 +++++++++++++
2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 4d15331..e48457a 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1413,6 +1413,12 @@ static int nfs4_set_client(struct nfs_server *server,
goto error;
}

+ /* Cannot mount a DS only server */
+ if (is_ds_only_client(clp)) {
+ error = -ENODEV;
+ goto error;
+ }
+
/*
* Query for the lease time on clientid setup or renewal
*
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7a74740..5d84642 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -259,6 +259,13 @@ extern int nfs4_proc_destroy_session(struct nfs4_session *);
extern int nfs4_init_session(struct nfs_server *server);
extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
struct nfs_fsinfo *fsinfo);
+
+static inline bool
+is_ds_only_client(struct nfs_client *clp)
+{
+ return (clp->cl_exchange_flags & EXCHGID4_FLAG_MASK_PNFS) ==
+ EXCHGID4_FLAG_USE_PNFS_DS;
+}
#else /* CONFIG_NFS_v4_1 */
static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *server)
{
@@ -276,6 +283,12 @@ static inline int nfs4_init_session(struct nfs_server *server)
{
return 0;
}
+
+static inline bool
+is_ds_only_client(struct nfs_client *clp)
+{
+ return false;
+}
#endif /* CONFIG_NFS_V4_1 */

extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];
--
1.7.2.3


2011-02-15 16:37:10

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On Tue, Feb 15, 2011 at 11:02 AM, Christoph Hellwig <[email protected]> wrote:
> FYI that whole device layout cache thingy looks like a complete fucking
> mess to me.
>
> It's nothing but a trivial hash lookup which is only used in the file
> layout driver. But instead of just having a hash allocated in the file
> layout driver on module load, and a trivial opencoded lookup for it it's
> a massively overcomplicated set of routines. Please rip this stuff out
> before doing further work in this area.
>
> The patch below removes the maze of pointless abstractions and just
> keeps a simple hash of deviceids in the filelayout driver.


The abstract layer is so that this code is not replicated per layout
driver. Object and block drivers need to do the same task, and indeed
use this code in their prototypes.
That said, we don't have those other layout drivers in kernel, so
moving it all to the file layout driver is fine with me, so long as we
don't have to move it back once we get other drivers.

Trond?

-->Andy

>
>
> Index: linux-2.6/fs/nfs/nfs4filelayout.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/nfs4filelayout.c 2011-02-15 16:10:51.108421283 +0100
> +++ linux-2.6/fs/nfs/nfs4filelayout.c 2011-02-15 16:55:22.087422176 +0100
> @@ -40,32 +40,6 @@ MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Dean Hildebrand <[email protected]>");
> MODULE_DESCRIPTION("The NFSv4 file layout driver");
>
> -static int
> -filelayout_set_layoutdriver(struct nfs_server *nfss)
> -{
> - int status = pnfs_alloc_init_deviceid_cache(nfss->nfs_client,
> - nfs4_fl_free_deviceid_callback);
> - if (status) {
> - printk(KERN_WARNING "%s: deviceid cache could not be "
> - "initialized\n", __func__);
> - return status;
> - }
> - dprintk("%s: deviceid cache has been initialized successfully\n",
> - __func__);
> - return 0;
> -}
> -
> -/* Clear out the layout by destroying its device list */
> -static int
> -filelayout_clear_layoutdriver(struct nfs_server *nfss)
> -{
> - dprintk("--> %s\n", __func__);
> -
> - if (nfss->nfs_client->cl_devid_cache)
> - pnfs_put_deviceid_cache(nfss->nfs_client);
> - return 0;
> -}
> -
> /*
> * filelayout_check_layout()
> *
> @@ -99,7 +73,7 @@ filelayout_check_layout(struct pnfs_layo
> }
>
> /* find and reference the deviceid */
> - dsaddr = nfs4_fl_find_get_deviceid(nfss->nfs_client, id);
> + dsaddr = nfs4_fl_find_get_deviceid(id);
> if (dsaddr == NULL) {
> dsaddr = get_device_info(lo->plh_inode, id);
> if (dsaddr == NULL)
> @@ -134,7 +108,7 @@ out:
> dprintk("--> %s returns %d\n", __func__, status);
> return status;
> out_put:
> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache, &dsaddr->deviceid);
> + nfs4_fl_put_deviceid(dsaddr);
> goto out;
> }
>
> @@ -243,23 +217,19 @@ filelayout_alloc_lseg(struct pnfs_layout
> static void
> filelayout_free_lseg(struct pnfs_layout_segment *lseg)
> {
> - struct nfs_server *nfss = NFS_SERVER(lseg->pls_layout->plh_inode);
> struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
>
> dprintk("--> %s\n", __func__);
> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache,
> - &fl->dsaddr->deviceid);
> + nfs4_fl_put_deviceid(fl->dsaddr);
> _filelayout_free_lseg(fl);
> }
>
> static struct pnfs_layoutdriver_type filelayout_type = {
> - .id = LAYOUT_NFSV4_1_FILES,
> - .name = "LAYOUT_NFSV4_1_FILES",
> - .owner = THIS_MODULE,
> - .set_layoutdriver = filelayout_set_layoutdriver,
> - .clear_layoutdriver = filelayout_clear_layoutdriver,
> - .alloc_lseg = filelayout_alloc_lseg,
> - .free_lseg = filelayout_free_lseg,
> + .id = LAYOUT_NFSV4_1_FILES,
> + .name = "LAYOUT_NFSV4_1_FILES",
> + .owner = THIS_MODULE,
> + .alloc_lseg = filelayout_alloc_lseg,
> + .free_lseg = filelayout_free_lseg,
> };
>
> static int __init nfs4filelayout_init(void)
> Index: linux-2.6/fs/nfs/nfs4filelayout.h
> ===================================================================
> --- linux-2.6.orig/fs/nfs/nfs4filelayout.h 2011-02-15 16:30:25.270920897 +0100
> +++ linux-2.6/fs/nfs/nfs4filelayout.h 2011-02-15 16:47:50.063445740 +0100
> @@ -56,7 +56,9 @@ struct nfs4_pnfs_ds {
> };
>
> struct nfs4_file_layout_dsaddr {
> - struct pnfs_deviceid_node deviceid;
> + struct hlist_node node;
> + struct nfs4_deviceid deviceid;
> + atomic_t ref;
> u32 stripe_count;
> u8 *stripe_indices;
> u32 ds_num;
> @@ -83,11 +85,11 @@ FILELAYOUT_LSEG(struct pnfs_layout_segme
> generic_hdr);
> }
>
> -extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
> extern void print_ds(struct nfs4_pnfs_ds *ds);
> extern void print_deviceid(struct nfs4_deviceid *dev_id);
> extern struct nfs4_file_layout_dsaddr *
> -nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
> +extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
> struct nfs4_file_layout_dsaddr *
> get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
>
> Index: linux-2.6/fs/nfs/nfs4filelayoutdev.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:23:03.480487362 +0100
> +++ linux-2.6/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:55:02.894924739 +0100
> @@ -37,6 +37,30 @@
> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>
> /*
> + * Device ID RCU cache. A device ID is unique per client ID and layout type.
> + */
> +#define NFS4_FL_DEVICE_ID_HASH_BITS 5
> +#define NFS4_FL_DEVICE_ID_HASH_SIZE (1 << NFS4_FL_DEVICE_ID_HASH_BITS)
> +#define NFS4_FL_DEVICE_ID_HASH_MASK (NFS4_FL_DEVICE_ID_HASH_SIZE - 1)
> +
> +static inline u32
> +nfs4_fl_deviceid_hash(struct nfs4_deviceid *id)
> +{
> + unsigned char *cptr = (unsigned char *)id->data;
> + unsigned int nbytes = NFS4_DEVICEID4_SIZE;
> + u32 x = 0;
> +
> + while (nbytes--) {
> + x *= 37;
> + x += *cptr++;
> + }
> + return x & NFS4_FL_DEVICE_ID_HASH_MASK;
> +}
> +
> +static struct hlist_head filelayout_deviceid_cache[NFS4_FL_DEVICE_ID_HASH_SIZE];
> +static DEFINE_SPINLOCK(filelayout_deviceid_lock);
> +
> +/*
> * Data server cache
> *
> * Data servers can be mapped to different device ids.
> @@ -122,7 +146,7 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
> struct nfs4_pnfs_ds *ds;
> int i;
>
> - print_deviceid(&dsaddr->deviceid.de_id);
> + print_deviceid(&dsaddr->deviceid);
>
> for (i = 0; i < dsaddr->ds_num; i++) {
> ds = dsaddr->ds_list[i];
> @@ -139,15 +163,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
> kfree(dsaddr);
> }
>
> -void
> -nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *device)
> -{
> - struct nfs4_file_layout_dsaddr *dsaddr =
> - container_of(device, struct nfs4_file_layout_dsaddr, deviceid);
> -
> - nfs4_fl_free_deviceid(dsaddr);
> -}
> -
> static struct nfs4_pnfs_ds *
> nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port)
> {
> @@ -296,7 +311,7 @@ decode_device(struct inode *ino, struct
> dsaddr->stripe_count = cnt;
> dsaddr->ds_num = num;
>
> - memcpy(&dsaddr->deviceid.de_id, &pdev->dev_id, sizeof(pdev->dev_id));
> + memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));
>
> /* Go back an read stripe indices */
> p = indicesp;
> @@ -346,28 +361,37 @@ out_err:
> }
>
> /*
> - * Decode the opaque device specified in 'dev'
> - * and add it to the list of available devices.
> - * If the deviceid is already cached, nfs4_add_deviceid will return
> - * a pointer to the cached struct and throw away the new.
> + * Decode the opaque device specified in 'dev' and add it to the cache of
> + * available devices.
> */
> -static struct nfs4_file_layout_dsaddr*
> +static struct nfs4_file_layout_dsaddr *
> decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
> {
> - struct nfs4_file_layout_dsaddr *dsaddr;
> - struct pnfs_deviceid_node *d;
> + struct nfs4_file_layout_dsaddr *d, *new;
> + long hash;
>
> - dsaddr = decode_device(inode, dev);
> - if (!dsaddr) {
> + new = decode_device(inode, dev);
> + if (!new) {
> printk(KERN_WARNING "%s: Could not decode or add device\n",
> __func__);
> return NULL;
> }
>
> - d = pnfs_add_deviceid(NFS_SERVER(inode)->nfs_client->cl_devid_cache,
> - &dsaddr->deviceid);
> + spin_lock(&filelayout_deviceid_lock);
> + d = nfs4_fl_find_get_deviceid(&new->deviceid);
> + if (d) {
> + spin_unlock(&filelayout_deviceid_lock);
> + nfs4_fl_free_deviceid(new);
> + return d;
> + }
> +
> + INIT_HLIST_NODE(&new->node);
> + atomic_set(&new->ref, 1);
> + hash = nfs4_fl_deviceid_hash(&new->deviceid);
> + hlist_add_head_rcu(&new->node, &filelayout_deviceid_cache[hash]);
> + spin_unlock(&filelayout_deviceid_lock);
>
> - return container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
> + return new;
> }
>
> /*
> @@ -442,12 +466,36 @@ out_free:
> return dsaddr;
> }
>
> +void
> +nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
> +{
> + if (atomic_dec_and_lock(&dsaddr->ref, &filelayout_deviceid_lock)) {
> + hlist_del_rcu(&dsaddr->node);
> + spin_unlock(&filelayout_deviceid_lock);
> +
> + synchronize_rcu();
> + nfs4_fl_free_deviceid(dsaddr);
> + }
> +}
> +
> struct nfs4_file_layout_dsaddr *
> -nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *id)
> {
> - struct pnfs_deviceid_node *d;
> + struct nfs4_file_layout_dsaddr *d;
> + struct hlist_node *n;
> + long hash = nfs4_fl_deviceid_hash(id);
> +
>
> - d = pnfs_find_get_deviceid(clp->cl_devid_cache, id);
> - return (d == NULL) ? NULL :
> - container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
> + rcu_read_lock();
> + hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
> + if (!memcmp(&d->deviceid, id, sizeof(*id))) {
> + if (!atomic_inc_not_zero(&d->ref))
> + goto fail;
> + rcu_read_unlock();
> + return d;
> + }
> + }
> +fail:
> + rcu_read_unlock();
> + return NULL;
> }
> Index: linux-2.6/fs/nfs/pnfs.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/pnfs.c 2011-02-15 16:10:33.284421051 +0100
> +++ linux-2.6/fs/nfs/pnfs.c 2011-02-15 16:21:47.115422052 +0100
> @@ -74,10 +74,8 @@ find_pnfs_driver(u32 id)
> void
> unset_pnfs_layoutdriver(struct nfs_server *nfss)
> {
> - if (nfss->pnfs_curr_ld) {
> - nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
> + if (nfss->pnfs_curr_ld)
> module_put(nfss->pnfs_curr_ld->owner);
> - }
> nfss->pnfs_curr_ld = NULL;
> }
>
> @@ -115,13 +113,7 @@ set_pnfs_layoutdriver(struct nfs_server
> goto out_no_driver;
> }
> server->pnfs_curr_ld = ld_type;
> - if (ld_type->set_layoutdriver(server)) {
> - printk(KERN_ERR
> - "%s: Error initializing mount point for layout driver %u.\n",
> - __func__, id);
> - module_put(ld_type->owner);
> - goto out_no_driver;
> - }
> +
> dprintk("%s: pNFS module for %u set\n", __func__, id);
> return;
>
> @@ -828,138 +820,3 @@ out_forget_reply:
> NFS_SERVER(ino)->pnfs_curr_ld->free_lseg(lseg);
> goto out;
> }
> -
> -/*
> - * Device ID cache. Currently supports one layout type per struct nfs_client.
> - * Add layout type to the lookup key to expand to support multiple types.
> - */
> -int
> -pnfs_alloc_init_deviceid_cache(struct nfs_client *clp,
> - void (*free_callback)(struct pnfs_deviceid_node *))
> -{
> - struct pnfs_deviceid_cache *c;
> -
> - c = kzalloc(sizeof(struct pnfs_deviceid_cache), GFP_KERNEL);
> - if (!c)
> - return -ENOMEM;
> - spin_lock(&clp->cl_lock);
> - if (clp->cl_devid_cache != NULL) {
> - atomic_inc(&clp->cl_devid_cache->dc_ref);
> - dprintk("%s [kref [%d]]\n", __func__,
> - atomic_read(&clp->cl_devid_cache->dc_ref));
> - kfree(c);
> - } else {
> - /* kzalloc initializes hlists */
> - spin_lock_init(&c->dc_lock);
> - atomic_set(&c->dc_ref, 1);
> - c->dc_free_callback = free_callback;
> - clp->cl_devid_cache = c;
> - dprintk("%s [new]\n", __func__);
> - }
> - spin_unlock(&clp->cl_lock);
> - return 0;
> -}
> -EXPORT_SYMBOL_GPL(pnfs_alloc_init_deviceid_cache);
> -
> -/*
> - * Called from pnfs_layoutdriver_type->free_lseg
> - * last layout segment reference frees deviceid
> - */
> -void
> -pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
> - struct pnfs_deviceid_node *devid)
> -{
> - struct nfs4_deviceid *id = &devid->de_id;
> - struct pnfs_deviceid_node *d;
> - struct hlist_node *n;
> - long h = nfs4_deviceid_hash(id);
> -
> - dprintk("%s [%d]\n", __func__, atomic_read(&devid->de_ref));
> - if (!atomic_dec_and_lock(&devid->de_ref, &c->dc_lock))
> - return;
> -
> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[h], de_node)
> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
> - hlist_del_rcu(&d->de_node);
> - spin_unlock(&c->dc_lock);
> - synchronize_rcu();
> - c->dc_free_callback(devid);
> - return;
> - }
> - spin_unlock(&c->dc_lock);
> - /* Why wasn't it found in the list? */
> - BUG();
> -}
> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid);
> -
> -/* Find and reference a deviceid */
> -struct pnfs_deviceid_node *
> -pnfs_find_get_deviceid(struct pnfs_deviceid_cache *c, struct nfs4_deviceid *id)
> -{
> - struct pnfs_deviceid_node *d;
> - struct hlist_node *n;
> - long hash = nfs4_deviceid_hash(id);
> -
> - dprintk("--> %s hash %ld\n", __func__, hash);
> - rcu_read_lock();
> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
> - if (!atomic_inc_not_zero(&d->de_ref)) {
> - goto fail;
> - } else {
> - rcu_read_unlock();
> - return d;
> - }
> - }
> - }
> -fail:
> - rcu_read_unlock();
> - return NULL;
> -}
> -EXPORT_SYMBOL_GPL(pnfs_find_get_deviceid);
> -
> -/*
> - * Add a deviceid to the cache.
> - * GETDEVICEINFOs for same deviceid can race. If deviceid is found, discard new
> - */
> -struct pnfs_deviceid_node *
> -pnfs_add_deviceid(struct pnfs_deviceid_cache *c, struct pnfs_deviceid_node *new)
> -{
> - struct pnfs_deviceid_node *d;
> - long hash = nfs4_deviceid_hash(&new->de_id);
> -
> - dprintk("--> %s hash %ld\n", __func__, hash);
> - spin_lock(&c->dc_lock);
> - d = pnfs_find_get_deviceid(c, &new->de_id);
> - if (d) {
> - spin_unlock(&c->dc_lock);
> - dprintk("%s [discard]\n", __func__);
> - c->dc_free_callback(new);
> - return d;
> - }
> - INIT_HLIST_NODE(&new->de_node);
> - atomic_set(&new->de_ref, 1);
> - hlist_add_head_rcu(&new->de_node, &c->dc_deviceids[hash]);
> - spin_unlock(&c->dc_lock);
> - dprintk("%s [new]\n", __func__);
> - return new;
> -}
> -EXPORT_SYMBOL_GPL(pnfs_add_deviceid);
> -
> -void
> -pnfs_put_deviceid_cache(struct nfs_client *clp)
> -{
> - struct pnfs_deviceid_cache *local = clp->cl_devid_cache;
> -
> - dprintk("--> %s ({%d})\n", __func__, atomic_read(&local->dc_ref));
> - if (atomic_dec_and_lock(&local->dc_ref, &clp->cl_lock)) {
> - int i;
> - /* Verify cache is empty */
> - for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE; i++)
> - BUG_ON(!hlist_empty(&local->dc_deviceids[i]));
> - clp->cl_devid_cache = NULL;
> - spin_unlock(&clp->cl_lock);
> - kfree(local);
> - }
> -}
> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
> Index: linux-2.6/fs/nfs/pnfs.h
> ===================================================================
> --- linux-2.6.orig/fs/nfs/pnfs.h 2011-02-15 16:10:51.088421060 +0100
> +++ linux-2.6/fs/nfs/pnfs.h 2011-02-15 16:21:34.995159583 +0100
> @@ -61,8 +61,6 @@ struct pnfs_layoutdriver_type {
> const u32 id;
> const char *name;
> struct module *owner;
> - int (*set_layoutdriver) (struct nfs_server *);
> - int (*clear_layoutdriver) (struct nfs_server *);
> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
> void (*free_lseg) (struct pnfs_layout_segment *lseg);
> };
> @@ -90,52 +88,6 @@ struct pnfs_device {
> unsigned int pglen;
> };
>
> -/*
> - * Device ID RCU cache. A device ID is unique per client ID and layout type.
> - */
> -#define NFS4_DEVICE_ID_HASH_BITS 5
> -#define NFS4_DEVICE_ID_HASH_SIZE (1 << NFS4_DEVICE_ID_HASH_BITS)
> -#define NFS4_DEVICE_ID_HASH_MASK (NFS4_DEVICE_ID_HASH_SIZE - 1)
> -
> -static inline u32
> -nfs4_deviceid_hash(struct nfs4_deviceid *id)
> -{
> - unsigned char *cptr = (unsigned char *)id->data;
> - unsigned int nbytes = NFS4_DEVICEID4_SIZE;
> - u32 x = 0;
> -
> - while (nbytes--) {
> - x *= 37;
> - x += *cptr++;
> - }
> - return x & NFS4_DEVICE_ID_HASH_MASK;
> -}
> -
> -struct pnfs_deviceid_node {
> - struct hlist_node de_node;
> - struct nfs4_deviceid de_id;
> - atomic_t de_ref;
> -};
> -
> -struct pnfs_deviceid_cache {
> - spinlock_t dc_lock;
> - atomic_t dc_ref;
> - void (*dc_free_callback)(struct pnfs_deviceid_node *);
> - struct hlist_head dc_deviceids[NFS4_DEVICE_ID_HASH_SIZE];
> -};
> -
> -extern int pnfs_alloc_init_deviceid_cache(struct nfs_client *,
> - void (*free_callback)(struct pnfs_deviceid_node *));
> -extern void pnfs_put_deviceid_cache(struct nfs_client *);
> -extern struct pnfs_deviceid_node *pnfs_find_get_deviceid(
> - struct pnfs_deviceid_cache *,
> - struct nfs4_deviceid *);
> -extern struct pnfs_deviceid_node *pnfs_add_deviceid(
> - struct pnfs_deviceid_cache *,
> - struct pnfs_deviceid_node *);
> -extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
> - struct pnfs_deviceid_node *devid);
> -
> extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
> extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>
> Index: linux-2.6/include/linux/nfs_fs_sb.h
> ===================================================================
> --- linux-2.6.orig/include/linux/nfs_fs_sb.h 2011-02-15 16:16:45.976420895 +0100
> +++ linux-2.6/include/linux/nfs_fs_sb.h 2011-02-15 16:16:50.347380534 +0100
> @@ -79,7 +79,6 @@ struct nfs_client {
> u32 cl_exchange_flags;
> struct nfs4_session *cl_session; /* sharred session */
> struct list_head cl_layouts;
> - struct pnfs_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
> #endif /* CONFIG_NFS_V4_1 */
>
> #ifdef CONFIG_NFS_FSCACHE
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-02-15 19:30:24

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting


On Feb 15, 2011, at 2:29 PM, Benny Halevy wrote:

> On 2011-02-15 14:17, Andy Adamson wrote:
>>
>> On Feb 15, 2011, at 11:37 AM, William A. (Andy) Adamson wrote:
>>
>>> On Tue, Feb 15, 2011 at 11:02 AM, Christoph Hellwig <[email protected]> wrote:
>>>> FYI that whole device layout cache thingy looks like a complete fucking
>>>> mess to me.
>>>>
>>>> It's nothing but a trivial hash lookup which is only used in the file
>>>> layout driver. But instead of just having a hash allocated in the file
>>>> layout driver on module load, and a trivial opencoded lookup for it it's
>>>> a massively overcomplicated set of routines. Please rip this stuff out
>>>> before doing further work in this area.
>>>>
>>>> The patch below removes the maze of pointless abstractions and just
>>>> keeps a simple hash of deviceids in the filelayout driver.
>>>
>>>
>>> The abstract layer is so that this code is not replicated per layout
>>> driver. Object and block drivers need to do the same task, and indeed
>>> use this code in their prototypes.
>>> That said, we don't have those other layout drivers in kernel, so
>>> moving it all to the file layout driver is fine with me, so long as we
>>> don't have to move it back once we get other drivers.
>
> Why not move it back later on?
> I don't want to replicate any code if it can be factored out and reused.

Yes, we should move it back later on. Christopher's single patch makes this easy to revert.

-->Andy

>
> Benny
>
>>>
>>> Trond?
>>>
>>> -->Andy
>>
>> OK. We all agree. Move the deviceid cache to the filelayout driver until there is a need for a common cache.
>>
>> -->Andy
>>
>>>
>>>>
>>>>
>>>> Index: linux-2.6/fs/nfs/nfs4filelayout.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/fs/nfs/nfs4filelayout.c 2011-02-15 16:10:51.108421283 +0100
>>>> +++ linux-2.6/fs/nfs/nfs4filelayout.c 2011-02-15 16:55:22.087422176 +0100
>>>> @@ -40,32 +40,6 @@ MODULE_LICENSE("GPL");
>>>> MODULE_AUTHOR("Dean Hildebrand <[email protected]>");
>>>> MODULE_DESCRIPTION("The NFSv4 file layout driver");
>>>>
>>>> -static int
>>>> -filelayout_set_layoutdriver(struct nfs_server *nfss)
>>>> -{
>>>> - int status = pnfs_alloc_init_deviceid_cache(nfss->nfs_client,
>>>> - nfs4_fl_free_deviceid_callback);
>>>> - if (status) {
>>>> - printk(KERN_WARNING "%s: deviceid cache could not be "
>>>> - "initialized\n", __func__);
>>>> - return status;
>>>> - }
>>>> - dprintk("%s: deviceid cache has been initialized successfully\n",
>>>> - __func__);
>>>> - return 0;
>>>> -}
>>>> -
>>>> -/* Clear out the layout by destroying its device list */
>>>> -static int
>>>> -filelayout_clear_layoutdriver(struct nfs_server *nfss)
>>>> -{
>>>> - dprintk("--> %s\n", __func__);
>>>> -
>>>> - if (nfss->nfs_client->cl_devid_cache)
>>>> - pnfs_put_deviceid_cache(nfss->nfs_client);
>>>> - return 0;
>>>> -}
>>>> -
>>>> /*
>>>> * filelayout_check_layout()
>>>> *
>>>> @@ -99,7 +73,7 @@ filelayout_check_layout(struct pnfs_layo
>>>> }
>>>>
>>>> /* find and reference the deviceid */
>>>> - dsaddr = nfs4_fl_find_get_deviceid(nfss->nfs_client, id);
>>>> + dsaddr = nfs4_fl_find_get_deviceid(id);
>>>> if (dsaddr == NULL) {
>>>> dsaddr = get_device_info(lo->plh_inode, id);
>>>> if (dsaddr == NULL)
>>>> @@ -134,7 +108,7 @@ out:
>>>> dprintk("--> %s returns %d\n", __func__, status);
>>>> return status;
>>>> out_put:
>>>> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache, &dsaddr->deviceid);
>>>> + nfs4_fl_put_deviceid(dsaddr);
>>>> goto out;
>>>> }
>>>>
>>>> @@ -243,23 +217,19 @@ filelayout_alloc_lseg(struct pnfs_layout
>>>> static void
>>>> filelayout_free_lseg(struct pnfs_layout_segment *lseg)
>>>> {
>>>> - struct nfs_server *nfss = NFS_SERVER(lseg->pls_layout->plh_inode);
>>>> struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
>>>>
>>>> dprintk("--> %s\n", __func__);
>>>> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache,
>>>> - &fl->dsaddr->deviceid);
>>>> + nfs4_fl_put_deviceid(fl->dsaddr);
>>>> _filelayout_free_lseg(fl);
>>>> }
>>>>
>>>> static struct pnfs_layoutdriver_type filelayout_type = {
>>>> - .id = LAYOUT_NFSV4_1_FILES,
>>>> - .name = "LAYOUT_NFSV4_1_FILES",
>>>> - .owner = THIS_MODULE,
>>>> - .set_layoutdriver = filelayout_set_layoutdriver,
>>>> - .clear_layoutdriver = filelayout_clear_layoutdriver,
>>>> - .alloc_lseg = filelayout_alloc_lseg,
>>>> - .free_lseg = filelayout_free_lseg,
>>>> + .id = LAYOUT_NFSV4_1_FILES,
>>>> + .name = "LAYOUT_NFSV4_1_FILES",
>>>> + .owner = THIS_MODULE,
>>>> + .alloc_lseg = filelayout_alloc_lseg,
>>>> + .free_lseg = filelayout_free_lseg,
>>>> };
>>>>
>>>> static int __init nfs4filelayout_init(void)
>>>> Index: linux-2.6/fs/nfs/nfs4filelayout.h
>>>> ===================================================================
>>>> --- linux-2.6.orig/fs/nfs/nfs4filelayout.h 2011-02-15 16:30:25.270920897 +0100
>>>> +++ linux-2.6/fs/nfs/nfs4filelayout.h 2011-02-15 16:47:50.063445740 +0100
>>>> @@ -56,7 +56,9 @@ struct nfs4_pnfs_ds {
>>>> };
>>>>
>>>> struct nfs4_file_layout_dsaddr {
>>>> - struct pnfs_deviceid_node deviceid;
>>>> + struct hlist_node node;
>>>> + struct nfs4_deviceid deviceid;
>>>> + atomic_t ref;
>>>> u32 stripe_count;
>>>> u8 *stripe_indices;
>>>> u32 ds_num;
>>>> @@ -83,11 +85,11 @@ FILELAYOUT_LSEG(struct pnfs_layout_segme
>>>> generic_hdr);
>>>> }
>>>>
>>>> -extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
>>>> extern void print_ds(struct nfs4_pnfs_ds *ds);
>>>> extern void print_deviceid(struct nfs4_deviceid *dev_id);
>>>> extern struct nfs4_file_layout_dsaddr *
>>>> -nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
>>>> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
>>>> +extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
>>>> struct nfs4_file_layout_dsaddr *
>>>> get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
>>>>
>>>> Index: linux-2.6/fs/nfs/nfs4filelayoutdev.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:23:03.480487362 +0100
>>>> +++ linux-2.6/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:55:02.894924739 +0100
>>>> @@ -37,6 +37,30 @@
>>>> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>>>>
>>>> /*
>>>> + * Device ID RCU cache. A device ID is unique per client ID and layout type.
>>>> + */
>>>> +#define NFS4_FL_DEVICE_ID_HASH_BITS 5
>>>> +#define NFS4_FL_DEVICE_ID_HASH_SIZE (1 << NFS4_FL_DEVICE_ID_HASH_BITS)
>>>> +#define NFS4_FL_DEVICE_ID_HASH_MASK (NFS4_FL_DEVICE_ID_HASH_SIZE - 1)
>>>> +
>>>> +static inline u32
>>>> +nfs4_fl_deviceid_hash(struct nfs4_deviceid *id)
>>>> +{
>>>> + unsigned char *cptr = (unsigned char *)id->data;
>>>> + unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>>>> + u32 x = 0;
>>>> +
>>>> + while (nbytes--) {
>>>> + x *= 37;
>>>> + x += *cptr++;
>>>> + }
>>>> + return x & NFS4_FL_DEVICE_ID_HASH_MASK;
>>>> +}
>>>> +
>>>> +static struct hlist_head filelayout_deviceid_cache[NFS4_FL_DEVICE_ID_HASH_SIZE];
>>>> +static DEFINE_SPINLOCK(filelayout_deviceid_lock);
>>>> +
>>>> +/*
>>>> * Data server cache
>>>> *
>>>> * Data servers can be mapped to different device ids.
>>>> @@ -122,7 +146,7 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
>>>> struct nfs4_pnfs_ds *ds;
>>>> int i;
>>>>
>>>> - print_deviceid(&dsaddr->deviceid.de_id);
>>>> + print_deviceid(&dsaddr->deviceid);
>>>>
>>>> for (i = 0; i < dsaddr->ds_num; i++) {
>>>> ds = dsaddr->ds_list[i];
>>>> @@ -139,15 +163,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
>>>> kfree(dsaddr);
>>>> }
>>>>
>>>> -void
>>>> -nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *device)
>>>> -{
>>>> - struct nfs4_file_layout_dsaddr *dsaddr =
>>>> - container_of(device, struct nfs4_file_layout_dsaddr, deviceid);
>>>> -
>>>> - nfs4_fl_free_deviceid(dsaddr);
>>>> -}
>>>> -
>>>> static struct nfs4_pnfs_ds *
>>>> nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port)
>>>> {
>>>> @@ -296,7 +311,7 @@ decode_device(struct inode *ino, struct
>>>> dsaddr->stripe_count = cnt;
>>>> dsaddr->ds_num = num;
>>>>
>>>> - memcpy(&dsaddr->deviceid.de_id, &pdev->dev_id, sizeof(pdev->dev_id));
>>>> + memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));
>>>>
>>>> /* Go back an read stripe indices */
>>>> p = indicesp;
>>>> @@ -346,28 +361,37 @@ out_err:
>>>> }
>>>>
>>>> /*
>>>> - * Decode the opaque device specified in 'dev'
>>>> - * and add it to the list of available devices.
>>>> - * If the deviceid is already cached, nfs4_add_deviceid will return
>>>> - * a pointer to the cached struct and throw away the new.
>>>> + * Decode the opaque device specified in 'dev' and add it to the cache of
>>>> + * available devices.
>>>> */
>>>> -static struct nfs4_file_layout_dsaddr*
>>>> +static struct nfs4_file_layout_dsaddr *
>>>> decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
>>>> {
>>>> - struct nfs4_file_layout_dsaddr *dsaddr;
>>>> - struct pnfs_deviceid_node *d;
>>>> + struct nfs4_file_layout_dsaddr *d, *new;
>>>> + long hash;
>>>>
>>>> - dsaddr = decode_device(inode, dev);
>>>> - if (!dsaddr) {
>>>> + new = decode_device(inode, dev);
>>>> + if (!new) {
>>>> printk(KERN_WARNING "%s: Could not decode or add device\n",
>>>> __func__);
>>>> return NULL;
>>>> }
>>>>
>>>> - d = pnfs_add_deviceid(NFS_SERVER(inode)->nfs_client->cl_devid_cache,
>>>> - &dsaddr->deviceid);
>>>> + spin_lock(&filelayout_deviceid_lock);
>>>> + d = nfs4_fl_find_get_deviceid(&new->deviceid);
>>>> + if (d) {
>>>> + spin_unlock(&filelayout_deviceid_lock);
>>>> + nfs4_fl_free_deviceid(new);
>>>> + return d;
>>>> + }
>>>> +
>>>> + INIT_HLIST_NODE(&new->node);
>>>> + atomic_set(&new->ref, 1);
>>>> + hash = nfs4_fl_deviceid_hash(&new->deviceid);
>>>> + hlist_add_head_rcu(&new->node, &filelayout_deviceid_cache[hash]);
>>>> + spin_unlock(&filelayout_deviceid_lock);
>>>>
>>>> - return container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
>>>> + return new;
>>>> }
>>>>
>>>> /*
>>>> @@ -442,12 +466,36 @@ out_free:
>>>> return dsaddr;
>>>> }
>>>>
>>>> +void
>>>> +nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
>>>> +{
>>>> + if (atomic_dec_and_lock(&dsaddr->ref, &filelayout_deviceid_lock)) {
>>>> + hlist_del_rcu(&dsaddr->node);
>>>> + spin_unlock(&filelayout_deviceid_lock);
>>>> +
>>>> + synchronize_rcu();
>>>> + nfs4_fl_free_deviceid(dsaddr);
>>>> + }
>>>> +}
>>>> +
>>>> struct nfs4_file_layout_dsaddr *
>>>> -nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
>>>> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *id)
>>>> {
>>>> - struct pnfs_deviceid_node *d;
>>>> + struct nfs4_file_layout_dsaddr *d;
>>>> + struct hlist_node *n;
>>>> + long hash = nfs4_fl_deviceid_hash(id);
>>>> +
>>>>
>>>> - d = pnfs_find_get_deviceid(clp->cl_devid_cache, id);
>>>> - return (d == NULL) ? NULL :
>>>> - container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
>>>> + rcu_read_lock();
>>>> + hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
>>>> + if (!memcmp(&d->deviceid, id, sizeof(*id))) {
>>>> + if (!atomic_inc_not_zero(&d->ref))
>>>> + goto fail;
>>>> + rcu_read_unlock();
>>>> + return d;
>>>> + }
>>>> + }
>>>> +fail:
>>>> + rcu_read_unlock();
>>>> + return NULL;
>>>> }
>>>> Index: linux-2.6/fs/nfs/pnfs.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/fs/nfs/pnfs.c 2011-02-15 16:10:33.284421051 +0100
>>>> +++ linux-2.6/fs/nfs/pnfs.c 2011-02-15 16:21:47.115422052 +0100
>>>> @@ -74,10 +74,8 @@ find_pnfs_driver(u32 id)
>>>> void
>>>> unset_pnfs_layoutdriver(struct nfs_server *nfss)
>>>> {
>>>> - if (nfss->pnfs_curr_ld) {
>>>> - nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
>>>> + if (nfss->pnfs_curr_ld)
>>>> module_put(nfss->pnfs_curr_ld->owner);
>>>> - }
>>>> nfss->pnfs_curr_ld = NULL;
>>>> }
>>>>
>>>> @@ -115,13 +113,7 @@ set_pnfs_layoutdriver(struct nfs_server
>>>> goto out_no_driver;
>>>> }
>>>> server->pnfs_curr_ld = ld_type;
>>>> - if (ld_type->set_layoutdriver(server)) {
>>>> - printk(KERN_ERR
>>>> - "%s: Error initializing mount point for layout driver %u.\n",
>>>> - __func__, id);
>>>> - module_put(ld_type->owner);
>>>> - goto out_no_driver;
>>>> - }
>>>> +
>>>> dprintk("%s: pNFS module for %u set\n", __func__, id);
>>>> return;
>>>>
>>>> @@ -828,138 +820,3 @@ out_forget_reply:
>>>> NFS_SERVER(ino)->pnfs_curr_ld->free_lseg(lseg);
>>>> goto out;
>>>> }
>>>> -
>>>> -/*
>>>> - * Device ID cache. Currently supports one layout type per struct nfs_client.
>>>> - * Add layout type to the lookup key to expand to support multiple types.
>>>> - */
>>>> -int
>>>> -pnfs_alloc_init_deviceid_cache(struct nfs_client *clp,
>>>> - void (*free_callback)(struct pnfs_deviceid_node *))
>>>> -{
>>>> - struct pnfs_deviceid_cache *c;
>>>> -
>>>> - c = kzalloc(sizeof(struct pnfs_deviceid_cache), GFP_KERNEL);
>>>> - if (!c)
>>>> - return -ENOMEM;
>>>> - spin_lock(&clp->cl_lock);
>>>> - if (clp->cl_devid_cache != NULL) {
>>>> - atomic_inc(&clp->cl_devid_cache->dc_ref);
>>>> - dprintk("%s [kref [%d]]\n", __func__,
>>>> - atomic_read(&clp->cl_devid_cache->dc_ref));
>>>> - kfree(c);
>>>> - } else {
>>>> - /* kzalloc initializes hlists */
>>>> - spin_lock_init(&c->dc_lock);
>>>> - atomic_set(&c->dc_ref, 1);
>>>> - c->dc_free_callback = free_callback;
>>>> - clp->cl_devid_cache = c;
>>>> - dprintk("%s [new]\n", __func__);
>>>> - }
>>>> - spin_unlock(&clp->cl_lock);
>>>> - return 0;
>>>> -}
>>>> -EXPORT_SYMBOL_GPL(pnfs_alloc_init_deviceid_cache);
>>>> -
>>>> -/*
>>>> - * Called from pnfs_layoutdriver_type->free_lseg
>>>> - * last layout segment reference frees deviceid
>>>> - */
>>>> -void
>>>> -pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
>>>> - struct pnfs_deviceid_node *devid)
>>>> -{
>>>> - struct nfs4_deviceid *id = &devid->de_id;
>>>> - struct pnfs_deviceid_node *d;
>>>> - struct hlist_node *n;
>>>> - long h = nfs4_deviceid_hash(id);
>>>> -
>>>> - dprintk("%s [%d]\n", __func__, atomic_read(&devid->de_ref));
>>>> - if (!atomic_dec_and_lock(&devid->de_ref, &c->dc_lock))
>>>> - return;
>>>> -
>>>> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[h], de_node)
>>>> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
>>>> - hlist_del_rcu(&d->de_node);
>>>> - spin_unlock(&c->dc_lock);
>>>> - synchronize_rcu();
>>>> - c->dc_free_callback(devid);
>>>> - return;
>>>> - }
>>>> - spin_unlock(&c->dc_lock);
>>>> - /* Why wasn't it found in the list? */
>>>> - BUG();
>>>> -}
>>>> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid);
>>>> -
>>>> -/* Find and reference a deviceid */
>>>> -struct pnfs_deviceid_node *
>>>> -pnfs_find_get_deviceid(struct pnfs_deviceid_cache *c, struct nfs4_deviceid *id)
>>>> -{
>>>> - struct pnfs_deviceid_node *d;
>>>> - struct hlist_node *n;
>>>> - long hash = nfs4_deviceid_hash(id);
>>>> -
>>>> - dprintk("--> %s hash %ld\n", __func__, hash);
>>>> - rcu_read_lock();
>>>> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
>>>> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
>>>> - if (!atomic_inc_not_zero(&d->de_ref)) {
>>>> - goto fail;
>>>> - } else {
>>>> - rcu_read_unlock();
>>>> - return d;
>>>> - }
>>>> - }
>>>> - }
>>>> -fail:
>>>> - rcu_read_unlock();
>>>> - return NULL;
>>>> -}
>>>> -EXPORT_SYMBOL_GPL(pnfs_find_get_deviceid);
>>>> -
>>>> -/*
>>>> - * Add a deviceid to the cache.
>>>> - * GETDEVICEINFOs for same deviceid can race. If deviceid is found, discard new
>>>> - */
>>>> -struct pnfs_deviceid_node *
>>>> -pnfs_add_deviceid(struct pnfs_deviceid_cache *c, struct pnfs_deviceid_node *new)
>>>> -{
>>>> - struct pnfs_deviceid_node *d;
>>>> - long hash = nfs4_deviceid_hash(&new->de_id);
>>>> -
>>>> - dprintk("--> %s hash %ld\n", __func__, hash);
>>>> - spin_lock(&c->dc_lock);
>>>> - d = pnfs_find_get_deviceid(c, &new->de_id);
>>>> - if (d) {
>>>> - spin_unlock(&c->dc_lock);
>>>> - dprintk("%s [discard]\n", __func__);
>>>> - c->dc_free_callback(new);
>>>> - return d;
>>>> - }
>>>> - INIT_HLIST_NODE(&new->de_node);
>>>> - atomic_set(&new->de_ref, 1);
>>>> - hlist_add_head_rcu(&new->de_node, &c->dc_deviceids[hash]);
>>>> - spin_unlock(&c->dc_lock);
>>>> - dprintk("%s [new]\n", __func__);
>>>> - return new;
>>>> -}
>>>> -EXPORT_SYMBOL_GPL(pnfs_add_deviceid);
>>>> -
>>>> -void
>>>> -pnfs_put_deviceid_cache(struct nfs_client *clp)
>>>> -{
>>>> - struct pnfs_deviceid_cache *local = clp->cl_devid_cache;
>>>> -
>>>> - dprintk("--> %s ({%d})\n", __func__, atomic_read(&local->dc_ref));
>>>> - if (atomic_dec_and_lock(&local->dc_ref, &clp->cl_lock)) {
>>>> - int i;
>>>> - /* Verify cache is empty */
>>>> - for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE; i++)
>>>> - BUG_ON(!hlist_empty(&local->dc_deviceids[i]));
>>>> - clp->cl_devid_cache = NULL;
>>>> - spin_unlock(&clp->cl_lock);
>>>> - kfree(local);
>>>> - }
>>>> -}
>>>> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
>>>> Index: linux-2.6/fs/nfs/pnfs.h
>>>> ===================================================================
>>>> --- linux-2.6.orig/fs/nfs/pnfs.h 2011-02-15 16:10:51.088421060 +0100
>>>> +++ linux-2.6/fs/nfs/pnfs.h 2011-02-15 16:21:34.995159583 +0100
>>>> @@ -61,8 +61,6 @@ struct pnfs_layoutdriver_type {
>>>> const u32 id;
>>>> const char *name;
>>>> struct module *owner;
>>>> - int (*set_layoutdriver) (struct nfs_server *);
>>>> - int (*clear_layoutdriver) (struct nfs_server *);
>>>> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
>>>> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>>>> };
>>>> @@ -90,52 +88,6 @@ struct pnfs_device {
>>>> unsigned int pglen;
>>>> };
>>>>
>>>> -/*
>>>> - * Device ID RCU cache. A device ID is unique per client ID and layout type.
>>>> - */
>>>> -#define NFS4_DEVICE_ID_HASH_BITS 5
>>>> -#define NFS4_DEVICE_ID_HASH_SIZE (1 << NFS4_DEVICE_ID_HASH_BITS)
>>>> -#define NFS4_DEVICE_ID_HASH_MASK (NFS4_DEVICE_ID_HASH_SIZE - 1)
>>>> -
>>>> -static inline u32
>>>> -nfs4_deviceid_hash(struct nfs4_deviceid *id)
>>>> -{
>>>> - unsigned char *cptr = (unsigned char *)id->data;
>>>> - unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>>>> - u32 x = 0;
>>>> -
>>>> - while (nbytes--) {
>>>> - x *= 37;
>>>> - x += *cptr++;
>>>> - }
>>>> - return x & NFS4_DEVICE_ID_HASH_MASK;
>>>> -}
>>>> -
>>>> -struct pnfs_deviceid_node {
>>>> - struct hlist_node de_node;
>>>> - struct nfs4_deviceid de_id;
>>>> - atomic_t de_ref;
>>>> -};
>>>> -
>>>> -struct pnfs_deviceid_cache {
>>>> - spinlock_t dc_lock;
>>>> - atomic_t dc_ref;
>>>> - void (*dc_free_callback)(struct pnfs_deviceid_node *);
>>>> - struct hlist_head dc_deviceids[NFS4_DEVICE_ID_HASH_SIZE];
>>>> -};
>>>> -
>>>> -extern int pnfs_alloc_init_deviceid_cache(struct nfs_client *,
>>>> - void (*free_callback)(struct pnfs_deviceid_node *));
>>>> -extern void pnfs_put_deviceid_cache(struct nfs_client *);
>>>> -extern struct pnfs_deviceid_node *pnfs_find_get_deviceid(
>>>> - struct pnfs_deviceid_cache *,
>>>> - struct nfs4_deviceid *);
>>>> -extern struct pnfs_deviceid_node *pnfs_add_deviceid(
>>>> - struct pnfs_deviceid_cache *,
>>>> - struct pnfs_deviceid_node *);
>>>> -extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
>>>> - struct pnfs_deviceid_node *devid);
>>>> -
>>>> extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
>>>> extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>>>>
>>>> Index: linux-2.6/include/linux/nfs_fs_sb.h
>>>> ===================================================================
>>>> --- linux-2.6.orig/include/linux/nfs_fs_sb.h 2011-02-15 16:16:45.976420895 +0100
>>>> +++ linux-2.6/include/linux/nfs_fs_sb.h 2011-02-15 16:16:50.347380534 +0100
>>>> @@ -79,7 +79,6 @@ struct nfs_client {
>>>> u32 cl_exchange_flags;
>>>> struct nfs4_session *cl_session; /* sharred session */
>>>> struct list_head cl_layouts;
>>>> - struct pnfs_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
>>>> #endif /* CONFIG_NFS_V4_1 */
>>>>
>>>> #ifdef CONFIG_NFS_FSCACHE
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-02-15 15:07:15

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On Tue, Feb 15, 2011 at 9:58 AM, Christoph Hellwig <[email protected]> wrote:
> On Tue, Feb 15, 2011 at 09:48:26AM -0500, Fred Isaman wrote:
>> pnfs_free_lseg_list, besides calling free_lseg, also potentially
>> removes the layout from the clients list of inodes with layouts.
>
> Looks like the routine than changed from the mainline variant
> I looked at. ?I took a quick look at the one from pnfs-submit,
> which looks quite suspicios, as it special cases the first item
> on the list without a good explanation and then iterates the list.
>

I can add a comment, but every element on the list is from the same
layout, so we just grab the layout from the first item in the list.

> Does your tree have another caller of pnfs_free_lseg_list?

Yes, there are several callers, two in callback_proc.c for example
which are both potentially actual lists.

Fred


> If not
> please just open code the right thing in the caller, instead of
> pretending we're dealing with a list if you're always dealing with
> one entry. ?If the tree grows a caller that needs to deal with a list
> with more than 1 entry we can revisit if there's a point in sharing
> code.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-02-14 19:18:55

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 12/16] pnfs: wave 3: data server connection

From: Andy Adamson <[email protected]>

Introduce a data server set_client and init session following the
nfs4_set_client and nfs4_init_session convention.

Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state
serializes access to creating an nfs_client struct with matching properties.

Use the new nfs_get_client() that initializes new clients.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/client.c | 41 +++++++++++++++++++++++++++++
fs/nfs/internal.h | 5 +++
fs/nfs/nfs4_fs.h | 12 ++++++++
fs/nfs/nfs4filelayoutdev.c | 61 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 29 +++++++++++++++++++-
include/linux/nfs_xdr.h | 1 +
6 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index e48457a..78e6ebe 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1436,6 +1436,47 @@ error:
return error;
}

+/*
+ * Set up a pNFS Data Server client.
+ *
+ * Return any existing nfs_client that matches server address,port,version
+ * and minorversion.
+ *
+ * For a new nfs_client, use a soft mount (default), a low retrans and a
+ * low timeout interval so that if a connection is lost, we retry through
+ * the MDS.
+ */
+struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
+ const struct sockaddr *ds_addr,
+ int ds_addrlen, int ds_proto)
+{
+ struct nfs_client_initdata cl_init = {
+ .addr = ds_addr,
+ .addrlen = ds_addrlen,
+ .rpc_ops = &nfs_v4_clientops,
+ .proto = ds_proto,
+ .minorversion = mds_clp->cl_minorversion,
+ };
+ struct rpc_timeout ds_timeout = {
+ .to_initval = 15 * HZ,
+ .to_maxval = 15 * HZ,
+ .to_retries = 1,
+ .to_exponential = 1,
+ };
+ struct nfs_client *clp;
+
+ /*
+ * Set an authflavor equual to the MDS value. Use the MDS nfs_client
+ * cl_ipaddr so as to use the same EXCHANGE_ID co_ownerid as the MDS
+ * (section 13.1 RFC 5661).
+ */
+ clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
+ mds_clp->cl_rpcclient->cl_auth->au_flavor, 0);
+
+ dprintk("<-- %s %p\n", __func__, clp);
+ return clp;
+}
+EXPORT_SYMBOL(nfs4_set_ds_client);

/*
* Session has been established, and the client marked ready.
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 335755d..5518d61 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -148,6 +148,9 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
struct nfs_fattr *);
extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
extern int nfs4_check_client_ready(struct nfs_client *clp);
+extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
+ const struct sockaddr *ds_addr,
+ int ds_addrlen, int ds_proto);
#ifdef CONFIG_PROC_FS
extern int __init nfs_fs_proc_init(void);
extern void nfs_fs_proc_exit(void);
@@ -213,6 +216,8 @@ extern const u32 nfs41_maxwrite_overhead;
extern struct rpc_procinfo nfs4_procedures[];
#endif

+extern int nfs4_init_ds_session(struct nfs_client *clp);
+
/* proc.c */
void nfs_close_context(struct nfs_open_context *ctx, int is_sync);

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 5d84642..5dc378e 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -266,6 +266,12 @@ is_ds_only_client(struct nfs_client *clp)
return (clp->cl_exchange_flags & EXCHGID4_FLAG_MASK_PNFS) ==
EXCHGID4_FLAG_USE_PNFS_DS;
}
+
+static inline bool
+is_ds_client(struct nfs_client *clp)
+{
+ return clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS;
+}
#else /* CONFIG_NFS_v4_1 */
static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *server)
{
@@ -289,6 +295,12 @@ is_ds_only_client(struct nfs_client *clp)
{
return false;
}
+
+static inline bool
+is_ds_client(struct nfs_client *clp)
+{
+ return false;
+}
#endif /* CONFIG_NFS_V4_1 */

extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f5c9b12..8e21e65 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -104,6 +104,67 @@ _data_server_lookup_locked(u32 ip_addr, u32 port)
return NULL;
}

+/*
+ * Create an rpc connection to the nfs4_pnfs_ds data server
+ * Currently only support IPv4
+ */
+static int
+nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
+{
+ struct nfs_client *clp;
+ struct sockaddr_in sin;
+ int status = 0;
+
+ dprintk("--> %s ip:port %x:%hu au_flavor %d\n", __func__,
+ ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
+ mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
+
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = ds->ds_ip_addr;
+ sin.sin_port = ds->ds_port;
+
+ clp = nfs4_set_ds_client(mds_srv->nfs_client, (struct sockaddr *)&sin,
+ sizeof(sin), IPPROTO_TCP);
+ if (IS_ERR(clp)) {
+ status = PTR_ERR(clp);
+ goto out;
+ }
+
+ if ((clp->cl_exchange_flags & EXCHGID4_FLAG_MASK_PNFS) != 0) {
+ if (!is_ds_client(clp)) {
+ status = -ENODEV;
+ goto out_put;
+ }
+ ds->ds_clp = clp;
+ dprintk("%s [existing] ip=%x, port=%hu\n", __func__,
+ ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
+ goto out;
+ }
+
+ /*
+ * Do not set NFS_CS_CHECK_LEASE_TIME instead set the DS lease to
+ * be equal to the MDS lease. Renewal is scheduled in create_session.
+ */
+ spin_lock(&mds_srv->nfs_client->cl_lock);
+ clp->cl_lease_time = mds_srv->nfs_client->cl_lease_time;
+ spin_unlock(&mds_srv->nfs_client->cl_lock);
+ clp->cl_last_renewal = jiffies;
+
+ /* New nfs_client */
+ status = nfs4_init_ds_session(clp);
+ if (status)
+ goto out_put;
+
+ ds->ds_clp = clp;
+ dprintk("%s [new] ip=%x, port=%hu\n", __func__, ntohl(ds->ds_ip_addr),
+ ntohs(ds->ds_port));
+out:
+ return status;
+out_put:
+ nfs_put_client(clp);
+ goto out;
+}
+
static void
destroy_ds(struct nfs4_pnfs_ds *ds)
{
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 78936a8..fe75ebd 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1574,9 +1574,8 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
return 0;
}

-static int nfs4_recover_expired_lease(struct nfs_server *server)
+static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
{
- struct nfs_client *clp = server->nfs_client;
unsigned int loop;
int ret;

@@ -1593,6 +1592,11 @@ static int nfs4_recover_expired_lease(struct nfs_server *server)
return ret;
}

+static int nfs4_recover_expired_lease(struct nfs_server *server)
+{
+ return nfs4_client_recover_expired_lease(server->nfs_client);
+}
+
/*
* OPEN_EXPIRED:
* reclaim state on the server after a network partition.
@@ -5073,6 +5077,27 @@ int nfs4_init_session(struct nfs_server *server)
return ret;
}

+int nfs4_init_ds_session(struct nfs_client *clp)
+{
+ struct nfs4_session *session = clp->cl_session;
+ int ret;
+
+ if (!test_and_clear_bit(NFS4_SESSION_INITING, &session->session_state))
+ return 0;
+
+ ret = nfs4_client_recover_expired_lease(clp);
+ if (!ret)
+ /* Test for the DS role */
+ if (!is_ds_client(clp))
+ ret = -ENODEV;
+ if (!ret)
+ ret = nfs4_check_client_ready(clp);
+ return ret;
+
+}
+EXPORT_SYMBOL_GPL(nfs4_init_ds_session);
+
+
/*
* Renew the cl_session lease.
*/
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 4591075..a607c65 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1018,6 +1018,7 @@ struct nfs_read_data {
struct nfs_readres res;
unsigned long timestamp; /* For lease renewal */
struct pnfs_layout_segment *lseg;
+ struct nfs_client *ds_clp; /* pNFS data server */
const struct rpc_call_ops *mds_ops;
struct page *page_array[NFS_PAGEVEC_SIZE];
};
--
1.7.2.3


2011-02-14 19:18:55

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 11/16] pnfs: wave 3: generic read

From: Andy Adamson <[email protected]>

Separate the rpc run portion of nfs_read_rpcsetup into a new function
nfs_initiate_read that is called for normal NFS I/O.

Add a pNFS read_pagelist function that is called instead of nfs_intitate_read
for pNFS reads.

Reported-by: Alexandros Batsakis <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Boaz Harrosh <[email protected]>
Signed-off-by: Dean Hildebrand <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Mike Sager <[email protected]>
Signed-off-by: Mingyang Guo <[email protected]>
Signed-off-by: Ricardo Labiaga <[email protected]>
Signed-off-by: Tao Guo <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/internal.h | 2 +
fs/nfs/pnfs.c | 28 ++++++++++++++++++
fs/nfs/pnfs.h | 20 +++++++++++++
fs/nfs/read.c | 66 +++++++++++++++++++++++++++----------------
include/linux/nfs_iostat.h | 1 +
include/linux/nfs_xdr.h | 1 +
6 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index cf9fdbd..335755d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -262,6 +262,8 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
#endif

/* read.c */
+extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+ const struct rpc_call_ops *call_ops);
extern void nfs_read_prepare(struct rpc_task *task, void *calldata);

/* write.c */
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index f200e34..6f4a5ab 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -30,6 +30,7 @@
#include <linux/nfs_fs.h>
#include "internal.h"
#include "pnfs.h"
+#include "iostat.h"

#define NFSDBG_FACILITY NFSDBG_PNFS

@@ -891,6 +892,33 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
}

/*
+ * Call the appropriate parallel I/O subsystem read function.
+ */
+enum pnfs_try_status
+pnfs_try_to_read_data(struct nfs_read_data *rdata,
+ const struct rpc_call_ops *call_ops)
+{
+ struct inode *inode = rdata->inode;
+ struct nfs_server *nfss = NFS_SERVER(inode);
+ enum pnfs_try_status trypnfs;
+
+ rdata->mds_ops = call_ops;
+
+ dprintk("%s: Reading ino:%lu %u@%llu\n",
+ __func__, inode->i_ino, rdata->args.count, rdata->args.offset);
+
+ trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
+ if (trypnfs == PNFS_NOT_ATTEMPTED) {
+ put_lseg(rdata->lseg);
+ rdata->lseg = NULL;
+ } else {
+ nfs_inc_stats(inode, NFSIOS_PNFS_READ);
+ }
+ dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
+ return trypnfs;
+}
+
+/*
* Device ID cache. Currently supports one layout type per struct nfs_client.
* Add layout type to the lookup key to expand to support multiple types.
*/
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 5107d14..585023f 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -45,6 +45,11 @@ struct pnfs_layout_segment {
struct pnfs_layout_hdr *pls_layout;
};

+enum pnfs_try_status {
+ PNFS_ATTEMPTED = 0,
+ PNFS_NOT_ATTEMPTED = 1,
+};
+
#ifdef CONFIG_NFS_V4_1

#define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
@@ -70,6 +75,12 @@ struct pnfs_layoutdriver_type {

/* test for nfs page cache coalescing */
int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
+
+ /*
+ * Return PNFS_ATTEMPTED to indicate the layout code has attempted
+ * I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
+ */
+ enum pnfs_try_status (*read_pagelist) (struct nfs_read_data *nfs_data);
};

struct pnfs_layout_hdr {
@@ -157,6 +168,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
enum pnfs_iomode access_type);
void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
void unset_pnfs_layoutdriver(struct nfs_server *);
+enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
+ const struct rpc_call_ops *);
void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
int pnfs_layout_process(struct nfs4_layoutget *lgp);
void pnfs_free_lseg_list(struct list_head *tmp_list);
@@ -227,6 +240,13 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
return NULL;
}

+static inline enum pnfs_try_status
+pnfs_try_to_read_data(struct nfs_read_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ return PNFS_NOT_ATTEMPTED;
+}
+
static inline bool
pnfs_roc(struct inode *ino)
{
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 20cc936..5c09d72 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -18,6 +18,8 @@
#include <linux/sunrpc/clnt.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
+#include <linux/smp_lock.h>
+#include <linux/module.h>

#include <asm/system.h>
#include "pnfs.h"
@@ -158,25 +160,20 @@ static void nfs_readpage_release(struct nfs_page *req)
nfs_release_request(req);
}

-/*
- * Set up the NFS read request struct
- */
-static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
- const struct rpc_call_ops *call_ops,
- unsigned int count, unsigned int offset,
- struct pnfs_layout_segment *lseg)
+int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+ const struct rpc_call_ops *call_ops)
{
- struct inode *inode = req->wb_context->path.dentry->d_inode;
+ struct inode *inode = data->inode;
int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
struct rpc_task *task;
struct rpc_message msg = {
.rpc_argp = &data->args,
.rpc_resp = &data->res,
- .rpc_cred = req->wb_context->cred,
+ .rpc_cred = data->cred,
};
struct rpc_task_setup task_setup_data = {
.task = &data->task,
- .rpc_client = NFS_CLIENT(inode),
+ .rpc_client = clnt,
.rpc_message = &msg,
.callback_ops = call_ops,
.callback_data = data,
@@ -184,9 +181,38 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
.flags = RPC_TASK_ASYNC | swap_flags,
};

+ /* Set up the initial task struct. */
+ NFS_PROTO(inode)->read_setup(data, &msg);
+
+ dprintk("NFS: %5u initiated read call (req %s/%lld, %u bytes @ "
+ "offset %llu)\n",
+ data->task.tk_pid,
+ inode->i_sb->s_id,
+ (long long)NFS_FILEID(inode),
+ data->args.count,
+ (unsigned long long)data->args.offset);
+
+ task = rpc_run_task(&task_setup_data);
+ if (IS_ERR(task))
+ return PTR_ERR(task);
+ rpc_put_task(task);
+ return 0;
+}
+EXPORT_SYMBOL(nfs_initiate_read);
+
+/*
+ * Set up the NFS read request struct
+ */
+static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
+ const struct rpc_call_ops *call_ops,
+ unsigned int count, unsigned int offset,
+ struct pnfs_layout_segment *lseg)
+{
+ struct inode *inode = req->wb_context->path.dentry->d_inode;
+
data->req = req;
data->inode = inode;
- data->cred = msg.rpc_cred;
+ data->cred = req->wb_context->cred;
data->lseg = get_lseg(lseg);

data->args.fh = NFS_FH(inode);
@@ -202,21 +228,11 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
data->res.eof = 0;
nfs_fattr_init(&data->fattr);

- /* Set up the initial task struct. */
- NFS_PROTO(inode)->read_setup(data, &msg);
-
- dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
- data->task.tk_pid,
- inode->i_sb->s_id,
- (long long)NFS_FILEID(inode),
- count,
- (unsigned long long)data->args.offset);
+ if (data->lseg &&
+ (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
+ return 0;

- task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
- rpc_put_task(task);
- return 0;
+ return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
}

static void
diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
index 68b10f5..37a1437 100644
--- a/include/linux/nfs_iostat.h
+++ b/include/linux/nfs_iostat.h
@@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
NFSIOS_SHORTREAD,
NFSIOS_SHORTWRITE,
NFSIOS_DELAY,
+ NFSIOS_PNFS_READ,
__NFSIOS_COUNTSMAX,
};

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 37e91c3..4591075 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1018,6 +1018,7 @@ struct nfs_read_data {
struct nfs_readres res;
unsigned long timestamp; /* For lease renewal */
struct pnfs_layout_segment *lseg;
+ const struct rpc_call_ops *mds_ops;
struct page *page_array[NFS_PAGEVEC_SIZE];
};

--
1.7.2.3


2011-02-15 14:47:52

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read


On Feb 14, 2011, at 6:36 PM, Trond Myklebust wrote:

> On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Separate the rpc run portion of nfs_read_rpcsetup into a new function
>> nfs_initiate_read that is called for normal NFS I/O.
>>
>> Add a pNFS read_pagelist function that is called instead of nfs_intitate_read
>> for pNFS reads.
>>
>> Reported-by: Alexandros Batsakis <[email protected]>
>> Signed-off-by: Andy Adamson <[email protected]>
>> Signed-off-by: Boaz Harrosh <[email protected]>
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> Signed-off-by: Fred Isaman <[email protected]>
>> Signed-off-by: Fred Isaman <[email protected]>
>> Signed-off-by: J. Bruce Fields <[email protected]>
>> Signed-off-by: Mike Sager <[email protected]>
>> Signed-off-by: Mingyang Guo <[email protected]>
>> Signed-off-by: Ricardo Labiaga <[email protected]>
>> Signed-off-by: Tao Guo <[email protected]>
>> Signed-off-by: Andy Adamson <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfs/internal.h | 2 +
>> fs/nfs/pnfs.c | 28 ++++++++++++++++++
>> fs/nfs/pnfs.h | 20 +++++++++++++
>> fs/nfs/read.c | 66 +++++++++++++++++++++++++++----------------
>> include/linux/nfs_iostat.h | 1 +
>> include/linux/nfs_xdr.h | 1 +
>> 6 files changed, 93 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index cf9fdbd..335755d 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -262,6 +262,8 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
>> #endif
>>
>> /* read.c */
>> +extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
>> + const struct rpc_call_ops *call_ops);
>> extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
>>
>> /* write.c */
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index f200e34..6f4a5ab 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -30,6 +30,7 @@
>> #include <linux/nfs_fs.h>
>> #include "internal.h"
>> #include "pnfs.h"
>> +#include "iostat.h"
>>
>> #define NFSDBG_FACILITY NFSDBG_PNFS
>>
>> @@ -891,6 +892,33 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
>> }
>>
>> /*
>> + * Call the appropriate parallel I/O subsystem read function.
>> + */
>> +enum pnfs_try_status
>> +pnfs_try_to_read_data(struct nfs_read_data *rdata,
>> + const struct rpc_call_ops *call_ops)
>> +{
>> + struct inode *inode = rdata->inode;
>> + struct nfs_server *nfss = NFS_SERVER(inode);
>> + enum pnfs_try_status trypnfs;
>> +
>> + rdata->mds_ops = call_ops;
>> +
>> + dprintk("%s: Reading ino:%lu %u@%llu\n",
>> + __func__, inode->i_ino, rdata->args.count, rdata->args.offset);
>> +
>> + trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
>> + if (trypnfs == PNFS_NOT_ATTEMPTED) {
>> + put_lseg(rdata->lseg);
>> + rdata->lseg = NULL;
>> + } else {
>> + nfs_inc_stats(inode, NFSIOS_PNFS_READ);
>> + }
>> + dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>> + return trypnfs;
>> +}
>> +
>> +/*
>> * Device ID cache. Currently supports one layout type per struct nfs_client.
>> * Add layout type to the lookup key to expand to support multiple types.
>> */
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 5107d14..585023f 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -45,6 +45,11 @@ struct pnfs_layout_segment {
>> struct pnfs_layout_hdr *pls_layout;
>> };
>>
>> +enum pnfs_try_status {
>> + PNFS_ATTEMPTED = 0,
>> + PNFS_NOT_ATTEMPTED = 1,
>> +};
>> +
>> #ifdef CONFIG_NFS_V4_1
>>
>> #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
>> @@ -70,6 +75,12 @@ struct pnfs_layoutdriver_type {
>>
>> /* test for nfs page cache coalescing */
>> int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
>> +
>> + /*
>> + * Return PNFS_ATTEMPTED to indicate the layout code has attempted
>> + * I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
>> + */
>> + enum pnfs_try_status (*read_pagelist) (struct nfs_read_data *nfs_data);
>> };
>>
>> struct pnfs_layout_hdr {
>> @@ -157,6 +168,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>> enum pnfs_iomode access_type);
>> void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
>> void unset_pnfs_layoutdriver(struct nfs_server *);
>> +enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
>> + const struct rpc_call_ops *);
>> void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
>> int pnfs_layout_process(struct nfs4_layoutget *lgp);
>> void pnfs_free_lseg_list(struct list_head *tmp_list);
>> @@ -227,6 +240,13 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>> return NULL;
>> }
>>
>> +static inline enum pnfs_try_status
>> +pnfs_try_to_read_data(struct nfs_read_data *data,
>> + const struct rpc_call_ops *call_ops)
>> +{
>> + return PNFS_NOT_ATTEMPTED;
>> +}
>> +
>> static inline bool
>> pnfs_roc(struct inode *ino)
>> {
>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> index 20cc936..5c09d72 100644
>> --- a/fs/nfs/read.c
>> +++ b/fs/nfs/read.c
>> @@ -18,6 +18,8 @@
>> #include <linux/sunrpc/clnt.h>
>> #include <linux/nfs_fs.h>
>> #include <linux/nfs_page.h>
>> +#include <linux/smp_lock.h>
>> +#include <linux/module.h>
>>
>> #include <asm/system.h>
>> #include "pnfs.h"
>> @@ -158,25 +160,20 @@ static void nfs_readpage_release(struct nfs_page *req)
>> nfs_release_request(req);
>> }
>>
>> -/*
>> - * Set up the NFS read request struct
>> - */
>> -static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> - const struct rpc_call_ops *call_ops,
>> - unsigned int count, unsigned int offset,
>> - struct pnfs_layout_segment *lseg)
>> +int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
>
> static int.... Nobody is using this outside of fs/nfs/read.c

Should be static here, and made not static in the patch that uses it.


>
>> + const struct rpc_call_ops *call_ops)
>> {
>> - struct inode *inode = req->wb_context->path.dentry->d_inode;
>> + struct inode *inode = data->inode;
>> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
>> struct rpc_task *task;
>> struct rpc_message msg = {
>> .rpc_argp = &data->args,
>> .rpc_resp = &data->res,
>> - .rpc_cred = req->wb_context->cred,
>> + .rpc_cred = data->cred,
>> };
>> struct rpc_task_setup task_setup_data = {
>> .task = &data->task,
>> - .rpc_client = NFS_CLIENT(inode),
>> + .rpc_client = clnt,
>> .rpc_message = &msg,
>> .callback_ops = call_ops,
>> .callback_data = data,
>> @@ -184,9 +181,38 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> .flags = RPC_TASK_ASYNC | swap_flags,
>> };
>>
>> + /* Set up the initial task struct. */
>> + NFS_PROTO(inode)->read_setup(data, &msg);
>> +
>> + dprintk("NFS: %5u initiated read call (req %s/%lld, %u bytes @ "
>> + "offset %llu)\n",
>> + data->task.tk_pid,
>> + inode->i_sb->s_id,
>> + (long long)NFS_FILEID(inode),
>> + data->args.count,
>> + (unsigned long long)data->args.offset);
>> +
>> + task = rpc_run_task(&task_setup_data);
>> + if (IS_ERR(task))
>> + return PTR_ERR(task);
>> + rpc_put_task(task);
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(nfs_initiate_read);
>
> Firstly, this should be EXPORT_SYMBOL_GPL, but in any case, why include
> it here? This patch contains no users for the export.

Oops - GPL indeed. Yes, I'll move it to the patch that uses it.

-->Andy

>
>> +
>> +/*
>> + * Set up the NFS read request struct
>> + */
>> +static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> + const struct rpc_call_ops *call_ops,
>> + unsigned int count, unsigned int offset,
>> + struct pnfs_layout_segment *lseg)
>> +{
>> + struct inode *inode = req->wb_context->path.dentry->d_inode;
>> +
>> data->req = req;
>> data->inode = inode;
>> - data->cred = msg.rpc_cred;
>> + data->cred = req->wb_context->cred;
>> data->lseg = get_lseg(lseg);
>>
>> data->args.fh = NFS_FH(inode);
>> @@ -202,21 +228,11 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> data->res.eof = 0;
>> nfs_fattr_init(&data->fattr);
>>
>> - /* Set up the initial task struct. */
>> - NFS_PROTO(inode)->read_setup(data, &msg);
>> -
>> - dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
>> - data->task.tk_pid,
>> - inode->i_sb->s_id,
>> - (long long)NFS_FILEID(inode),
>> - count,
>> - (unsigned long long)data->args.offset);
>> + if (data->lseg &&
>> + (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
>> + return 0;
>>
>> - task = rpc_run_task(&task_setup_data);
>> - if (IS_ERR(task))
>> - return PTR_ERR(task);
>> - rpc_put_task(task);
>> - return 0;
>> + return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
>> }
>>
>> static void
>> diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
>> index 68b10f5..37a1437 100644
>> --- a/include/linux/nfs_iostat.h
>> +++ b/include/linux/nfs_iostat.h
>> @@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
>> NFSIOS_SHORTREAD,
>> NFSIOS_SHORTWRITE,
>> NFSIOS_DELAY,
>> + NFSIOS_PNFS_READ,
>> __NFSIOS_COUNTSMAX,
>> };
>>
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 37e91c3..4591075 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1018,6 +1018,7 @@ struct nfs_read_data {
>> struct nfs_readres res;
>> unsigned long timestamp; /* For lease renewal */
>> struct pnfs_layout_segment *lseg;
>> + const struct rpc_call_ops *mds_ops;
>> struct page *page_array[NFS_PAGEVEC_SIZE];
>> };
>>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>


2011-02-15 14:41:14

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 09/16] pnfs: wave 3: shift pnfs_update_layout locations

On Mon, Feb 14, 2011 at 6:14 PM, Trond Myklebust
<[email protected]> wrote:
> On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
>> From: Fred Isaman <[email protected]>
>>
>> Move the pnfs_update_layout call location to nfs_pageio_do_add_request().
>> Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach
>> it to each nfs_read_data so it can be sent to the layout driver.
>>
>> Signed-off-by: Andy Adamon <[email protected]>
>> Signed-off-by: Andy Adamon <[email protected]>
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> Signed-off-by: Fred Isaman <[email protected]>
>> Signed-off-by: Fred Isaman <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Boaz Harrosh <[email protected]>
>> Signed-off-by: Oleg Drokin <[email protected]>
>> Signed-off-by: Tao Guo <[email protected]>
>> ---
>> ?fs/nfs/file.c ? ? ? ? ? ?| ? ?4 ----
>> ?fs/nfs/pagelist.c ? ? ? ?| ? 15 ++++++++++++---
>> ?fs/nfs/pnfs.c ? ? ? ? ? ?| ? ?4 ++--
>> ?fs/nfs/pnfs.h ? ? ? ? ? ?| ? ?1 +
>> ?fs/nfs/read.c ? ? ? ? ? ?| ? 28 ++++++++++++++++------------
>> ?fs/nfs/write.c ? ? ? ? ? | ? ?4 ++--
>> ?include/linux/nfs_page.h | ? ?5 +++--
>> ?include/linux/nfs_xdr.h ?| ? ?1 +
>> ?8 files changed, 37 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index 7bf029e..d85a534 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -387,10 +387,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>> ? ? ? ? ? ? ? file->f_path.dentry->d_name.name,
>> ? ? ? ? ? ? ? mapping->host->i_ino, len, (long long) pos);
>>
>> - ? ? pnfs_update_layout(mapping->host,
>> - ? ? ? ? ? ? ? ? ? ? ? ?nfs_file_open_context(file),
>> - ? ? ? ? ? ? ? ? ? ? ? ?IOMODE_RW);
>> -
>> ?start:
>> ? ? ? /*
>> ? ? ? ?* Prevent starvation issues if someone is doing a consistency
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index e1164e3..e0a0cb4 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -20,6 +20,7 @@
>> ?#include <linux/nfs_mount.h>
>>
>> ?#include "internal.h"
>> +#include "pnfs.h"
>>
>> ?static struct kmem_cache *nfs_page_cachep;
>>
>> @@ -213,7 +214,7 @@ nfs_wait_on_request(struct nfs_page *req)
>> ? */
>> ?void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>> ? ? ? ? ? ? ? ? ? ?struct inode *inode,
>> - ? ? ? ? ? ? ? ? ?int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
>> + ? ? ? ? ? ? ? ? ?int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
>> ? ? ? ? ? ? ? ? ? ?size_t bsize,
>> ? ? ? ? ? ? ? ? ? ?int io_flags)
>> ?{
>> @@ -226,6 +227,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>> ? ? ? desc->pg_doio = doio;
>> ? ? ? desc->pg_ioflags = io_flags;
>> ? ? ? desc->pg_error = 0;
>> + ? ? desc->pg_lseg = NULL;
>> ?}
>>
>> ?/**
>> @@ -288,8 +290,13 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>> ? ? ? ? ? ? ? prev = nfs_list_entry(desc->pg_list.prev);
>> ? ? ? ? ? ? ? if (!nfs_can_coalesce_requests(prev, req))
>> ? ? ? ? ? ? ? ? ? ? ? return 0;
>> - ? ? } else
>> + ? ? } else {
>> + ? ? ? ? ? ? put_lseg(desc->pg_lseg);
>> ? ? ? ? ? ? ? desc->pg_base = req->wb_pgbase;
>> + ? ? ? ? ? ? desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?req->wb_context,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?IOMODE_READ);
>
> Looking at this afresh after a week of vacation. Isn't it more natural
> to do this as part of the pg_doio() callback?
>
> Your only reason for introducing the ->pg_lseg pointer is to be able to
> pass it to the ->pg_doio() in the first place. Why not do that by simply
> passing the 'desc' pointer to ->pg_doio(), and then having it call
> pnfs_update_layout() instead of 'get_layout()'?
>

The problem is that it is not the only reason. Passing the lseg into
the nfs_can_coalesce_requests is another. Calling pnfs_update_layout
in ->pg_doio would be eliminate the opportunity to have a say in
coalescing based on the layout.


>> + ? ? }
>> ? ? ? nfs_list_remove_request(req);
>> ? ? ? nfs_list_add_request(req, &desc->pg_list);
>> ? ? ? desc->pg_count = newlen;
>> @@ -307,7 +314,8 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? nfs_page_array_len(desc->pg_base,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?desc->pg_count),
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? desc->pg_count,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? desc->pg_ioflags);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? desc->pg_ioflags,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? desc->pg_lseg);
>> ? ? ? ? ? ? ? if (error < 0)
>> ? ? ? ? ? ? ? ? ? ? ? desc->pg_error = error;
>> ? ? ? ? ? ? ? else
>> @@ -345,6 +353,7 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> ?void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
>> ?{
>> ? ? ? nfs_pageio_doio(desc);
>> + ? ? put_lseg(desc->pg_lseg);
>> ?}
>>
>> ?/**
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index f0a9578..dcd4356 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -264,7 +264,7 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
>> ? ? ? return 0;
>> ?}
>>
>> -static void
>> +void
>> ?put_lseg(struct pnfs_layout_segment *lseg)
>> ?{
>> ? ? ? struct inode *ino;
>> @@ -285,6 +285,7 @@ put_lseg(struct pnfs_layout_segment *lseg)
>> ? ? ? ? ? ? ? pnfs_free_lseg_list(&free_me);
>> ? ? ? }
>> ?}
>> +EXPORT_SYMBOL_GPL(put_lseg);
>
> Why is this needed here?
>

That looks like an artifact left over from older code. It is not needed.

>
>> static bool
>> ?should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
>> @@ -797,7 +798,6 @@ pnfs_update_layout(struct inode *ino,
>> ?out:
>> ? ? ? dprintk("%s end, state 0x%lx lseg %p\n", __func__,
>> ? ? ? ? ? ? ? nfsi->layout ? nfsi->layout->plh_flags : -1, lseg);
>> - ? ? put_lseg(lseg); /* STUB - callers currently ignore return value */
>> ? ? ? return lseg;
>> ?out_unlock:
>> ? ? ? spin_unlock(&ino->i_lock);
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 9a994bc..121d6a3 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -146,6 +146,7 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
>>
>> ?/* pnfs.c */
>> ?void get_layout_hdr(struct pnfs_layout_hdr *lo);
>> +void put_lseg(struct pnfs_layout_segment *lseg);
>> ?struct pnfs_layout_segment *
>> ?pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>> ? ? ? ? ? ? ? ? ?enum pnfs_iomode access_type);
>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> index aedcaa7..c453164 100644
>> --- a/fs/nfs/read.c
>> +++ b/fs/nfs/read.c
>> @@ -20,17 +20,17 @@
>> ?#include <linux/nfs_page.h>
>>
>> ?#include <asm/system.h>
>> +#include "pnfs.h"
>>
>> ?#include "nfs4_fs.h"
>> ?#include "internal.h"
>> ?#include "iostat.h"
>> ?#include "fscache.h"
>> -#include "pnfs.h"
>>
>> ?#define NFSDBG_FACILITY ? ? ? ? ? ? ?NFSDBG_PAGECACHE
>>
>> -static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int);
>> -static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int);
>> +static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
>> +static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
>> ?static const struct rpc_call_ops nfs_read_partial_ops;
>> ?static const struct rpc_call_ops nfs_read_full_ops;
>>
>> @@ -70,6 +70,7 @@ void nfs_readdata_free(struct nfs_read_data *p)
>> ?static void nfs_readdata_release(struct nfs_read_data *rdata)
>> ?{
>> ? ? ? put_nfs_open_context(rdata->args.context);
>> + ? ? put_lseg(rdata->lseg);
>
> Shouldn't you be calling put_lseg() _before_ put_nfs_open_context()? You
> are not guaranteed that the inode still exists after that call.
>

Yes.

Fred

2011-02-15 14:43:51

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 10/16] pnfs: wave 3: coelesce across layout stripes

On Mon, Feb 14, 2011 at 6:42 PM, Trond Myklebust
<[email protected]> wrote:
> On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
>> From: Fred Isaman <[email protected]>
>>
>> Add a pg_test layout driver hook which is used to avoid coelescing I/O across
>> layout stripes.
>
> Doesn't this belong before [PATCH 09/16] pnfs: wave 3: shift
> pnfs_update_layout locations?

The pg_test uses the pg_lseg declared in [PATCH 09/16] pnfs: wave 3:
shift pnfs_update_layout locations, which is why the patches are
ordered this way.


-->Andy

>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-02-16 15:58:21

by Sager, Mike

[permalink] [raw]
Subject: RE: [PATCH 11/16] pnfs: wave 3: generic read

QSBsb3Qgb2YgdGhlIGVhcmx5IHBhdGNoZXMgaGF2ZSBteSBuYW1lIG9uIGl0IGJlY2F1c2UgSSB3
YXMgaGVscGluZyB3aXRoIGEgcG9ydGluZyBlZmZvcnQuICBSYWh1bCBJeWVyIChhbmQgc2V2ZXJh
bCBvdGhlcnMpIGF1dGhvcmVkIGEgZ29vZCBjaHVuayBvZiB0aGF0IGNvZGUuDQoNCk1pa2UNCg0K
PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBCZW5ueSBIYWxldnkgW21haWx0
bzpiaGFsZXZ5QHBhbmFzYXMuY29tXQ0KPiBTZW50OiBXZWRuZXNkYXksIEZlYnJ1YXJ5IDE2LCAy
MDExIDEwOjUyIEFNDQo+IFRvOiBNeWtsZWJ1c3QsIFRyb25kOyBBZGFtc29uLCBBbmR5DQo+IENj
OiBsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnOyBBbmR5IEFkYW1zb247IEJvYXogSGFycm9zaDsg
RGVhbg0KPiBIaWxkZWJyYW5kOyBGcmVkIElzYW1hbjsgSXNhbWFuLCBGcmVkOyBKLiBCcnVjZSBG
aWVsZHM7IFNhZ2VyLCBNaWtlOw0KPiBNaW5neWFuZyBHdW87IExhYmlhZ2EsIFJpY2FyZG8NCj4g
U3ViamVjdDogUmU6IFtQQVRDSCAxMS8xNl0gcG5mczogd2F2ZSAzOiBnZW5lcmljIHJlYWQNCj4g
DQo+IE9uIDIwMTEtMDItMTYgMTA6MDksIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gPiBPbiBX
ZWQsIDIwMTEtMDItMTYgYXQgMDk6NTMgLTA1MDAsIEFuZHkgQWRhbXNvbiB3cm90ZToNCj4gPj4g
T24gRmViIDE1LCAyMDExLCBhdCAxMDoxNiBQTSwgQmVubnkgSGFsZXZ5IHdyb3RlOg0KPiA+Pg0K
PiA+Pj4gT24gMjAxMS0wMi0xNCAxNDoxOCwgYW5kcm9zQG5ldGFwcC5jb20gd3JvdGU6DQo+ID4+
Pj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gPj4+DQo+ID4+PiBB
bmR5LCB0YWtpbmcgaW50byBhY2NvdW50IHRoZSBtYW55IGNvbnRyaWJ1dG9ycyB0byB0aGlzIHBh
dGNoDQo+ID4+PiB0aGUgYXV0aG9yIHNob3VsZCBiZSAiVGhlIHBORlMgVGVhbSIgSU1PLg0KPiA+
Pg0KPiA+PiBUaGUgYXV0aG9yIGNhbid0IGJlICJUaGUgcE5GUyBUZWFtIi4gIFNvbWVib2R5IG5l
ZWRzIHRvIGJlIHRoZQ0KPiBhdXRob3IuIEkgYXNrZWQgZm9yIHZvbHVudGVlcnMgYW5kIHNhaWQg
SSB3b3VsZCBiZSB0aGUgZGVmYXVsdC4gRG8geW91DQo+IHdhbnQgdG8gYmUgdGhlIGF1dGhvcj8N
Cj4gPg0KPiA+IFJpZ2h0LiBQYXRjaGVzIGF1dGhvcmVkIGJ5ICdUaGUgcE5GUyBUZWFtJyB3aWxs
IGJlIHJlamVjdGVkLCBhcw0KPiA+IGRpc2N1c3NlZCBpbiBIb3BraW50b24gbGFzdCBhdXR1bW4u
DQo+ID4NCj4gDQo+IE9LLiBJJ20gbm90IHRoZSBvcmlnaW5hbCBhdXRob3Igc28gSSBjYW4ndCBj
bGFpbSBhdXRob3JzaGlwIGZvciB0aGlzDQo+IHBhdGNoLg0KPiBGV0lXLCBUaGUgZWFybGllc3Qg
cmVjb3JkIEkgaGF2ZSBpbiBteSB0cmVlIGZvciB0aGUgZWFybGllc3QgdmVyc2lvbnMNCj4gb2Yg
dGhpcyBjb2RlIGlzDQo+IGF1dGhvcmVkIGJ5IEFuZHkgYW4gTWlrZSBTYWdlci4uLg0KPiANCj4g
QmVubnkNCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxp
bmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRv
IG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0
cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQoT77+977+97Lm7HO+/vSbv
v71+77+9Ju+/vRjvv73vv70rLe+/ve+/vd22F++/ve+/vXfvv73vv73Lm++/ve+/ve+/vW3vv71i
77+977+9Z37Ip++/vRfvv73vv73cqH3vv73vv73vv73GoHrvv70majordu+/ve+/ve+/vQfvv73v
v73vv73vv716Wivvv73vv70rembvv73vv73vv71o77+977+977+9fu+/ve+/ve+/ve+/vWnvv73v
v73vv71677+9Hu+/vXfvv73vv73vv70/77+977+977+977+9Ju+/vSnfohtm

2011-02-14 19:19:01

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 16/16] pnfs: wave 3: turn off pNFS on ds connection failure

From: Andy Adamson <[email protected]>

If a data server is unavailable, go through MDS.

Mark the deviceid containing the data server as a negative cache entry.
Do not try to connect to any data server on a deviceid marked as a negative
cache entry. Mark any layout that tries to use the marked deviceid as failed.

Inodes with a layout marked as fails will not use the layout for I/O, and will
not perform any more layoutgets.
Inodes without a layout will still do layoutget, but the layout will get
marked immediately.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 4 +++-
fs/nfs/nfs4filelayout.h | 3 +++
fs/nfs/nfs4filelayoutdev.c | 27 +++++++++++++++++++++++----
fs/nfs/pnfs.c | 18 ++++++++++++++----
fs/nfs/pnfs.h | 4 ++++
5 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c818042..3768377 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -232,7 +232,9 @@ filelayout_read_pagelist(struct nfs_read_data *data)
idx = nfs4_fl_calc_ds_index(lseg, j);
ds = nfs4_fl_prepare_ds(lseg, idx);
if (!ds) {
- printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
+ /* Either layout fh index faulty, or ds connect failed */
+ set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
+ set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
return PNFS_NOT_ATTEMPTED;
}
dprintk("%s USE DS:ip %x %hu\n", __func__,
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 9fef76e..1809aa6 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -97,5 +97,8 @@ extern struct nfs4_file_layout_dsaddr *
nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
struct nfs4_file_layout_dsaddr *
get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
+void filelayout_mark_devid_negative(struct nfs_client *clp,
+ struct pnfs_deviceid_node *devid,
+ int err, u32 ds_ipaddr);

#endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index e8496f3..b8b3dbb 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -553,6 +553,19 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
i = j;
return flseg->fh_array[i];
}
+void
+filelayout_mark_devid_negative(struct nfs_client *mds_clp,
+ struct pnfs_deviceid_node *devid,
+ int err, u32 ds_addr)
+{
+ u32 *p = (u32 *)&devid->de_id;
+
+ printk(KERN_ERR "NFS: data server %x connection error %d."
+ " Deviceid [%x%x%x%x] marked out of use.\n",
+ ds_addr, err, p[0], p[1], p[2], p[3]);
+
+ pnfs_mark_devid_negative(mds_clp, devid);
+}

struct nfs4_pnfs_ds *
nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
@@ -567,13 +580,19 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
}

if (!ds->ds_clp) {
+ struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
int err;

- err = nfs4_ds_connect(NFS_SERVER(lseg->pls_layout->plh_inode),
- dsaddr->ds_list[ds_idx]);
+ if (dsaddr->deviceid.de_flags & NFS4_DEVICE_ID_NEG_ENTRY) {
+ /* Already tried to connect, don't try again */
+ dprintk("%s Deviceid marked out of use\n", __func__);
+ return NULL;
+ }
+ err = nfs4_ds_connect(s, ds);
if (err) {
- printk(KERN_ERR "%s nfs4_ds_connect error %d\n",
- __func__, err);
+ filelayout_mark_devid_negative(s->nfs_client,
+ &dsaddr->deviceid, err,
+ ntohl(ds->ds_ip_addr));
return NULL;
}
}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 6f4a5ab..912b1ff 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -761,15 +761,16 @@ pnfs_update_layout(struct inode *ino,
dprintk("%s matches recall, use MDS\n", __func__);
goto out_unlock;
}
+
+ /* If LAYOUTGET or pNFS I/O already failed once we don't try again */
+ if (test_bit(lo_fail_bit(iomode), &nfsi->layout->plh_flags))
+ goto out_unlock;
+
/* Check to see if the layout for the given range already exists */
lseg = pnfs_find_lseg(lo, iomode);
if (lseg)
goto out_unlock;

- /* if LAYOUTGET already failed once we don't try again */
- if (test_bit(lo_fail_bit(iomode), &nfsi->layout->plh_flags))
- goto out_unlock;
-
if (pnfs_layoutgets_blocked(lo, NULL, 0))
goto out_unlock;
atomic_inc(&lo->plh_outstanding);
@@ -1052,3 +1053,12 @@ pnfs_put_deviceid_cache(struct nfs_client *clp)
}
}
EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
+
+void
+pnfs_mark_devid_negative(struct nfs_client *clp, struct pnfs_deviceid_node *d)
+{
+ spin_lock(&clp->cl_devid_cache->dc_lock);
+ d->de_flags |= NFS4_DEVICE_ID_NEG_ENTRY;
+ spin_unlock(&clp->cl_devid_cache->dc_lock);
+}
+EXPORT_SYMBOL_GPL(pnfs_mark_devid_negative);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 585023f..a760363 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -131,6 +131,8 @@ struct pnfs_deviceid_node {
struct hlist_node de_node;
struct nfs4_deviceid de_id;
atomic_t de_ref;
+ unsigned long de_flags;
+#define NFS4_DEVICE_ID_NEG_ENTRY 1
};

struct pnfs_deviceid_cache {
@@ -151,6 +153,8 @@ extern struct pnfs_deviceid_node *pnfs_add_deviceid(
struct pnfs_deviceid_node *);
extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
struct pnfs_deviceid_node *devid);
+extern void pnfs_mark_devid_negative(struct nfs_client *clp,
+ struct pnfs_deviceid_node *d);

extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
--
1.7.2.3


2011-02-15 15:11:35

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On Tue, Feb 15, 2011 at 10:06 AM, Christoph Hellwig <[email protected]> wrote:
> Btw, what's the point for deferring the free_lseg calls? ?It looks like it's
> to avoid calling something that might block under i_lock, but looking around
> the pnfs-submit branch it seems that root cause could be fixed trivially.
>
> In common code *free_lseg* and *put_layout_hdr* do nothing but list
> manipulations and kfrees. ?And in filelayout_free_lseg we have just kfrees
> and a call to pnfs_put_deviceid which may sleep due to calling
> synchronize_rcu. ?But synchronize_rcu is horribly inefficient to start with,
> and you'd better be off using call_rcu to free the device id, which will
> lead to much saner code and better performance.
>

The fundamental reason was that filelayout_free_lseg can call
nfs_put_client (through nfs4_fl_free_deviceid_callback), which in some
codepaths grabs a mutex.

Fred

2011-02-15 14:52:07

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 01/16] NFS remove unnecessary CONFIG_NFS_V4 from nfs_read_data


On Feb 15, 2011, at 4:16 AM, Christoph Hellwig wrote:

> On Mon, Feb 14, 2011 at 02:18:21PM -0500, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>
> Either the patch or the description is incorrect. If you actually need
> it for NFSv2/3 the description should say it. Otherwise it's just a
> "cleanup" which bloats the structure for people not having v4 support
> compiled in.

It is just a clean-up per Trond's request to just get rid of the CONFIG_NFS_V4 and CONFIG_NFS_V4_1 ifdef's in struct nfs_read_data. I'll change the description to match.

-->Andy

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-02-14 19:18:43

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 01/16] NFS remove unnecessary CONFIG_NFS_V4 from nfs_read_data

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
include/linux/nfs_xdr.h | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index b006857..51bfadb 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1016,9 +1016,7 @@ struct nfs_read_data {
unsigned int npages; /* Max length of pagevec */
struct nfs_readargs args;
struct nfs_readres res;
-#ifdef CONFIG_NFS_V4
unsigned long timestamp; /* For lease renewal */
-#endif
struct page *page_array[NFS_PAGEVEC_SIZE];
};

--
1.7.2.3


2011-02-15 14:44:25

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 0/16] pnfs wave 3 submission

On Mon, Feb 14, 2011 at 5:39 PM, Trond Myklebust
<[email protected]> wrote:
> On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
>> These patches implement wave 3 of the pNFS submission, which encompasses file
>> layout data server connection, READ I/O, and recovery through the MDS.
>>
>> They apply on top of Fred's recent 3 patch 'lock inversion' series
>> commits 2d767077 .. 2cc09edd in Trond's nfs-for-next tree.
>>
>> They are based upon Benny's current pnfs-submit-wave3 branch re-arranged
>> into a more choerent series of patches and rebased upon Trond's nfs-for-next.
>>
>> -->Andy
>>
>> 0001-NFS-remove-unnecessary-CONFIG_NFS_V4-from-nfs_read_d.patch
>> 0002-NFS-put_layout_hdr-can-remove-nfsi-layout.patch
>> 0003-NFS-move-nfs_client-initialization-into-nfs_get_clie.patch
>> 0004-pnfs-wave-3-send-zero-stateid-seqid-on-v4.1-i-o.patch
>> 0005-pnfs-wave-3-new-flag-for-state-renewal-check.patch
>> 0006-pnfs-wave-3-new-flag-for-lease-time-check.patch
>> 0007-pnfs-wave-3-add-MDS-mount-DS-only-check.patch
>> 0008-pnfs-wave-3-lseg-refcounting.patch
>> 0009-pnfs-wave-3-shift-pnfs_update_layout-locations.patch
>> 0010-pnfs-wave-3-coelesce-across-layout-stripes.patch
>> 0011-pnfs-wave-3-generic-read.patch
>> 0012-pnfs-wave-3-data-server-connection.patch
>> 0013-pnfs-wave-3-filelayout-read.patch
>> 0014-pnfs-wave-3-filelayout-read.patch
>> 0015-pnfs-wave-3-filelayout-async-error-handler.patch
>> 0016-pnfs-wave-3-turn-off-pNFS-on-ds-connection-failure.patch
>>
>
> Hi Andy,
>
> Can we please get rid of the 'wave 3:' in the subject of all these
> patches? It is a slang that is only meaningful to the people who
> regularly follow the Linux pNFS conference calls.

Sure - makes sense.

-->Andy
>
> If you do want to label the patches, then please encode the label as
> something like
>
> [PATCH pNFS wave3] NFSv4.1: xxxxxxxxxxxx
>
> so that 'git am' strips out the [PNFS WAVE 3] bit, and just leaves the
> patch short description.
>
> You can do this using the --subject-prefix="PATCH pNFS wave3" argument
> to 'git format-patch'.
> Alternatively, if you are using 'stg mail' then you can achieve the same
> effect with --version="pNFS wave3".
>
> Cheers
> Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-02-16 03:16:38

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read

On 2011-02-14 14:18, [email protected] wrote:
> From: Andy Adamson <[email protected]>

Andy, taking into account the many contributors to this patch
the author should be "The pNFS Team" IMO.

>
> Separate the rpc run portion of nfs_read_rpcsetup into a new function
> nfs_initiate_read that is called for normal NFS I/O.
>
> Add a pNFS read_pagelist function that is called instead of nfs_intitate_read
> for pNFS reads.
>
> Reported-by: Alexandros Batsakis <[email protected]>

historical trivia? :)

Benny

> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Boaz Harrosh <[email protected]>
> Signed-off-by: Dean Hildebrand <[email protected]>
> Signed-off-by: Fred Isaman <[email protected]>
> Signed-off-by: Fred Isaman <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> Signed-off-by: Mike Sager <[email protected]>
> Signed-off-by: Mingyang Guo <[email protected]>
> Signed-off-by: Ricardo Labiaga <[email protected]>
> Signed-off-by: Tao Guo <[email protected]>
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfs/internal.h | 2 +
> fs/nfs/pnfs.c | 28 ++++++++++++++++++
> fs/nfs/pnfs.h | 20 +++++++++++++
> fs/nfs/read.c | 66 +++++++++++++++++++++++++++----------------
> include/linux/nfs_iostat.h | 1 +
> include/linux/nfs_xdr.h | 1 +
> 6 files changed, 93 insertions(+), 25 deletions(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index cf9fdbd..335755d 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -262,6 +262,8 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
> #endif
>
> /* read.c */
> +extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
> + const struct rpc_call_ops *call_ops);
> extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
>
> /* write.c */
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index f200e34..6f4a5ab 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -30,6 +30,7 @@
> #include <linux/nfs_fs.h>
> #include "internal.h"
> #include "pnfs.h"
> +#include "iostat.h"
>
> #define NFSDBG_FACILITY NFSDBG_PNFS
>
> @@ -891,6 +892,33 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
> }
>
> /*
> + * Call the appropriate parallel I/O subsystem read function.
> + */
> +enum pnfs_try_status
> +pnfs_try_to_read_data(struct nfs_read_data *rdata,
> + const struct rpc_call_ops *call_ops)
> +{
> + struct inode *inode = rdata->inode;
> + struct nfs_server *nfss = NFS_SERVER(inode);
> + enum pnfs_try_status trypnfs;
> +
> + rdata->mds_ops = call_ops;
> +
> + dprintk("%s: Reading ino:%lu %u@%llu\n",
> + __func__, inode->i_ino, rdata->args.count, rdata->args.offset);
> +
> + trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
> + if (trypnfs == PNFS_NOT_ATTEMPTED) {
> + put_lseg(rdata->lseg);
> + rdata->lseg = NULL;
> + } else {
> + nfs_inc_stats(inode, NFSIOS_PNFS_READ);
> + }
> + dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
> + return trypnfs;
> +}
> +
> +/*
> * Device ID cache. Currently supports one layout type per struct nfs_client.
> * Add layout type to the lookup key to expand to support multiple types.
> */
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 5107d14..585023f 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -45,6 +45,11 @@ struct pnfs_layout_segment {
> struct pnfs_layout_hdr *pls_layout;
> };
>
> +enum pnfs_try_status {
> + PNFS_ATTEMPTED = 0,
> + PNFS_NOT_ATTEMPTED = 1,
> +};
> +
> #ifdef CONFIG_NFS_V4_1
>
> #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
> @@ -70,6 +75,12 @@ struct pnfs_layoutdriver_type {
>
> /* test for nfs page cache coalescing */
> int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
> +
> + /*
> + * Return PNFS_ATTEMPTED to indicate the layout code has attempted
> + * I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
> + */
> + enum pnfs_try_status (*read_pagelist) (struct nfs_read_data *nfs_data);
> };
>
> struct pnfs_layout_hdr {
> @@ -157,6 +168,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> enum pnfs_iomode access_type);
> void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
> void unset_pnfs_layoutdriver(struct nfs_server *);
> +enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
> + const struct rpc_call_ops *);
> void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
> int pnfs_layout_process(struct nfs4_layoutget *lgp);
> void pnfs_free_lseg_list(struct list_head *tmp_list);
> @@ -227,6 +240,13 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> return NULL;
> }
>
> +static inline enum pnfs_try_status
> +pnfs_try_to_read_data(struct nfs_read_data *data,
> + const struct rpc_call_ops *call_ops)
> +{
> + return PNFS_NOT_ATTEMPTED;
> +}
> +
> static inline bool
> pnfs_roc(struct inode *ino)
> {
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 20cc936..5c09d72 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -18,6 +18,8 @@
> #include <linux/sunrpc/clnt.h>
> #include <linux/nfs_fs.h>
> #include <linux/nfs_page.h>
> +#include <linux/smp_lock.h>
> +#include <linux/module.h>
>
> #include <asm/system.h>
> #include "pnfs.h"
> @@ -158,25 +160,20 @@ static void nfs_readpage_release(struct nfs_page *req)
> nfs_release_request(req);
> }
>
> -/*
> - * Set up the NFS read request struct
> - */
> -static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> - const struct rpc_call_ops *call_ops,
> - unsigned int count, unsigned int offset,
> - struct pnfs_layout_segment *lseg)
> +int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
> + const struct rpc_call_ops *call_ops)
> {
> - struct inode *inode = req->wb_context->path.dentry->d_inode;
> + struct inode *inode = data->inode;
> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
> struct rpc_task *task;
> struct rpc_message msg = {
> .rpc_argp = &data->args,
> .rpc_resp = &data->res,
> - .rpc_cred = req->wb_context->cred,
> + .rpc_cred = data->cred,
> };
> struct rpc_task_setup task_setup_data = {
> .task = &data->task,
> - .rpc_client = NFS_CLIENT(inode),
> + .rpc_client = clnt,
> .rpc_message = &msg,
> .callback_ops = call_ops,
> .callback_data = data,
> @@ -184,9 +181,38 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> .flags = RPC_TASK_ASYNC | swap_flags,
> };
>
> + /* Set up the initial task struct. */
> + NFS_PROTO(inode)->read_setup(data, &msg);
> +
> + dprintk("NFS: %5u initiated read call (req %s/%lld, %u bytes @ "
> + "offset %llu)\n",
> + data->task.tk_pid,
> + inode->i_sb->s_id,
> + (long long)NFS_FILEID(inode),
> + data->args.count,
> + (unsigned long long)data->args.offset);
> +
> + task = rpc_run_task(&task_setup_data);
> + if (IS_ERR(task))
> + return PTR_ERR(task);
> + rpc_put_task(task);
> + return 0;
> +}
> +EXPORT_SYMBOL(nfs_initiate_read);
> +
> +/*
> + * Set up the NFS read request struct
> + */
> +static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> + const struct rpc_call_ops *call_ops,
> + unsigned int count, unsigned int offset,
> + struct pnfs_layout_segment *lseg)
> +{
> + struct inode *inode = req->wb_context->path.dentry->d_inode;
> +
> data->req = req;
> data->inode = inode;
> - data->cred = msg.rpc_cred;
> + data->cred = req->wb_context->cred;
> data->lseg = get_lseg(lseg);
>
> data->args.fh = NFS_FH(inode);
> @@ -202,21 +228,11 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> data->res.eof = 0;
> nfs_fattr_init(&data->fattr);
>
> - /* Set up the initial task struct. */
> - NFS_PROTO(inode)->read_setup(data, &msg);
> -
> - dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
> - data->task.tk_pid,
> - inode->i_sb->s_id,
> - (long long)NFS_FILEID(inode),
> - count,
> - (unsigned long long)data->args.offset);
> + if (data->lseg &&
> + (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
> + return 0;
>
> - task = rpc_run_task(&task_setup_data);
> - if (IS_ERR(task))
> - return PTR_ERR(task);
> - rpc_put_task(task);
> - return 0;
> + return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
> }
>
> static void
> diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
> index 68b10f5..37a1437 100644
> --- a/include/linux/nfs_iostat.h
> +++ b/include/linux/nfs_iostat.h
> @@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
> NFSIOS_SHORTREAD,
> NFSIOS_SHORTWRITE,
> NFSIOS_DELAY,
> + NFSIOS_PNFS_READ,
> __NFSIOS_COUNTSMAX,
> };
>
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 37e91c3..4591075 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1018,6 +1018,7 @@ struct nfs_read_data {
> struct nfs_readres res;
> unsigned long timestamp; /* For lease renewal */
> struct pnfs_layout_segment *lseg;
> + const struct rpc_call_ops *mds_ops;
> struct page *page_array[NFS_PAGEVEC_SIZE];
> };
>

2011-02-14 19:18:43

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 02/16] NFS put_layout_hdr can remove nfsi->layout

From: Andy Adamson <[email protected]>

Prevents an Oops triggered by CB_LAYOUTRECALL and LAYOUTGET race on a
pnfs_layout_hdr first pnfs_layout_segment.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0f5b66f..7d031cd 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -768,7 +768,7 @@ pnfs_update_layout(struct inode *ino,
put_layout_hdr(lo);
out:
dprintk("%s end, state 0x%lx lseg %p\n", __func__,
- nfsi->layout->plh_flags, lseg);
+ nfsi->layout ? nfsi->layout->plh_flags : -1, lseg);
return lseg;
out_unlock:
spin_unlock(&ino->i_lock);
--
1.7.2.3


2011-02-16 15:09:17

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read

On Wed, 2011-02-16 at 09:53 -0500, Andy Adamson wrote:
> On Feb 15, 2011, at 10:16 PM, Benny Halevy wrote:
>
> > On 2011-02-14 14:18, [email protected] wrote:
> >> From: Andy Adamson <[email protected]>
> >
> > Andy, taking into account the many contributors to this patch
> > the author should be "The pNFS Team" IMO.
>
> The author can't be "The pNFS Team". Somebody needs to be the author. I asked for volunteers and said I would be the default. Do you want to be the author?

Right. Patches authored by 'The pNFS Team' will be rejected, as
discussed in Hopkinton last autumn.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-14 23:42:09

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 10/16] pnfs: wave 3: coelesce across layout stripes

On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
> From: Fred Isaman <[email protected]>
>
> Add a pg_test layout driver hook which is used to avoid coelescing I/O across
> layout stripes.

Doesn't this belong before [PATCH 09/16] pnfs: wave 3: shift
pnfs_update_layout locations?


--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-16 02:58:59

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 03/16] NFS move nfs_client initialization into nfs_get_client

On 2011-02-14 14:18, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Now nfs_get_client returns an nfs_client ready to be used no matter if it was
> found or created.
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/client.c | 67 ++++++++++++++++++++++++++++++++++++++----------------
> 1 files changed, 47 insertions(+), 20 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index bd3ca32..75b236f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -81,6 +81,15 @@ retry:
> }
> #endif /* CONFIG_NFS_V4 */
>
> +static int nfs4_init_client(struct nfs_client *clp,
> + const struct rpc_timeout *timeparms,
> + const char *ip_addr,
> + rpc_authflavor_t authflavour,
> + int noresvport);
> +static int nfs_init_client(struct nfs_client *clp,
> + const struct rpc_timeout *timeparms,
> + int noresvport);
> +
> /*
> * RPC cruft for NFS
> */
> @@ -481,7 +490,12 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
> * Look up a client by IP address and protocol version
> * - creates a new record if one doesn't yet exist
> */
> -static struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
> +static struct nfs_client *
> +nfs_get_client(const struct nfs_client_initdata *cl_init,
> + const struct rpc_timeout *timeparms,
> + const char *ip_addr,
> + rpc_authflavor_t authflavour,
> + int noresvport)
> {
> struct nfs_client *clp, *new = NULL;
> int error;
> @@ -512,6 +526,17 @@ install_client:
> clp = new;
> list_add(&clp->cl_share_link, &nfs_client_list);
> spin_unlock(&nfs_client_lock);
> +
> + if (cl_init->rpc_ops->version == 4)
> + error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
> + noresvport);
> + else
> + error = nfs_init_client(clp, timeparms, noresvport);

To make that cleaner your could have both get the same parameters
and put nfs_init_client in struct nfs_rpc_ops, then call it via cl_init->rpc_ops

Benny

> +
> + if (error < 0) {
> + nfs_put_client(clp);
> + return ERR_PTR(error);
> + }
> dprintk("--> nfs_get_client() = %p [new]\n", clp);
> return clp;
>
> @@ -769,7 +794,7 @@ static int nfs_init_server_rpcclient(struct nfs_server *server,
> */
> static int nfs_init_client(struct nfs_client *clp,
> const struct rpc_timeout *timeparms,
> - const struct nfs_parsed_mount_data *data)
> + int noresvport)
> {
> int error;
>
> @@ -784,7 +809,7 @@ static int nfs_init_client(struct nfs_client *clp,
> * - RFC 2623, sec 2.3.2
> */
> error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_UNIX,
> - 0, data->flags & NFS_MOUNT_NORESVPORT);
> + 0, noresvport);
> if (error < 0)
> goto error;
> nfs_mark_client_ready(clp, NFS_CS_READY);
> @@ -820,19 +845,17 @@ static int nfs_init_server(struct nfs_server *server,
> cl_init.rpc_ops = &nfs_v3_clientops;
> #endif
>
> + nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
> + data->timeo, data->retrans);
> +
> /* Allocate or find a client reference we can use */
> - clp = nfs_get_client(&cl_init);
> + clp = nfs_get_client(&cl_init, &timeparms, NULL, RPC_AUTH_UNIX,
> + data->flags & NFS_MOUNT_NORESVPORT);
> if (IS_ERR(clp)) {
> dprintk("<-- nfs_init_server() = error %ld\n", PTR_ERR(clp));
> return PTR_ERR(clp);
> }
>
> - nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
> - data->timeo, data->retrans);
> - error = nfs_init_client(clp, &timeparms, data);
> - if (error < 0)
> - goto error;
> -
> server->nfs_client = clp;
>
> /* Initialise the client representation from the mount data */
> @@ -1311,7 +1334,7 @@ static int nfs4_init_client(struct nfs_client *clp,
> const struct rpc_timeout *timeparms,
> const char *ip_addr,
> rpc_authflavor_t authflavour,
> - int flags)
> + int noresvport)
> {
> int error;
>
> @@ -1325,7 +1348,7 @@ static int nfs4_init_client(struct nfs_client *clp,
> clp->rpc_ops = &nfs_v4_clientops;
>
> error = nfs_create_rpc_client(clp, timeparms, authflavour,
> - 1, flags & NFS_MOUNT_NORESVPORT);
> + 1, noresvport);
> if (error < 0)
> goto error;
> strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> @@ -1378,22 +1401,16 @@ static int nfs4_set_client(struct nfs_server *server,
> dprintk("--> nfs4_set_client()\n");
>
> /* Allocate or find a client reference we can use */
> - clp = nfs_get_client(&cl_init);
> + clp = nfs_get_client(&cl_init, timeparms, ip_addr, authflavour,
> + server->flags & NFS_MOUNT_NORESVPORT);
> if (IS_ERR(clp)) {
> error = PTR_ERR(clp);
> goto error;
> }
> - error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
> - server->flags);
> - if (error < 0)
> - goto error_put;
>
> server->nfs_client = clp;
> dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
> return 0;
> -
> -error_put:
> - nfs_put_client(clp);
> error:
> dprintk("<-- nfs4_set_client() = xerror %d\n", error);
> return error;
> @@ -1611,6 +1628,16 @@ error:
> return ERR_PTR(error);
> }
>
> +#else /* CONFIG_NFS_V4 */
> +static int nfs4_init_client(struct nfs_client *clp,
> + const struct rpc_timeout *timeparms,
> + const char *ip_addr,
> + rpc_authflavor_t authflavour,
> + int noresvport)
> +{
> + return -EPROTONOSUPPORT;
> +}
> +
> #endif /* CONFIG_NFS_V4 */
>
> /*

2011-02-14 22:39:22

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/16] pnfs wave 3 submission

On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
> These patches implement wave 3 of the pNFS submission, which encompasses file
> layout data server connection, READ I/O, and recovery through the MDS.
>
> They apply on top of Fred's recent 3 patch 'lock inversion' series
> commits 2d767077 .. 2cc09edd in Trond's nfs-for-next tree.
>
> They are based upon Benny's current pnfs-submit-wave3 branch re-arranged
> into a more choerent series of patches and rebased upon Trond's nfs-for-next.
>
> -->Andy
>
> 0001-NFS-remove-unnecessary-CONFIG_NFS_V4-from-nfs_read_d.patch
> 0002-NFS-put_layout_hdr-can-remove-nfsi-layout.patch
> 0003-NFS-move-nfs_client-initialization-into-nfs_get_clie.patch
> 0004-pnfs-wave-3-send-zero-stateid-seqid-on-v4.1-i-o.patch
> 0005-pnfs-wave-3-new-flag-for-state-renewal-check.patch
> 0006-pnfs-wave-3-new-flag-for-lease-time-check.patch
> 0007-pnfs-wave-3-add-MDS-mount-DS-only-check.patch
> 0008-pnfs-wave-3-lseg-refcounting.patch
> 0009-pnfs-wave-3-shift-pnfs_update_layout-locations.patch
> 0010-pnfs-wave-3-coelesce-across-layout-stripes.patch
> 0011-pnfs-wave-3-generic-read.patch
> 0012-pnfs-wave-3-data-server-connection.patch
> 0013-pnfs-wave-3-filelayout-read.patch
> 0014-pnfs-wave-3-filelayout-read.patch
> 0015-pnfs-wave-3-filelayout-async-error-handler.patch
> 0016-pnfs-wave-3-turn-off-pNFS-on-ds-connection-failure.patch
>

Hi Andy,

Can we please get rid of the 'wave 3:' in the subject of all these
patches? It is a slang that is only meaningful to the people who
regularly follow the Linux pNFS conference calls.

If you do want to label the patches, then please encode the label as
something like

[PATCH pNFS wave3] NFSv4.1: xxxxxxxxxxxx

so that 'git am' strips out the [PNFS WAVE 3] bit, and just leaves the
patch short description.

You can do this using the --subject-prefix="PATCH pNFS wave3" argument
to 'git format-patch'.
Alternatively, if you are using 'stg mail' then you can achieve the same
effect with --version="pNFS wave3".

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-14 19:18:51

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 09/16] pnfs: wave 3: shift pnfs_update_layout locations

From: Fred Isaman <[email protected]>

Move the pnfs_update_layout call location to nfs_pageio_do_add_request().
Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach
it to each nfs_read_data so it can be sent to the layout driver.

Signed-off-by: Andy Adamon <[email protected]>
Signed-off-by: Andy Adamon <[email protected]>
Signed-off-by: Dean Hildebrand <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Boaz Harrosh <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Tao Guo <[email protected]>
---
fs/nfs/file.c | 4 ----
fs/nfs/pagelist.c | 15 ++++++++++++---
fs/nfs/pnfs.c | 4 ++--
fs/nfs/pnfs.h | 1 +
fs/nfs/read.c | 28 ++++++++++++++++------------
fs/nfs/write.c | 4 ++--
include/linux/nfs_page.h | 5 +++--
include/linux/nfs_xdr.h | 1 +
8 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7bf029e..d85a534 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -387,10 +387,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
file->f_path.dentry->d_name.name,
mapping->host->i_ino, len, (long long) pos);

- pnfs_update_layout(mapping->host,
- nfs_file_open_context(file),
- IOMODE_RW);
-
start:
/*
* Prevent starvation issues if someone is doing a consistency
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e1164e3..e0a0cb4 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -20,6 +20,7 @@
#include <linux/nfs_mount.h>

#include "internal.h"
+#include "pnfs.h"

static struct kmem_cache *nfs_page_cachep;

@@ -213,7 +214,7 @@ nfs_wait_on_request(struct nfs_page *req)
*/
void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
struct inode *inode,
- int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
+ int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
size_t bsize,
int io_flags)
{
@@ -226,6 +227,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
desc->pg_doio = doio;
desc->pg_ioflags = io_flags;
desc->pg_error = 0;
+ desc->pg_lseg = NULL;
}

/**
@@ -288,8 +290,13 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
prev = nfs_list_entry(desc->pg_list.prev);
if (!nfs_can_coalesce_requests(prev, req))
return 0;
- } else
+ } else {
+ put_lseg(desc->pg_lseg);
desc->pg_base = req->wb_pgbase;
+ desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
+ req->wb_context,
+ IOMODE_READ);
+ }
nfs_list_remove_request(req);
nfs_list_add_request(req, &desc->pg_list);
desc->pg_count = newlen;
@@ -307,7 +314,8 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
nfs_page_array_len(desc->pg_base,
desc->pg_count),
desc->pg_count,
- desc->pg_ioflags);
+ desc->pg_ioflags,
+ desc->pg_lseg);
if (error < 0)
desc->pg_error = error;
else
@@ -345,6 +353,7 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
{
nfs_pageio_doio(desc);
+ put_lseg(desc->pg_lseg);
}

/**
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index f0a9578..dcd4356 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -264,7 +264,7 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
return 0;
}

-static void
+void
put_lseg(struct pnfs_layout_segment *lseg)
{
struct inode *ino;
@@ -285,6 +285,7 @@ put_lseg(struct pnfs_layout_segment *lseg)
pnfs_free_lseg_list(&free_me);
}
}
+EXPORT_SYMBOL_GPL(put_lseg);

static bool
should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
@@ -797,7 +798,6 @@ pnfs_update_layout(struct inode *ino,
out:
dprintk("%s end, state 0x%lx lseg %p\n", __func__,
nfsi->layout ? nfsi->layout->plh_flags : -1, lseg);
- put_lseg(lseg); /* STUB - callers currently ignore return value */
return lseg;
out_unlock:
spin_unlock(&ino->i_lock);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9a994bc..121d6a3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -146,6 +146,7 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);

/* pnfs.c */
void get_layout_hdr(struct pnfs_layout_hdr *lo);
+void put_lseg(struct pnfs_layout_segment *lseg);
struct pnfs_layout_segment *
pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
enum pnfs_iomode access_type);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index aedcaa7..c453164 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -20,17 +20,17 @@
#include <linux/nfs_page.h>

#include <asm/system.h>
+#include "pnfs.h"

#include "nfs4_fs.h"
#include "internal.h"
#include "iostat.h"
#include "fscache.h"
-#include "pnfs.h"

#define NFSDBG_FACILITY NFSDBG_PAGECACHE

-static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int);
-static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int);
+static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
+static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
static const struct rpc_call_ops nfs_read_partial_ops;
static const struct rpc_call_ops nfs_read_full_ops;

@@ -70,6 +70,7 @@ void nfs_readdata_free(struct nfs_read_data *p)
static void nfs_readdata_release(struct nfs_read_data *rdata)
{
put_nfs_open_context(rdata->args.context);
+ put_lseg(rdata->lseg);
nfs_readdata_free(rdata);
}

@@ -117,11 +118,11 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
LIST_HEAD(one_request);
struct nfs_page *new;
unsigned int len;
+ struct pnfs_layout_segment *lseg;

len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);
- pnfs_update_layout(inode, ctx, IOMODE_READ);
new = nfs_create_request(ctx, inode, page, 0, len);
if (IS_ERR(new)) {
unlock_page(page);
@@ -131,10 +132,12 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
zero_user_segment(page, len, PAGE_CACHE_SIZE);

nfs_list_add_request(new, &one_request);
+ lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
if (NFS_SERVER(inode)->rsize < PAGE_CACHE_SIZE)
- nfs_pagein_multi(inode, &one_request, 1, len, 0);
+ nfs_pagein_multi(inode, &one_request, 1, len, 0, lseg);
else
- nfs_pagein_one(inode, &one_request, 1, len, 0);
+ nfs_pagein_one(inode, &one_request, 1, len, 0, lseg);
+ put_lseg(lseg);
return 0;
}

@@ -160,7 +163,8 @@ static void nfs_readpage_release(struct nfs_page *req)
*/
static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
const struct rpc_call_ops *call_ops,
- unsigned int count, unsigned int offset)
+ unsigned int count, unsigned int offset,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = req->wb_context->path.dentry->d_inode;
int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
@@ -183,6 +187,7 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
data->req = req;
data->inode = inode;
data->cred = msg.rpc_cred;
+ data->lseg = get_lseg(lseg);

data->args.fh = NFS_FH(inode);
data->args.offset = req_offset(req) + offset;
@@ -240,7 +245,7 @@ nfs_async_read_error(struct list_head *head)
* won't see the new data until our attribute cache is updated. This is more
* or less conventional NFS client behavior.
*/
-static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags)
+static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags, struct pnfs_layout_segment *lseg)
{
struct nfs_page *req = nfs_list_entry(head->next);
struct page *page = req->wb_page;
@@ -280,7 +285,7 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne
if (nbytes < rsize)
rsize = nbytes;
ret2 = nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
- rsize, offset);
+ rsize, offset, lseg);
if (ret == 0)
ret = ret2;
offset += rsize;
@@ -300,7 +305,7 @@ out_bad:
return -ENOMEM;
}

-static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags)
+static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags, struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;
struct page **pages;
@@ -321,7 +326,7 @@ static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned
}
req = nfs_list_entry(data->pages.next);

- return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
+ return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0, lseg);
out_bad:
nfs_async_read_error(head);
return ret;
@@ -625,7 +630,6 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
if (ret == 0)
goto read_complete; /* all pages were read */

- pnfs_update_layout(inode, desc.ctx, IOMODE_READ);
if (rsize < PAGE_CACHE_SIZE)
nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
else
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c8278f4..004c28b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -879,7 +879,7 @@ static void nfs_redirty_request(struct nfs_page *req)
* Generate multiple small requests to write out a single
* contiguous dirty area on one page.
*/
-static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how)
+static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how, struct pnfs_layout_segment *lseg)
{
struct nfs_page *req = nfs_list_entry(head->next);
struct page *page = req->wb_page;
@@ -946,7 +946,7 @@ out_bad:
* This is the case if nfs_updatepage detects a conflicting request
* that has been written but not committed.
*/
-static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how)
+static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how, struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;
struct page **pages;
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d55cee7..2db0372 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -59,9 +59,10 @@ struct nfs_pageio_descriptor {
unsigned int pg_base;

struct inode *pg_inode;
- int (*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
+ int (*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
int pg_ioflags;
int pg_error;
+ struct pnfs_layout_segment *pg_lseg;
};

#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
@@ -79,7 +80,7 @@ extern int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
pgoff_t idx_start, unsigned int npages, int tag);
extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
struct inode *inode,
- int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
+ int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
size_t bsize,
int how);
extern int nfs_pageio_add_request(struct nfs_pageio_descriptor *,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 51bfadb..37e91c3 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1017,6 +1017,7 @@ struct nfs_read_data {
struct nfs_readargs args;
struct nfs_readres res;
unsigned long timestamp; /* For lease renewal */
+ struct pnfs_layout_segment *lseg;
struct page *page_array[NFS_PAGEVEC_SIZE];
};

--
1.7.2.3


2011-02-15 16:02:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

FYI that whole device layout cache thingy looks like a complete fucking
mess to me.

It's nothing but a trivial hash lookup which is only used in the file
layout driver. But instead of just having a hash allocated in the file
layout driver on module load, and a trivial opencoded lookup for it it's
a massively overcomplicated set of routines. Please rip this stuff out
before doing further work in this area.

The patch below removes the maze of pointless abstractions and just
keeps a simple hash of deviceids in the filelayout driver.


Index: linux-2.6/fs/nfs/nfs4filelayout.c
===================================================================
--- linux-2.6.orig/fs/nfs/nfs4filelayout.c 2011-02-15 16:10:51.108421283 +0100
+++ linux-2.6/fs/nfs/nfs4filelayout.c 2011-02-15 16:55:22.087422176 +0100
@@ -40,32 +40,6 @@ MODULE_LICENSE("GPL");
MODULE_AUTHOR("Dean Hildebrand <[email protected]>");
MODULE_DESCRIPTION("The NFSv4 file layout driver");

-static int
-filelayout_set_layoutdriver(struct nfs_server *nfss)
-{
- int status = pnfs_alloc_init_deviceid_cache(nfss->nfs_client,
- nfs4_fl_free_deviceid_callback);
- if (status) {
- printk(KERN_WARNING "%s: deviceid cache could not be "
- "initialized\n", __func__);
- return status;
- }
- dprintk("%s: deviceid cache has been initialized successfully\n",
- __func__);
- return 0;
-}
-
-/* Clear out the layout by destroying its device list */
-static int
-filelayout_clear_layoutdriver(struct nfs_server *nfss)
-{
- dprintk("--> %s\n", __func__);
-
- if (nfss->nfs_client->cl_devid_cache)
- pnfs_put_deviceid_cache(nfss->nfs_client);
- return 0;
-}
-
/*
* filelayout_check_layout()
*
@@ -99,7 +73,7 @@ filelayout_check_layout(struct pnfs_layo
}

/* find and reference the deviceid */
- dsaddr = nfs4_fl_find_get_deviceid(nfss->nfs_client, id);
+ dsaddr = nfs4_fl_find_get_deviceid(id);
if (dsaddr == NULL) {
dsaddr = get_device_info(lo->plh_inode, id);
if (dsaddr == NULL)
@@ -134,7 +108,7 @@ out:
dprintk("--> %s returns %d\n", __func__, status);
return status;
out_put:
- pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache, &dsaddr->deviceid);
+ nfs4_fl_put_deviceid(dsaddr);
goto out;
}

@@ -243,23 +217,19 @@ filelayout_alloc_lseg(struct pnfs_layout
static void
filelayout_free_lseg(struct pnfs_layout_segment *lseg)
{
- struct nfs_server *nfss = NFS_SERVER(lseg->pls_layout->plh_inode);
struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);

dprintk("--> %s\n", __func__);
- pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache,
- &fl->dsaddr->deviceid);
+ nfs4_fl_put_deviceid(fl->dsaddr);
_filelayout_free_lseg(fl);
}

static struct pnfs_layoutdriver_type filelayout_type = {
- .id = LAYOUT_NFSV4_1_FILES,
- .name = "LAYOUT_NFSV4_1_FILES",
- .owner = THIS_MODULE,
- .set_layoutdriver = filelayout_set_layoutdriver,
- .clear_layoutdriver = filelayout_clear_layoutdriver,
- .alloc_lseg = filelayout_alloc_lseg,
- .free_lseg = filelayout_free_lseg,
+ .id = LAYOUT_NFSV4_1_FILES,
+ .name = "LAYOUT_NFSV4_1_FILES",
+ .owner = THIS_MODULE,
+ .alloc_lseg = filelayout_alloc_lseg,
+ .free_lseg = filelayout_free_lseg,
};

static int __init nfs4filelayout_init(void)
Index: linux-2.6/fs/nfs/nfs4filelayout.h
===================================================================
--- linux-2.6.orig/fs/nfs/nfs4filelayout.h 2011-02-15 16:30:25.270920897 +0100
+++ linux-2.6/fs/nfs/nfs4filelayout.h 2011-02-15 16:47:50.063445740 +0100
@@ -56,7 +56,9 @@ struct nfs4_pnfs_ds {
};

struct nfs4_file_layout_dsaddr {
- struct pnfs_deviceid_node deviceid;
+ struct hlist_node node;
+ struct nfs4_deviceid deviceid;
+ atomic_t ref;
u32 stripe_count;
u8 *stripe_indices;
u32 ds_num;
@@ -83,11 +85,11 @@ FILELAYOUT_LSEG(struct pnfs_layout_segme
generic_hdr);
}

-extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
extern void print_ds(struct nfs4_pnfs_ds *ds);
extern void print_deviceid(struct nfs4_deviceid *dev_id);
extern struct nfs4_file_layout_dsaddr *
-nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
+nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
+extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
struct nfs4_file_layout_dsaddr *
get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);

Index: linux-2.6/fs/nfs/nfs4filelayoutdev.c
===================================================================
--- linux-2.6.orig/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:23:03.480487362 +0100
+++ linux-2.6/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:55:02.894924739 +0100
@@ -37,6 +37,30 @@
#define NFSDBG_FACILITY NFSDBG_PNFS_LD

/*
+ * Device ID RCU cache. A device ID is unique per client ID and layout type.
+ */
+#define NFS4_FL_DEVICE_ID_HASH_BITS 5
+#define NFS4_FL_DEVICE_ID_HASH_SIZE (1 << NFS4_FL_DEVICE_ID_HASH_BITS)
+#define NFS4_FL_DEVICE_ID_HASH_MASK (NFS4_FL_DEVICE_ID_HASH_SIZE - 1)
+
+static inline u32
+nfs4_fl_deviceid_hash(struct nfs4_deviceid *id)
+{
+ unsigned char *cptr = (unsigned char *)id->data;
+ unsigned int nbytes = NFS4_DEVICEID4_SIZE;
+ u32 x = 0;
+
+ while (nbytes--) {
+ x *= 37;
+ x += *cptr++;
+ }
+ return x & NFS4_FL_DEVICE_ID_HASH_MASK;
+}
+
+static struct hlist_head filelayout_deviceid_cache[NFS4_FL_DEVICE_ID_HASH_SIZE];
+static DEFINE_SPINLOCK(filelayout_deviceid_lock);
+
+/*
* Data server cache
*
* Data servers can be mapped to different device ids.
@@ -122,7 +146,7 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
struct nfs4_pnfs_ds *ds;
int i;

- print_deviceid(&dsaddr->deviceid.de_id);
+ print_deviceid(&dsaddr->deviceid);

for (i = 0; i < dsaddr->ds_num; i++) {
ds = dsaddr->ds_list[i];
@@ -139,15 +163,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
kfree(dsaddr);
}

-void
-nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *device)
-{
- struct nfs4_file_layout_dsaddr *dsaddr =
- container_of(device, struct nfs4_file_layout_dsaddr, deviceid);
-
- nfs4_fl_free_deviceid(dsaddr);
-}
-
static struct nfs4_pnfs_ds *
nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port)
{
@@ -296,7 +311,7 @@ decode_device(struct inode *ino, struct
dsaddr->stripe_count = cnt;
dsaddr->ds_num = num;

- memcpy(&dsaddr->deviceid.de_id, &pdev->dev_id, sizeof(pdev->dev_id));
+ memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));

/* Go back an read stripe indices */
p = indicesp;
@@ -346,28 +361,37 @@ out_err:
}

/*
- * Decode the opaque device specified in 'dev'
- * and add it to the list of available devices.
- * If the deviceid is already cached, nfs4_add_deviceid will return
- * a pointer to the cached struct and throw away the new.
+ * Decode the opaque device specified in 'dev' and add it to the cache of
+ * available devices.
*/
-static struct nfs4_file_layout_dsaddr*
+static struct nfs4_file_layout_dsaddr *
decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
{
- struct nfs4_file_layout_dsaddr *dsaddr;
- struct pnfs_deviceid_node *d;
+ struct nfs4_file_layout_dsaddr *d, *new;
+ long hash;

- dsaddr = decode_device(inode, dev);
- if (!dsaddr) {
+ new = decode_device(inode, dev);
+ if (!new) {
printk(KERN_WARNING "%s: Could not decode or add device\n",
__func__);
return NULL;
}

- d = pnfs_add_deviceid(NFS_SERVER(inode)->nfs_client->cl_devid_cache,
- &dsaddr->deviceid);
+ spin_lock(&filelayout_deviceid_lock);
+ d = nfs4_fl_find_get_deviceid(&new->deviceid);
+ if (d) {
+ spin_unlock(&filelayout_deviceid_lock);
+ nfs4_fl_free_deviceid(new);
+ return d;
+ }
+
+ INIT_HLIST_NODE(&new->node);
+ atomic_set(&new->ref, 1);
+ hash = nfs4_fl_deviceid_hash(&new->deviceid);
+ hlist_add_head_rcu(&new->node, &filelayout_deviceid_cache[hash]);
+ spin_unlock(&filelayout_deviceid_lock);

- return container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
+ return new;
}

/*
@@ -442,12 +466,36 @@ out_free:
return dsaddr;
}

+void
+nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
+{
+ if (atomic_dec_and_lock(&dsaddr->ref, &filelayout_deviceid_lock)) {
+ hlist_del_rcu(&dsaddr->node);
+ spin_unlock(&filelayout_deviceid_lock);
+
+ synchronize_rcu();
+ nfs4_fl_free_deviceid(dsaddr);
+ }
+}
+
struct nfs4_file_layout_dsaddr *
-nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
+nfs4_fl_find_get_deviceid(struct nfs4_deviceid *id)
{
- struct pnfs_deviceid_node *d;
+ struct nfs4_file_layout_dsaddr *d;
+ struct hlist_node *n;
+ long hash = nfs4_fl_deviceid_hash(id);
+

- d = pnfs_find_get_deviceid(clp->cl_devid_cache, id);
- return (d == NULL) ? NULL :
- container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
+ if (!memcmp(&d->deviceid, id, sizeof(*id))) {
+ if (!atomic_inc_not_zero(&d->ref))
+ goto fail;
+ rcu_read_unlock();
+ return d;
+ }
+ }
+fail:
+ rcu_read_unlock();
+ return NULL;
}
Index: linux-2.6/fs/nfs/pnfs.c
===================================================================
--- linux-2.6.orig/fs/nfs/pnfs.c 2011-02-15 16:10:33.284421051 +0100
+++ linux-2.6/fs/nfs/pnfs.c 2011-02-15 16:21:47.115422052 +0100
@@ -74,10 +74,8 @@ find_pnfs_driver(u32 id)
void
unset_pnfs_layoutdriver(struct nfs_server *nfss)
{
- if (nfss->pnfs_curr_ld) {
- nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
+ if (nfss->pnfs_curr_ld)
module_put(nfss->pnfs_curr_ld->owner);
- }
nfss->pnfs_curr_ld = NULL;
}

@@ -115,13 +113,7 @@ set_pnfs_layoutdriver(struct nfs_server
goto out_no_driver;
}
server->pnfs_curr_ld = ld_type;
- if (ld_type->set_layoutdriver(server)) {
- printk(KERN_ERR
- "%s: Error initializing mount point for layout driver %u.\n",
- __func__, id);
- module_put(ld_type->owner);
- goto out_no_driver;
- }
+
dprintk("%s: pNFS module for %u set\n", __func__, id);
return;

@@ -828,138 +820,3 @@ out_forget_reply:
NFS_SERVER(ino)->pnfs_curr_ld->free_lseg(lseg);
goto out;
}
-
-/*
- * Device ID cache. Currently supports one layout type per struct nfs_client.
- * Add layout type to the lookup key to expand to support multiple types.
- */
-int
-pnfs_alloc_init_deviceid_cache(struct nfs_client *clp,
- void (*free_callback)(struct pnfs_deviceid_node *))
-{
- struct pnfs_deviceid_cache *c;
-
- c = kzalloc(sizeof(struct pnfs_deviceid_cache), GFP_KERNEL);
- if (!c)
- return -ENOMEM;
- spin_lock(&clp->cl_lock);
- if (clp->cl_devid_cache != NULL) {
- atomic_inc(&clp->cl_devid_cache->dc_ref);
- dprintk("%s [kref [%d]]\n", __func__,
- atomic_read(&clp->cl_devid_cache->dc_ref));
- kfree(c);
- } else {
- /* kzalloc initializes hlists */
- spin_lock_init(&c->dc_lock);
- atomic_set(&c->dc_ref, 1);
- c->dc_free_callback = free_callback;
- clp->cl_devid_cache = c;
- dprintk("%s [new]\n", __func__);
- }
- spin_unlock(&clp->cl_lock);
- return 0;
-}
-EXPORT_SYMBOL_GPL(pnfs_alloc_init_deviceid_cache);
-
-/*
- * Called from pnfs_layoutdriver_type->free_lseg
- * last layout segment reference frees deviceid
- */
-void
-pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
- struct pnfs_deviceid_node *devid)
-{
- struct nfs4_deviceid *id = &devid->de_id;
- struct pnfs_deviceid_node *d;
- struct hlist_node *n;
- long h = nfs4_deviceid_hash(id);
-
- dprintk("%s [%d]\n", __func__, atomic_read(&devid->de_ref));
- if (!atomic_dec_and_lock(&devid->de_ref, &c->dc_lock))
- return;
-
- hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[h], de_node)
- if (!memcmp(&d->de_id, id, sizeof(*id))) {
- hlist_del_rcu(&d->de_node);
- spin_unlock(&c->dc_lock);
- synchronize_rcu();
- c->dc_free_callback(devid);
- return;
- }
- spin_unlock(&c->dc_lock);
- /* Why wasn't it found in the list? */
- BUG();
-}
-EXPORT_SYMBOL_GPL(pnfs_put_deviceid);
-
-/* Find and reference a deviceid */
-struct pnfs_deviceid_node *
-pnfs_find_get_deviceid(struct pnfs_deviceid_cache *c, struct nfs4_deviceid *id)
-{
- struct pnfs_deviceid_node *d;
- struct hlist_node *n;
- long hash = nfs4_deviceid_hash(id);
-
- dprintk("--> %s hash %ld\n", __func__, hash);
- rcu_read_lock();
- hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
- if (!memcmp(&d->de_id, id, sizeof(*id))) {
- if (!atomic_inc_not_zero(&d->de_ref)) {
- goto fail;
- } else {
- rcu_read_unlock();
- return d;
- }
- }
- }
-fail:
- rcu_read_unlock();
- return NULL;
-}
-EXPORT_SYMBOL_GPL(pnfs_find_get_deviceid);
-
-/*
- * Add a deviceid to the cache.
- * GETDEVICEINFOs for same deviceid can race. If deviceid is found, discard new
- */
-struct pnfs_deviceid_node *
-pnfs_add_deviceid(struct pnfs_deviceid_cache *c, struct pnfs_deviceid_node *new)
-{
- struct pnfs_deviceid_node *d;
- long hash = nfs4_deviceid_hash(&new->de_id);
-
- dprintk("--> %s hash %ld\n", __func__, hash);
- spin_lock(&c->dc_lock);
- d = pnfs_find_get_deviceid(c, &new->de_id);
- if (d) {
- spin_unlock(&c->dc_lock);
- dprintk("%s [discard]\n", __func__);
- c->dc_free_callback(new);
- return d;
- }
- INIT_HLIST_NODE(&new->de_node);
- atomic_set(&new->de_ref, 1);
- hlist_add_head_rcu(&new->de_node, &c->dc_deviceids[hash]);
- spin_unlock(&c->dc_lock);
- dprintk("%s [new]\n", __func__);
- return new;
-}
-EXPORT_SYMBOL_GPL(pnfs_add_deviceid);
-
-void
-pnfs_put_deviceid_cache(struct nfs_client *clp)
-{
- struct pnfs_deviceid_cache *local = clp->cl_devid_cache;
-
- dprintk("--> %s ({%d})\n", __func__, atomic_read(&local->dc_ref));
- if (atomic_dec_and_lock(&local->dc_ref, &clp->cl_lock)) {
- int i;
- /* Verify cache is empty */
- for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE; i++)
- BUG_ON(!hlist_empty(&local->dc_deviceids[i]));
- clp->cl_devid_cache = NULL;
- spin_unlock(&clp->cl_lock);
- kfree(local);
- }
-}
-EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
Index: linux-2.6/fs/nfs/pnfs.h
===================================================================
--- linux-2.6.orig/fs/nfs/pnfs.h 2011-02-15 16:10:51.088421060 +0100
+++ linux-2.6/fs/nfs/pnfs.h 2011-02-15 16:21:34.995159583 +0100
@@ -61,8 +61,6 @@ struct pnfs_layoutdriver_type {
const u32 id;
const char *name;
struct module *owner;
- int (*set_layoutdriver) (struct nfs_server *);
- int (*clear_layoutdriver) (struct nfs_server *);
struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
void (*free_lseg) (struct pnfs_layout_segment *lseg);
};
@@ -90,52 +88,6 @@ struct pnfs_device {
unsigned int pglen;
};

-/*
- * Device ID RCU cache. A device ID is unique per client ID and layout type.
- */
-#define NFS4_DEVICE_ID_HASH_BITS 5
-#define NFS4_DEVICE_ID_HASH_SIZE (1 << NFS4_DEVICE_ID_HASH_BITS)
-#define NFS4_DEVICE_ID_HASH_MASK (NFS4_DEVICE_ID_HASH_SIZE - 1)
-
-static inline u32
-nfs4_deviceid_hash(struct nfs4_deviceid *id)
-{
- unsigned char *cptr = (unsigned char *)id->data;
- unsigned int nbytes = NFS4_DEVICEID4_SIZE;
- u32 x = 0;
-
- while (nbytes--) {
- x *= 37;
- x += *cptr++;
- }
- return x & NFS4_DEVICE_ID_HASH_MASK;
-}
-
-struct pnfs_deviceid_node {
- struct hlist_node de_node;
- struct nfs4_deviceid de_id;
- atomic_t de_ref;
-};
-
-struct pnfs_deviceid_cache {
- spinlock_t dc_lock;
- atomic_t dc_ref;
- void (*dc_free_callback)(struct pnfs_deviceid_node *);
- struct hlist_head dc_deviceids[NFS4_DEVICE_ID_HASH_SIZE];
-};
-
-extern int pnfs_alloc_init_deviceid_cache(struct nfs_client *,
- void (*free_callback)(struct pnfs_deviceid_node *));
-extern void pnfs_put_deviceid_cache(struct nfs_client *);
-extern struct pnfs_deviceid_node *pnfs_find_get_deviceid(
- struct pnfs_deviceid_cache *,
- struct nfs4_deviceid *);
-extern struct pnfs_deviceid_node *pnfs_add_deviceid(
- struct pnfs_deviceid_cache *,
- struct pnfs_deviceid_node *);
-extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
- struct pnfs_deviceid_node *devid);
-
extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);

Index: linux-2.6/include/linux/nfs_fs_sb.h
===================================================================
--- linux-2.6.orig/include/linux/nfs_fs_sb.h 2011-02-15 16:16:45.976420895 +0100
+++ linux-2.6/include/linux/nfs_fs_sb.h 2011-02-15 16:16:50.347380534 +0100
@@ -79,7 +79,6 @@ struct nfs_client {
u32 cl_exchange_flags;
struct nfs4_session *cl_session; /* sharred session */
struct list_head cl_layouts;
- struct pnfs_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
#endif /* CONFIG_NFS_V4_1 */

#ifdef CONFIG_NFS_FSCACHE

2011-02-14 19:18:58

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 13/16] pnfs: wave 3: filelayout i/o helpers

From: Fred Isaman <[email protected]>

Prepare for filelayout_read_pagelist with helper functions that find the correct
data server, filehandle, and offset.

Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Dean Hildebrand <[email protected]>
Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Marc Eshel <[email protected]>
Signed-off-by: Mike Sager <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Tao Guo <[email protected]>
Signed-off-by: Tigran Mkrtchyan <[email protected]>
Signed-off-by: Tigran Mkrtchyan <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/nfs4filelayout.c | 29 +++++++++++++++++++
fs/nfs/nfs4filelayout.h | 7 ++++
fs/nfs/nfs4filelayoutdev.c | 67 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 98e26e0..1c34809 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -66,6 +66,35 @@ filelayout_clear_layoutdriver(struct nfs_server *nfss)
return 0;
}

+/* This function is used by the layout driver to calculate the
+ * offset of the file on the dserver based on whether the
+ * layout type is STRIPE_DENSE or STRIPE_SPARSE
+ */
+static loff_t
+filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+ struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
+
+ switch (flseg->stripe_type) {
+ case STRIPE_SPARSE:
+ return offset;
+
+ case STRIPE_DENSE:
+ {
+ u32 stripe_width;
+ u64 tmp, off;
+ u32 unit = flseg->stripe_unit;
+
+ stripe_width = unit * flseg->dsaddr->stripe_count;
+ tmp = off = offset - flseg->pattern_offset;
+ do_div(tmp, stripe_width);
+ return tmp * unit + do_div(off, unit);
+ }
+ }
+
+ BUG();
+}
+
/*
* filelayout_check_layout()
*
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index bbf60dd..9fef76e 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -83,9 +83,16 @@ FILELAYOUT_LSEG(struct pnfs_layout_segment *lseg)
generic_hdr);
}

+extern struct nfs_fh *
+nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
+
extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
extern void print_ds(struct nfs4_pnfs_ds *ds);
extern void print_deviceid(struct nfs4_deviceid *dev_id);
+u32 nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset);
+u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j);
+struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
+ u32 ds_idx);
extern struct nfs4_file_layout_dsaddr *
nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
struct nfs4_file_layout_dsaddr *
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 8e21e65..e8496f3 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -512,3 +512,70 @@ nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
return (d == NULL) ? NULL :
container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
}
+
+/*
+ * Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
+ * Then: ((res + fsi) % dsaddr->stripe_count)
+ */
+u32
+nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+ struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
+ u64 tmp;
+
+ tmp = offset - flseg->pattern_offset;
+ do_div(tmp, flseg->stripe_unit);
+ tmp += flseg->first_stripe_index;
+ return do_div(tmp, flseg->dsaddr->stripe_count);
+}
+
+u32
+nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j)
+{
+ return FILELAYOUT_LSEG(lseg)->dsaddr->stripe_indices[j];
+}
+
+struct nfs_fh *
+nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
+{
+ struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
+ u32 i;
+
+ if (flseg->stripe_type == STRIPE_SPARSE) {
+ if (flseg->num_fh == 1)
+ i = 0;
+ else if (flseg->num_fh == 0)
+ /* Use the MDS OPEN fh set in nfs_read_rpcsetup */
+ return NULL;
+ else
+ i = nfs4_fl_calc_ds_index(lseg, j);
+ } else
+ i = j;
+ return flseg->fh_array[i];
+}
+
+struct nfs4_pnfs_ds *
+nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+ struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
+ struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
+
+ if (ds == NULL) {
+ printk(KERN_ERR "%s: No data server for offset index %d\n",
+ __func__, ds_idx);
+ return NULL;
+ }
+
+ if (!ds->ds_clp) {
+ int err;
+
+ err = nfs4_ds_connect(NFS_SERVER(lseg->pls_layout->plh_inode),
+ dsaddr->ds_list[ds_idx]);
+ if (err) {
+ printk(KERN_ERR "%s nfs4_ds_connect error %d\n",
+ __func__, err);
+ return NULL;
+ }
+ }
+ return ds;
+}
--
1.7.2.3


2011-02-15 19:29:03

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On 2011-02-15 14:17, Andy Adamson wrote:
>
> On Feb 15, 2011, at 11:37 AM, William A. (Andy) Adamson wrote:
>
>> On Tue, Feb 15, 2011 at 11:02 AM, Christoph Hellwig <[email protected]> wrote:
>>> FYI that whole device layout cache thingy looks like a complete fucking
>>> mess to me.
>>>
>>> It's nothing but a trivial hash lookup which is only used in the file
>>> layout driver. But instead of just having a hash allocated in the file
>>> layout driver on module load, and a trivial opencoded lookup for it it's
>>> a massively overcomplicated set of routines. Please rip this stuff out
>>> before doing further work in this area.
>>>
>>> The patch below removes the maze of pointless abstractions and just
>>> keeps a simple hash of deviceids in the filelayout driver.
>>
>>
>> The abstract layer is so that this code is not replicated per layout
>> driver. Object and block drivers need to do the same task, and indeed
>> use this code in their prototypes.
>> That said, we don't have those other layout drivers in kernel, so
>> moving it all to the file layout driver is fine with me, so long as we
>> don't have to move it back once we get other drivers.

Why not move it back later on?
I don't want to replicate any code if it can be factored out and reused.

Benny

>>
>> Trond?
>>
>> -->Andy
>
> OK. We all agree. Move the deviceid cache to the filelayout driver until there is a need for a common cache.
>
> -->Andy
>
>>
>>>
>>>
>>> Index: linux-2.6/fs/nfs/nfs4filelayout.c
>>> ===================================================================
>>> --- linux-2.6.orig/fs/nfs/nfs4filelayout.c 2011-02-15 16:10:51.108421283 +0100
>>> +++ linux-2.6/fs/nfs/nfs4filelayout.c 2011-02-15 16:55:22.087422176 +0100
>>> @@ -40,32 +40,6 @@ MODULE_LICENSE("GPL");
>>> MODULE_AUTHOR("Dean Hildebrand <[email protected]>");
>>> MODULE_DESCRIPTION("The NFSv4 file layout driver");
>>>
>>> -static int
>>> -filelayout_set_layoutdriver(struct nfs_server *nfss)
>>> -{
>>> - int status = pnfs_alloc_init_deviceid_cache(nfss->nfs_client,
>>> - nfs4_fl_free_deviceid_callback);
>>> - if (status) {
>>> - printk(KERN_WARNING "%s: deviceid cache could not be "
>>> - "initialized\n", __func__);
>>> - return status;
>>> - }
>>> - dprintk("%s: deviceid cache has been initialized successfully\n",
>>> - __func__);
>>> - return 0;
>>> -}
>>> -
>>> -/* Clear out the layout by destroying its device list */
>>> -static int
>>> -filelayout_clear_layoutdriver(struct nfs_server *nfss)
>>> -{
>>> - dprintk("--> %s\n", __func__);
>>> -
>>> - if (nfss->nfs_client->cl_devid_cache)
>>> - pnfs_put_deviceid_cache(nfss->nfs_client);
>>> - return 0;
>>> -}
>>> -
>>> /*
>>> * filelayout_check_layout()
>>> *
>>> @@ -99,7 +73,7 @@ filelayout_check_layout(struct pnfs_layo
>>> }
>>>
>>> /* find and reference the deviceid */
>>> - dsaddr = nfs4_fl_find_get_deviceid(nfss->nfs_client, id);
>>> + dsaddr = nfs4_fl_find_get_deviceid(id);
>>> if (dsaddr == NULL) {
>>> dsaddr = get_device_info(lo->plh_inode, id);
>>> if (dsaddr == NULL)
>>> @@ -134,7 +108,7 @@ out:
>>> dprintk("--> %s returns %d\n", __func__, status);
>>> return status;
>>> out_put:
>>> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache, &dsaddr->deviceid);
>>> + nfs4_fl_put_deviceid(dsaddr);
>>> goto out;
>>> }
>>>
>>> @@ -243,23 +217,19 @@ filelayout_alloc_lseg(struct pnfs_layout
>>> static void
>>> filelayout_free_lseg(struct pnfs_layout_segment *lseg)
>>> {
>>> - struct nfs_server *nfss = NFS_SERVER(lseg->pls_layout->plh_inode);
>>> struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
>>>
>>> dprintk("--> %s\n", __func__);
>>> - pnfs_put_deviceid(nfss->nfs_client->cl_devid_cache,
>>> - &fl->dsaddr->deviceid);
>>> + nfs4_fl_put_deviceid(fl->dsaddr);
>>> _filelayout_free_lseg(fl);
>>> }
>>>
>>> static struct pnfs_layoutdriver_type filelayout_type = {
>>> - .id = LAYOUT_NFSV4_1_FILES,
>>> - .name = "LAYOUT_NFSV4_1_FILES",
>>> - .owner = THIS_MODULE,
>>> - .set_layoutdriver = filelayout_set_layoutdriver,
>>> - .clear_layoutdriver = filelayout_clear_layoutdriver,
>>> - .alloc_lseg = filelayout_alloc_lseg,
>>> - .free_lseg = filelayout_free_lseg,
>>> + .id = LAYOUT_NFSV4_1_FILES,
>>> + .name = "LAYOUT_NFSV4_1_FILES",
>>> + .owner = THIS_MODULE,
>>> + .alloc_lseg = filelayout_alloc_lseg,
>>> + .free_lseg = filelayout_free_lseg,
>>> };
>>>
>>> static int __init nfs4filelayout_init(void)
>>> Index: linux-2.6/fs/nfs/nfs4filelayout.h
>>> ===================================================================
>>> --- linux-2.6.orig/fs/nfs/nfs4filelayout.h 2011-02-15 16:30:25.270920897 +0100
>>> +++ linux-2.6/fs/nfs/nfs4filelayout.h 2011-02-15 16:47:50.063445740 +0100
>>> @@ -56,7 +56,9 @@ struct nfs4_pnfs_ds {
>>> };
>>>
>>> struct nfs4_file_layout_dsaddr {
>>> - struct pnfs_deviceid_node deviceid;
>>> + struct hlist_node node;
>>> + struct nfs4_deviceid deviceid;
>>> + atomic_t ref;
>>> u32 stripe_count;
>>> u8 *stripe_indices;
>>> u32 ds_num;
>>> @@ -83,11 +85,11 @@ FILELAYOUT_LSEG(struct pnfs_layout_segme
>>> generic_hdr);
>>> }
>>>
>>> -extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
>>> extern void print_ds(struct nfs4_pnfs_ds *ds);
>>> extern void print_deviceid(struct nfs4_deviceid *dev_id);
>>> extern struct nfs4_file_layout_dsaddr *
>>> -nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
>>> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
>>> +extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
>>> struct nfs4_file_layout_dsaddr *
>>> get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
>>>
>>> Index: linux-2.6/fs/nfs/nfs4filelayoutdev.c
>>> ===================================================================
>>> --- linux-2.6.orig/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:23:03.480487362 +0100
>>> +++ linux-2.6/fs/nfs/nfs4filelayoutdev.c 2011-02-15 16:55:02.894924739 +0100
>>> @@ -37,6 +37,30 @@
>>> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>>>
>>> /*
>>> + * Device ID RCU cache. A device ID is unique per client ID and layout type.
>>> + */
>>> +#define NFS4_FL_DEVICE_ID_HASH_BITS 5
>>> +#define NFS4_FL_DEVICE_ID_HASH_SIZE (1 << NFS4_FL_DEVICE_ID_HASH_BITS)
>>> +#define NFS4_FL_DEVICE_ID_HASH_MASK (NFS4_FL_DEVICE_ID_HASH_SIZE - 1)
>>> +
>>> +static inline u32
>>> +nfs4_fl_deviceid_hash(struct nfs4_deviceid *id)
>>> +{
>>> + unsigned char *cptr = (unsigned char *)id->data;
>>> + unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>>> + u32 x = 0;
>>> +
>>> + while (nbytes--) {
>>> + x *= 37;
>>> + x += *cptr++;
>>> + }
>>> + return x & NFS4_FL_DEVICE_ID_HASH_MASK;
>>> +}
>>> +
>>> +static struct hlist_head filelayout_deviceid_cache[NFS4_FL_DEVICE_ID_HASH_SIZE];
>>> +static DEFINE_SPINLOCK(filelayout_deviceid_lock);
>>> +
>>> +/*
>>> * Data server cache
>>> *
>>> * Data servers can be mapped to different device ids.
>>> @@ -122,7 +146,7 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
>>> struct nfs4_pnfs_ds *ds;
>>> int i;
>>>
>>> - print_deviceid(&dsaddr->deviceid.de_id);
>>> + print_deviceid(&dsaddr->deviceid);
>>>
>>> for (i = 0; i < dsaddr->ds_num; i++) {
>>> ds = dsaddr->ds_list[i];
>>> @@ -139,15 +163,6 @@ nfs4_fl_free_deviceid(struct nfs4_file_l
>>> kfree(dsaddr);
>>> }
>>>
>>> -void
>>> -nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *device)
>>> -{
>>> - struct nfs4_file_layout_dsaddr *dsaddr =
>>> - container_of(device, struct nfs4_file_layout_dsaddr, deviceid);
>>> -
>>> - nfs4_fl_free_deviceid(dsaddr);
>>> -}
>>> -
>>> static struct nfs4_pnfs_ds *
>>> nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port)
>>> {
>>> @@ -296,7 +311,7 @@ decode_device(struct inode *ino, struct
>>> dsaddr->stripe_count = cnt;
>>> dsaddr->ds_num = num;
>>>
>>> - memcpy(&dsaddr->deviceid.de_id, &pdev->dev_id, sizeof(pdev->dev_id));
>>> + memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));
>>>
>>> /* Go back an read stripe indices */
>>> p = indicesp;
>>> @@ -346,28 +361,37 @@ out_err:
>>> }
>>>
>>> /*
>>> - * Decode the opaque device specified in 'dev'
>>> - * and add it to the list of available devices.
>>> - * If the deviceid is already cached, nfs4_add_deviceid will return
>>> - * a pointer to the cached struct and throw away the new.
>>> + * Decode the opaque device specified in 'dev' and add it to the cache of
>>> + * available devices.
>>> */
>>> -static struct nfs4_file_layout_dsaddr*
>>> +static struct nfs4_file_layout_dsaddr *
>>> decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
>>> {
>>> - struct nfs4_file_layout_dsaddr *dsaddr;
>>> - struct pnfs_deviceid_node *d;
>>> + struct nfs4_file_layout_dsaddr *d, *new;
>>> + long hash;
>>>
>>> - dsaddr = decode_device(inode, dev);
>>> - if (!dsaddr) {
>>> + new = decode_device(inode, dev);
>>> + if (!new) {
>>> printk(KERN_WARNING "%s: Could not decode or add device\n",
>>> __func__);
>>> return NULL;
>>> }
>>>
>>> - d = pnfs_add_deviceid(NFS_SERVER(inode)->nfs_client->cl_devid_cache,
>>> - &dsaddr->deviceid);
>>> + spin_lock(&filelayout_deviceid_lock);
>>> + d = nfs4_fl_find_get_deviceid(&new->deviceid);
>>> + if (d) {
>>> + spin_unlock(&filelayout_deviceid_lock);
>>> + nfs4_fl_free_deviceid(new);
>>> + return d;
>>> + }
>>> +
>>> + INIT_HLIST_NODE(&new->node);
>>> + atomic_set(&new->ref, 1);
>>> + hash = nfs4_fl_deviceid_hash(&new->deviceid);
>>> + hlist_add_head_rcu(&new->node, &filelayout_deviceid_cache[hash]);
>>> + spin_unlock(&filelayout_deviceid_lock);
>>>
>>> - return container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
>>> + return new;
>>> }
>>>
>>> /*
>>> @@ -442,12 +466,36 @@ out_free:
>>> return dsaddr;
>>> }
>>>
>>> +void
>>> +nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
>>> +{
>>> + if (atomic_dec_and_lock(&dsaddr->ref, &filelayout_deviceid_lock)) {
>>> + hlist_del_rcu(&dsaddr->node);
>>> + spin_unlock(&filelayout_deviceid_lock);
>>> +
>>> + synchronize_rcu();
>>> + nfs4_fl_free_deviceid(dsaddr);
>>> + }
>>> +}
>>> +
>>> struct nfs4_file_layout_dsaddr *
>>> -nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
>>> +nfs4_fl_find_get_deviceid(struct nfs4_deviceid *id)
>>> {
>>> - struct pnfs_deviceid_node *d;
>>> + struct nfs4_file_layout_dsaddr *d;
>>> + struct hlist_node *n;
>>> + long hash = nfs4_fl_deviceid_hash(id);
>>> +
>>>
>>> - d = pnfs_find_get_deviceid(clp->cl_devid_cache, id);
>>> - return (d == NULL) ? NULL :
>>> - container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
>>> + rcu_read_lock();
>>> + hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
>>> + if (!memcmp(&d->deviceid, id, sizeof(*id))) {
>>> + if (!atomic_inc_not_zero(&d->ref))
>>> + goto fail;
>>> + rcu_read_unlock();
>>> + return d;
>>> + }
>>> + }
>>> +fail:
>>> + rcu_read_unlock();
>>> + return NULL;
>>> }
>>> Index: linux-2.6/fs/nfs/pnfs.c
>>> ===================================================================
>>> --- linux-2.6.orig/fs/nfs/pnfs.c 2011-02-15 16:10:33.284421051 +0100
>>> +++ linux-2.6/fs/nfs/pnfs.c 2011-02-15 16:21:47.115422052 +0100
>>> @@ -74,10 +74,8 @@ find_pnfs_driver(u32 id)
>>> void
>>> unset_pnfs_layoutdriver(struct nfs_server *nfss)
>>> {
>>> - if (nfss->pnfs_curr_ld) {
>>> - nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
>>> + if (nfss->pnfs_curr_ld)
>>> module_put(nfss->pnfs_curr_ld->owner);
>>> - }
>>> nfss->pnfs_curr_ld = NULL;
>>> }
>>>
>>> @@ -115,13 +113,7 @@ set_pnfs_layoutdriver(struct nfs_server
>>> goto out_no_driver;
>>> }
>>> server->pnfs_curr_ld = ld_type;
>>> - if (ld_type->set_layoutdriver(server)) {
>>> - printk(KERN_ERR
>>> - "%s: Error initializing mount point for layout driver %u.\n",
>>> - __func__, id);
>>> - module_put(ld_type->owner);
>>> - goto out_no_driver;
>>> - }
>>> +
>>> dprintk("%s: pNFS module for %u set\n", __func__, id);
>>> return;
>>>
>>> @@ -828,138 +820,3 @@ out_forget_reply:
>>> NFS_SERVER(ino)->pnfs_curr_ld->free_lseg(lseg);
>>> goto out;
>>> }
>>> -
>>> -/*
>>> - * Device ID cache. Currently supports one layout type per struct nfs_client.
>>> - * Add layout type to the lookup key to expand to support multiple types.
>>> - */
>>> -int
>>> -pnfs_alloc_init_deviceid_cache(struct nfs_client *clp,
>>> - void (*free_callback)(struct pnfs_deviceid_node *))
>>> -{
>>> - struct pnfs_deviceid_cache *c;
>>> -
>>> - c = kzalloc(sizeof(struct pnfs_deviceid_cache), GFP_KERNEL);
>>> - if (!c)
>>> - return -ENOMEM;
>>> - spin_lock(&clp->cl_lock);
>>> - if (clp->cl_devid_cache != NULL) {
>>> - atomic_inc(&clp->cl_devid_cache->dc_ref);
>>> - dprintk("%s [kref [%d]]\n", __func__,
>>> - atomic_read(&clp->cl_devid_cache->dc_ref));
>>> - kfree(c);
>>> - } else {
>>> - /* kzalloc initializes hlists */
>>> - spin_lock_init(&c->dc_lock);
>>> - atomic_set(&c->dc_ref, 1);
>>> - c->dc_free_callback = free_callback;
>>> - clp->cl_devid_cache = c;
>>> - dprintk("%s [new]\n", __func__);
>>> - }
>>> - spin_unlock(&clp->cl_lock);
>>> - return 0;
>>> -}
>>> -EXPORT_SYMBOL_GPL(pnfs_alloc_init_deviceid_cache);
>>> -
>>> -/*
>>> - * Called from pnfs_layoutdriver_type->free_lseg
>>> - * last layout segment reference frees deviceid
>>> - */
>>> -void
>>> -pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
>>> - struct pnfs_deviceid_node *devid)
>>> -{
>>> - struct nfs4_deviceid *id = &devid->de_id;
>>> - struct pnfs_deviceid_node *d;
>>> - struct hlist_node *n;
>>> - long h = nfs4_deviceid_hash(id);
>>> -
>>> - dprintk("%s [%d]\n", __func__, atomic_read(&devid->de_ref));
>>> - if (!atomic_dec_and_lock(&devid->de_ref, &c->dc_lock))
>>> - return;
>>> -
>>> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[h], de_node)
>>> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
>>> - hlist_del_rcu(&d->de_node);
>>> - spin_unlock(&c->dc_lock);
>>> - synchronize_rcu();
>>> - c->dc_free_callback(devid);
>>> - return;
>>> - }
>>> - spin_unlock(&c->dc_lock);
>>> - /* Why wasn't it found in the list? */
>>> - BUG();
>>> -}
>>> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid);
>>> -
>>> -/* Find and reference a deviceid */
>>> -struct pnfs_deviceid_node *
>>> -pnfs_find_get_deviceid(struct pnfs_deviceid_cache *c, struct nfs4_deviceid *id)
>>> -{
>>> - struct pnfs_deviceid_node *d;
>>> - struct hlist_node *n;
>>> - long hash = nfs4_deviceid_hash(id);
>>> -
>>> - dprintk("--> %s hash %ld\n", __func__, hash);
>>> - rcu_read_lock();
>>> - hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
>>> - if (!memcmp(&d->de_id, id, sizeof(*id))) {
>>> - if (!atomic_inc_not_zero(&d->de_ref)) {
>>> - goto fail;
>>> - } else {
>>> - rcu_read_unlock();
>>> - return d;
>>> - }
>>> - }
>>> - }
>>> -fail:
>>> - rcu_read_unlock();
>>> - return NULL;
>>> -}
>>> -EXPORT_SYMBOL_GPL(pnfs_find_get_deviceid);
>>> -
>>> -/*
>>> - * Add a deviceid to the cache.
>>> - * GETDEVICEINFOs for same deviceid can race. If deviceid is found, discard new
>>> - */
>>> -struct pnfs_deviceid_node *
>>> -pnfs_add_deviceid(struct pnfs_deviceid_cache *c, struct pnfs_deviceid_node *new)
>>> -{
>>> - struct pnfs_deviceid_node *d;
>>> - long hash = nfs4_deviceid_hash(&new->de_id);
>>> -
>>> - dprintk("--> %s hash %ld\n", __func__, hash);
>>> - spin_lock(&c->dc_lock);
>>> - d = pnfs_find_get_deviceid(c, &new->de_id);
>>> - if (d) {
>>> - spin_unlock(&c->dc_lock);
>>> - dprintk("%s [discard]\n", __func__);
>>> - c->dc_free_callback(new);
>>> - return d;
>>> - }
>>> - INIT_HLIST_NODE(&new->de_node);
>>> - atomic_set(&new->de_ref, 1);
>>> - hlist_add_head_rcu(&new->de_node, &c->dc_deviceids[hash]);
>>> - spin_unlock(&c->dc_lock);
>>> - dprintk("%s [new]\n", __func__);
>>> - return new;
>>> -}
>>> -EXPORT_SYMBOL_GPL(pnfs_add_deviceid);
>>> -
>>> -void
>>> -pnfs_put_deviceid_cache(struct nfs_client *clp)
>>> -{
>>> - struct pnfs_deviceid_cache *local = clp->cl_devid_cache;
>>> -
>>> - dprintk("--> %s ({%d})\n", __func__, atomic_read(&local->dc_ref));
>>> - if (atomic_dec_and_lock(&local->dc_ref, &clp->cl_lock)) {
>>> - int i;
>>> - /* Verify cache is empty */
>>> - for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE; i++)
>>> - BUG_ON(!hlist_empty(&local->dc_deviceids[i]));
>>> - clp->cl_devid_cache = NULL;
>>> - spin_unlock(&clp->cl_lock);
>>> - kfree(local);
>>> - }
>>> -}
>>> -EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
>>> Index: linux-2.6/fs/nfs/pnfs.h
>>> ===================================================================
>>> --- linux-2.6.orig/fs/nfs/pnfs.h 2011-02-15 16:10:51.088421060 +0100
>>> +++ linux-2.6/fs/nfs/pnfs.h 2011-02-15 16:21:34.995159583 +0100
>>> @@ -61,8 +61,6 @@ struct pnfs_layoutdriver_type {
>>> const u32 id;
>>> const char *name;
>>> struct module *owner;
>>> - int (*set_layoutdriver) (struct nfs_server *);
>>> - int (*clear_layoutdriver) (struct nfs_server *);
>>> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
>>> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>>> };
>>> @@ -90,52 +88,6 @@ struct pnfs_device {
>>> unsigned int pglen;
>>> };
>>>
>>> -/*
>>> - * Device ID RCU cache. A device ID is unique per client ID and layout type.
>>> - */
>>> -#define NFS4_DEVICE_ID_HASH_BITS 5
>>> -#define NFS4_DEVICE_ID_HASH_SIZE (1 << NFS4_DEVICE_ID_HASH_BITS)
>>> -#define NFS4_DEVICE_ID_HASH_MASK (NFS4_DEVICE_ID_HASH_SIZE - 1)
>>> -
>>> -static inline u32
>>> -nfs4_deviceid_hash(struct nfs4_deviceid *id)
>>> -{
>>> - unsigned char *cptr = (unsigned char *)id->data;
>>> - unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>>> - u32 x = 0;
>>> -
>>> - while (nbytes--) {
>>> - x *= 37;
>>> - x += *cptr++;
>>> - }
>>> - return x & NFS4_DEVICE_ID_HASH_MASK;
>>> -}
>>> -
>>> -struct pnfs_deviceid_node {
>>> - struct hlist_node de_node;
>>> - struct nfs4_deviceid de_id;
>>> - atomic_t de_ref;
>>> -};
>>> -
>>> -struct pnfs_deviceid_cache {
>>> - spinlock_t dc_lock;
>>> - atomic_t dc_ref;
>>> - void (*dc_free_callback)(struct pnfs_deviceid_node *);
>>> - struct hlist_head dc_deviceids[NFS4_DEVICE_ID_HASH_SIZE];
>>> -};
>>> -
>>> -extern int pnfs_alloc_init_deviceid_cache(struct nfs_client *,
>>> - void (*free_callback)(struct pnfs_deviceid_node *));
>>> -extern void pnfs_put_deviceid_cache(struct nfs_client *);
>>> -extern struct pnfs_deviceid_node *pnfs_find_get_deviceid(
>>> - struct pnfs_deviceid_cache *,
>>> - struct nfs4_deviceid *);
>>> -extern struct pnfs_deviceid_node *pnfs_add_deviceid(
>>> - struct pnfs_deviceid_cache *,
>>> - struct pnfs_deviceid_node *);
>>> -extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
>>> - struct pnfs_deviceid_node *devid);
>>> -
>>> extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
>>> extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>>>
>>> Index: linux-2.6/include/linux/nfs_fs_sb.h
>>> ===================================================================
>>> --- linux-2.6.orig/include/linux/nfs_fs_sb.h 2011-02-15 16:16:45.976420895 +0100
>>> +++ linux-2.6/include/linux/nfs_fs_sb.h 2011-02-15 16:16:50.347380534 +0100
>>> @@ -79,7 +79,6 @@ struct nfs_client {
>>> u32 cl_exchange_flags;
>>> struct nfs4_session *cl_session; /* sharred session */
>>> struct list_head cl_layouts;
>>> - struct pnfs_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
>>> #endif /* CONFIG_NFS_V4_1 */
>>>
>>> #ifdef CONFIG_NFS_FSCACHE
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-02-14 19:18:48

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 08/16] pnfs: wave 3: lseg refcounting

From: Fred Isaman <[email protected]>

Prepare put_lseg and get_lseg to be called from the pNFS I/O code.
Pull common code from pnfs_lseg_locked to call from pnfs_lseg.

Signed-off-by: Fred Isaman <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/pnfs.c | 53 +++++++++++++++++++++++++++++++++++++++++------------
fs/nfs/pnfs.h | 20 ++++++++++++++++++++
2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7d031cd..f0a9578 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -230,6 +230,21 @@ static void free_lseg(struct pnfs_layout_segment *lseg)
put_layout_hdr(NFS_I(ino)->layout);
}

+static void
+_put_lseg_common(struct pnfs_layout_segment *lseg)
+{
+ struct inode *ino = lseg->pls_layout->plh_inode;
+
+ BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
+ list_del(&lseg->pls_list);
+ if (list_empty(&lseg->pls_layout->plh_segs)) {
+ set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
+ /* Matched by initial refcount set in alloc_init_layout_hdr */
+ put_layout_hdr_locked(lseg->pls_layout);
+ }
+ rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
+}
+
/* The use of tmp_list is necessary because pnfs_curr_ld->free_lseg
* could sleep, so must be called outside of the lock.
* Returns 1 if object was removed, otherwise return 0.
@@ -242,22 +257,35 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
atomic_read(&lseg->pls_refcount),
test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
if (atomic_dec_and_test(&lseg->pls_refcount)) {
- struct inode *ino = lseg->pls_layout->plh_inode;
-
- BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
- list_del(&lseg->pls_list);
- if (list_empty(&lseg->pls_layout->plh_segs)) {
- set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
- /* Matched by initial refcount set in alloc_init_layout_hdr */
- put_layout_hdr_locked(lseg->pls_layout);
- }
- rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
+ _put_lseg_common(lseg);
list_add(&lseg->pls_list, tmp_list);
return 1;
}
return 0;
}

+static void
+put_lseg(struct pnfs_layout_segment *lseg)
+{
+ struct inode *ino;
+
+ if (!lseg)
+ return;
+
+ dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+ atomic_read(&lseg->pls_refcount),
+ test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
+ ino = lseg->pls_layout->plh_inode;
+ if (atomic_dec_and_lock(&lseg->pls_refcount, &ino->i_lock)) {
+ LIST_HEAD(free_me);
+
+ _put_lseg_common(lseg);
+ list_add(&lseg->pls_list, &free_me);
+ spin_unlock(&ino->i_lock);
+ pnfs_free_lseg_list(&free_me);
+ }
+}
+
static bool
should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
{
@@ -689,7 +717,7 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo, u32 iomode)
list_for_each_entry(lseg, &lo->plh_segs, pls_list) {
if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags) &&
is_matching_lseg(lseg, iomode)) {
- ret = lseg;
+ ret = get_lseg(lseg);
break;
}
if (cmp_layout(iomode, lseg->pls_range.iomode) > 0)
@@ -769,6 +797,7 @@ pnfs_update_layout(struct inode *ino,
out:
dprintk("%s end, state 0x%lx lseg %p\n", __func__,
nfsi->layout ? nfsi->layout->plh_flags : -1, lseg);
+ put_lseg(lseg); /* STUB - callers currently ignore return value */
return lseg;
out_unlock:
spin_unlock(&ino->i_lock);
@@ -821,7 +850,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
}
init_lseg(lo, lseg);
lseg->pls_range = res->range;
- *lgp->lsegpp = lseg;
+ *lgp->lsegpp = get_lseg(lseg);
pnfs_insert_layout(lo, lseg);

if (res->return_on_close) {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index e2612ea..9a994bc 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -177,6 +177,16 @@ static inline int lo_fail_bit(u32 iomode)
NFS_LAYOUT_RW_FAILED : NFS_LAYOUT_RO_FAILED;
}

+static inline struct pnfs_layout_segment *
+get_lseg(struct pnfs_layout_segment *lseg)
+{
+ if (lseg) {
+ atomic_inc(&lseg->pls_refcount);
+ smp_mb__after_atomic_inc();
+ }
+ return lseg;
+}
+
/* Return true if a layout driver is being used for this mountpoint */
static inline int pnfs_enabled_sb(struct nfs_server *nfss)
{
@@ -194,6 +204,16 @@ static inline void pnfs_destroy_layout(struct nfs_inode *nfsi)
}

static inline struct pnfs_layout_segment *
+get_lseg(struct pnfs_layout_segment *lseg)
+{
+ return NULL;
+}
+
+static inline void put_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline struct pnfs_layout_segment *
pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
enum pnfs_iomode access_type)
{
--
1.7.2.3


2011-02-14 23:36:28

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read

On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Separate the rpc run portion of nfs_read_rpcsetup into a new function
> nfs_initiate_read that is called for normal NFS I/O.
>
> Add a pNFS read_pagelist function that is called instead of nfs_intitate_read
> for pNFS reads.
>
> Reported-by: Alexandros Batsakis <[email protected]>
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Boaz Harrosh <[email protected]>
> Signed-off-by: Dean Hildebrand <[email protected]>
> Signed-off-by: Fred Isaman <[email protected]>
> Signed-off-by: Fred Isaman <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> Signed-off-by: Mike Sager <[email protected]>
> Signed-off-by: Mingyang Guo <[email protected]>
> Signed-off-by: Ricardo Labiaga <[email protected]>
> Signed-off-by: Tao Guo <[email protected]>
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfs/internal.h | 2 +
> fs/nfs/pnfs.c | 28 ++++++++++++++++++
> fs/nfs/pnfs.h | 20 +++++++++++++
> fs/nfs/read.c | 66 +++++++++++++++++++++++++++----------------
> include/linux/nfs_iostat.h | 1 +
> include/linux/nfs_xdr.h | 1 +
> 6 files changed, 93 insertions(+), 25 deletions(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index cf9fdbd..335755d 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -262,6 +262,8 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
> #endif
>
> /* read.c */
> +extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
> + const struct rpc_call_ops *call_ops);
> extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
>
> /* write.c */
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index f200e34..6f4a5ab 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -30,6 +30,7 @@
> #include <linux/nfs_fs.h>
> #include "internal.h"
> #include "pnfs.h"
> +#include "iostat.h"
>
> #define NFSDBG_FACILITY NFSDBG_PNFS
>
> @@ -891,6 +892,33 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
> }
>
> /*
> + * Call the appropriate parallel I/O subsystem read function.
> + */
> +enum pnfs_try_status
> +pnfs_try_to_read_data(struct nfs_read_data *rdata,
> + const struct rpc_call_ops *call_ops)
> +{
> + struct inode *inode = rdata->inode;
> + struct nfs_server *nfss = NFS_SERVER(inode);
> + enum pnfs_try_status trypnfs;
> +
> + rdata->mds_ops = call_ops;
> +
> + dprintk("%s: Reading ino:%lu %u@%llu\n",
> + __func__, inode->i_ino, rdata->args.count, rdata->args.offset);
> +
> + trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
> + if (trypnfs == PNFS_NOT_ATTEMPTED) {
> + put_lseg(rdata->lseg);
> + rdata->lseg = NULL;
> + } else {
> + nfs_inc_stats(inode, NFSIOS_PNFS_READ);
> + }
> + dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
> + return trypnfs;
> +}
> +
> +/*
> * Device ID cache. Currently supports one layout type per struct nfs_client.
> * Add layout type to the lookup key to expand to support multiple types.
> */
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 5107d14..585023f 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -45,6 +45,11 @@ struct pnfs_layout_segment {
> struct pnfs_layout_hdr *pls_layout;
> };
>
> +enum pnfs_try_status {
> + PNFS_ATTEMPTED = 0,
> + PNFS_NOT_ATTEMPTED = 1,
> +};
> +
> #ifdef CONFIG_NFS_V4_1
>
> #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
> @@ -70,6 +75,12 @@ struct pnfs_layoutdriver_type {
>
> /* test for nfs page cache coalescing */
> int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
> +
> + /*
> + * Return PNFS_ATTEMPTED to indicate the layout code has attempted
> + * I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
> + */
> + enum pnfs_try_status (*read_pagelist) (struct nfs_read_data *nfs_data);
> };
>
> struct pnfs_layout_hdr {
> @@ -157,6 +168,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> enum pnfs_iomode access_type);
> void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
> void unset_pnfs_layoutdriver(struct nfs_server *);
> +enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
> + const struct rpc_call_ops *);
> void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
> int pnfs_layout_process(struct nfs4_layoutget *lgp);
> void pnfs_free_lseg_list(struct list_head *tmp_list);
> @@ -227,6 +240,13 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> return NULL;
> }
>
> +static inline enum pnfs_try_status
> +pnfs_try_to_read_data(struct nfs_read_data *data,
> + const struct rpc_call_ops *call_ops)
> +{
> + return PNFS_NOT_ATTEMPTED;
> +}
> +
> static inline bool
> pnfs_roc(struct inode *ino)
> {
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 20cc936..5c09d72 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -18,6 +18,8 @@
> #include <linux/sunrpc/clnt.h>
> #include <linux/nfs_fs.h>
> #include <linux/nfs_page.h>
> +#include <linux/smp_lock.h>
> +#include <linux/module.h>
>
> #include <asm/system.h>
> #include "pnfs.h"
> @@ -158,25 +160,20 @@ static void nfs_readpage_release(struct nfs_page *req)
> nfs_release_request(req);
> }
>
> -/*
> - * Set up the NFS read request struct
> - */
> -static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> - const struct rpc_call_ops *call_ops,
> - unsigned int count, unsigned int offset,
> - struct pnfs_layout_segment *lseg)
> +int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,

static int.... Nobody is using this outside of fs/nfs/read.c

> + const struct rpc_call_ops *call_ops)
> {
> - struct inode *inode = req->wb_context->path.dentry->d_inode;
> + struct inode *inode = data->inode;
> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
> struct rpc_task *task;
> struct rpc_message msg = {
> .rpc_argp = &data->args,
> .rpc_resp = &data->res,
> - .rpc_cred = req->wb_context->cred,
> + .rpc_cred = data->cred,
> };
> struct rpc_task_setup task_setup_data = {
> .task = &data->task,
> - .rpc_client = NFS_CLIENT(inode),
> + .rpc_client = clnt,
> .rpc_message = &msg,
> .callback_ops = call_ops,
> .callback_data = data,
> @@ -184,9 +181,38 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> .flags = RPC_TASK_ASYNC | swap_flags,
> };
>
> + /* Set up the initial task struct. */
> + NFS_PROTO(inode)->read_setup(data, &msg);
> +
> + dprintk("NFS: %5u initiated read call (req %s/%lld, %u bytes @ "
> + "offset %llu)\n",
> + data->task.tk_pid,
> + inode->i_sb->s_id,
> + (long long)NFS_FILEID(inode),
> + data->args.count,
> + (unsigned long long)data->args.offset);
> +
> + task = rpc_run_task(&task_setup_data);
> + if (IS_ERR(task))
> + return PTR_ERR(task);
> + rpc_put_task(task);
> + return 0;
> +}
> +EXPORT_SYMBOL(nfs_initiate_read);

Firstly, this should be EXPORT_SYMBOL_GPL, but in any case, why include
it here? This patch contains no users for the export.

> +
> +/*
> + * Set up the NFS read request struct
> + */
> +static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> + const struct rpc_call_ops *call_ops,
> + unsigned int count, unsigned int offset,
> + struct pnfs_layout_segment *lseg)
> +{
> + struct inode *inode = req->wb_context->path.dentry->d_inode;
> +
> data->req = req;
> data->inode = inode;
> - data->cred = msg.rpc_cred;
> + data->cred = req->wb_context->cred;
> data->lseg = get_lseg(lseg);
>
> data->args.fh = NFS_FH(inode);
> @@ -202,21 +228,11 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> data->res.eof = 0;
> nfs_fattr_init(&data->fattr);
>
> - /* Set up the initial task struct. */
> - NFS_PROTO(inode)->read_setup(data, &msg);
> -
> - dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
> - data->task.tk_pid,
> - inode->i_sb->s_id,
> - (long long)NFS_FILEID(inode),
> - count,
> - (unsigned long long)data->args.offset);
> + if (data->lseg &&
> + (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
> + return 0;
>
> - task = rpc_run_task(&task_setup_data);
> - if (IS_ERR(task))
> - return PTR_ERR(task);
> - rpc_put_task(task);
> - return 0;
> + return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
> }
>
> static void
> diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
> index 68b10f5..37a1437 100644
> --- a/include/linux/nfs_iostat.h
> +++ b/include/linux/nfs_iostat.h
> @@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
> NFSIOS_SHORTREAD,
> NFSIOS_SHORTWRITE,
> NFSIOS_DELAY,
> + NFSIOS_PNFS_READ,
> __NFSIOS_COUNTSMAX,
> };
>
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 37e91c3..4591075 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1018,6 +1018,7 @@ struct nfs_read_data {
> struct nfs_readres res;
> unsigned long timestamp; /* For lease renewal */
> struct pnfs_layout_segment *lseg;
> + const struct rpc_call_ops *mds_ops;
> struct page *page_array[NFS_PAGEVEC_SIZE];
> };
>

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-15 15:00:42

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 09/16] pnfs: wave 3: shift pnfs_update_layout locations

On Tue, 2011-02-15 at 09:41 -0500, Fred Isaman wrote:
> On Mon, Feb 14, 2011 at 6:14 PM, Trond Myklebust
> <[email protected]> wrote:
> > On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
> >> From: Fred Isaman <[email protected]>
> >>
> >> Move the pnfs_update_layout call location to nfs_pageio_do_add_request().
> >> Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach
> >> it to each nfs_read_data so it can be sent to the layout driver.
> >>
> >> Signed-off-by: Andy Adamon <[email protected]>
> >> Signed-off-by: Andy Adamon <[email protected]>
> >> Signed-off-by: Dean Hildebrand <[email protected]>
> >> Signed-off-by: Fred Isaman <[email protected]>
> >> Signed-off-by: Fred Isaman <[email protected]>
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> Signed-off-by: Boaz Harrosh <[email protected]>
> >> Signed-off-by: Oleg Drokin <[email protected]>
> >> Signed-off-by: Tao Guo <[email protected]>
> >> ---
> >> fs/nfs/file.c | 4 ----
> >> fs/nfs/pagelist.c | 15 ++++++++++++---
> >> fs/nfs/pnfs.c | 4 ++--
> >> fs/nfs/pnfs.h | 1 +
> >> fs/nfs/read.c | 28 ++++++++++++++++------------
> >> fs/nfs/write.c | 4 ++--
> >> include/linux/nfs_page.h | 5 +++--
> >> include/linux/nfs_xdr.h | 1 +
> >> 8 files changed, 37 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> >> index 7bf029e..d85a534 100644
> >> --- a/fs/nfs/file.c
> >> +++ b/fs/nfs/file.c
> >> @@ -387,10 +387,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
> >> file->f_path.dentry->d_name.name,
> >> mapping->host->i_ino, len, (long long) pos);
> >>
> >> - pnfs_update_layout(mapping->host,
> >> - nfs_file_open_context(file),
> >> - IOMODE_RW);
> >> -
> >> start:
> >> /*
> >> * Prevent starvation issues if someone is doing a consistency
> >> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> >> index e1164e3..e0a0cb4 100644
> >> --- a/fs/nfs/pagelist.c
> >> +++ b/fs/nfs/pagelist.c
> >> @@ -20,6 +20,7 @@
> >> #include <linux/nfs_mount.h>
> >>
> >> #include "internal.h"
> >> +#include "pnfs.h"
> >>
> >> static struct kmem_cache *nfs_page_cachep;
> >>
> >> @@ -213,7 +214,7 @@ nfs_wait_on_request(struct nfs_page *req)
> >> */
> >> void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> >> struct inode *inode,
> >> - int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
> >> + int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
> >> size_t bsize,
> >> int io_flags)
> >> {
> >> @@ -226,6 +227,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> >> desc->pg_doio = doio;
> >> desc->pg_ioflags = io_flags;
> >> desc->pg_error = 0;
> >> + desc->pg_lseg = NULL;
> >> }
> >>
> >> /**
> >> @@ -288,8 +290,13 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
> >> prev = nfs_list_entry(desc->pg_list.prev);
> >> if (!nfs_can_coalesce_requests(prev, req))
> >> return 0;
> >> - } else
> >> + } else {
> >> + put_lseg(desc->pg_lseg);
> >> desc->pg_base = req->wb_pgbase;
> >> + desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
> >> + req->wb_context,
> >> + IOMODE_READ);
> >
> > Looking at this afresh after a week of vacation. Isn't it more natural
> > to do this as part of the pg_doio() callback?
> >
> > Your only reason for introducing the ->pg_lseg pointer is to be able to
> > pass it to the ->pg_doio() in the first place. Why not do that by simply
> > passing the 'desc' pointer to ->pg_doio(), and then having it call
> > pnfs_update_layout() instead of 'get_layout()'?
> >
>
> The problem is that it is not the only reason. Passing the lseg into
> the nfs_can_coalesce_requests is another. Calling pnfs_update_layout
> in ->pg_doio would be eliminate the opportunity to have a say in
> coalescing based on the layout.

The point is that you are adding intimate knowledge of layouts to sites
like nfs_pageio_do_add_request and nfs_pageio_complete, and then on top
of that adding callbacks whose sole purpose is to support layouts.

A better approach is to keep the layouts in the callbacks (i.e. pg_test
and pg_doio). I don't care if you cache the layout in a pg_lseg field,
but I do object to the proliferation of layout knowledge in places where
we don't need it.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-16 14:53:57

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 11/16] pnfs: wave 3: generic read


On Feb 15, 2011, at 10:16 PM, Benny Halevy wrote:

> On 2011-02-14 14:18, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>
> Andy, taking into account the many contributors to this patch
> the author should be "The pNFS Team" IMO.

The author can't be "The pNFS Team". Somebody needs to be the author. I asked for volunteers and said I would be the default. Do you want to be the author?

>
>>
>> Separate the rpc run portion of nfs_read_rpcsetup into a new function
>> nfs_initiate_read that is called for normal NFS I/O.
>>
>> Add a pNFS read_pagelist function that is called instead of nfs_intitate_read
>> for pNFS reads.
>>
>> Reported-by: Alexandros Batsakis <[email protected]>
>
> historical trivia? :)

Yes.

-->Andy
>
> Benny
>
>> Signed-off-by: Andy Adamson <[email protected]>
>> Signed-off-by: Boaz Harrosh <[email protected]>
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> Signed-off-by: Fred Isaman <[email protected]>
>> Signed-off-by: Fred Isaman <[email protected]>
>> Signed-off-by: J. Bruce Fields <[email protected]>
>> Signed-off-by: Mike Sager <[email protected]>
>> Signed-off-by: Mingyang Guo <[email protected]>
>> Signed-off-by: Ricardo Labiaga <[email protected]>
>> Signed-off-by: Tao Guo <[email protected]>
>> Signed-off-by: Andy Adamson <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfs/internal.h | 2 +
>> fs/nfs/pnfs.c | 28 ++++++++++++++++++
>> fs/nfs/pnfs.h | 20 +++++++++++++
>> fs/nfs/read.c | 66 +++++++++++++++++++++++++++----------------
>> include/linux/nfs_iostat.h | 1 +
>> include/linux/nfs_xdr.h | 1 +
>> 6 files changed, 93 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index cf9fdbd..335755d 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -262,6 +262,8 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
>> #endif
>>
>> /* read.c */
>> +extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
>> + const struct rpc_call_ops *call_ops);
>> extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
>>
>> /* write.c */
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index f200e34..6f4a5ab 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -30,6 +30,7 @@
>> #include <linux/nfs_fs.h>
>> #include "internal.h"
>> #include "pnfs.h"
>> +#include "iostat.h"
>>
>> #define NFSDBG_FACILITY NFSDBG_PNFS
>>
>> @@ -891,6 +892,33 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
>> }
>>
>> /*
>> + * Call the appropriate parallel I/O subsystem read function.
>> + */
>> +enum pnfs_try_status
>> +pnfs_try_to_read_data(struct nfs_read_data *rdata,
>> + const struct rpc_call_ops *call_ops)
>> +{
>> + struct inode *inode = rdata->inode;
>> + struct nfs_server *nfss = NFS_SERVER(inode);
>> + enum pnfs_try_status trypnfs;
>> +
>> + rdata->mds_ops = call_ops;
>> +
>> + dprintk("%s: Reading ino:%lu %u@%llu\n",
>> + __func__, inode->i_ino, rdata->args.count, rdata->args.offset);
>> +
>> + trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
>> + if (trypnfs == PNFS_NOT_ATTEMPTED) {
>> + put_lseg(rdata->lseg);
>> + rdata->lseg = NULL;
>> + } else {
>> + nfs_inc_stats(inode, NFSIOS_PNFS_READ);
>> + }
>> + dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>> + return trypnfs;
>> +}
>> +
>> +/*
>> * Device ID cache. Currently supports one layout type per struct nfs_client.
>> * Add layout type to the lookup key to expand to support multiple types.
>> */
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 5107d14..585023f 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -45,6 +45,11 @@ struct pnfs_layout_segment {
>> struct pnfs_layout_hdr *pls_layout;
>> };
>>
>> +enum pnfs_try_status {
>> + PNFS_ATTEMPTED = 0,
>> + PNFS_NOT_ATTEMPTED = 1,
>> +};
>> +
>> #ifdef CONFIG_NFS_V4_1
>>
>> #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
>> @@ -70,6 +75,12 @@ struct pnfs_layoutdriver_type {
>>
>> /* test for nfs page cache coalescing */
>> int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
>> +
>> + /*
>> + * Return PNFS_ATTEMPTED to indicate the layout code has attempted
>> + * I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
>> + */
>> + enum pnfs_try_status (*read_pagelist) (struct nfs_read_data *nfs_data);
>> };
>>
>> struct pnfs_layout_hdr {
>> @@ -157,6 +168,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>> enum pnfs_iomode access_type);
>> void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
>> void unset_pnfs_layoutdriver(struct nfs_server *);
>> +enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
>> + const struct rpc_call_ops *);
>> void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
>> int pnfs_layout_process(struct nfs4_layoutget *lgp);
>> void pnfs_free_lseg_list(struct list_head *tmp_list);
>> @@ -227,6 +240,13 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>> return NULL;
>> }
>>
>> +static inline enum pnfs_try_status
>> +pnfs_try_to_read_data(struct nfs_read_data *data,
>> + const struct rpc_call_ops *call_ops)
>> +{
>> + return PNFS_NOT_ATTEMPTED;
>> +}
>> +
>> static inline bool
>> pnfs_roc(struct inode *ino)
>> {
>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> index 20cc936..5c09d72 100644
>> --- a/fs/nfs/read.c
>> +++ b/fs/nfs/read.c
>> @@ -18,6 +18,8 @@
>> #include <linux/sunrpc/clnt.h>
>> #include <linux/nfs_fs.h>
>> #include <linux/nfs_page.h>
>> +#include <linux/smp_lock.h>
>> +#include <linux/module.h>
>>
>> #include <asm/system.h>
>> #include "pnfs.h"
>> @@ -158,25 +160,20 @@ static void nfs_readpage_release(struct nfs_page *req)
>> nfs_release_request(req);
>> }
>>
>> -/*
>> - * Set up the NFS read request struct
>> - */
>> -static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> - const struct rpc_call_ops *call_ops,
>> - unsigned int count, unsigned int offset,
>> - struct pnfs_layout_segment *lseg)
>> +int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
>> + const struct rpc_call_ops *call_ops)
>> {
>> - struct inode *inode = req->wb_context->path.dentry->d_inode;
>> + struct inode *inode = data->inode;
>> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
>> struct rpc_task *task;
>> struct rpc_message msg = {
>> .rpc_argp = &data->args,
>> .rpc_resp = &data->res,
>> - .rpc_cred = req->wb_context->cred,
>> + .rpc_cred = data->cred,
>> };
>> struct rpc_task_setup task_setup_data = {
>> .task = &data->task,
>> - .rpc_client = NFS_CLIENT(inode),
>> + .rpc_client = clnt,
>> .rpc_message = &msg,
>> .callback_ops = call_ops,
>> .callback_data = data,
>> @@ -184,9 +181,38 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> .flags = RPC_TASK_ASYNC | swap_flags,
>> };
>>
>> + /* Set up the initial task struct. */
>> + NFS_PROTO(inode)->read_setup(data, &msg);
>> +
>> + dprintk("NFS: %5u initiated read call (req %s/%lld, %u bytes @ "
>> + "offset %llu)\n",
>> + data->task.tk_pid,
>> + inode->i_sb->s_id,
>> + (long long)NFS_FILEID(inode),
>> + data->args.count,
>> + (unsigned long long)data->args.offset);
>> +
>> + task = rpc_run_task(&task_setup_data);
>> + if (IS_ERR(task))
>> + return PTR_ERR(task);
>> + rpc_put_task(task);
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(nfs_initiate_read);
>> +
>> +/*
>> + * Set up the NFS read request struct
>> + */
>> +static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> + const struct rpc_call_ops *call_ops,
>> + unsigned int count, unsigned int offset,
>> + struct pnfs_layout_segment *lseg)
>> +{
>> + struct inode *inode = req->wb_context->path.dentry->d_inode;
>> +
>> data->req = req;
>> data->inode = inode;
>> - data->cred = msg.rpc_cred;
>> + data->cred = req->wb_context->cred;
>> data->lseg = get_lseg(lseg);
>>
>> data->args.fh = NFS_FH(inode);
>> @@ -202,21 +228,11 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
>> data->res.eof = 0;
>> nfs_fattr_init(&data->fattr);
>>
>> - /* Set up the initial task struct. */
>> - NFS_PROTO(inode)->read_setup(data, &msg);
>> -
>> - dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
>> - data->task.tk_pid,
>> - inode->i_sb->s_id,
>> - (long long)NFS_FILEID(inode),
>> - count,
>> - (unsigned long long)data->args.offset);
>> + if (data->lseg &&
>> + (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
>> + return 0;
>>
>> - task = rpc_run_task(&task_setup_data);
>> - if (IS_ERR(task))
>> - return PTR_ERR(task);
>> - rpc_put_task(task);
>> - return 0;
>> + return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
>> }
>>
>> static void
>> diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
>> index 68b10f5..37a1437 100644
>> --- a/include/linux/nfs_iostat.h
>> +++ b/include/linux/nfs_iostat.h
>> @@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
>> NFSIOS_SHORTREAD,
>> NFSIOS_SHORTWRITE,
>> NFSIOS_DELAY,
>> + NFSIOS_PNFS_READ,
>> __NFSIOS_COUNTSMAX,
>> };
>>
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 37e91c3..4591075 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1018,6 +1018,7 @@ struct nfs_read_data {
>> struct nfs_readres res;
>> unsigned long timestamp; /* For lease renewal */
>> struct pnfs_layout_segment *lseg;
>> + const struct rpc_call_ops *mds_ops;
>> struct page *page_array[NFS_PAGEVEC_SIZE];
>> };
>>


2011-02-14 19:18:46

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 06/16] pnfs: wave 3: new flag for lease time check

From: Andy Adamson <[email protected]>

Data servers cannot send nfs4_proc_get_lease_time. but still need to setup
state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease
time can be checked.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/client.c | 9 +++++++++
fs/nfs/nfs4state.c | 5 +++++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 5891cf8..4d15331 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1413,6 +1413,15 @@ static int nfs4_set_client(struct nfs_server *server,
goto error;
}

+ /*
+ * Query for the lease time on clientid setup or renewal
+ *
+ * Note that this will be set on nfs_clients that were created
+ * only for the DS role and did not set this bit, but now will
+ * serve a dual role.
+ */
+ set_bit(NFS_CS_CHECK_LEASE_TIME, &clp->cl_res_state);
+
server->nfs_client = clp;
dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
return 0;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e6742b5..9e33e88 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -153,6 +153,11 @@ static int nfs41_setup_state_renewal(struct nfs_client *clp)
int status;
struct nfs_fsinfo fsinfo;

+ if (!test_bit(NFS_CS_CHECK_LEASE_TIME, &clp->cl_res_state)) {
+ nfs4_schedule_state_renewal(clp);
+ return 0;
+ }
+
status = nfs4_proc_get_lease_time(clp, &fsinfo);
if (status == 0) {
/* Update lease time and schedule renewal */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 2c2dc18..2669a9a 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -31,6 +31,7 @@ struct nfs_client {
#define NFS_CS_IDMAP 2 /* - idmap started */
#define NFS_CS_RENEWD 3 /* - renewd started */
#define NFS_CS_STOP_RENEW 4 /* no more state to renew */
+#define NFS_CS_CHECK_LEASE_TIME 5 /* need to check lease time */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
--
1.7.2.3


2011-02-14 23:14:31

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 09/16] pnfs: wave 3: shift pnfs_update_layout locations

On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
> From: Fred Isaman <[email protected]>
>
> Move the pnfs_update_layout call location to nfs_pageio_do_add_request().
> Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach
> it to each nfs_read_data so it can be sent to the layout driver.
>
> Signed-off-by: Andy Adamon <[email protected]>
> Signed-off-by: Andy Adamon <[email protected]>
> Signed-off-by: Dean Hildebrand <[email protected]>
> Signed-off-by: Fred Isaman <[email protected]>
> Signed-off-by: Fred Isaman <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Boaz Harrosh <[email protected]>
> Signed-off-by: Oleg Drokin <[email protected]>
> Signed-off-by: Tao Guo <[email protected]>
> ---
> fs/nfs/file.c | 4 ----
> fs/nfs/pagelist.c | 15 ++++++++++++---
> fs/nfs/pnfs.c | 4 ++--
> fs/nfs/pnfs.h | 1 +
> fs/nfs/read.c | 28 ++++++++++++++++------------
> fs/nfs/write.c | 4 ++--
> include/linux/nfs_page.h | 5 +++--
> include/linux/nfs_xdr.h | 1 +
> 8 files changed, 37 insertions(+), 25 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 7bf029e..d85a534 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -387,10 +387,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
> file->f_path.dentry->d_name.name,
> mapping->host->i_ino, len, (long long) pos);
>
> - pnfs_update_layout(mapping->host,
> - nfs_file_open_context(file),
> - IOMODE_RW);
> -
> start:
> /*
> * Prevent starvation issues if someone is doing a consistency
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index e1164e3..e0a0cb4 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -20,6 +20,7 @@
> #include <linux/nfs_mount.h>
>
> #include "internal.h"
> +#include "pnfs.h"
>
> static struct kmem_cache *nfs_page_cachep;
>
> @@ -213,7 +214,7 @@ nfs_wait_on_request(struct nfs_page *req)
> */
> void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> struct inode *inode,
> - int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
> + int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
> size_t bsize,
> int io_flags)
> {
> @@ -226,6 +227,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> desc->pg_doio = doio;
> desc->pg_ioflags = io_flags;
> desc->pg_error = 0;
> + desc->pg_lseg = NULL;
> }
>
> /**
> @@ -288,8 +290,13 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
> prev = nfs_list_entry(desc->pg_list.prev);
> if (!nfs_can_coalesce_requests(prev, req))
> return 0;
> - } else
> + } else {
> + put_lseg(desc->pg_lseg);
> desc->pg_base = req->wb_pgbase;
> + desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
> + req->wb_context,
> + IOMODE_READ);

Looking at this afresh after a week of vacation. Isn't it more natural
to do this as part of the pg_doio() callback?

Your only reason for introducing the ->pg_lseg pointer is to be able to
pass it to the ->pg_doio() in the first place. Why not do that by simply
passing the 'desc' pointer to ->pg_doio(), and then having it call
pnfs_update_layout() instead of 'get_layout()'?

> + }
> nfs_list_remove_request(req);
> nfs_list_add_request(req, &desc->pg_list);
> desc->pg_count = newlen;
> @@ -307,7 +314,8 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
> nfs_page_array_len(desc->pg_base,
> desc->pg_count),
> desc->pg_count,
> - desc->pg_ioflags);
> + desc->pg_ioflags,
> + desc->pg_lseg);
> if (error < 0)
> desc->pg_error = error;
> else
> @@ -345,6 +353,7 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
> {
> nfs_pageio_doio(desc);
> + put_lseg(desc->pg_lseg);
> }
>
> /**
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index f0a9578..dcd4356 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -264,7 +264,7 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
> return 0;
> }
>
> -static void
> +void
> put_lseg(struct pnfs_layout_segment *lseg)
> {
> struct inode *ino;
> @@ -285,6 +285,7 @@ put_lseg(struct pnfs_layout_segment *lseg)
> pnfs_free_lseg_list(&free_me);
> }
> }
> +EXPORT_SYMBOL_GPL(put_lseg);

Why is this needed here?


> static bool
> should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
> @@ -797,7 +798,6 @@ pnfs_update_layout(struct inode *ino,
> out:
> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
> nfsi->layout ? nfsi->layout->plh_flags : -1, lseg);
> - put_lseg(lseg); /* STUB - callers currently ignore return value */
> return lseg;
> out_unlock:
> spin_unlock(&ino->i_lock);
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 9a994bc..121d6a3 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -146,6 +146,7 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
>
> /* pnfs.c */
> void get_layout_hdr(struct pnfs_layout_hdr *lo);
> +void put_lseg(struct pnfs_layout_segment *lseg);
> struct pnfs_layout_segment *
> pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> enum pnfs_iomode access_type);
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index aedcaa7..c453164 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -20,17 +20,17 @@
> #include <linux/nfs_page.h>
>
> #include <asm/system.h>
> +#include "pnfs.h"
>
> #include "nfs4_fs.h"
> #include "internal.h"
> #include "iostat.h"
> #include "fscache.h"
> -#include "pnfs.h"
>
> #define NFSDBG_FACILITY NFSDBG_PAGECACHE
>
> -static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int);
> -static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int);
> +static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
> +static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
> static const struct rpc_call_ops nfs_read_partial_ops;
> static const struct rpc_call_ops nfs_read_full_ops;
>
> @@ -70,6 +70,7 @@ void nfs_readdata_free(struct nfs_read_data *p)
> static void nfs_readdata_release(struct nfs_read_data *rdata)
> {
> put_nfs_open_context(rdata->args.context);
> + put_lseg(rdata->lseg);

Shouldn't you be calling put_lseg() _before_ put_nfs_open_context()? You
are not guaranteed that the inode still exists after that call.

> nfs_readdata_free(rdata);
> }
>
> @@ -117,11 +118,11 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> LIST_HEAD(one_request);
> struct nfs_page *new;
> unsigned int len;
> + struct pnfs_layout_segment *lseg;
>
> len = nfs_page_length(page);
> if (len == 0)
> return nfs_return_empty_page(page);
> - pnfs_update_layout(inode, ctx, IOMODE_READ);
> new = nfs_create_request(ctx, inode, page, 0, len);
> if (IS_ERR(new)) {
> unlock_page(page);
> @@ -131,10 +132,12 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> zero_user_segment(page, len, PAGE_CACHE_SIZE);
>
> nfs_list_add_request(new, &one_request);
> + lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
> if (NFS_SERVER(inode)->rsize < PAGE_CACHE_SIZE)
> - nfs_pagein_multi(inode, &one_request, 1, len, 0);
> + nfs_pagein_multi(inode, &one_request, 1, len, 0, lseg);
> else
> - nfs_pagein_one(inode, &one_request, 1, len, 0);
> + nfs_pagein_one(inode, &one_request, 1, len, 0, lseg);
> + put_lseg(lseg);
> return 0;
> }
>
> @@ -160,7 +163,8 @@ static void nfs_readpage_release(struct nfs_page *req)
> */
> static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> const struct rpc_call_ops *call_ops,
> - unsigned int count, unsigned int offset)
> + unsigned int count, unsigned int offset,
> + struct pnfs_layout_segment *lseg)
> {
> struct inode *inode = req->wb_context->path.dentry->d_inode;
> int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
> @@ -183,6 +187,7 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
> data->req = req;
> data->inode = inode;
> data->cred = msg.rpc_cred;
> + data->lseg = get_lseg(lseg);
>
> data->args.fh = NFS_FH(inode);
> data->args.offset = req_offset(req) + offset;
> @@ -240,7 +245,7 @@ nfs_async_read_error(struct list_head *head)
> * won't see the new data until our attribute cache is updated. This is more
> * or less conventional NFS client behavior.
> */
> -static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags)
> +static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags, struct pnfs_layout_segment *lseg)
> {
> struct nfs_page *req = nfs_list_entry(head->next);
> struct page *page = req->wb_page;
> @@ -280,7 +285,7 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne
> if (nbytes < rsize)
> rsize = nbytes;
> ret2 = nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
> - rsize, offset);
> + rsize, offset, lseg);
> if (ret == 0)
> ret = ret2;
> offset += rsize;
> @@ -300,7 +305,7 @@ out_bad:
> return -ENOMEM;
> }
>
> -static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags)
> +static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags, struct pnfs_layout_segment *lseg)
> {
> struct nfs_page *req;
> struct page **pages;
> @@ -321,7 +326,7 @@ static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned
> }
> req = nfs_list_entry(data->pages.next);
>
> - return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
> + return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0, lseg);
> out_bad:
> nfs_async_read_error(head);
> return ret;
> @@ -625,7 +630,6 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
> if (ret == 0)
> goto read_complete; /* all pages were read */
>
> - pnfs_update_layout(inode, desc.ctx, IOMODE_READ);
> if (rsize < PAGE_CACHE_SIZE)
> nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
> else
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index c8278f4..004c28b 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -879,7 +879,7 @@ static void nfs_redirty_request(struct nfs_page *req)
> * Generate multiple small requests to write out a single
> * contiguous dirty area on one page.
> */
> -static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how)
> +static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how, struct pnfs_layout_segment *lseg)
> {
> struct nfs_page *req = nfs_list_entry(head->next);
> struct page *page = req->wb_page;
> @@ -946,7 +946,7 @@ out_bad:
> * This is the case if nfs_updatepage detects a conflicting request
> * that has been written but not committed.
> */
> -static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how)
> +static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how, struct pnfs_layout_segment *lseg)
> {
> struct nfs_page *req;
> struct page **pages;
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index d55cee7..2db0372 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -59,9 +59,10 @@ struct nfs_pageio_descriptor {
> unsigned int pg_base;
>
> struct inode *pg_inode;
> - int (*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
> + int (*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
> int pg_ioflags;
> int pg_error;
> + struct pnfs_layout_segment *pg_lseg;
> };
>
> #define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
> @@ -79,7 +80,7 @@ extern int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
> pgoff_t idx_start, unsigned int npages, int tag);
> extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> struct inode *inode,
> - int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
> + int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
> size_t bsize,
> int how);
> extern int nfs_pageio_add_request(struct nfs_pageio_descriptor *,
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 51bfadb..37e91c3 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1017,6 +1017,7 @@ struct nfs_read_data {
> struct nfs_readargs args;
> struct nfs_readres res;
> unsigned long timestamp; /* For lease renewal */
> + struct pnfs_layout_segment *lseg;
> struct page *page_array[NFS_PAGEVEC_SIZE];
> };
>

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-02-15 15:10:15

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 10/16] pnfs: wave 3: coelesce across layout stripes


On Feb 15, 2011, at 10:03 AM, Trond Myklebust wrote:

> On Tue, 2011-02-15 at 09:43 -0500, William A. (Andy) Adamson wrote:
>> On Mon, Feb 14, 2011 at 6:42 PM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Mon, 2011-02-14 at 14:18 -0500, [email protected] wrote:
>>>> From: Fred Isaman <[email protected]>
>>>>
>>>> Add a pg_test layout driver hook which is used to avoid coelescing I/O across
>>>> layout stripes.
>>>
>>> Doesn't this belong before [PATCH 09/16] pnfs: wave 3: shift
>>> pnfs_update_layout locations?
>>
>> The pg_test uses the pg_lseg declared in [PATCH 09/16] pnfs: wave 3:
>> shift pnfs_update_layout locations, which is why the patches are
>> ordered this way.
>
> What prevents you from moving the pg_lseg declaration into this patch,
> and just relying on the initialisation being NULL?
>
> The current ordering means that applying 9/16 without 10/16 gives rise
> to broken stripe sizes.

OK - Good reason.

-->Andy

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-02-15 15:12:21

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 13/16] pnfs: wave 3: filelayout i/o helpers


On Feb 15, 2011, at 4:31 AM, Christoph Hellwig wrote:

> On Mon, Feb 14, 2011 at 02:18:33PM -0500, [email protected] wrote:
>> +static loff_t
>> +filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
>> +{
>> + struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
>> +
>> + switch (flseg->stripe_type) {
>> + case STRIPE_SPARSE:
>> + return offset;
>> +
>> + case STRIPE_DENSE:
>> + {
>> + u32 stripe_width;
>> + u64 tmp, off;
>> + u32 unit = flseg->stripe_unit;
>> +
>> + stripe_width = unit * flseg->dsaddr->stripe_count;
>> + tmp = off = offset - flseg->pattern_offset;
>> + do_div(tmp, stripe_width);
>> + return tmp * unit + do_div(off, unit);
>
> For readability's sake I'd split this out into a helper:
>
> static loff_t
> filelayout_get_dense_offset(struct nfs4_filelayout_segment *flseg,
> loff_t offset)
> {
> u32 stripe_width = flseg->stripe_unit * flseg->dsaddr->stripe_count;
> u64 tmp;
>
> offset -= flseg->pattern_offset
>
> tmp = off;
> do_div(tmp, stripe_width);
>
> return tmp * unit + do_div(offset, flseg->stripe_unit);
> }

OK - I see your point.

-->Andy

>
> ...
>
>
> case STRIPE_DENSE:
> return filelayout_get_dense_offset(flset, offset);
>


2011-02-16 16:00:15

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 03/16] NFS move nfs_client initialization into nfs_get_client


On Feb 15, 2011, at 9:58 PM, Benny Halevy wrote:

> On 2011-02-14 14:18, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Now nfs_get_client returns an nfs_client ready to be used no matter if it was
>> found or created.
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> fs/nfs/client.c | 67 ++++++++++++++++++++++++++++++++++++++----------------
>> 1 files changed, 47 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>> index bd3ca32..75b236f 100644
>> --- a/fs/nfs/client.c
>> +++ b/fs/nfs/client.c
>> @@ -81,6 +81,15 @@ retry:
>> }
>> #endif /* CONFIG_NFS_V4 */
>>
>> +static int nfs4_init_client(struct nfs_client *clp,
>> + const struct rpc_timeout *timeparms,
>> + const char *ip_addr,
>> + rpc_authflavor_t authflavour,
>> + int noresvport);
>> +static int nfs_init_client(struct nfs_client *clp,
>> + const struct rpc_timeout *timeparms,
>> + int noresvport);
>> +
>> /*
>> * RPC cruft for NFS
>> */
>> @@ -481,7 +490,12 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
>> * Look up a client by IP address and protocol version
>> * - creates a new record if one doesn't yet exist
>> */
>> -static struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
>> +static struct nfs_client *
>> +nfs_get_client(const struct nfs_client_initdata *cl_init,
>> + const struct rpc_timeout *timeparms,
>> + const char *ip_addr,
>> + rpc_authflavor_t authflavour,
>> + int noresvport)
>> {
>> struct nfs_client *clp, *new = NULL;
>> int error;
>> @@ -512,6 +526,17 @@ install_client:
>> clp = new;
>> list_add(&clp->cl_share_link, &nfs_client_list);
>> spin_unlock(&nfs_client_lock);
>> +
>> + if (cl_init->rpc_ops->version == 4)
>> + error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
>> + noresvport);
>> + else
>> + error = nfs_init_client(clp, timeparms, noresvport);
>
> To make that cleaner your could have both get the same parameters
> and put nfs_init_client in struct nfs_rpc_ops, then call it via cl_init->rpc_ops

OK - looks good.

-->Andy
>
> Benny
>
>> +
>> + if (error < 0) {
>> + nfs_put_client(clp);
>> + return ERR_PTR(error);
>> + }
>> dprintk("--> nfs_get_client() = %p [new]\n", clp);
>> return clp;
>>
>> @@ -769,7 +794,7 @@ static int nfs_init_server_rpcclient(struct nfs_server *server,
>> */
>> static int nfs_init_client(struct nfs_client *clp,
>> const struct rpc_timeout *timeparms,
>> - const struct nfs_parsed_mount_data *data)
>> + int noresvport)
>> {
>> int error;
>>
>> @@ -784,7 +809,7 @@ static int nfs_init_client(struct nfs_client *clp,
>> * - RFC 2623, sec 2.3.2
>> */
>> error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_UNIX,
>> - 0, data->flags & NFS_MOUNT_NORESVPORT);
>> + 0, noresvport);
>> if (error < 0)
>> goto error;
>> nfs_mark_client_ready(clp, NFS_CS_READY);
>> @@ -820,19 +845,17 @@ static int nfs_init_server(struct nfs_server *server,
>> cl_init.rpc_ops = &nfs_v3_clientops;
>> #endif
>>
>> + nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
>> + data->timeo, data->retrans);
>> +
>> /* Allocate or find a client reference we can use */
>> - clp = nfs_get_client(&cl_init);
>> + clp = nfs_get_client(&cl_init, &timeparms, NULL, RPC_AUTH_UNIX,
>> + data->flags & NFS_MOUNT_NORESVPORT);
>> if (IS_ERR(clp)) {
>> dprintk("<-- nfs_init_server() = error %ld\n", PTR_ERR(clp));
>> return PTR_ERR(clp);
>> }
>>
>> - nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
>> - data->timeo, data->retrans);
>> - error = nfs_init_client(clp, &timeparms, data);
>> - if (error < 0)
>> - goto error;
>> -
>> server->nfs_client = clp;
>>
>> /* Initialise the client representation from the mount data */
>> @@ -1311,7 +1334,7 @@ static int nfs4_init_client(struct nfs_client *clp,
>> const struct rpc_timeout *timeparms,
>> const char *ip_addr,
>> rpc_authflavor_t authflavour,
>> - int flags)
>> + int noresvport)
>> {
>> int error;
>>
>> @@ -1325,7 +1348,7 @@ static int nfs4_init_client(struct nfs_client *clp,
>> clp->rpc_ops = &nfs_v4_clientops;
>>
>> error = nfs_create_rpc_client(clp, timeparms, authflavour,
>> - 1, flags & NFS_MOUNT_NORESVPORT);
>> + 1, noresvport);
>> if (error < 0)
>> goto error;
>> strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
>> @@ -1378,22 +1401,16 @@ static int nfs4_set_client(struct nfs_server *server,
>> dprintk("--> nfs4_set_client()\n");
>>
>> /* Allocate or find a client reference we can use */
>> - clp = nfs_get_client(&cl_init);
>> + clp = nfs_get_client(&cl_init, timeparms, ip_addr, authflavour,
>> + server->flags & NFS_MOUNT_NORESVPORT);
>> if (IS_ERR(clp)) {
>> error = PTR_ERR(clp);
>> goto error;
>> }
>> - error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
>> - server->flags);
>> - if (error < 0)
>> - goto error_put;
>>
>> server->nfs_client = clp;
>> dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
>> return 0;
>> -
>> -error_put:
>> - nfs_put_client(clp);
>> error:
>> dprintk("<-- nfs4_set_client() = xerror %d\n", error);
>> return error;
>> @@ -1611,6 +1628,16 @@ error:
>> return ERR_PTR(error);
>> }
>>
>> +#else /* CONFIG_NFS_V4 */
>> +static int nfs4_init_client(struct nfs_client *clp,
>> + const struct rpc_timeout *timeparms,
>> + const char *ip_addr,
>> + rpc_authflavor_t authflavour,
>> + int noresvport)
>> +{
>> + return -EPROTONOSUPPORT;
>> +}
>> +
>> #endif /* CONFIG_NFS_V4 */
>>
>> /*


Subject: RE: [PATCH 01/16] NFS remove unnecessary CONFIG_NFS_V4 from nfs_read_data



-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Christoph Hellwig
Sent: Tuesday, February 15, 2011 2:46 PM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: Re: [PATCH 01/16] NFS remove unnecessary CONFIG_NFS_V4 from nfs_read_data

On Mon, Feb 14, 2011 at 02:18:21PM -0500, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Signed-off-by: Andy Adamson <[email protected]>

Either the patch or the description is incorrect. If you actually need
it for NFSv2/3 the description should say it. Otherwise it's just a
"cleanup" which bloats the structure for people not having v4 support
compiled in.

2011-02-15 14:59:27

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 08/16] pnfs: wave 3: lseg refcounting

On 2011-02-15 09:58, Christoph Hellwig wrote:
> On Tue, Feb 15, 2011 at 09:48:26AM -0500, Fred Isaman wrote:
>> pnfs_free_lseg_list, besides calling free_lseg, also potentially
>> removes the layout from the clients list of inodes with layouts.
>
> Looks like the routine than changed from the mainline variant
> I looked at. I took a quick look at the one from pnfs-submit,
> which looks quite suspicios, as it special cases the first item
> on the list without a good explanation and then iterates the list.
>
> Does your tree have another caller of pnfs_free_lseg_list? If not
> please just open code the right thing in the caller, instead of
> pretending we're dealing with a list if you're always dealing with
> one entry. If the tree grows a caller that needs to deal with a list
> with more than 1 entry we can revisit if there's a point in sharing
> code.
>

Agreed.

Benny