2010-06-11 07:32:19

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 00/26] LAYOUT invocation v2

This is version 2 of a patch series that limits LAYOUTGET invocation
to the beginning of the IO paths. It is intended for the pnfs_submit branch,
without reversion in a post_submit branch.

Patches 1-4 revert direct IO. Commit is already broken, and this
series breaks them further. The problem is that the direct IO
redefines data->wb_req and data->pages, so that it can only work with
the pnfs code if we don't look at those fields. The reverted code should
be saved somewhere. I tend to agree with Boaz that keeping it in git is preferable, but I can supply a patch which returns the code ifdef'ed out if tht is preferred.

Patches 5-9 do some code cleanup in preperation for the real work.

Patches 10-21 implement the change. NOTE that patch 20 changes the
calling convention of the layout drivers commit calls. There is no
longer a universal lseg for the commit, instead each nfs_page has an
lseg attached, with NULL meaning to go through the MDS.

Patches 22-26 rework the filelayout commit function, and then do some
other code cleanup.



The basic idea of these patches is as follows:

We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the
IO. If we succeed, we refcount and stash it, using it through the
rest of the io. If we fail, we revert to straight nfs, even if the
area becomes covered by a layout due to other io.

The tricky, though hopefully anomalous, case is when we start without
the layout, but have it at this particular stage of the IO. We ignore
this for the moment at write_pages, which will cause block and object
to issue CB_LAYOUTRECALL. At commit, it is tricky to handle, but
since block doesn't use commit, and file needs to handle complicated
splitting anyway, I just push all complicated decisions of splitting
commit between nfs (for IO started without layout) and pnfs to the
driver.

Fred



2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 04/26] pnfs-submit: Revert "pnfs: Add function to set up O_DIRECT I/O"

This reverts commit 4bc73cd4118b5d5b710c28c83a750bf4e02e8269.

Conflicts:

fs/nfs/pnfs.c
fs/nfs/pnfs.h

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 31 -------------------------------
fs/nfs/pnfs.h | 25 -------------------------
2 files changed, 0 insertions(+), 56 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 27b7b48..6717a9d 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1374,37 +1374,6 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
pnfs_set_pg_test(inode, pgio);
}

-/* Retrieve I/O parameters for O_DIRECT.
- * Out Args:
- * iosize - min of boundary and (rsize or wsize)
- * remaining - # bytes remaining in the current stripe unit
- */
-void
-_pnfs_direct_init_io(struct inode *inode, struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite, size_t *iosize,
- size_t *remaining)
-{
- struct nfs_server *nfss = NFS_SERVER(inode);
- u32 boundary;
- unsigned int rwsize;
-
- if (count <= 0 ||
- pnfs_update_layout(inode, ctx, count, loff, IOMODE_READ, NULL))
- return;
-
- if (iswrite)
- rwsize = nfss->wsize;
- else
- rwsize = nfss->rsize;
-
- boundary = pnfs_getboundary(inode);
-
- *iosize = min(rwsize, boundary);
- *remaining = boundary - (do_div(loff, boundary));
-
- dprintk("%s Rem %Zu iosize %Zu\n", __func__, *remaining, *iosize);
-}
-
/*
* Get a layoutout for COMMIT
*/
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 47de1ba..4581a3e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -65,9 +65,6 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
const nfs4_stateid *stateid);
void pnfs_destroy_layout(struct nfs_inode *);
-void _pnfs_direct_init_io(struct inode *inode, struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite,
- size_t *rwsize, size_t *remaining);

#define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
(srv)->pnfs_curr_ld->ld_io_ops && \
@@ -183,20 +180,6 @@ static inline int pnfs_get_read_status(struct nfs_read_data *data)
return data->pdata.pnfs_error;
}

-static inline void pnfs_direct_init_io(struct inode *inode,
- struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite,
- size_t *iosize, size_t *remaining)
-{
- struct nfs_server *nfss = NFS_SERVER(inode);
-
- if (pnfs_enabled_sb(nfss))
- return _pnfs_direct_init_io(inode, ctx, count, loff, iswrite,
- iosize, remaining);
-
- return;
-}
-
static inline int pnfs_use_rpc(struct nfs_server *nfss)
{
if (pnfs_enabled_sb(nfss))
@@ -242,14 +225,6 @@ static inline int pnfs_get_read_status(struct nfs_read_data *data)
return 0;
}

-/* Set num of remaining bytes, which is everything */
-static inline void pnfs_direct_init_io(struct inode *inode,
- struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite,
- size_t *iosize, size_t *remaining)
-{
-}
-
static inline int pnfs_use_rpc(struct nfs_server *nfss)
{
return 1;
--
1.6.6.1


2010-06-11 07:32:21

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 10/26] pnfs_submit: mandate basic io path operations for layout drivers

Mandate read_pagelist, write_pagelist, and commit. This will help
void needless checks in the io path.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c73ab80..a82eac7 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -270,6 +270,14 @@ pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
return NULL;
}

+ if (!io_ops->read_pagelist || !io_ops->write_pagelist ||
+ !io_ops->commit) {
+ printk(KERN_ERR "%s Layout driver must provide "
+ "read_pagelist, write_pagelist, and commit.\n",
+ __func__);
+ return NULL;
+ }
+
pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
if (pnfs_mod != NULL) {
dprintk("%s Registering id:%u name:%s\n",
--
1.6.6.1


2010-06-11 07:32:21

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 09/26] pnfs-submit: track the number of outstanding commits

Commit 71d0a6112a3 "NFS: Fix an unstable write data integrity race"
adds locking which is incompatible with the current file layout commit code,
which splits the commit into several RPCs cloned from the original.
Add a counter so layout driver can properly unlock only once.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 3 +++
fs/nfs/write.c | 19 ++++++++++++++++---
include/linux/nfs_xdr.h | 2 ++
3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 7fc93e6..e36c95d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -518,6 +518,9 @@ filelayout_clone_write_data(struct nfs_write_data *old)
new = nfs_commitdata_alloc();
if (!new)
goto out;
+ kref_init(&new->refcount);
+ new->parent = old;
+ kref_get(&old->refcount);
new->inode = old->inode;
new->cred = old->cred;
new->args.offset = 0;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index fcdc4cd..fb3ceca 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1369,7 +1369,8 @@ static int nfs_commit_rpcsetup(struct list_head *head,
data->res.fattr = &data->fattr;
data->res.verf = &data->verf;
nfs_fattr_init(&data->fattr);
-
+ kref_init(&data->refcount);
+ data->parent = NULL;
data->args.context = first->wb_context; /* used by commit done */

return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
@@ -1421,6 +1422,19 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
return;
}

+static inline void nfs_commit_cleanup(struct kref *kref)
+{
+ struct nfs_write_data *data;
+
+ data = container_of(kref, struct nfs_write_data, refcount);
+ /* Clear lock only when all cloned commits are finished */
+ if (data->parent)
+ kref_put(&data->parent->refcount, nfs_commit_cleanup);
+ else
+ nfs_commit_clear_lock(NFS_I(data->inode));
+ nfs_commitdata_release(data);
+}
+
static void nfs_commit_release(void *calldata)
{
struct nfs_write_data *data = calldata;
@@ -1458,8 +1472,7 @@ static void nfs_commit_release(void *calldata)
next:
nfs_clear_page_tag_locked(req);
}
- nfs_commit_clear_lock(NFS_I(data->inode));
- nfs_commitdata_release(calldata);
+ kref_put(&data->refcount, nfs_commit_cleanup);
}

static const struct rpc_call_ops nfs_commit_ops = {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index af30cbf..a8b85b6 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1012,6 +1012,8 @@ struct nfs_read_data {
};

struct nfs_write_data {
+ struct kref refcount; /* For pnfs commit splitting */
+ struct nfs_write_data *parent; /* For pnfs commit splitting */
int flags;
struct rpc_task task;
struct inode *inode;
--
1.6.6.1


2010-06-11 07:32:21

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 12/26] pnfs_submit: stash and refcount lseg in read path

Note we are not using it yet, but refcounting should be accurate.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pagelist.c | 12 ++++++++++--
fs/nfs/pnfs.c | 4 +++-
fs/nfs/read.c | 9 +++++++--
fs/nfs/write.c | 2 +-
include/linux/nfs_page.h | 5 ++++-
5 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 8314915..b9d3baf 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -20,6 +20,7 @@
#include <linux/nfs_mount.h>

#include "internal.h"
+#include "pnfs.h"

static struct kmem_cache *nfs_page_cachep;

@@ -56,7 +57,8 @@ nfs_page_free(struct nfs_page *p)
struct nfs_page *
nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
struct page *page,
- unsigned int offset, unsigned int count)
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;

@@ -80,6 +82,9 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
req->wb_bytes = count;
req->wb_context = get_nfs_open_context(ctx);
kref_init(&req->wb_kref);
+ req->wb_lseg = lseg;
+ if (lseg)
+ get_lseg(lseg);
return req;
}

@@ -150,9 +155,12 @@ void nfs_clear_request(struct nfs_page *req)
put_nfs_open_context(ctx);
req->wb_context = NULL;
}
+ if (req->wb_lseg != NULL) {
+ put_lseg(req->wb_lseg);
+ req->wb_lseg = NULL;
+ }
}

-
/**
* nfs_release_request - Release the count on an NFS read/write request
* @req: request to release
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 52dbcbe..c1eb02f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1369,6 +1369,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
pgio->pg_iswrite = 0;
pgio->pg_boundary = 0;
pgio->pg_test = NULL;
+ pgio->pg_lseg = NULL;

if (!pnfs_enabled_sb(nfss))
return;
@@ -1378,7 +1379,8 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,

if (count > 0) {
status = _pnfs_update_layout(inode, ctx, count,
- loff, IOMODE_READ, NULL);
+ loff, IOMODE_READ,
+ &pgio->pg_lseg);
dprintk("%s virt update returned %d\n", __func__, status);
if (status != 0)
return;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 28c49f1..68b4ca8 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -121,11 +121,14 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
LIST_HEAD(one_request);
struct nfs_page *new;
unsigned int len;
+ struct pnfs_layout_segment *lseg;

len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);
- new = nfs_create_request(ctx, inode, page, 0, len);
+ pnfs_update_layout(inode, ctx, NFS4_MAX_UINT64, 0, IOMODE_READ, &lseg);
+ new = nfs_create_request(ctx, inode, page, 0, len, lseg);
+ put_lseg(lseg);
if (IS_ERR(new)) {
unlock_page(page);
return PTR_ERR(new);
@@ -606,7 +609,8 @@ readpage_async_filler(void *data, struct page *page)
if (len == 0)
return nfs_return_empty_page(page);

- new = nfs_create_request(desc->ctx, inode, page, 0, len);
+ new = nfs_create_request(desc->ctx, inode, page, 0, len,
+ desc->pgio->pg_lseg);
if (IS_ERR(new))
goto out_error;

@@ -673,6 +677,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);

nfs_pageio_complete(&pgio);
+ put_lseg(pgio.pg_lseg);
npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
nfs_add_stats(inode, NFSIOS_READPAGES, npages);
read_complete:
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index fb3ceca..30f4c09 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -653,7 +653,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
req = nfs_try_to_update_request(inode, page, offset, bytes);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, inode, page, offset, bytes);
+ req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
if (IS_ERR(req))
goto out;
error = nfs_inode_add_request(inode, req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d04ebb2..18a455c 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -48,6 +48,7 @@ struct nfs_page {
struct kref wb_kref; /* reference count */
unsigned long wb_flags;
struct nfs_writeverf wb_verf; /* Commit cookie */
+ struct pnfs_layout_segment *wb_lseg; /* Pnfs layout info */
};

struct nfs_pageio_descriptor {
@@ -61,6 +62,7 @@ struct nfs_pageio_descriptor {
int (*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
int pg_ioflags;
int pg_error;
+ struct pnfs_layout_segment *pg_lseg;
#ifdef CONFIG_NFS_V4_1
int pg_iswrite;
int pg_boundary;
@@ -74,7 +76,8 @@ extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
struct inode *inode,
struct page *page,
unsigned int offset,
- unsigned int count);
+ unsigned int count,
+ struct pnfs_layout_segment *lseg);
extern void nfs_clear_request(struct nfs_page *req);
extern void nfs_release_request(struct nfs_page *req);

--
1.6.6.1


2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 01/26] pnfs-submit: Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT"

This reverts commit 05277f5f5236462a11e7a20ebe9009449f8a463d.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/direct.c | 10 ----------
1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e111e9f..02e5918 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -191,22 +191,12 @@ static ssize_t nfs_direct_wait(struct nfs_direct_req *dreq)
{
ssize_t result = -EIOCBQUEUED;

- if (!pnfs_use_rpc(NFS_SERVER(dreq->inode))) {
- /* FIXME: Right now non-rpc layout types must perform
- * syncronous direct i/o.
- * New pNFS callback to wait on outstanding requests?
- */
- result = 0;
- goto set_result;
- }
-
/* Async requests don't wait here */
if (dreq->iocb)
goto out;

result = wait_for_completion_killable(&dreq->completion);

-set_result:
if (!result)
result = dreq->error;
if (!result)
--
1.6.6.1


2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 03/26] pnfs-submit: Revert "pnfs: Enable O_DIRECT read path."

This reverts commit fe1dbd120b6a94bbacec205d0a4ae40d36e314b5.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/direct.c | 26 +-------------------------
1 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1148214..3ef9b0c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -56,7 +56,6 @@

#include "internal.h"
#include "iostat.h"
-#include "pnfs.h"

#define NFSDBG_FACILITY NFSDBG_VFS

@@ -329,17 +328,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
unsigned int pgbase;
int result;
ssize_t started = 0;
- size_t pnfs_stripe_rem = count;
- enum pnfs_try_status trypnfs;
-
- /* pnfs_stripe_rem will be set to the remaining bytes in
- * the first stripe_unit (which for standard nfs is count)
- */
- pnfs_direct_init_io(inode, ctx, count, pos, 0, &rsize,
- &pnfs_stripe_rem);
-
- dprintk("%s: pos %llu count %Zu wsize %Zu\n",
- __func__, pos, count, rsize);

do {
struct nfs_read_data *data;
@@ -347,12 +335,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,

pgbase = user_addr & ~PAGE_MASK;
bytes = min(rsize,count);
-#if defined(CONFIG_NFS_V4_1)
- if (pnfs_enabled_sb(NFS_SERVER(inode))) {
- bytes = min(bytes, pnfs_stripe_rem);
- pnfs_stripe_rem = rsize;
- }
-#endif /* CONFIG_NFS_V4_1 */

result = -ENOMEM;
data = nfs_readdata_alloc(nfs_page_array_len(pgbase, bytes));
@@ -393,14 +375,8 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
data->res.eof = 0;
data->res.count = bytes;

- trypnfs = pnfs_try_to_read_data(data, &nfs_read_direct_ops);
- if (trypnfs == PNFS_ATTEMPTED) {
- result = pnfs_get_read_status(data);
- if (result)
- break;
- } else if (nfs_direct_read_execute(data, &task_setup_data, &msg)) {
+ if (nfs_direct_read_execute(data, &task_setup_data, &msg))
break;
- }

started += bytes;
user_addr += bytes;
--
1.6.6.1


2010-06-11 07:32:23

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 14/26] pnfs_submit: use fsdata to pass lseg

Preparing for LAYUTGET invocation in nfs_write_begin to be the
only invocation in the write path.

It isn't used at all yet, but it should be properly referenced/dereferenced

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 21 ++++++++++++++++++---
1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 03601d2..e308244 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -414,12 +414,17 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
pgoff_t index = pos >> PAGE_CACHE_SHIFT;
struct page *page;
int once_thru = 0;
+ struct pnfs_layout_segment *lseg;

dfprintk(PAGECACHE, "NFS: write_begin(%s/%s(%ld), %u@%lld)\n",
file->f_path.dentry->d_parent->d_name.name,
file->f_path.dentry->d_name.name,
mapping->host->i_ino, len, (long long) pos);

+ pnfs_update_layout(mapping->host,
+ nfs_file_open_context(file),
+ NFS4_MAX_UINT64, 0, IOMODE_RW,
+ &lseg);
start:
/*
* Prevent starvation issues if someone is doing a consistency
@@ -428,11 +433,13 @@ start:
ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
nfs_wait_bit_killable, TASK_KILLABLE);
if (ret)
- return ret;
+ goto out;

page = grab_cache_page_write_begin(mapping, index, flags);
- if (!page)
- return -ENOMEM;
+ if (!page) {
+ ret = -ENOMEM;
+ goto out;
+ }
*pagep = page;

ret = nfs_flush_incompatible(file, page);
@@ -447,6 +454,12 @@ start:
if (!ret)
goto start;
}
+ *fsdata = lseg;
+ out:
+ if (ret) {
+ put_lseg(lseg);
+ *fsdata = NULL;
+ }
return ret;
}

@@ -456,6 +469,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
{
unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
int status;
+ struct pnfs_layout_segment *lseg = fsdata;

dfprintk(PAGECACHE, "NFS: write_end(%s/%s(%ld), %u@%lld)\n",
file->f_path.dentry->d_parent->d_name.name,
@@ -486,6 +500,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,

unlock_page(page);
page_cache_release(page);
+ put_lseg(lseg);

if (status < 0)
return status;
--
1.6.6.1


2010-06-11 07:32:21

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 11/26] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions

These will be used in the generic code. Set so they will compile away to
nothing if CONFIG_NFS_V4_1 not set.

This requires kref_put to be under lock. See rule 3 of Documentation/kref.txt

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 45 ++++++++++++++++++++++++++++++++-------------
fs/nfs/pnfs.h | 43 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index a82eac7..52dbcbe 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -415,7 +415,25 @@ destroy_lseg(struct kref *kref)
PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
}

-static inline void
+static void
+put_lseg_locked(struct pnfs_layout_segment *lseg)
+{
+ bool do_wake_up;
+ struct nfs_inode *nfsi;
+
+ if (!lseg)
+ return;
+
+ dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+ atomic_read(&lseg->kref.refcount), lseg->valid);
+ do_wake_up = !lseg->valid;
+ nfsi = PNFS_NFS_INODE(lseg->layout);
+ kref_put(&lseg->kref, destroy_lseg);
+ if (do_wake_up)
+ wake_up(&nfsi->lo_waitq);
+}
+
+void
put_lseg(struct pnfs_layout_segment *lseg)
{
bool do_wake_up;
@@ -428,7 +446,9 @@ put_lseg(struct pnfs_layout_segment *lseg)
atomic_read(&lseg->kref.refcount), lseg->valid);
do_wake_up = !lseg->valid;
nfsi = PNFS_NFS_INODE(lseg->layout);
+ spin_lock(&nfsi->lo_lock);
kref_put(&lseg->kref, destroy_lseg);
+ spin_unlock(&nfsi->lo_lock);
if (do_wake_up)
wake_up(&nfsi->lo_waitq);
}
@@ -653,7 +673,7 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
lseg, lseg->range.iomode, lseg->range.offset,
lseg->range.length);
list_del(&lseg->fi_list);
- put_lseg(lseg);
+ put_lseg_locked(lseg);
}

dprintk("%s:Return\n", __func__);
@@ -1011,7 +1031,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
(lseg->valid || !only_valid)) {
ret = lseg;
if (take_ref)
- kref_get(&ret->kref);
+ get_lseg(ret);
break;
}
if (cmp_layout(range, &lseg->range) > 0)
@@ -1031,7 +1051,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
* returned to the caller.
*/
int
-pnfs_update_layout(struct inode *ino,
+_pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
u64 count,
loff_t pos,
@@ -1063,8 +1083,7 @@ pnfs_update_layout(struct inode *ino,
lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
if (lseg && !lseg->valid) {
if (take_ref)
- put_lseg(lseg);
-
+ put_lseg_locked(lseg);
/* someone is cleaning the layout */
lseg = NULL;
result = -EAGAIN;
@@ -1240,7 +1259,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
init_lseg(lo, lseg);
lseg->range = res->lseg;
if (lgp->lsegpp) {
- kref_get(&lseg->kref);
+ get_lseg(lseg);
*lgp->lsegpp = lseg;
}

@@ -1358,7 +1377,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
readahead_range(inode, pages, &loff, &count);

if (count > 0) {
- status = pnfs_update_layout(inode, ctx, count,
+ status = _pnfs_update_layout(inode, ctx, count,
loff, IOMODE_READ, NULL);
dprintk("%s virt update returned %d\n", __func__, status);
if (status != 0)
@@ -1416,7 +1435,7 @@ pnfs_update_layout_commit(struct inode *inode,
if (start == 0 && count == 0)
count = NFS4_MAX_UINT64;

- status = pnfs_update_layout(inode, nfs_page->wb_context,
+ status = _pnfs_update_layout(inode, nfs_page->wb_context,
count,
start,
IOMODE_RW,
@@ -1516,7 +1535,7 @@ pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
goto out;

/* Retrieve and set layout if not allready cached */
- status = pnfs_update_layout(inode,
+ status = _pnfs_update_layout(inode,
context,
count,
*pos,
@@ -1558,7 +1577,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
args->offset);

/* Retrieve and set layout if not allready cached */
- status = pnfs_update_layout(inode,
+ status = _pnfs_update_layout(inode,
args->context,
args->count,
args->offset,
@@ -1659,7 +1678,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
args->offset);

/* Retrieve and set layout if not allready cached */
- status = pnfs_update_layout(inode,
+ status = _pnfs_update_layout(inode,
args->context,
args->count,
args->offset,
@@ -1823,7 +1842,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
new one. If it was recalled we better commit the data first
before returning it, otherwise the data needs to be rewritten,
either with a new layout or to the MDS */
- result = pnfs_update_layout(data->inode,
+ result = _pnfs_update_layout(data->inode,
NULL,
count,
first->wb_offset,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 4581a3e..a2a7b94 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -31,7 +31,8 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait
/* pnfs.c */
extern const nfs4_stateid zero_stateid;

-int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+void put_lseg(struct pnfs_layout_segment *lseg);
+int _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
u64 count, loff_t pos, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp);

@@ -81,6 +82,11 @@ static inline int lo_fail_bit(u32 iomode)
NFS_INO_RW_LAYOUT_FAILED : NFS_INO_RO_LAYOUT_FAILED;
}

+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+ kref_get(&lseg->kref);
+}
+
/* Return true if a layout driver is being used for this mountpoint */
static inline int pnfs_enabled_sb(struct nfs_server *nfss)
{
@@ -170,6 +176,23 @@ static inline int pnfs_return_layout(struct inode *ino,
return 0;
}

+static inline int pnfs_update_layout(struct inode *ino,
+ struct nfs_open_context *ctx,
+ u64 count, loff_t pos, enum pnfs_iomode access_type,
+ struct pnfs_layout_segment **lsegpp)
+{
+ struct nfs_server *nfss = NFS_SERVER(ino);
+
+ if (pnfs_enabled_sb(nfss))
+ return _pnfs_update_layout(ino, ctx, count, pos,
+ access_type, lsegpp);
+ else {
+ if (lsegpp)
+ *lsegpp = NULL;
+ return 0;
+ }
+}
+
static inline int pnfs_get_write_status(struct nfs_write_data *data)
{
return data->pdata.pnfs_error;
@@ -190,6 +213,24 @@ static inline int pnfs_use_rpc(struct nfs_server *nfss)

#else /* CONFIG_NFS_V4_1 */

+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline void put_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline int
+pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+ u64 count, loff_t pos, enum pnfs_iomode access_type,
+ struct pnfs_layout_segment **lsegpp)
+{
+ if (lsegpp)
+ *lsegpp = NULL;
+ return 0;
+}
+
static inline enum pnfs_try_status
pnfs_try_to_read_data(struct nfs_read_data *data,
const struct rpc_call_ops *call_ops)
--
1.6.6.1


2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 02/26] pnfs-submit: Revert "pnfs: Enable O_DIRECT write path."

This reverts commit 2faf680af973895bdfe19f2254b59dc1a153dd82.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/direct.c | 41 +----------------------------------------
1 files changed, 1 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 02e5918..1148214 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -505,7 +505,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
.workqueue = nfsiod_workqueue,
.flags = RPC_TASK_ASYNC,
};
- enum pnfs_try_status trypnfs;

dreq->count = 0;
get_dreq(dreq);
@@ -529,11 +528,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
* Reuse data->task; data->args should not have changed
* since the original request was sent.
*/
- trypnfs = pnfs_try_to_write_data(data, &nfs_write_direct_ops,
- NFS_FILE_SYNC);
- if (trypnfs == PNFS_ATTEMPTED)
- continue;
-
nfs_direct_write_execute(data, &task_setup_data, &msg);
}

@@ -616,7 +610,6 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
.workqueue = nfsiod_workqueue,
.flags = RPC_TASK_ASYNC,
};
- enum pnfs_try_status trypnfs;

data->inode = dreq->inode;
data->cred = msg.rpc_cred;
@@ -630,11 +623,6 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
data->res.verf = &data->verf;
nfs_fattr_init(&data->fattr);

- trypnfs = pnfs_try_to_commit(data, &nfs_commit_direct_ops,
- RPC_TASK_ASYNC);
- if (trypnfs == PNFS_ATTEMPTED)
- return;
-
nfs_direct_commit_execute(dreq, data, &task_setup_data, &msg);
}

@@ -683,9 +671,6 @@ static void nfs_direct_write_result(struct rpc_task *task, void *calldata)
{
struct nfs_write_data *data = calldata;

- dprintk("%s: verf: %d stable %d\n", __func__,
- data->res.verf->committed, data->args.stable);
-
if (nfs_writeback_done(task, data) != 0)
return;
}
@@ -799,17 +784,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
unsigned int pgbase;
int result;
ssize_t started = 0;
- size_t pnfs_stripe_rem = count;
- enum pnfs_try_status trypnfs;
-
- /* pnfs_stripe_rem will be set to the remaining bytes in
- * the first stripe_unit (which for standard nfs is count)
- */
- pnfs_direct_init_io(inode, ctx, count, pos, 1,
- &wsize, &pnfs_stripe_rem);
-
- dprintk("%s: pos %llu count %Zu wsize %Zu\n",
- __func__, pos, count, wsize);

do {
struct nfs_write_data *data;
@@ -818,12 +792,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
pgbase = user_addr & ~PAGE_MASK;
bytes = min(wsize,count);

-#if defined(CONFIG_NFS_V4_1)
- if (pnfs_enabled_sb(NFS_SERVER(inode))) {
- bytes = min(bytes, pnfs_stripe_rem);
- pnfs_stripe_rem = wsize;
- }
-#endif /* CONFIG_NFS_V4_1 */
result = -ENOMEM;
data = nfs_writedata_alloc(nfs_page_array_len(pgbase, bytes));
if (unlikely(!data))
@@ -867,15 +835,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
data->res.verf = &data->verf;
nfs_fattr_init(&data->fattr);

- trypnfs = pnfs_try_to_write_data(data, &nfs_write_direct_ops,
- sync);
- if (trypnfs == PNFS_ATTEMPTED) {
- result = pnfs_get_write_status(data);
- if (result)
- break;
- } else if (nfs_direct_write_execute(data, &task_setup_data, &msg)) {
+ if (nfs_direct_write_execute(data, &task_setup_data, &msg))
break;
- }

started += bytes;
user_addr += bytes;
--
1.6.6.1


2010-06-11 07:32:21

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 08/26] pnfs-submit: remove PNFS_LAYOUTGET_ON_OPEN

It is not used anywhere.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 3 +--
include/linux/nfs4_pnfs.h | 14 --------------
2 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c15b90a..7fc93e6 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -710,8 +710,7 @@ struct layoutdriver_io_operations filelayout_io_operations = {
};

struct layoutdriver_policy_operations filelayout_policy_operations = {
- .flags = PNFS_USE_RPC_CODE |
- PNFS_LAYOUTGET_ON_OPEN,
+ .flags = PNFS_USE_RPC_CODE,
.get_stripesize = filelayout_get_stripesize,
.pg_test = filelayout_pg_test,
};
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 9b8dd26..f6e97c8 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -179,11 +179,6 @@ enum layoutdriver_policy_flags {
/* Should the NFS req. gather algorithm cross stripe boundaries? */
PNFS_GATHER_ACROSS_STRIPES = 1 << 1,

- /* Should the pNFS client issue a layoutget call in the
- * same compound as the OPEN operation?
- */
- PNFS_LAYOUTGET_ON_OPEN = 1 << 2,
-
/* Should the pNFS client commit and return the layout upon a setattr */
PNFS_LAYOUTRET_ON_SETATTR = 1 << 3,
};
@@ -212,15 +207,6 @@ pnfs_ld_gather_across_stripes(struct pnfs_layoutdriver_type *ld)
return ld->ld_policy_ops->flags & PNFS_GATHER_ACROSS_STRIPES;
}

-/* Should the pNFS client issue a layoutget call in the
- * same compound as the OPEN operation?
- */
-static inline int
-pnfs_ld_layoutget_on_open(struct pnfs_layoutdriver_type *ld)
-{
- return ld->ld_policy_ops->flags & PNFS_LAYOUTGET_ON_OPEN;
-}
-
/* Should the pNFS client commit and return the layout upon a setattr
*/
static inline int
--
1.6.6.1


2010-06-11 07:32:26

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 15/26] pnfs_submit: stash and refcount lseg in write path

Store the lseg in each nfs_page. Note this necessitates adding checks
for compatibility with pre-existing nfs_pages lsegs.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 10 ++++++----
fs/nfs/write.c | 30 ++++++++++++++++++------------
include/linux/nfs_fs.h | 8 ++++++--
3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index e308244..184535a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -442,7 +442,7 @@ start:
}
*pagep = page;

- ret = nfs_flush_incompatible(file, page);
+ ret = nfs_flush_incompatible(file, page, lseg);
if (ret) {
unlock_page(page);
page_cache_release(page);
@@ -496,7 +496,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
zero_user_segment(page, pglen, PAGE_CACHE_SIZE);
}

- status = nfs_updatepage(file, page, offset, copied);
+ status = nfs_updatepage(file, page, offset, copied, lseg);

unlock_page(page);
page_cache_release(page);
@@ -603,6 +603,8 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
/* make sure the cache has finished storing the page */
nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);

+ /* XXX Do we want to call pnfs_update_layout here? */
+
lock_page(page);
mapping = page->mapping;
if (mapping != dentry->d_inode->i_mapping)
@@ -613,11 +615,11 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
if (pagelen == 0)
goto out_unlock;

- ret = nfs_flush_incompatible(filp, page);
+ ret = nfs_flush_incompatible(filp, page, NULL);
if (ret != 0)
goto out_unlock;

- ret = nfs_updatepage(filp, page, 0, pagelen);
+ ret = nfs_updatepage(filp, page, 0, pagelen, NULL);
out_unlock:
if (!ret)
return VM_FAULT_LOCKED;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 30f4c09..8a0c845 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -570,7 +570,8 @@ static inline int nfs_scan_commit(struct inode *inode, struct list_head *dst, pg
static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
struct page *page,
unsigned int offset,
- unsigned int bytes)
+ unsigned int bytes,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;
unsigned int rqend;
@@ -595,8 +596,8 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
* Note: nfs_flush_incompatible() will already
* have flushed out requests having wrong owners.
*/
- if (offset > rqend
- || end < req->wb_offset)
+ if (offset > rqend || end < req->wb_offset ||
+ req->wb_lseg != lseg)
goto out_flushme;

if (nfs_set_page_tag_locked(req))
@@ -644,16 +645,17 @@ out_err:
* already called nfs_flush_incompatible() if necessary.
*/
static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
- struct page *page, unsigned int offset, unsigned int bytes)
+ struct page *page, unsigned int offset, unsigned int bytes,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = page->mapping->host;
struct nfs_page *req;
int error;

- req = nfs_try_to_update_request(inode, page, offset, bytes);
+ req = nfs_try_to_update_request(inode, page, offset, bytes, lseg);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
+ req = nfs_create_request(ctx, inode, page, offset, bytes, lseg);
if (IS_ERR(req))
goto out;
error = nfs_inode_add_request(inode, req);
@@ -666,11 +668,12 @@ out:
}

static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
- unsigned int offset, unsigned int count)
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;

- req = nfs_setup_write_request(ctx, page, offset, count);
+ req = nfs_setup_write_request(ctx, page, offset, count, lseg);
if (IS_ERR(req))
return PTR_ERR(req);
nfs_mark_request_dirty(req);
@@ -682,7 +685,8 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
return 0;
}

-int nfs_flush_incompatible(struct file *file, struct page *page)
+int nfs_flush_incompatible(struct file *file, struct page *page,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_open_context *ctx = nfs_file_open_context(file);
struct nfs_page *req;
@@ -699,7 +703,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
req = nfs_page_find_request(page);
if (req == NULL)
return 0;
- do_flush = req->wb_page != page || req->wb_context != ctx;
+ do_flush = req->wb_page != page || req->wb_context != ctx ||
+ req->wb_lseg != lseg;
nfs_release_request(req);
if (!do_flush)
return 0;
@@ -726,7 +731,8 @@ static int nfs_write_pageuptodate(struct page *page, struct inode *inode)
* things with a page scheduled for an RPC call (e.g. invalidate it).
*/
int nfs_updatepage(struct file *file, struct page *page,
- unsigned int offset, unsigned int count)
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_open_context *ctx = nfs_file_open_context(file);
struct inode *inode = page->mapping->host;
@@ -751,7 +757,7 @@ int nfs_updatepage(struct file *file, struct page *page,
offset = 0;
}

- status = nfs_writepage_setup(ctx, page, offset, count);
+ status = nfs_writepage_setup(ctx, page, offset, count, lseg);
if (status < 0)
nfs_set_pageerror(page);

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index ee45eac..0de7847 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -512,8 +512,12 @@ extern void nfs_unblock_sillyrename(struct dentry *dentry);
extern int nfs_congestion_kb;
extern int nfs_writepage(struct page *page, struct writeback_control *wbc);
extern int nfs_writepages(struct address_space *, struct writeback_control *);
-extern int nfs_flush_incompatible(struct file *file, struct page *page);
-extern int nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
+struct pnfs_layout_segment;
+extern int nfs_flush_incompatible(struct file *file, struct page *page,
+ struct pnfs_layout_segment *lseg);
+extern int nfs_updatepage(struct file *, struct page *,
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg);
extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);

/*
--
1.6.6.1


2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 07/26] pnfs-submit: filelayout: remove some dead code from filelayout_commit

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 10 ++--------
1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 2ffca74..c15b90a 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -562,7 +562,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
struct nfs_page *req, *reqt;
struct list_head *pos, *tmp, head, head2;
loff_t file_offset, comp_offset;
- size_t stripesz, cbytes;
enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
u32 idx1, idx2;

@@ -577,9 +576,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
return PNFS_NOT_ATTEMPTED;
}

- stripesz = filelayout_get_stripesize(layoutid);
- dprintk("%s stripesize %Zd\n", __func__, stripesz);
-
INIT_LIST_HEAD(&head);
INIT_LIST_HEAD(&head2);
list_add(&head, &data->pages);
@@ -587,7 +583,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,

/* COMMIT to each Data Server */
while (!list_empty(&head)) {
- cbytes = 0;
req = nfs_list_entry(head.next);

file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
@@ -613,7 +608,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
if (idx1 == idx2) {
nfs_list_remove_request(reqt);
nfs_list_add_request(reqt, &head2);
- cbytes += reqt->wb_bytes;
}
}

@@ -637,8 +631,8 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
dsdata->fldata.ds_nfs_client = ds->ds_clp;
dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);

- dprintk("%s: Initiating commit: %Zu@%llu USE DS:\n",
- __func__, cbytes, file_offset);
+ dprintk("%s: Initiating commit: %llu USE DS:\n",
+ __func__, file_offset);
print_ds(ds);

/* Send COMMIT to data server */
--
1.6.6.1


2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 05/26] SQUASHME: pnfs-submit: ensure pnfs_update_layout clears lsegp on error

Compensate for pnfs_update_layout returning error but assigning lseg.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 6717a9d..c73ab80 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1042,6 +1042,8 @@ pnfs_update_layout(struct inode *ino,
DEFINE_WAIT(__wait);
int result = 0;

+ if (take_ref)
+ *lsegpp = NULL;
lo = get_lock_alloc_layout(ino);
if (IS_ERR(lo)) {
dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
@@ -1056,6 +1058,7 @@ pnfs_update_layout(struct inode *ino,
put_lseg(lseg);

/* someone is cleaning the layout */
+ lseg = NULL;
result = -EAGAIN;
goto out_put;
}
--
1.6.6.1


2010-06-11 07:32:30

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 18/26] pnfs_submit: remove pnfs_writepages LAYOUTGET invocation

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 37 +++++++------------------------------
fs/nfs/pnfs.h | 15 ++++++---------
2 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index dd221e2..afafd0a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1492,7 +1492,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
{
struct nfs_writeargs *args = &wdata->args;
struct inode *inode = wdata->inode;
- int numpages, status;
+ int numpages;
enum pnfs_try_status trypnfs;
struct nfs_server *nfss = NFS_SERVER(inode);
struct nfs_inode *nfsi = NFS_I(inode);
@@ -1504,19 +1504,8 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
args->count,
args->offset);

- /* Retrieve and set layout if not allready cached */
- status = _pnfs_update_layout(inode,
- args->context,
- args->count,
- args->offset,
- IOMODE_RW,
- &lseg);
- if (status) {
- dprintk("%s: Updating layout failed (%d), retry with NFS \n",
- __func__, status);
- trypnfs = PNFS_NOT_ATTEMPTED; /* retry with nfs I/O */
- goto out;
- }
+ lseg = wdata->req->wb_lseg;
+ get_lseg(lseg);

/* Determine number of pages
*/
@@ -1544,7 +1533,6 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
wdata->pdata.lseg = NULL;
put_lseg(lseg);
}
-out:
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
}
@@ -1651,22 +1639,11 @@ enum pnfs_try_status
_pnfs_try_to_write_data(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops, int how)
{
- struct inode *ino = data->inode;
- struct nfs_server *nfss = NFS_SERVER(ino);
-
dprintk("--> %s\n", __func__);
- /* Only create an rpc request if utilizing NFSv4 I/O */
- if (!pnfs_enabled_sb(nfss) ||
- !nfss->pnfs_curr_ld->ld_io_ops->write_pagelist) {
- dprintk("<-- %s: not using pnfs\n", __func__);
- return PNFS_NOT_ATTEMPTED;
- } else {
- dprintk("%s: Utilizing pNFS I/O\n", __func__);
- data->pdata.call_ops = call_ops;
- data->pdata.pnfs_error = 0;
- data->pdata.how = how;
- return pnfs_writepages(data, how);
- }
+ data->pdata.call_ops = call_ops;
+ data->pdata.pnfs_error = 0;
+ data->pdata.how = how;
+ return pnfs_writepages(data, how);
}

enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7bff487..60a9db6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -122,18 +122,15 @@ pnfs_try_to_write_data(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops,
int how)
{
- struct inode *inode = data->inode;
- struct nfs_server *nfss = NFS_SERVER(inode);
enum pnfs_try_status ret;

- /* FIXME: write_pagelist should probably be mandated */
- if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
- ret = _pnfs_try_to_write_data(data, call_ops, how);
- else
- ret = PNFS_NOT_ATTEMPTED;
-
+ if (!data->req->wb_lseg)
+ return PNFS_NOT_ATTEMPTED;
+ ret = _pnfs_try_to_write_data(data, call_ops, how);
if (ret == PNFS_ATTEMPTED)
- nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
+ nfs_inc_stats(data->inode, NFSIOS_PNFS_WRITE);
+ else
+ _pnfs_clear_lseg_from_pages(&data->pages);
return ret;
}

--
1.6.6.1


2010-06-11 07:32:22

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 13/26] pnfs_submit: read path changeover

Change readpages path to only call LAYOUTGET once.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pagelist.c | 2 ++
fs/nfs/pnfs.c | 37 +++++++------------------------------
fs/nfs/pnfs.h | 25 ++++++++++++++++---------
3 files changed, 25 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b9d3baf..5b20545 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -254,6 +254,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
return 0;
if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
return 0;
+ if (req->wb_lseg != prev->wb_lseg)
+ return 0;
#ifdef CONFIG_NFS_V4_1
if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
return 0;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c1eb02f..0e3208b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1667,7 +1667,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
{
struct nfs_readargs *args = &rdata->args;
struct inode *inode = rdata->inode;
- int numpages, status, pgcount, temp;
+ int numpages, pgcount, temp;
struct nfs_server *nfss = NFS_SERVER(inode);
struct nfs_inode *nfsi = NFS_I(inode);
struct pnfs_layout_segment *lseg;
@@ -1679,19 +1679,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
args->count,
args->offset);

- /* Retrieve and set layout if not allready cached */
- status = _pnfs_update_layout(inode,
- args->context,
- args->count,
- args->offset,
- IOMODE_READ,
- &lseg);
- if (status) {
- dprintk("%s: Updating layout failed (%d), retry with NFS \n",
- __func__, status);
- trypnfs = PNFS_NOT_ATTEMPTED;
- goto out;
- }
+ lseg = rdata->req->wb_lseg;
+ get_lseg(lseg);

/* Determine number of pages. */
pgcount = args->pgbase + args->count;
@@ -1718,7 +1707,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
rdata->pdata.lseg = NULL;
put_lseg(lseg);
}
- out:
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
}
@@ -1727,21 +1715,10 @@ enum pnfs_try_status
_pnfs_try_to_read_data(struct nfs_read_data *data,
const struct rpc_call_ops *call_ops)
{
- struct inode *ino = data->inode;
- struct nfs_server *nfss = NFS_SERVER(ino);
-
- dprintk("--> %s\n", __func__);
- /* Only create an rpc request if utilizing NFSv4 I/O */
- if (!pnfs_enabled_sb(nfss) ||
- !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
- dprintk("<-- %s: not using pnfs\n", __func__);
- return PNFS_NOT_ATTEMPTED;
- } else {
- dprintk("%s: Utilizing pNFS I/O\n", __func__);
- data->pdata.call_ops = call_ops;
- data->pdata.pnfs_error = 0;
- return pnfs_readpages(data);
- }
+ dprintk("%s: Utilizing pNFS I/O\n", __func__);
+ data->pdata.call_ops = call_ops;
+ data->pdata.pnfs_error = 0;
+ return pnfs_readpages(data);
}

enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a2a7b94..1620026 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -93,22 +93,29 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
return nfss->pnfs_curr_ld != NULL;
}

+static inline void _pnfs_clear_lseg_from_pages(struct list_head *head)
+{
+ struct nfs_page *req;
+
+ list_for_each_entry(req, head, wb_list) {
+ put_lseg(req->wb_lseg);
+ req->wb_lseg = NULL;
+ }
+}
+
static inline enum pnfs_try_status
pnfs_try_to_read_data(struct nfs_read_data *data,
const struct rpc_call_ops *call_ops)
{
- struct inode *inode = data->inode;
- struct nfs_server *nfss = NFS_SERVER(inode);
enum pnfs_try_status ret;

- /* FIXME: read_pagelist should probably be mandated */
- if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
- ret = _pnfs_try_to_read_data(data, call_ops);
- else
- ret = PNFS_NOT_ATTEMPTED;
-
+ if (!data->req->wb_lseg)
+ return PNFS_NOT_ATTEMPTED;
+ ret = _pnfs_try_to_read_data(data, call_ops);
if (ret == PNFS_ATTEMPTED)
- nfs_inc_stats(inode, NFSIOS_PNFS_READ);
+ nfs_inc_stats(data->inode, NFSIOS_PNFS_READ);
+ else
+ _pnfs_clear_lseg_from_pages(&data->pages);
return ret;
}

--
1.6.6.1


2010-06-11 07:32:34

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 21/26] pnfs_submit: filelayout: rewrite filelayout_commit to use new API

In the process, give it a much needed rewrite.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 192 ++++++++++++++++++++++++++---------------------
fs/nfs/write.c | 17 ++++
2 files changed, 123 insertions(+), 86 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index e36c95d..756cb64 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -530,8 +530,7 @@ filelayout_clone_write_data(struct nfs_write_data *old)
nfs_fattr_init(&new->fattr);
new->res.verf = &new->verf;
new->args.context = get_nfs_open_context(old->args.context);
- new->pdata.lseg = old->pdata.lseg;
- kref_get(&new->pdata.lseg->kref);
+ new->pdata.lseg = NULL;
new->pdata.call_ops = old->pdata.call_ops;
new->pdata.how = old->pdata.how;
out:
@@ -559,103 +558,124 @@ enum pnfs_try_status
filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
struct nfs_write_data *data)
{
- struct nfs4_filelayout_segment *nfslay;
- struct nfs_write_data *dsdata = NULL;
+ LIST_HEAD(head);
+ struct nfs_page *req;
+ loff_t file_offset = 0;
+ u16 idx, i;
+ struct list_head **ds_page_list = NULL;
+ u16 *indices_used;
+ int num_indices_seen = 0;
+ const struct rpc_call_ops *call_ops;
+ struct rpc_clnt *clnt;
+ struct nfs_write_data **clone_list = NULL;
+ struct nfs_write_data *dsdata;
struct nfs4_pnfs_ds *ds;
- struct nfs_page *req, *reqt;
- struct list_head *pos, *tmp, head, head2;
- loff_t file_offset, comp_offset;
- enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
- u32 idx1, idx2;

- nfslay = LSEG_LD_DATA(data->pdata.lseg);
-
- dprintk("%s data %p pnfs_client %p nfslay %p sync %d\n",
- __func__, data, data->fldata.pnfs_client, nfslay, sync);
-
- data->fldata.commit_through_mds = nfslay->commit_through_mds;
- if (nfslay->commit_through_mds) {
- dprintk("%s data %p commit through mds\n", __func__, data);
- return PNFS_NOT_ATTEMPTED;
- }
-
- INIT_LIST_HEAD(&head);
- INIT_LIST_HEAD(&head2);
- list_add(&head, &data->pages);
- list_del_init(&data->pages);
-
- /* COMMIT to each Data Server */
- while (!list_empty(&head)) {
- req = nfs_list_entry(head.next);
-
- file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
-
- /* Get dserver for the current page */
- idx1 = nfs4_fl_calc_ds_index(data->pdata.lseg, file_offset);
- ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx1);
- if (!ds) {
- data->pdata.pnfs_error = -EIO;
- goto err_rewind;
+ dprintk("%s data %p pnfs_client %p sync %d\n",
+ __func__, data, data->fldata.pnfs_client, sync);
+
+ /* Alloc room for both in one go */
+ ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
+ (sizeof(u16) + sizeof(struct list_head *)),
+ GFP_KERNEL);
+ if (!ds_page_list)
+ goto mem_error;
+ indices_used = (u16 *) (ds_page_list + NFS4_PNFS_MAX_MULTI_CNT + 1);
+
+ /* Sort pages based on which ds to send to.
+ * MDS is given index equal to NFS4_PNFS_MAX_MULTI_CNT.
+ * Note we are assuming there is only a single lseg in play.
+ * When that is not true, we could first sort on lseg, then
+ * sort within each as we do here.
+ */
+ while (!list_empty(&data->pages)) {
+ req = nfs_list_entry(data->pages.next);
+ nfs_list_remove_request(req);
+ if (!req->wb_lseg ||
+ ((struct nfs4_filelayout_segment *)
+ LSEG_LD_DATA(req->wb_lseg))->commit_through_mds)
+ idx = NFS4_PNFS_MAX_MULTI_CNT;
+ else {
+ file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
+ idx = nfs4_fl_calc_ds_index(req->wb_lseg, file_offset);
}
-
- /* Gather all pages going to the current data server by
- * comparing their indices.
- * XXX: This recalculates the indices unecessarily.
- * One idea would be to calc the index for every page
- * and then compare if they are the same. */
- list_for_each_safe(pos, tmp, &head) {
- reqt = nfs_list_entry(pos);
- comp_offset = (loff_t)reqt->wb_index << PAGE_CACHE_SHIFT;
- idx2 = nfs4_fl_calc_ds_index(data->pdata.lseg,
- comp_offset);
- if (idx1 == idx2) {
- nfs_list_remove_request(reqt);
- nfs_list_add_request(reqt, &head2);
- }
+ if (ds_page_list[idx]) {
+ /* Already seen this idx */
+ list_add(&req->wb_list, ds_page_list[idx]);
+ } else {
+ /* New idx not seen so far */
+ list_add_tail(&req->wb_list, &head);
+ indices_used[num_indices_seen++] = idx;
}
-
- if (!list_empty(&head)) {
- dsdata = filelayout_clone_write_data(data);
- if (!dsdata) {
- /* return pages back to head */
- list_splice(&head2, &head);
- INIT_LIST_HEAD(&head2);
- data->pdata.pnfs_error = -ENOMEM;
- goto err_rewind;
- }
+ ds_page_list[idx] = &req->wb_list;
+ }
+ /* Once created, clone must be released via call_op */
+ clone_list = kzalloc(num_indices_seen *
+ sizeof(struct nfs_write_data *), GFP_KERNEL);
+ if (!clone_list)
+ goto mem_error;
+ for (i = 0; i < num_indices_seen - 1; i++) {
+ clone_list[i] = filelayout_clone_write_data(data);
+ if (!clone_list[i])
+ goto mem_error;
+ }
+ clone_list[i] = data;
+ /* Now send off the RPCs to each ds. Note that it is important
+ * that any RPC to the MDS be sent last (or at least after all
+ * clones have been made.)
+ */
+ for (i = 0; i < num_indices_seen; i++) {
+ dsdata = clone_list[i];
+ idx = indices_used[i];
+ list_cut_position(&dsdata->pages, &head, ds_page_list[idx]);
+ if (idx == NFS4_PNFS_MAX_MULTI_CNT) {
+ call_ops = data->pdata.call_ops;;
+ clnt = NFS_CLIENT(dsdata->inode);
+ ds = NULL;
} else {
- dsdata = data;
+ call_ops = &filelayout_commit_call_ops;
+ req = nfs_list_entry(dsdata->pages.next);
+ ds = nfs4_fl_prepare_ds(req->wb_lseg, idx);
+ if (!ds) {
+ /* Trigger retry of this chunk through MDS */
+ dsdata->task.tk_status = -EIO;
+ data->pdata.call_ops->rpc_release(dsdata);
+ continue;
+ }
+ clnt = ds->ds_clp->cl_rpcclient;
+ dsdata->fldata.pnfs_client = clnt;
+ dsdata->fldata.ds_nfs_client = ds->ds_clp;
+ dsdata->args.fh = \
+ nfs4_fl_select_ds_fh(LSEG_LD_DATA(req->wb_lseg),
+ idx);
}
-
- list_add(&dsdata->pages, &head2);
- list_del_init(&head2);
-
- dsdata->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
- dsdata->fldata.ds_nfs_client = ds->ds_clp;
- dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);
-
dprintk("%s: Initiating commit: %llu USE DS:\n",
__func__, file_offset);
print_ds(ds);

/* Send COMMIT to data server */
- nfs_initiate_commit(dsdata, dsdata->fldata.pnfs_client,
- &filelayout_commit_call_ops, sync);
+ nfs_initiate_commit(dsdata, clnt, call_ops, sync);
}
+ kfree(clone_list);
+ kfree(ds_page_list);
+ data->pdata.pnfs_error = 0;
+ return PNFS_ATTEMPTED;

-out:
- if (data->pdata.pnfs_error)
- printk(KERN_ERR "%s: ERROR %d\n", __func__,
- data->pdata.pnfs_error);
-
- /* XXX should we send COMMIT to MDS e.g. not free data and return 1 ? */
- return trypnfs;
-err_rewind:
- /* put remaining pages back onto the original data->pages */
- list_add(&data->pages, &head);
- list_del_init(&head);
- trypnfs = PNFS_NOT_ATTEMPTED;
- goto out;
+ mem_error:
+ if (clone_list) {
+ for (i = 0; i < num_indices_seen - 1; i++) {
+ if (!clone_list[i])
+ break;
+ data->pdata.call_ops->rpc_release(clone_list[i]);
+ }
+ kfree(clone_list);
+ }
+ kfree(ds_page_list);
+ /* One of these will be empty, but doesn't hurt to do both */
+ nfs_mark_list_commit(&head);
+ nfs_mark_list_commit(&data->pages);
+ data->pdata.call_ops->rpc_release(data);
+ return PNFS_ATTEMPTED;
}

/* Return the stripesize for the specified file.
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2427c1d..f1e4120 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -422,6 +422,17 @@ static void nfs_inode_remove_request(struct nfs_page *req)
nfs_clear_request(req);
nfs_release_request(req);
}
+static void
+nfs_mark_request_nopnfs(struct nfs_page *req)
+{
+ struct pnfs_layout_segment *lseg = req->wb_lseg;
+
+ if (req->wb_lseg == NULL)
+ return;
+ req->wb_lseg = NULL;
+ put_lseg(lseg);
+ dprintk(" retry through MDS\n");
+}

static void
nfs_mark_request_dirty(struct nfs_page *req)
@@ -1461,6 +1472,11 @@ static void nfs_commit_release(void *calldata)
req->wb_bytes,
(long long)req_offset(req));
if (status < 0) {
+ if (req->wb_lseg) {
+ nfs_mark_request_nopnfs(req);
+ nfs_mark_request_dirty(req);
+ goto next;
+ }
nfs_context_set_write_error(req->wb_context, status);
nfs_inode_remove_request(req);
dprintk(", error = %d\n", status);
@@ -1477,6 +1493,7 @@ static void nfs_commit_release(void *calldata)
}
/* We have a mismatch. Write the page again */
dprintk(" mismatch\n");
+ nfs_mark_request_nopnfs(req);
nfs_mark_request_dirty(req);
next:
nfs_clear_page_tag_locked(req);
--
1.6.6.1


2010-06-11 07:32:27

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 16/26] pnfs_submit: remove pnfs_file_operations

pnfs_writepages is useful, but not necessary, for determining size
parameters for LAYUTGET.

Also, the pnfs_file_operations were getting out of sync with
nfs_file_operations (see commits e1ebfd33be068 and bf40d3435caf49369).

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 24 ------------------------
fs/nfs/nfs4proc.c | 1 -
fs/nfs/pnfs.c | 35 -----------------------------------
fs/nfs/pnfs.h | 1 -
include/linux/nfs_fs.h | 3 ---
5 files changed, 0 insertions(+), 64 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 184535a..409892f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -82,30 +82,6 @@ const struct file_operations nfs_file_operations = {
.setlease = nfs_setlease,
};

-#ifdef CONFIG_NFS_V4_1
-const struct file_operations pnfs_file_operations = {
- .llseek = nfs_file_llseek,
- .read = do_sync_read,
- .write = pnfs_file_write,
- .aio_read = nfs_file_read,
- .aio_write = nfs_file_write,
-#ifdef CONFIG_MMU
- .mmap = nfs_file_mmap,
-#else
- .mmap = generic_file_mmap,
-#endif
- .open = nfs_file_open,
- .flush = nfs_file_flush,
- .release = nfs_file_release,
- .fsync = nfs_file_fsync,
- .lock = nfs_lock,
- .flock = nfs_flock,
- .splice_read = nfs_file_splice_read,
- .check_flags = nfs_check_flags,
- .setlease = nfs_setlease,
-};
-#endif /* CONFIG_NFS_V4_1 */
-
const struct inode_operations nfs_file_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 48492ae..7e6cb89 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5984,7 +5984,6 @@ pnfs_v4_clientops_init(void)
struct nfs_rpc_ops *p = (struct nfs_rpc_ops *)&pnfs_v4_clientops;

memcpy(p, &nfs_v4_clientops, sizeof(*p));
- p->file_ops = &pnfs_file_operations;
p->setattr = pnfs4_proc_setattr;
p->read_done = pnfs4_read_done;
p->write_setup = pnfs4_proc_write_setup;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0e3208b..0f891d4 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1516,41 +1516,6 @@ pnfs_writeback_done(struct nfs_write_data *data)
}

/*
- * Obtain a layout for the the write range, and call do_sync_write.
- *
- * Unlike the read path which can wait until page coalescing
- * (pnfs_pageio_init_read) to get a layout, the write path discards the
- * request range to form the address_mapping - so we get a layout in
- * the file operations write method.
- *
- * If pnfs_update_layout fails, pages will be coalesced for MDS I/O.
- */
-ssize_t
-pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
- loff_t *pos)
-{
- struct inode *inode = filp->f_dentry->d_inode;
- struct nfs_open_context *context = filp->private_data;
- int status;
-
- if (!pnfs_enabled_sb(NFS_SERVER(inode)))
- goto out;
-
- /* Retrieve and set layout if not allready cached */
- status = _pnfs_update_layout(inode,
- context,
- count,
- *pos,
- IOMODE_RW,
- NULL);
- if (status)
- dprintk("%s: Unable to get a layout for %Zu@%llu iomode %d)\n",
- __func__, count, *pos, IOMODE_RW);
-out:
- return do_sync_write(filp, buf, count, pos);
-}
-
-/*
* Call the appropriate parallel I/O subsystem write function.
* If no I/O device driver exists, or one does match the returned
* fstype, then return a positive status for regular NFS processing.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1620026..1922ffa 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -59,7 +59,6 @@ void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
struct nfs_open_context *, struct list_head *);
void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
void pnfs_update_layout_commit(struct inode *, struct list_head *, pgoff_t, unsigned int);
-ssize_t pnfs_file_write(struct file *, const char __user *, size_t, loff_t *);
void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 0de7847..41026cb 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -411,9 +411,6 @@ extern const struct inode_operations nfs3_file_inode_operations;
#endif /* CONFIG_NFS_V3 */
extern const struct file_operations nfs_file_operations;
extern const struct address_space_operations nfs_file_aops;
-#ifdef CONFIG_NFS_V4_1
-extern const struct file_operations pnfs_file_operations;
-#endif /* CONFIG_NFS_V4_1 */

static inline struct nfs_open_context *nfs_file_open_context(struct file *filp)
{
--
1.6.6.1


2010-06-11 07:32:33

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 20/26] pnfs_submit: API change: remove pnfs_commit layoutget invocation

WARNING - this is an API change.

The layout driver's commit operation no longer takes an lseg.
This is because each nfs_page may or may not have an associated lseg.
It is the layout drivers task to send commits to the appropriate place.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/internal.h | 2 +-
fs/nfs/pagelist.c | 5 ++-
fs/nfs/pnfs.c | 79 ++++++++-------------------------------------
fs/nfs/pnfs.h | 21 +++++-------
fs/nfs/write.c | 23 +++++++------
include/linux/nfs_page.h | 3 +-
6 files changed, 43 insertions(+), 90 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index b754446..a30974a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -286,7 +286,7 @@ extern int nfs_initiate_commit(struct nfs_write_data *data,
extern int pnfs_initiate_commit(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how);
+ int how, int pnfs);
extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
extern void nfs_mark_list_commit(struct list_head *head);
#ifdef CONFIG_MIGRATION
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 5b20545..453d100 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -381,6 +381,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
* @idx_start: lower bound of page->index to scan
* @npages: idx_start + npages sets the upper bound to scan.
* @tag: tag to scan for
+ * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
*
* Moves elements from one of the inode request lists.
* If the number of requests is set to 0, the entire address_space
@@ -390,7 +391,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
*/
int nfs_scan_list(struct nfs_inode *nfsi,
struct list_head *dst, pgoff_t idx_start,
- unsigned int npages, int tag)
+ unsigned int npages, int tag, int *use_pnfs)
{
struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
struct nfs_page *req;
@@ -421,6 +422,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
radix_tree_tag_clear(&nfsi->nfs_page_tree,
req->wb_index, tag);
nfs_list_add_request(req, dst);
+ if (req->wb_lseg)
+ *use_pnfs = 1;
res++;
if (res == INT_MAX)
goto out;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index afafd0a..e37b71e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1650,19 +1650,11 @@ enum pnfs_try_status
_pnfs_try_to_commit(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops, int how)
{
- struct inode *inode = data->inode;
-
- if (!pnfs_enabled_sb(NFS_SERVER(inode))) {
- dprintk("%s: Not using pNFS I/O\n", __func__);
- return PNFS_NOT_ATTEMPTED;
- } else {
- /* data->call_ops and data->how set in nfs_commit_rpcsetup */
- dprintk("%s: Utilizing pNFS I/O\n", __func__);
- data->pdata.call_ops = call_ops;
- data->pdata.pnfs_error = 0;
- data->pdata.how = how;
- return pnfs_commit(data, how);
- }
+ dprintk("%s: Utilizing pNFS I/O\n", __func__);
+ data->pdata.call_ops = call_ops;
+ data->pdata.pnfs_error = 0;
+ data->pdata.how = how;
+ return pnfs_commit(data, how);
}

/* pNFS Commit callback function for all layout drivers */
@@ -1683,76 +1675,33 @@ pnfs_commit_done(struct nfs_write_data *data)
_pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
true);
pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
- pdata->call_ops, pdata->how);
+ pdata->call_ops, pdata->how, 1);
}
}

static enum pnfs_try_status
pnfs_commit(struct nfs_write_data *data, int sync)
{
- int result;
struct nfs_inode *nfsi = NFS_I(data->inode);
struct nfs_server *nfss = NFS_SERVER(data->inode);
- struct pnfs_layout_segment *lseg;
- struct nfs_page *first, *last, *p;
- int npages;
enum pnfs_try_status trypnfs;
- u64 count;

dprintk("%s: Begin\n", __func__);

- /* If the layout driver doesn't define its own commit function
- * use standard NFSv4 commit
- */
- first = last = nfs_list_entry(data->pages.next);
- npages = 0;
- list_for_each_entry(p, &data->pages, wb_list) {
- last = p;
- npages++;
- }
- /* COMMIT indicates the whole file with offset = count = 0
- * whereas layout segments indicate whole file with offset = 0,
- * count = NFS4_MAX_UINT64.
+ /* We need to account for possibility that
+ * each nfs_page can point to a different lseg (or be NULL).
+ * For the immediate case of whole-file-only layouts, we at
+ * least know there can be only a single lseg.
+ * We still have to account for the possibility of some being NULL.
+ * This will be done by passing the buck to the layout driver.
*/
- count = ((npages - 1) << PAGE_CACHE_SHIFT) + first->wb_bytes +
- (first != last) ? last->wb_bytes : 0;
- if (first->wb_offset == 0 && count == 0)
- count = NFS4_MAX_UINT64;
-
- /* FIXME: we really ought to keep the layout segment that we used
- to write the page around for committing it and never ask for a
- new one. If it was recalled we better commit the data first
- before returning it, otherwise the data needs to be rewritten,
- either with a new layout or to the MDS */
- result = _pnfs_update_layout(data->inode,
- NULL,
- count,
- first->wb_offset,
- IOMODE_RW,
- &lseg);
- /* If no layout have been retrieved,
- * use standard NFSv4 commit
- */
- if (result) {
- dprintk("%s: Updating layout failed (%d), retry with NFS \n",
- __func__, result);
- trypnfs = PNFS_NOT_ATTEMPTED;
- goto out;
- }
-
- dprintk("%s: Calling layout driver commit\n", __func__);
+ data->pdata.lseg = NULL;
if (!pnfs_use_rpc(nfss))
data->pdata.pnfsflags |= PNFS_NO_RPC;
- data->pdata.lseg = lseg;
trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
sync, data);
- if (trypnfs == PNFS_NOT_ATTEMPTED) {
+ if (trypnfs == PNFS_NOT_ATTEMPTED)
data->pdata.pnfsflags &= ~PNFS_NO_RPC;
- data->pdata.lseg = NULL;
- put_lseg(lseg);
- }
-
-out:
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 60a9db6..bdf11bc 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -139,21 +139,18 @@ pnfs_try_to_commit(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops,
int how)
{
- struct inode *inode = data->inode;
- struct nfs_server *nfss = NFS_SERVER(inode);
enum pnfs_try_status ret;

- /* Note that we check for "write_pagelist" and not for "commit"
- since if async writes were done and pages weren't marked as stable
- the commit method MUST be defined by the LD */
- /* FIXME: write_pagelist should probably be mandated */
- if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
- ret = _pnfs_try_to_commit(data, call_ops, how);
- else
- ret = PNFS_NOT_ATTEMPTED;
-
+ /* Unlike in pnfs_try_to_write_data and pnfs_try_to_read_data,
+ * we have no guarantee that all nfs_pages point to the same
+ * lseg. However, if we reach here, we are guaranteed that at
+ * least one points to some lseg.
+ */
+ ret = _pnfs_try_to_commit(data, call_ops, how);
if (ret == PNFS_ATTEMPTED)
- nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
+ nfs_inc_stats(data->inode, NFSIOS_PNFS_COMMIT);
+ else
+ _pnfs_clear_lseg_from_pages(&data->pages);
return ret;
}

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 40abbd0..2427c1d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -527,7 +527,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
* The requests are *not* checked to ensure that they form a contiguous set.
*/
static int
-nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
+nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
{
struct nfs_inode *nfsi = NFS_I(inode);
int ret;
@@ -535,7 +535,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
if (!nfs_need_commit(nfsi))
return 0;

- ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
+ ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
+ use_pnfs);
if (ret > 0)
nfsi->ncommit -= ret;
if (nfs_need_commit(NFS_I(inode)))
@@ -1334,9 +1335,10 @@ EXPORT_SYMBOL(nfs_initiate_commit);
int pnfs_initiate_commit(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how)
+ int how, int pnfs)
{
- if (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED)
+ if (pnfs &&
+ (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
return pnfs_get_write_status(data);

return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
@@ -1347,7 +1349,7 @@ int pnfs_initiate_commit(struct nfs_write_data *data,
*/
static int nfs_commit_rpcsetup(struct list_head *head,
struct nfs_write_data *data,
- int how)
+ int how, int pnfs)
{
struct nfs_page *first = nfs_list_entry(head->next);
struct inode *inode = first->wb_context->path.dentry->d_inode;
@@ -1374,7 +1376,7 @@ static int nfs_commit_rpcsetup(struct list_head *head,
data->args.context = first->wb_context; /* used by commit done */

return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
- how);
+ how, pnfs);
}

/* Handle memory error during commit */
@@ -1398,7 +1400,7 @@ EXPORT_SYMBOL(nfs_mark_list_commit);
* Commit dirty pages
*/
static int
-nfs_commit_list(struct inode *inode, struct list_head *head, int how)
+nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
{
struct nfs_write_data *data;

@@ -1407,7 +1409,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
goto out_bad;

/* Set up the argument struct */
- return nfs_commit_rpcsetup(head, data, how);
+ return nfs_commit_rpcsetup(head, data, how, pnfs);
out_bad:
nfs_mark_list_commit(head);
nfs_commit_clear_lock(NFS_I(inode));
@@ -1495,14 +1497,15 @@ static int nfs_commit_inode(struct inode *inode, int how)
LIST_HEAD(head);
int may_wait = how & FLUSH_SYNC;
int res = 0;
+ int use_pnfs = 0;

if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
goto out_mark_dirty;
spin_lock(&inode->i_lock);
- res = nfs_scan_commit(inode, &head, 0, 0);
+ res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
spin_unlock(&inode->i_lock);
if (res) {
- int error = nfs_commit_list(inode, &head, how);
+ int error = nfs_commit_list(inode, &head, how, use_pnfs);
if (error < 0)
return error;
if (may_wait) {
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 18a455c..06e5157 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -83,7 +83,8 @@ extern void nfs_release_request(struct nfs_page *req);


extern int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
- pgoff_t idx_start, unsigned int npages, int tag);
+ pgoff_t idx_start, unsigned int npages, int tag,
+ int *use_pnfs);
extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
struct inode *inode,
int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
--
1.6.6.1


2010-06-11 07:32:20

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 06/26] pnfs-submit: filelayout: clean and breakup nfs4_pnfs_dserver_get

Rewrite nfs4_pnfs_dserver_get as two functions, nfs4_fl_calc_ds_index() and
nfs4_fl_prepare_ds(). This cleans up the code a bit and prepares for more
extensive rewrite of filelayout_commit().

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 75 ++++++++++++----------------------
fs/nfs/nfs4filelayout.h | 33 +++++++--------
fs/nfs/nfs4filelayoutdev.c | 95 +++++++++++++++----------------------------
3 files changed, 75 insertions(+), 128 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index b0cda5d..2ffca74 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -196,8 +196,8 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
{
struct inode *inode = PNFS_INODE(layoutid);
struct nfs4_filelayout_segment *flseg;
- struct nfs4_pnfs_dserver dserver;
- int status;
+ struct nfs4_pnfs_ds *ds;
+ u32 idx;

dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
__func__, inode->i_ino, nr_pages, pgbase, count, offset);
@@ -205,23 +205,19 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
flseg = LSEG_LD_DATA(data->pdata.lseg);

/* Retrieve the correct rpc_client for the byte range */
- status = nfs4_pnfs_dserver_get(data->pdata.lseg,
- offset,
- count,
- &dserver);
- if (status) {
- printk(KERN_ERR "%s: dserver get failed status %d use MDS\n",
- __func__, status);
+ idx = nfs4_fl_calc_ds_index(data->pdata.lseg, offset);
+ ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx);
+ if (!ds) {
+ printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
return PNFS_NOT_ATTEMPTED;
}
-
dprintk("%s USE DS:ip %x %s\n", __func__,
- htonl(dserver.ds->ds_ip_addr), dserver.ds->r_addr);
+ htonl(ds->ds_ip_addr), ds->r_addr);

/* just try the first data server for the index..*/
- data->fldata.pnfs_client = dserver.ds->ds_clp->cl_rpcclient;
- data->fldata.ds_nfs_client = dserver.ds->ds_clp;
- data->args.fh = dserver.fh;
+ data->fldata.ds_nfs_client = ds->ds_clp;
+ data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
+ data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Now get the file offset on the dserver
* Set the read offset to this offset, and
@@ -255,32 +251,26 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
{
struct inode *inode = PNFS_INODE(layoutid);
struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(data->pdata.lseg);
- struct nfs4_pnfs_dserver dserver;
- int status;
+ struct nfs4_pnfs_ds *ds;
+ u32 idx;

dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu sync %d\n",
__func__, inode->i_ino, nr_pages, pgbase, count, offset, sync);

/* Retrieve the correct rpc_client for the byte range */
- status = nfs4_pnfs_dserver_get(data->pdata.lseg,
- offset,
- count,
- &dserver);
-
- if (status) {
- printk(KERN_ERR "%s: dserver get failed status %d use MDS\n",
- __func__, status);
+ idx = nfs4_fl_calc_ds_index(data->pdata.lseg, offset);
+ ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx);
+ if (!ds) {
+ printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
return PNFS_NOT_ATTEMPTED;
}
-
dprintk("%s ino %lu %Zu@%llu DS:%x:%hu %s\n",
__func__, inode->i_ino, count, offset,
- htonl(dserver.ds->ds_ip_addr), ntohs(dserver.ds->ds_port),
- dserver.ds->r_addr);
+ htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);

- data->fldata.pnfs_client = dserver.ds->ds_clp->cl_rpcclient;
- data->fldata.ds_nfs_client = dserver.ds->ds_clp;
- data->args.fh = dserver.fh;
+ data->fldata.ds_nfs_client = ds->ds_clp;
+ data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
+ data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Get the file offset on the dserver. Set the write offset to
* this offset and save the original offset.
@@ -568,15 +558,12 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
{
struct nfs4_filelayout_segment *nfslay;
struct nfs_write_data *dsdata = NULL;
- struct nfs4_pnfs_dserver dserver;
struct nfs4_pnfs_ds *ds;
struct nfs_page *req, *reqt;
struct list_head *pos, *tmp, head, head2;
loff_t file_offset, comp_offset;
size_t stripesz, cbytes;
- int status;
enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
- struct nfs4_file_layout_dsaddr *dsaddr;
u32 idx1, idx2;

nfslay = LSEG_LD_DATA(data->pdata.lseg);
@@ -593,9 +580,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
stripesz = filelayout_get_stripesize(layoutid);
dprintk("%s stripesize %Zd\n", __func__, stripesz);

- dsaddr = container_of(data->pdata.lseg->deviceid,
- struct nfs4_file_layout_dsaddr, deviceid);
-
INIT_LIST_HEAD(&head);
INIT_LIST_HEAD(&head2);
list_add(&head, &data->pages);
@@ -609,19 +593,13 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;

/* Get dserver for the current page */
- status = nfs4_pnfs_dserver_get(data->pdata.lseg,
- file_offset,
- req->wb_bytes,
- &dserver);
- if (status) {
+ idx1 = nfs4_fl_calc_ds_index(data->pdata.lseg, file_offset);
+ ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx1);
+ if (!ds) {
data->pdata.pnfs_error = -EIO;
goto err_rewind;
}

- /* Get its index */
- idx1 = filelayout_dserver_get_index(file_offset, dsaddr,
- nfslay);
-
/* Gather all pages going to the current data server by
* comparing their indices.
* XXX: This recalculates the indices unecessarily.
@@ -630,8 +608,8 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
list_for_each_safe(pos, tmp, &head) {
reqt = nfs_list_entry(pos);
comp_offset = (loff_t)reqt->wb_index << PAGE_CACHE_SHIFT;
- idx2 = filelayout_dserver_get_index(comp_offset,
- dsaddr, nfslay);
+ idx2 = nfs4_fl_calc_ds_index(data->pdata.lseg,
+ comp_offset);
if (idx1 == idx2) {
nfs_list_remove_request(reqt);
nfs_list_add_request(reqt, &head2);
@@ -655,10 +633,9 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
list_add(&dsdata->pages, &head2);
list_del_init(&head2);

- ds = dserver.ds;
dsdata->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
dsdata->fldata.ds_nfs_client = ds->ds_clp;
- dsdata->args.fh = dserver.fh;
+ dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);

dprintk("%s: Initiating commit: %Zu@%llu USE DS:\n",
__func__, cbytes, file_offset);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index fbf307c..3697926 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -26,6 +26,9 @@

#define FILE_MT(inode) ((struct filelayout_mount_type *) \
(NFS_SERVER(inode)->pnfs_mountid->mountid))
+#define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
+ struct nfs4_file_layout_dsaddr, \
+ deviceid))

enum stripetype4 {
STRIPE_SPARSE = 1,
@@ -55,16 +58,6 @@ struct nfs4_pnfs_dev_hlist {
struct hlist_head dev_list[NFS4_PNFS_DEV_HASH_SIZE];
};

-/*
- * Used for I/O, Maps a stripe index to a layout file handle and a
- * multipath data server.
- */
-
-struct nfs4_pnfs_dserver {
- struct nfs_fh *fh;
- struct nfs4_pnfs_ds *ds;
-};
-
struct nfs4_filelayout_segment {
u32 stripe_type;
u32 commit_through_mds;
@@ -87,18 +80,24 @@ struct filelayout_mount_type {
struct super_block *fl_sb;
};

+static inline struct nfs_fh *
+nfs4_fl_select_ds_fh(struct nfs4_filelayout_segment *flseg, u32 idx)
+{
+ /* FRED - what about case == 0??? */
+ if (flseg->num_fh == 1)
+ return &flseg->fh_array[0];
+ else
+ return &flseg->fh_array[idx];
+}
+
extern struct pnfs_client_operations *pnfs_callback_ops;

extern void nfs4_fl_free_deviceid_callback(struct kref *);
extern void print_ds(struct nfs4_pnfs_ds *ds);
char *deviceid_fmt(const struct pnfs_deviceid *dev_id);
-int nfs4_pnfs_dserver_get(struct pnfs_layout_segment *lseg,
- loff_t offset,
- size_t count,
- struct nfs4_pnfs_dserver *dserver);
-u32 filelayout_dserver_get_index(loff_t offset,
- struct nfs4_file_layout_dsaddr *di,
- struct nfs4_filelayout_segment *layout);
+u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset);
+struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
+ u32 ds_idx);
extern struct nfs4_file_layout_dsaddr *
nfs4_pnfs_device_item_find(struct nfs_client *, struct pnfs_deviceid *dev_id);
struct nfs4_file_layout_dsaddr *
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f5eb5f1..404dd5f 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -554,90 +554,61 @@ nfs4_pnfs_device_item_find(struct nfs_client *clp, struct pnfs_deviceid *id)
container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
}

-/* Want res = ((offset / layout->stripe_unit) % dsaddr->stripe_count)
+/* Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
* Then: ((res + fsi) % dsaddr->stripe_count)
*/
-u32
-filelayout_dserver_get_index(loff_t offset,
- struct nfs4_file_layout_dsaddr *dsaddr,
- struct nfs4_filelayout_segment *layout)
+static inline u32
+_nfs4_fl_calc_j_index(loff_t offset,
+ struct nfs4_file_layout_dsaddr *dsaddr,
+ struct nfs4_filelayout_segment *layout)
{
- u64 tmp, tmp2;
+ u64 tmp;

- tmp = offset;
+ tmp = offset - layout->pattern_offset;
do_div(tmp, layout->stripe_unit);
- tmp2 = do_div(tmp, dsaddr->stripe_count) + layout->first_stripe_index;
- return do_div(tmp2, dsaddr->stripe_count);
+ tmp += layout->first_stripe_index;
+ return do_div(tmp, dsaddr->stripe_count);
}

-/* Retrieve the rpc client for a specified byte range
- * in 'inode' by filling in the contents of 'dserver'.
- */
-int
-nfs4_pnfs_dserver_get(struct pnfs_layout_segment *lseg,
- loff_t offset,
- size_t count,
- struct nfs4_pnfs_dserver *dserver)
+u32
+nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset)
{
- struct nfs4_filelayout_segment *layout = LSEG_LD_DATA(lseg);
- struct inode *inode = PNFS_INODE(lseg->layout);
- struct nfs_server *mds_srv = NFS_SERVER(inode);
+ struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
struct nfs4_file_layout_dsaddr *dsaddr;
- u64 tmp, tmp2;
- u32 stripe_idx, end_idx, ds_idx;
-
- if (!layout)
- return 1;
-
- dsaddr = container_of(lseg->deviceid, struct nfs4_file_layout_dsaddr,
- deviceid);
-
- stripe_idx = filelayout_dserver_get_index(offset, dsaddr, layout);
-
- /* For debugging, ensure entire requested range is in this dserver */
- tmp = offset + count - 1;
- do_div(tmp, layout->stripe_unit);
- tmp2 = do_div(tmp, dsaddr->stripe_count) + layout->first_stripe_index;
- end_idx = do_div(tmp2, dsaddr->stripe_count);
+ u32 j;

- dprintk("%s: offset=%Lu, count=%Zu, si=%u, dsi=%u, "
- "stripe_count=%u, stripe_unit=%u first_stripe_index %u\n",
- __func__,
- offset, count, stripe_idx, end_idx, dsaddr->stripe_count,
- layout->stripe_unit, layout->first_stripe_index);
+ dsaddr = FILE_DSADDR(lseg);
+ j = _nfs4_fl_calc_j_index(offset, dsaddr, flseg);
+ return dsaddr->stripe_indices[j];
+}

- BUG_ON(end_idx != stripe_idx);
- BUG_ON(stripe_idx >= dsaddr->stripe_count);
+struct nfs4_pnfs_ds *
+nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+ struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+ struct nfs4_file_layout_dsaddr *dsaddr;

- ds_idx = dsaddr->stripe_indices[stripe_idx];
+ dsaddr = FILE_DSADDR(lseg);
if (dsaddr->ds_list[ds_idx] == NULL) {
- printk(KERN_ERR "%s: No data server for device id (%s)!! \n",
- __func__, deviceid_fmt(&layout->dev_id));
- return 1;
+ printk(KERN_ERR "%s: No data server for device id (%s)!!\n",
+ __func__, deviceid_fmt(&flseg->dev_id));
+ return NULL;
}

if (!dsaddr->ds_list[ds_idx]->ds_clp) {
int err;

- err = nfs4_pnfs_ds_create(mds_srv, dsaddr->ds_list[ds_idx]);
+ err = nfs4_pnfs_ds_create(PNFS_NFS_SERVER(lseg->layout),
+ dsaddr->ds_list[ds_idx]);
if (err) {
printk(KERN_ERR "%s nfs4_pnfs_ds_create error %d\n",
__func__, err);
- return 1;
+ return NULL;
}
}
- dserver->ds = dsaddr->ds_list[ds_idx];
+ dprintk("%s: dev_id=%s, ds_idx=%u\n",
+ __func__, deviceid_fmt(&flseg->dev_id), ds_idx);

- if (layout->num_fh == 1)
- dserver->fh = &layout->fh_array[0];
- else
- dserver->fh = &layout->fh_array[ds_idx];
-
- dprintk("%s: dev_id=%s, ip:port=%s, ds_idx=%u stripe_idx=%u, "
- "offset=%llu, count=%Zu\n",
- __func__, deviceid_fmt(&layout->dev_id),
- dserver->ds->r_addr,
- ds_idx, stripe_idx, offset, count);
-
- return 0;
+ return dsaddr->ds_list[ds_idx];
}
+
--
1.6.6.1


2010-06-11 07:32:28

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 17/26] pnfs_submit: remove pnfs_update_layout_commit

This seems completely extraneous. Also note this was being
called from within a spinlock.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 39 ---------------------------------------
fs/nfs/pnfs.h | 1 -
fs/nfs/write.c | 8 +-------
3 files changed, 1 insertions(+), 47 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0f891d4..dd221e2 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1406,45 +1406,6 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
pnfs_set_pg_test(inode, pgio);
}

-/*
- * Get a layoutout for COMMIT
- */
-void
-pnfs_update_layout_commit(struct inode *inode,
- struct list_head *head,
- pgoff_t idx_start,
- unsigned int npages)
-{
- struct nfs_server *nfss = NFS_SERVER(inode);
- struct nfs_page *nfs_page = nfs_list_entry(head->next);
- u64 count;
- loff_t start;
- int status;
-
- dprintk("--> %s inode %p layout range: %Zd@%llu\n", __func__, inode,
- (size_t)(npages * PAGE_CACHE_SIZE),
- (u64)((u64)idx_start << PAGE_CACHE_SHIFT));
-
- if (!pnfs_enabled_sb(nfss))
- return;
-
- /* COMMIT indicates the whole file with offset = count = 0
- * whereas layout segments indicate whole file with offset = 0,
- * count = NFS4_MAX_UINT64.
- */
- count = (size_t)npages * PAGE_CACHE_SIZE;
- start = (loff_t)idx_start << PAGE_CACHE_SHIFT;
- if (start == 0 && count == 0)
- count = NFS4_MAX_UINT64;
-
- status = _pnfs_update_layout(inode, nfs_page->wb_context,
- count,
- start,
- IOMODE_RW,
- NULL);
- dprintk("%s virt update status %d\n", __func__, status);
-}
-
static int
pnfs_call_done(struct pnfs_call_data *pdata, struct rpc_task *task, void *data)
{
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1922ffa..7bff487 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -58,7 +58,6 @@ enum pnfs_try_status _pnfs_try_to_commit(struct nfs_write_data *,
void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
struct nfs_open_context *, struct list_head *);
void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
-void pnfs_update_layout_commit(struct inode *, struct list_head *, pgoff_t, unsigned int);
void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8a0c845..ac2608f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -538,14 +538,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
if (ret > 0)
nfsi->ncommit -= ret;
- if (nfs_need_commit(NFS_I(inode))) {
+ if (nfs_need_commit(NFS_I(inode)))
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-#ifdef CONFIG_NFS_V4_1
- /* FIXME: change pnfs_update_layout_commit to derive
- idx_start from head of list and pass ret rather than npages */
- pnfs_update_layout_commit(inode, dst, idx_start, npages);
-#endif /* CONFIG_NFS_V4_1 */
- }
return ret;
}
#else
--
1.6.6.1


2010-06-11 07:32:32

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 19/26] pnfs-submit: export some commit error handling for use by layout drivers

There exists code to deal with a memory error during commit before the
RPC has been sent. Separate this out and export it for later use by the
filelayout driver.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/internal.h | 1 +
fs/nfs/write.c | 29 ++++++++++++++++++-----------
2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 517aa0b..b754446 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -288,6 +288,7 @@ extern int pnfs_initiate_commit(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops,
int how);
extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
+extern void nfs_mark_list_commit(struct list_head *head);
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
struct page *, struct page *);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ac2608f..40abbd0 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1377,6 +1377,23 @@ static int nfs_commit_rpcsetup(struct list_head *head,
how);
}

+/* Handle memory error during commit */
+void nfs_mark_list_commit(struct list_head *head)
+{
+ struct nfs_page *req;
+
+ while (!list_empty(head)) {
+ req = nfs_list_entry(head->next);
+ nfs_list_remove_request(req);
+ nfs_mark_request_commit(req);
+ dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+ dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+ BDI_RECLAIMABLE);
+ nfs_clear_page_tag_locked(req);
+ }
+}
+EXPORT_SYMBOL(nfs_mark_list_commit);
+
/*
* Commit dirty pages
*/
@@ -1384,25 +1401,15 @@ static int
nfs_commit_list(struct inode *inode, struct list_head *head, int how)
{
struct nfs_write_data *data;
- struct nfs_page *req;

data = nfs_commitdata_alloc();
-
if (!data)
goto out_bad;

/* Set up the argument struct */
return nfs_commit_rpcsetup(head, data, how);
out_bad:
- while (!list_empty(head)) {
- req = nfs_list_entry(head->next);
- nfs_list_remove_request(req);
- nfs_mark_request_commit(req);
- dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
- dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
- BDI_RECLAIMABLE);
- nfs_clear_page_tag_locked(req);
- }
+ nfs_mark_list_commit(head);
nfs_commit_clear_lock(NFS_I(inode));
return -ENOMEM;
}
--
1.6.6.1


2010-06-11 07:32:37

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 23/26] pnfs_submit: remove unecessary pnfs_fl_call_data field commit_through_mds

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4proc.c | 8 ++++----
include/linux/nfs_xdr.h | 1 -
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7e6cb89..192afc3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3362,17 +3362,17 @@ static void nfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_messa

#if defined(CONFIG_NFS_V4_1)
/*
- * pNFS doew not send a getattr to Data Serfers on commit.
+ * pNFS doew not send a getattr to Data Servers on commit.
*/
static void
pnfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_message *msg)
{
struct nfs_server *server = NFS_SERVER(data->inode);

- dprintk("--> %s ds_nfs_client %p commit_through_mds %d\n", __func__,
- data->fldata.ds_nfs_client, data->fldata.commit_through_mds);
+ dprintk("--> %s ds_nfs_client %p\n", __func__,
+ data->fldata.ds_nfs_client);

- if (!data->fldata.ds_nfs_client || data->fldata.commit_through_mds)
+ if (!data->fldata.ds_nfs_client)
return nfs4_proc_commit_setup(data, msg);

data->args.bitmask = server->attr_bitmask;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 38a5349..22113a1 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -984,7 +984,6 @@ struct pnfs_call_data {
struct pnfs_fl_call_data {
struct nfs_client *ds_nfs_client;
__u64 orig_offset;
- int commit_through_mds;
};
#endif /* CONFIG_NFS_V4_1 */

--
1.6.6.1


2010-06-11 07:32:35

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 22/26] pnfs_submit: remove unecessary pnfs_fl_call_data field pnfs_client

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 9 +++------
include/linux/nfs_xdr.h | 1 -
2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 756cb64..b82e4ff 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -216,7 +216,6 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,

/* just try the first data server for the index..*/
data->fldata.ds_nfs_client = ds->ds_clp;
- data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Now get the file offset on the dserver
@@ -230,7 +229,7 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
data->fldata.orig_offset = offset;

/* Perform an asynchronous read */
- nfs_initiate_read(data, data->fldata.pnfs_client,
+ nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
&filelayout_read_call_ops);

data->pdata.pnfs_error = 0;
@@ -269,7 +268,6 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);

data->fldata.ds_nfs_client = ds->ds_clp;
- data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Get the file offset on the dserver. Set the write offset to
@@ -281,7 +279,7 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
/* Perform an asynchronous write The offset will be reset in the
* call_ops->rpc_call_done() routine
*/
- nfs_initiate_write(data, data->fldata.pnfs_client,
+ nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
&filelayout_write_call_ops, sync);

data->pdata.pnfs_error = 0;
@@ -572,7 +570,7 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
struct nfs4_pnfs_ds *ds;

dprintk("%s data %p pnfs_client %p sync %d\n",
- __func__, data, data->fldata.pnfs_client, sync);
+ __func__, data, data->fldata.ds_nfs_client->cl_rpcclient, sync);

/* Alloc room for both in one go */
ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
@@ -643,7 +641,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
continue;
}
clnt = ds->ds_clp->cl_rpcclient;
- dsdata->fldata.pnfs_client = clnt;
dsdata->fldata.ds_nfs_client = ds->ds_clp;
dsdata->args.fh = \
nfs4_fl_select_ds_fh(LSEG_LD_DATA(req->wb_lseg),
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index a8b85b6..38a5349 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -982,7 +982,6 @@ struct pnfs_call_data {

/* files layout-type specific data for read, write, and commit */
struct pnfs_fl_call_data {
- struct rpc_clnt *pnfs_client; /* Holds pNFS device across async calls */
struct nfs_client *ds_nfs_client;
__u64 orig_offset;
int commit_through_mds;
--
1.6.6.1


2010-06-11 07:32:38

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 24/26] pnfs_submit: pnfs_update_layout can return void

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 27 +++++++++------------------
fs/nfs/pnfs.h | 11 ++++-------
2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e37b71e..ae76e88 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1050,7 +1050,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
* If lsegpp is given, the appropriate layout segment is referenced and
* returned to the caller.
*/
-int
+void
_pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
u64 count,
@@ -1068,14 +1068,12 @@ _pnfs_update_layout(struct inode *ino,
struct pnfs_layout_segment *lseg = NULL;
bool take_ref = (lsegpp != NULL);
DEFINE_WAIT(__wait);
- int result = 0;

if (take_ref)
*lsegpp = NULL;
lo = get_lock_alloc_layout(ino);
if (IS_ERR(lo)) {
dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
- result = PTR_ERR(lo);
goto out;
}

@@ -1086,7 +1084,6 @@ _pnfs_update_layout(struct inode *ino,
put_lseg_locked(lseg);
/* someone is cleaning the layout */
lseg = NULL;
- result = -EAGAIN;
goto out_put;
}

@@ -1109,21 +1106,18 @@ _pnfs_update_layout(struct inode *ino,
clear_bit(lo_fail_bit(iomode),
&nfsi->layout.pnfs_layout_state);
nfsi->layout.pnfs_layout_suspend = 0;
- } else {
- result = 1;
+ } else
goto out_put;
- }
}

/* Lose lock, but not reference, match this with pnfs_layout_release */
spin_unlock(&nfsi->lo_lock);

- result = get_layout(ino, ctx, &arg, lsegpp, lo);
+ get_layout(ino, ctx, &arg, lsegpp, lo);
out:
- dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
- __func__, result, nfsi->layout.pnfs_layout_state,
- lseg);
- return result;
+ dprintk("%s end, state 0x%lx lseg %p\n", __func__,
+ nfsi->layout.pnfs_layout_state, lseg);
+ return;
out_put:
if (lsegpp)
*lsegpp = lseg;
@@ -1364,7 +1358,6 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs_server *nfss = NFS_SERVER(inode);
size_t count = 0;
loff_t loff;
- int status = 0;

pgio->pg_iswrite = 0;
pgio->pg_boundary = 0;
@@ -1378,11 +1371,9 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
readahead_range(inode, pages, &loff, &count);

if (count > 0) {
- status = _pnfs_update_layout(inode, ctx, count,
- loff, IOMODE_READ,
- &pgio->pg_lseg);
- dprintk("%s virt update returned %d\n", __func__, status);
- if (status != 0)
+ _pnfs_update_layout(inode, ctx, count, loff, IOMODE_READ,
+ &pgio->pg_lseg);
+ if (!pgio->pg_lseg)
return;

pgio->pg_boundary = pnfs_getboundary(inode);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index bdf11bc..09de7a3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -32,7 +32,7 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait
extern const nfs4_stateid zero_stateid;

void put_lseg(struct pnfs_layout_segment *lseg);
-int _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
u64 count, loff_t pos, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp);

@@ -175,7 +175,7 @@ static inline int pnfs_return_layout(struct inode *ino,
return 0;
}

-static inline int pnfs_update_layout(struct inode *ino,
+static inline void pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
u64 count, loff_t pos, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp)
@@ -183,12 +183,10 @@ static inline int pnfs_update_layout(struct inode *ino,
struct nfs_server *nfss = NFS_SERVER(ino);

if (pnfs_enabled_sb(nfss))
- return _pnfs_update_layout(ino, ctx, count, pos,
- access_type, lsegpp);
+ _pnfs_update_layout(ino, ctx, count, pos, access_type, lsegpp);
else {
if (lsegpp)
*lsegpp = NULL;
- return 0;
}
}

@@ -220,14 +218,13 @@ static inline void put_lseg(struct pnfs_layout_segment *lseg)
{
}

-static inline int
+static inline void
pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
u64 count, loff_t pos, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp)
{
if (lsegpp)
*lsegpp = NULL;
- return 0;
}

static inline enum pnfs_try_status
--
1.6.6.1


2010-06-11 07:32:41

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 25/26] pnfs-submit: Revert "pnfs: pnfs_redirty_request"

The existance of req->wb_lseg can take the place of PG_USE_PNFS

This reverts commit 447b65adcc53ab21e76ce8795827df7d4c165af1.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.h | 9 ---------
fs/nfs/write.c | 2 +-
include/linux/nfs_page.h | 1 -
3 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 09de7a3..541e3fd 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -154,11 +154,6 @@ pnfs_try_to_commit(struct nfs_write_data *data,
return ret;
}

-static inline void pnfs_redirty_request(struct nfs_page *req)
-{
- clear_bit(PG_USE_PNFS, &req->wb_flags);
-}
-
static inline int pnfs_return_layout(struct inode *ino,
struct nfs4_pnfs_layout_segment *lseg,
const nfs4_stateid *stateid, /* optional */
@@ -248,10 +243,6 @@ pnfs_try_to_commit(struct nfs_write_data *data,
return PNFS_NOT_ATTEMPTED;
}

-static inline void pnfs_redirty_request(struct nfs_page *req)
-{
-}
-
static inline int pnfs_get_write_status(struct nfs_write_data *data)
{
return 0;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f1e4120..65e2c62 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -903,7 +903,7 @@ static void nfs_redirty_request(struct nfs_page *req)
{
struct page *page = req->wb_page;

- pnfs_redirty_request(req);
+ nfs_mark_request_nopnfs(req);
nfs_mark_request_dirty(req);
nfs_clear_page_tag_locked(req);
nfs_end_page_writeback(page);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 06e5157..7709d3e 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -32,7 +32,6 @@ enum {
PG_CLEAN,
PG_NEED_COMMIT,
PG_NEED_RESCHED,
- PG_USE_PNFS,
};

struct nfs_inode;
--
1.6.6.1


2010-06-11 07:32:42

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 26/26] pnfs-submit: Reorder arguments to pnfs_update_layout

offset and count were switched for some reason.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 2 +-
fs/nfs/pnfs.c | 4 ++--
fs/nfs/pnfs.h | 8 ++++----
fs/nfs/read.c | 2 +-
4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 409892f..3066141 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -399,7 +399,7 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,

pnfs_update_layout(mapping->host,
nfs_file_open_context(file),
- NFS4_MAX_UINT64, 0, IOMODE_RW,
+ 0, NFS4_MAX_UINT64, IOMODE_RW,
&lseg);
start:
/*
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index ae76e88..7d322c9 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1053,8 +1053,8 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
void
_pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
- u64 count,
loff_t pos,
+ u64 count,
enum pnfs_iomode iomode,
struct pnfs_layout_segment **lsegpp)
{
@@ -1371,7 +1371,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
readahead_range(inode, pages, &loff, &count);

if (count > 0) {
- _pnfs_update_layout(inode, ctx, count, loff, IOMODE_READ,
+ _pnfs_update_layout(inode, ctx, loff, count, IOMODE_READ,
&pgio->pg_lseg);
if (!pgio->pg_lseg)
return;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 541e3fd..75f6cf1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -33,7 +33,7 @@ extern const nfs4_stateid zero_stateid;

void put_lseg(struct pnfs_layout_segment *lseg);
void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
- u64 count, loff_t pos, enum pnfs_iomode access_type,
+ loff_t pos, u64 count, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp);

int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
@@ -172,13 +172,13 @@ static inline int pnfs_return_layout(struct inode *ino,

static inline void pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
- u64 count, loff_t pos, enum pnfs_iomode access_type,
+ loff_t pos, u64 count, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp)
{
struct nfs_server *nfss = NFS_SERVER(ino);

if (pnfs_enabled_sb(nfss))
- _pnfs_update_layout(ino, ctx, count, pos, access_type, lsegpp);
+ _pnfs_update_layout(ino, ctx, pos, count, access_type, lsegpp);
else {
if (lsegpp)
*lsegpp = NULL;
@@ -215,7 +215,7 @@ static inline void put_lseg(struct pnfs_layout_segment *lseg)

static inline void
pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
- u64 count, loff_t pos, enum pnfs_iomode access_type,
+ loff_t pos, u64 count, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp)
{
if (lsegpp)
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 68b4ca8..9cabf88 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -126,7 +126,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);
- pnfs_update_layout(inode, ctx, NFS4_MAX_UINT64, 0, IOMODE_READ, &lseg);
+ pnfs_update_layout(inode, ctx, 0, NFS4_MAX_UINT64, IOMODE_READ, &lseg);
new = nfs_create_request(ctx, inode, page, 0, len, lseg);
put_lseg(lseg);
if (IS_ERR(new)) {
--
1.6.6.1


2010-06-12 20:35:59

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 00/26] LAYOUT invocation v2

Fred, I merged those patches as well the rebased dependent patches
I took from your pnfs-block git tree - under pnfs-all-2.6.35-rc2-2010-06-12

Thanks!

Benny

On Jun. 11, 2010, 10:31 +0300, Fred Isaman <[email protected]> wrote:
> This is version 2 of a patch series that limits LAYOUTGET invocation
> to the beginning of the IO paths. It is intended for the pnfs_submit branch,
> without reversion in a post_submit branch.
>
> Patches 1-4 revert direct IO. Commit is already broken, and this
> series breaks them further. The problem is that the direct IO
> redefines data->wb_req and data->pages, so that it can only work with
> the pnfs code if we don't look at those fields. The reverted code should
> be saved somewhere. I tend to agree with Boaz that keeping it in git is preferable, but I can supply a patch which returns the code ifdef'ed out if tht is preferred.
>
> Patches 5-9 do some code cleanup in preperation for the real work.
>
> Patches 10-21 implement the change. NOTE that patch 20 changes the
> calling convention of the layout drivers commit calls. There is no
> longer a universal lseg for the commit, instead each nfs_page has an
> lseg attached, with NULL meaning to go through the MDS.
>
> Patches 22-26 rework the filelayout commit function, and then do some
> other code cleanup.
>
>
>
> The basic idea of these patches is as follows:
>
> We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the
> IO. If we succeed, we refcount and stash it, using it through the
> rest of the io. If we fail, we revert to straight nfs, even if the
> area becomes covered by a layout due to other io.
>
> The tricky, though hopefully anomalous, case is when we start without
> the layout, but have it at this particular stage of the IO. We ignore
> this for the moment at write_pages, which will cause block and object
> to issue CB_LAYOUTRECALL. At commit, it is tricky to handle, but
> since block doesn't use commit, and file needs to handle complicated
> splitting anyway, I just push all complicated decisions of splitting
> commit between nfs (for IO started without layout) and pnfs to the
> driver.
>
> Fred
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html