2014-04-24 18:15:35

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 00/18 v2] nfs: support multiple requests per page

This patchset changes the read and write paths to be more flexible in dealing
with requests that are not page aligned. Until now there was a 1:1 mapping
of struct nfs_page (referred to as "nfs requests") to struct page, which
limited the client to page aligned I/O in several pNFS scenarios.

This patchset allows multiple requests per page, loosely following
the approach taken with struct buffer_head (part of kernel bio interface).

With this patchset the client now supports:
- non-page-aligned O_DIRECT I/O to DSes (instead of reverting to MDS)
- arbitrary pnfs layout segment boundaries
- arbitrary pnfs filelayout stripe sizes

This patchset also includes a lot of cleanup - notably we no longer need
a separate code path to support rsize/wsize < PAGE_SIZE.

This new approach opens the door to many optimizations, such as not having to
flush a page on a non-contiguous write, but for the time being we are focusing
on correctness -- this patchset touches the read and write path for *all*
versions of NFS!

This has been tested against v2, v3, v4.0 and v4.1 (no pnfs) servers with
different rsize/wsize settings, and against pynfs filelayout servers hacked to
have non page aligned stripe sizes.

I had some code review already (with changes applied) and we've been testing
this pretty extensively for the last month+ - focusing mostly on v2, v3, v4.x
(no pnfs).

The patchset applies against Trond's testing branch, but should also include
the fix I posted earlier today: "pnfs: fix race in filelayout commit path"
as the race seems to be easier to hit with this patchset applied.

I'm pretty sure I didn't break anything in the object and block layouts, but
some extra attention there would be helpful.

I plan on sharing some performance numbers once I'm able to run some nfsometer
workloads. Expect them soon.

Changes in V2:
- now includes "pnfs: fix race in filelayout commit path" which was posted
separately, but is needed for testing

- applied fixup to "nfs: add support for multiple nfs reqs per page"
to fix ref counting issues.

- fixed "pnfs: support multiple verfs per direct req" to set dreq->flag
to _RESCHED_WRITES, just as before

- applied comment from Boaz to "nfs: modify pg_test interface to return size_t" and "nfs: chain calls to pg_test"

- I also rebased on top of trond's testing branch with recent pgio patches
from Christoph and Anna applied first. the patches are:

% git log --oneline HEAD~37..HEAD~18
c1dac13 NFS: Create a common nfs_pageio_ops struct
a9f5822 NFS: Create a common generic_pg_pgios()
aab1abe NFS: Create a common multiple_pgios() function
60695af NFS: Create a common initiate_pgio() function
3e2e4ad NFS: Create a generic_pgio function
741d1af NFS: Create a common pgio_error function
087f8d8 NFS: Create a common rpcsetup function for reads and writes
fce97cb NFS: Create a common rpc_call_ops struct
c4c63d7 NFS: Create a common nfs_pgio_result_common function
fcc4ad0 NFS: Create a common pgio_rpc_prepare function
06a5f36 NFS: Create a common rw_header_alloc and rw_header_free function
d4aca41 NFS: Create a common pgio_alloc and pgio_release function
80afdb2 NFS: Move the write verifier into the nfs_pgio_header
4f1abf9 NFS: Create a common read and write header struct
fc7a26b NFS: Create a common read and write data struct
81ae1c0 NFS: Create a common results structure for reads and writes
f824d97 NFS: Create a common argument structure for reads and writes
0d08e1b nfs: remove ->read_pageio_init from rpc ops
9c4bc08 nfs: remove ->write_pageio_init from rpc ops

This can be found on my (new) public repo at:

git://git.linux-nfs.org/projects/dros/linux-nfs

on the branch "pgio".

-dros



Weston Andros Adamson (18):
pnfs: fix race in filelayout commit path
nfs: clean up PG_* flags
nfs: remove unused arg from nfs_create_request
nfs: modify pg_test interface to return size_t
nfs: call nfs_can_coalesce_requests for every req
nfs: add support for multiple nfs reqs per page
nfs: page group syncing in read path
nfs: page group syncing in write path
nfs: page group support in nfs_mark_uptodate
pnfs: clean up filelayout_alloc_commit_info
nfs: allow coalescing of subpage requests
nfs: chain calls to pg_test
nfs: use > 1 request to handle bsize < PAGE_SIZE
nfs: remove list of [rw]data from pgio header
pnfs: support multiple verfs per direct req
pnfs: allow non page aligned pnfs layout segments
pnfs: filelayout: support non page aligned layouts
nfs: support page groups in nfs_read_completion

fs/nfs/blocklayout/blocklayout.c | 16 ++-
fs/nfs/direct.c | 107 +++++++++++++--
fs/nfs/nfs4filelayout.c | 142 ++++++++++---------
fs/nfs/objlayout/objio_osd.c | 18 ++-
fs/nfs/pageio.c | 91 +-----------
fs/nfs/pagelist.c | 290 +++++++++++++++++++++++++++++++++------
fs/nfs/pnfs.c | 77 +++++------
fs/nfs/pnfs.h | 3 +-
fs/nfs/read.c | 51 +++++--
fs/nfs/write.c | 119 +++++++++++++---
include/linux/nfs.h | 5 +-
include/linux/nfs_page.h | 33 +++--
include/linux/nfs_xdr.h | 6 +-
13 files changed, 659 insertions(+), 299 deletions(-)

--
1.8.5.2 (Apple Git-48)



2014-04-24 18:15:40

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 03/18 v2] nfs: remove unused arg from nfs_create_request

@inode is passed but not used.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 6 ++----
fs/nfs/pagelist.c | 4 +---
fs/nfs/read.c | 5 ++---
fs/nfs/write.c | 2 +-
include/linux/nfs_page.h | 1 -
5 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 164b016..1dd8c62 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -380,8 +380,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
/* XXX do we need to do the eof zeroing found in async_filler? */
- req = nfs_create_request(dreq->ctx, dreq->inode,
- pagevec[i],
+ req = nfs_create_request(dreq->ctx, pagevec[i],
pgbase, req_len);
if (IS_ERR(req)) {
result = PTR_ERR(req);
@@ -750,8 +749,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);

- req = nfs_create_request(dreq->ctx, dreq->inode,
- pagevec[i],
+ req = nfs_create_request(dreq->ctx, pagevec[i],
pgbase, req_len);
if (IS_ERR(req)) {
result = PTR_ERR(req);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 8221706..30c221c 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -136,7 +136,6 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
/**
* nfs_create_request - Create an NFS read/write request.
* @ctx: open context to use
- * @inode: inode to which the request is attached
* @page: page to write
* @offset: starting offset within the page for the write
* @count: number of bytes to read/write
@@ -146,8 +145,7 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
* User should ensure it is safe to sleep in this function.
*/
struct nfs_page *
-nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
- struct page *page,
+nfs_create_request(struct nfs_open_context *ctx, struct page *page,
unsigned int offset, unsigned int count)
{
struct nfs_page *req;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index d7e9460..a48b909 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -85,7 +85,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);
- new = nfs_create_request(ctx, inode, page, 0, len);
+ new = nfs_create_request(ctx, page, 0, len);
if (IS_ERR(new)) {
unlock_page(page);
return PTR_ERR(new);
@@ -303,7 +303,6 @@ static int
readpage_async_filler(void *data, struct page *page)
{
struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
- struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *new;
unsigned int len;
int error;
@@ -312,7 +311,7 @@ readpage_async_filler(void *data, struct page *page)
if (len == 0)
return nfs_return_empty_page(page);

- new = nfs_create_request(desc->ctx, inode, page, 0, len);
+ new = nfs_create_request(desc->ctx, page, 0, len);
if (IS_ERR(new))
goto out_error;

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3483dd4..5f1c965 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -761,7 +761,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
req = nfs_try_to_update_request(inode, page, offset, bytes);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, inode, page, offset, bytes);
+ req = nfs_create_request(ctx, page, offset, bytes);
if (IS_ERR(req))
goto out;
nfs_inode_add_request(inode, req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 91ac4d4..8cb2640 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -85,7 +85,6 @@ struct nfs_pageio_descriptor {
#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))

extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
- struct inode *inode,
struct page *page,
unsigned int offset,
unsigned int count);
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:47

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 07/18 v2] nfs: page group syncing in read path

Operations that modify state for a whole page must be syncronized across
all requests within a page group. In the read path, this is calling
unlock_page and SetPageUptodate. Both of these functions should not be
called until all requests in a page group have reached the point where
they would call them.

This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/read.c | 22 +++++++++++++++++-----
include/linux/nfs_page.h | 2 ++
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index f814e8a..a889319 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -105,10 +105,16 @@ static void nfs_readpage_release(struct nfs_page *req)
{
struct inode *d_inode = req->wb_context->dentry->d_inode;

- if (PageUptodate(req->wb_page))
- nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+ dprintk("NFS: read done (%s/%llu %d@%lld)\n", d_inode->i_sb->s_id,
+ (unsigned long long)NFS_FILEID(d_inode), req->wb_bytes,
+ (long long)req_offset(req));

- unlock_page(req->wb_page);
+ if (nfs_page_group_sync_on_bit(req, PG_UNLOCKPAGE)) {
+ if (PageUptodate(req->wb_page))
+ nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+
+ unlock_page(req->wb_page);
+ }

dprintk("NFS: read done (%s/%Lu %d@%Ld)\n",
req->wb_context->dentry->d_inode->i_sb->s_id,
@@ -118,6 +124,12 @@ static void nfs_readpage_release(struct nfs_page *req)
nfs_release_request(req);
}

+static void nfs_page_group_set_uptodate(struct nfs_page *req)
+{
+ if (nfs_page_group_sync_on_bit(req, PG_UPTODATE))
+ SetPageUptodate(req->wb_page);
+}
+
/* Note io was page aligned */
static void nfs_read_completion(struct nfs_pgio_header *hdr)
{
@@ -140,9 +152,9 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
bytes += req->wb_bytes;
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
if (bytes <= hdr->good_bytes)
- SetPageUptodate(page);
+ nfs_page_group_set_uptodate(req);
} else
- SetPageUptodate(page);
+ nfs_page_group_set_uptodate(req);
nfs_list_remove_request(req);
nfs_readpage_release(req);
}
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index b99deb7..002f999 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -29,6 +29,8 @@ enum {
PG_INODE_REF, /* extra ref held by inode (head req only) */
PG_HEADLOCK, /* page group lock of wb_head */
PG_TEARDOWN, /* page group sync for destroy */
+ PG_UNLOCKPAGE, /* page group sync bit in read path */
+ PG_UPTODATE, /* page group sync bit in read path */
};

struct nfs_inode;
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:42

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 04/18 v2] nfs: modify pg_test interface to return size_t

This is a step toward allowing pg_test to inform the the
coalescing code to reduce the size of requests so they may fit in
whatever scheme the pg_test callback wants to define.

For now, just return the size of the request if there is space, or 0
if there is not. This shouldn't change any behavior as it acts
the same as when the pg_test functions returned bool.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/blocklayout/blocklayout.c | 16 ++++++++++++----
fs/nfs/nfs4filelayout.c | 12 +++++++-----
fs/nfs/objlayout/objio_osd.c | 15 ++++++++++-----
fs/nfs/pagelist.c | 22 +++++++++++++++++++---
fs/nfs/pnfs.c | 12 +++++++++---
fs/nfs/pnfs.h | 3 ++-
include/linux/nfs_page.h | 5 +++--
7 files changed, 62 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 206cc68..9b431f4 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -1189,13 +1189,17 @@ bl_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
pnfs_generic_pg_init_read(pgio, req);
}

-static bool
+/*
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
+ */
+static size_t
bl_pg_test_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
if (pgio->pg_dreq != NULL &&
!is_aligned_req(req, SECTOR_SIZE))
- return false;
+ return 0;

return pnfs_generic_pg_test(pgio, prev, req);
}
@@ -1241,13 +1245,17 @@ bl_pg_init_write(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
}
}

-static bool
+/*
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
+ */
+static size_t
bl_pg_test_write(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
if (pgio->pg_dreq != NULL &&
!is_aligned_req(req, PAGE_CACHE_SIZE))
- return false;
+ return 0;

return pnfs_generic_pg_test(pgio, prev, req);
}
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 9fd7ceb..ba9a9aa 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -915,10 +915,10 @@ filelayout_alloc_lseg(struct pnfs_layout_hdr *layoutid,
/*
* filelayout_pg_test(). Called by nfs_can_coalesce_requests()
*
- * return true : coalesce page
- * return false : don't coalesce page
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
*/
-static bool
+static size_t
filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
@@ -927,7 +927,7 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,

if (!pnfs_generic_pg_test(pgio, prev, req) ||
!nfs_generic_pg_test(pgio, prev, req))
- return false;
+ return 0;

p_stripe = (u64)req_offset(prev);
r_stripe = (u64)req_offset(req);
@@ -936,7 +936,9 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
do_div(p_stripe, stripe_unit);
do_div(r_stripe, stripe_unit);

- return (p_stripe == r_stripe);
+ if (p_stripe == r_stripe)
+ return req->wb_bytes;
+ return 0;
}

static void
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 426b366..71b9c69 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -564,14 +564,19 @@ int objio_write_pagelist(struct nfs_pgio_data *wdata, int how)
return 0;
}

-static bool objio_pg_test(struct nfs_pageio_descriptor *pgio,
+/*
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
+ */
+static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req)
{
- if (!pnfs_generic_pg_test(pgio, prev, req))
- return false;
+ if (!pnfs_generic_pg_test(pgio, prev, req) ||
+ pgio->pg_count + req->wb_bytes >
+ (unsigned long)pgio->pg_layout_private)
+ return 0;

- return pgio->pg_count + req->wb_bytes <=
- (unsigned long)pgio->pg_layout_private;
+ return req->wb_bytes;
}

static void objio_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 30c221c..4cd7a2a 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -277,7 +277,17 @@ nfs_wait_on_request(struct nfs_page *req)
TASK_UNINTERRUPTIBLE);
}

-bool nfs_generic_pg_test(struct nfs_pageio_descriptor *desc, struct nfs_page *prev, struct nfs_page *req)
+/*
+ * nfs_generic_pg_test - determine if requests can be coalesced
+ * @desc: pointer to descriptor
+ * @prev: previous request in desc, or NULL
+ * @req: this request
+ *
+ * Returns zero if @req can be coalesced into @desc, otherwise it returns
+ * the size of the request.
+ */
+size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
+ struct nfs_page *prev, struct nfs_page *req)
{
/*
* FIXME: ideally we should be able to coalesce all requests
@@ -289,7 +299,9 @@ bool nfs_generic_pg_test(struct nfs_pageio_descriptor *desc, struct nfs_page *pr
if (desc->pg_bsize < PAGE_SIZE)
return 0;

- return desc->pg_count + req->wb_bytes <= desc->pg_bsize;
+ if (desc->pg_count + req->wb_bytes <= desc->pg_bsize)
+ return req->wb_bytes;
+ return 0;
}
EXPORT_SYMBOL_GPL(nfs_generic_pg_test);

@@ -356,6 +368,8 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
struct nfs_page *req,
struct nfs_pageio_descriptor *pgio)
{
+ size_t size;
+
if (!nfs_match_open_context(req->wb_context, prev->wb_context))
return false;
if (req->wb_context->dentry->d_inode->i_flock != NULL &&
@@ -367,7 +381,9 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
return false;
if (req_offset(req) != req_offset(prev) + prev->wb_bytes)
return false;
- return pgio->pg_ops->pg_test(pgio, prev, req);
+ size = pgio->pg_ops->pg_test(pgio, prev, req);
+ WARN_ON_ONCE(size && size != req->wb_bytes);
+ return size > 0;
}

/**
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0fe6701..de6eb16 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1434,7 +1434,11 @@ pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_init_write);

-bool
+/*
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
+ */
+size_t
pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
@@ -1455,8 +1459,10 @@ pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
* first byte that lies outside the pnfs_layout_range. FIXME?
*
*/
- return req_offset(req) < end_offset(pgio->pg_lseg->pls_range.offset,
- pgio->pg_lseg->pls_range.length);
+ if (req_offset(req) < end_offset(pgio->pg_lseg->pls_range.offset,
+ pgio->pg_lseg->pls_range.length))
+ return req->wb_bytes;
+ return 0;
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_test);

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 0031267..dccf182 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -187,7 +187,8 @@ int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc);
void pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
struct nfs_page *req, u64 wb_size);
int pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc);
-bool pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, struct nfs_page *req);
+size_t pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *prev, struct nfs_page *req);
void pnfs_set_lo_fail(struct pnfs_layout_segment *lseg);
struct pnfs_layout_segment *pnfs_layout_process(struct nfs4_layoutget *lgp);
void pnfs_free_lseg_list(struct list_head *tmp_list);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 8cb2640..40faddd 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -46,7 +46,8 @@ struct nfs_page {
struct nfs_pageio_descriptor;
struct nfs_pageio_ops {
void (*pg_init)(struct nfs_pageio_descriptor *, struct nfs_page *);
- bool (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
+ size_t (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
+ struct nfs_page *);
int (*pg_doio)(struct nfs_pageio_descriptor *);
};

@@ -102,7 +103,7 @@ extern int nfs_pageio_add_request(struct nfs_pageio_descriptor *,
struct nfs_page *);
extern void nfs_pageio_complete(struct nfs_pageio_descriptor *desc);
extern void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *, pgoff_t);
-extern bool nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
+extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev,
struct nfs_page *req);
extern int nfs_wait_on_request(struct nfs_page *);
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:16:05

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 18/18 v2] nfs: support page groups in nfs_read_completion

nfs_read_completion relied on the fact that there was a 1:1 mapping
of page to nfs_request, but this has now changed.

Regions not covered by a request have already been zeroed elsewhere.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/read.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index a889319..3ea06f0 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -130,7 +130,6 @@ static void nfs_page_group_set_uptodate(struct nfs_page *req)
SetPageUptodate(req->wb_page);
}

-/* Note io was page aligned */
static void nfs_read_completion(struct nfs_pgio_header *hdr)
{
unsigned long bytes = 0;
@@ -140,14 +139,25 @@ static void nfs_read_completion(struct nfs_pgio_header *hdr)
while (!list_empty(&hdr->pages)) {
struct nfs_page *req = nfs_list_entry(hdr->pages.next);
struct page *page = req->wb_page;
+ unsigned long start = req->wb_pgbase;
+ unsigned long end = req->wb_pgbase + req->wb_bytes;

if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) {
- if (bytes > hdr->good_bytes)
- zero_user(page, 0, PAGE_SIZE);
- else if (hdr->good_bytes - bytes < PAGE_SIZE)
- zero_user_segment(page,
- hdr->good_bytes & ~PAGE_MASK,
- PAGE_SIZE);
+ /* note: regions of the page not covered by a
+ * request are zeroed in nfs_readpage_async /
+ * readpage_async_filler */
+ if (bytes > hdr->good_bytes) {
+ /* nothing in this request was good, so zero
+ * the full extent of the request */
+ zero_user_segment(page, start, end);
+
+ } else if (hdr->good_bytes - bytes < req->wb_bytes) {
+ /* part of this request has good bytes, but
+ * not all. zero the bad bytes */
+ start += hdr->good_bytes - bytes;
+ WARN_ON(start < req->wb_pgbase);
+ zero_user_segment(page, start, end);
+ }
}
bytes += req->wb_bytes;
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
--
1.8.5.2 (Apple Git-48)


2014-04-24 20:49:18

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 06/18 v2] nfs: add support for multiple nfs reqs per page

On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
> Add "page groups" - a circular list of nfs requests (struct nfs_page)
> that all reference the same page. This gives nfs read and write paths
> the ability to account for sub-page regions independently. This
> somewhat follows the design of struct buffer_head's sub-page
> accounting.
>
> Only "head" requests are ever added/removed from the inode list in
> the buffered write path. "head" and "sub" requests are treated the
> same through the read path and the rest of the write/commit path.
> Requests are given an extra reference across the life of the list.
>
> Page groups are never rejoined after being split. If the read/write
> request fails and the client falls back to another path (ie revert
> to MDS in PNFS case), the already split requests are pushed through
> the recoalescing code again, which may split them further and then
> coalesce them into properly sized requests on the wire. Fragmentation
> shouldn't be a problem with the current design, because we flush all
> requests in page group when a non-contiguous request is added, so
> the only time resplitting should occur is on a resend of a read or
> write.
>
> This patch lays the groundwork for sub-page splitting, but does not
> actually do any splitting. For now all page groups have one request
> as pg_test functions don't yet split pages. There are several related
> patches that are needed support multiple requests per page group.
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> ---
> fs/nfs/direct.c | 7 +-
> fs/nfs/pagelist.c | 224 ++++++++++++++++++++++++++++++++++++++++++++---
> fs/nfs/read.c | 4 +-
> fs/nfs/write.c | 13 ++-
> include/linux/nfs_page.h | 13 ++-
> 5 files changed, 240 insertions(+), 21 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 1dd8c62..2c0e08f 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -380,7 +380,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
> struct nfs_page *req;
> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
> /* XXX do we need to do the eof zeroing found in async_filler? */
> - req = nfs_create_request(dreq->ctx, pagevec[i],
> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
> pgbase, req_len);
> if (IS_ERR(req)) {
> result = PTR_ERR(req);
> @@ -749,7 +749,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
> struct nfs_page *req;
> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>
> - req = nfs_create_request(dreq->ctx, pagevec[i],
> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
> pgbase, req_len);
> if (IS_ERR(req)) {
> result = PTR_ERR(req);
> @@ -827,6 +827,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
> spin_unlock(&dreq->lock);
>
> while (!list_empty(&hdr->pages)) {
> + bool do_destroy = true;
> +
> req = nfs_list_entry(hdr->pages.next);
> nfs_list_remove_request(req);
> switch (bit) {
> @@ -834,6 +836,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
> case NFS_IOHDR_NEED_COMMIT:
> kref_get(&req->wb_kref);
> nfs_mark_request_commit(req, hdr->lseg, &cinfo);
> + do_destroy = false;
> }
> nfs_unlock_and_release_request(req);
> }
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index b120728..d57d675 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -26,6 +26,8 @@
>
> static struct kmem_cache *nfs_page_cachep;
>
> +static void nfs_free_request(struct nfs_page *);
> +
> bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
> {
> p->npages = pagecount;
> @@ -133,10 +135,151 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
> return __nfs_iocounter_wait(c);
> }
>
> +/*
> + * nfs_page_group_lock - lock the head of the page group
> + * @req - request in group that is to be locked
> + *
> + * this lock must be held if modifying the page group list
> + */
> +void
> +nfs_page_group_lock(struct nfs_page *req)
> +{
> + struct nfs_page *head = req->wb_head;
> + int err = -EAGAIN;
> +
> + WARN_ON_ONCE(head != head->wb_head);
> +
> + while (err)
> + err = wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
> + nfs_wait_bit_killable, TASK_KILLABLE);
> +}
> +
> +/*
> + * nfs_page_group_unlock - unlock the head of the page group
> + * @req - request in group that is to be unlocked
> + */
> +void
> +nfs_page_group_unlock(struct nfs_page *req)
> +{
> + struct nfs_page *head = req->wb_head;
> +
> + WARN_ON_ONCE(head != head->wb_head);
> +
> + smp_mb__before_clear_bit();
> + clear_bit(PG_HEADLOCK, &head->wb_flags);
> + smp_mb__after_clear_bit();
> + wake_up_bit(&head->wb_flags, PG_HEADLOCK);
> +}
> +
> +/*
> + * nfs_page_group_sync_on_bit_locked
> + *
> + * must be called with page group lock held
> + */
> +static bool
> +nfs_page_group_sync_on_bit_locked(struct nfs_page *req, unsigned int bit)
> +{
> + struct nfs_page *head = req->wb_head;
> + struct nfs_page *tmp;
> +
> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_flags));
> + WARN_ON_ONCE(test_and_set_bit(bit, &req->wb_flags));
> +
> + tmp = req->wb_this_page;
> + while (tmp != req) {
> + if (!test_bit(bit, &tmp->wb_flags))
> + return false;
> + tmp = tmp->wb_this_page;
> + }
> +
> + /* true! reset all bits */
> + tmp = req;
> + do {
> + clear_bit(bit, &tmp->wb_flags);
> + tmp = tmp->wb_this_page;
> + } while (tmp != req);
> +
> + return true;
> +}
> +
> +/*
> + * nfs_page_group_sync_on_bit - set bit on current request, but only
> + * return true if the bit is set for all requests in page group
> + * @req - request in page group
> + * @bit - PG_* bit that is used to sync page group
> + */
> +bool nfs_page_group_sync_on_bit(struct nfs_page *req, unsigned int bit)
> +{
> + bool ret;
> +
> + nfs_page_group_lock(req);
> + ret = nfs_page_group_sync_on_bit_locked(req, bit);
> + nfs_page_group_unlock(req);
> +
> + return ret;
> +}
> +
> +/*
> + * nfs_page_group_init - Initialize the page group linkage for @req
> + * @req - a new nfs request
> + * @prev - the previous request in page group, or NULL if @req is the first
> + * or only request in the group (the head).
> + */
> +static inline void
> +nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
> +{
> + WARN_ON_ONCE(prev == req);
> +
> + if (!prev) {
> + req->wb_head = req;
> + req->wb_this_page = req;
> + } else {
> + WARN_ON_ONCE(prev->wb_this_page != prev->wb_head);
> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags));
> + req->wb_head = prev->wb_head;
> + req->wb_this_page = prev->wb_this_page;
> + prev->wb_this_page = req;
> +
> + /* grab extra ref if head request has extra ref from
> + * the write/commit path to handle handoff between write
> + * and commit lists */
> + if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags))
> + kref_get(&req->wb_kref);
> + }
> +}
> +
> +/*
> + * nfs_page_group_destroy - sync the destruction of page groups
> + * @req - request that no longer needs the page group
> + *
> + * releases the page group reference from each member once all
> + * members have called this function.
> + */
> +static void
> +nfs_page_group_destroy(struct kref *kref)
> +{
> + struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
> + struct nfs_page *tmp, *next;
> +
> + if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
> + return;
> +
> + tmp = req;
> + do {
> + next = tmp->wb_this_page;
> + /* unlink and free */
> + tmp->wb_this_page = tmp;
> + tmp->wb_head = tmp;
> + nfs_free_request(tmp);
> + tmp = next;
> + } while (tmp != req);
> +}
> +
> /**
> * nfs_create_request - Create an NFS read/write request.
> * @ctx: open context to use
> * @page: page to write
> + * @last: last nfs request created for this page group or NULL if head
> * @offset: starting offset within the page for the write
> * @count: number of bytes to read/write
> *
> @@ -146,7 +289,8 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
> */
> struct nfs_page *
> nfs_create_request(struct nfs_open_context *ctx, struct page *page,
> - unsigned int offset, unsigned int count)
> + struct nfs_page *last, unsigned int offset,
> + unsigned int count)
> {
> struct nfs_page *req;
> struct nfs_lock_context *l_ctx;
> @@ -178,6 +322,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
> req->wb_bytes = count;
> req->wb_context = get_nfs_open_context(ctx);
> kref_init(&req->wb_kref);
> + nfs_page_group_init(req, last);
> return req;
> }
>
> @@ -235,16 +380,22 @@ static void nfs_clear_request(struct nfs_page *req)
> }
> }
>
> -
> /**
> * nfs_release_request - Release the count on an NFS read/write request
> * @req: request to release
> *
> * Note: Should never be called with the spinlock held!
> */
> -static void nfs_free_request(struct kref *kref)
> +static void nfs_free_request(struct nfs_page *req)
> {
> - struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
> + WARN_ON_ONCE(req->wb_this_page != req);
> +
> + /* extra debug: make sure no sync bits are still set */
> + WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
> + WARN_ON_ONCE(test_bit(PG_UNLOCKPAGE, &req->wb_flags));
> + WARN_ON_ONCE(test_bit(PG_UPTODATE, &req->wb_flags));
> + WARN_ON_ONCE(test_bit(PG_WB_END, &req->wb_flags));
My compiler doesn't know about PG_UNLOCKPAGE, PG_UPTODATE or PG_WB_END. Did you forget to add something?

Anna
> + WARN_ON_ONCE(test_bit(PG_REMOVE, &req->wb_flags));
>
> /* Release struct file and open context */
> nfs_clear_request(req);
> @@ -253,7 +404,7 @@ static void nfs_free_request(struct kref *kref)
>
> void nfs_release_request(struct nfs_page *req)
> {
> - kref_put(&req->wb_kref, nfs_free_request);
> + kref_put(&req->wb_kref, nfs_page_group_destroy);
> }
>
> static int nfs_wait_bit_uninterruptible(void *word)
> @@ -441,21 +592,66 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
> * @desc: destination io descriptor
> * @req: request
> *
> + * This may split a request into subrequests which are all part of the
> + * same page group.
> + *
> * Returns true if the request 'req' was successfully coalesced into the
> * existing list of pages 'desc'.
> */
> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> struct nfs_page *req)
> {
> - while (!nfs_pageio_do_add_request(desc, req)) {
> - desc->pg_moreio = 1;
> - nfs_pageio_doio(desc);
> - if (desc->pg_error < 0)
> - return 0;
> - desc->pg_moreio = 0;
> - if (desc->pg_recoalesce)
> - return 0;
> - }
> + struct nfs_page *subreq;
> + unsigned int bytes_left = 0;
> + unsigned int offset, pgbase;
> +
> + nfs_page_group_lock(req);
> +
> + subreq = req;
> + bytes_left = subreq->wb_bytes;
> + offset = subreq->wb_offset;
> + pgbase = subreq->wb_pgbase;
> +
> + do {
> + if (!nfs_pageio_do_add_request(desc, subreq)) {
> + /* make sure pg_test call(s) did nothing */
> + WARN_ON_ONCE(subreq->wb_bytes != bytes_left);
> + WARN_ON_ONCE(subreq->wb_offset != offset);
> + WARN_ON_ONCE(subreq->wb_pgbase != pgbase);
> +
> + nfs_page_group_unlock(req);
> + desc->pg_moreio = 1;
> + nfs_pageio_doio(desc);
> + if (desc->pg_error < 0)
> + return 0;
> + desc->pg_moreio = 0;
> + if (desc->pg_recoalesce)
> + return 0;
> + /* retry add_request for this subreq */
> + nfs_page_group_lock(req);
> + continue;
> + }
> +
> + /* check for buggy pg_test call(s) */
> + WARN_ON_ONCE(subreq->wb_bytes + subreq->wb_pgbase > PAGE_SIZE);
> + WARN_ON_ONCE(subreq->wb_bytes > bytes_left);
> + WARN_ON_ONCE(subreq->wb_bytes == 0);
> +
> + bytes_left -= subreq->wb_bytes;
> + offset += subreq->wb_bytes;
> + pgbase += subreq->wb_bytes;
> +
> + if (bytes_left) {
> + subreq = nfs_create_request(req->wb_context,
> + req->wb_page,
> + subreq, pgbase, bytes_left);
> + nfs_lock_request(subreq);
> + subreq->wb_offset = offset;
> + subreq->wb_index = req->wb_index;
> + }
> + } while (bytes_left > 0);
> +
> + nfs_page_group_unlock(req);
> return 1;
> }
>
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index a48b909..f814e8a 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -85,7 +85,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> len = nfs_page_length(page);
> if (len == 0)
> return nfs_return_empty_page(page);
> - new = nfs_create_request(ctx, page, 0, len);
> + new = nfs_create_request(ctx, page, NULL, 0, len);
> if (IS_ERR(new)) {
> unlock_page(page);
> return PTR_ERR(new);
> @@ -311,7 +311,7 @@ readpage_async_filler(void *data, struct page *page)
> if (len == 0)
> return nfs_return_empty_page(page);
>
> - new = nfs_create_request(desc->ctx, page, 0, len);
> + new = nfs_create_request(desc->ctx, page, NULL, 0, len);
> if (IS_ERR(new))
> goto out_error;
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 5f1c965..a65755a 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -367,6 +367,8 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
> {
> struct nfs_inode *nfsi = NFS_I(inode);
>
> + WARN_ON_ONCE(req->wb_this_page != req);
> +
> /* Lock the request! */
> nfs_lock_request(req);
>
> @@ -383,6 +385,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
> set_page_private(req->wb_page, (unsigned long)req);
> }
> nfsi->npages++;
> + set_bit(PG_INODE_REF, &req->wb_flags);
> kref_get(&req->wb_kref);
> spin_unlock(&inode->i_lock);
> }
> @@ -567,6 +570,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
> {
> struct nfs_commit_info cinfo;
> unsigned long bytes = 0;
> + bool do_destroy;
>
> if (test_bit(NFS_IOHDR_REDO, &hdr->flags))
> goto out;
> @@ -596,6 +600,7 @@ remove_req:
> next:
> nfs_unlock_request(req);
> nfs_end_page_writeback(req->wb_page);
> + do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags);
> nfs_release_request(req);
> }
> out:
> @@ -700,6 +705,10 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
> if (req == NULL)
> goto out_unlock;
>
> + /* should be handled by nfs_flush_incompatible */
> + WARN_ON_ONCE(req->wb_head != req);
> + WARN_ON_ONCE(req->wb_this_page != req);
> +
> rqend = req->wb_offset + req->wb_bytes;
> /*
> * Tell the caller to flush out the request if
> @@ -761,7 +770,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
> req = nfs_try_to_update_request(inode, page, offset, bytes);
> if (req != NULL)
> goto out;
> - req = nfs_create_request(ctx, page, offset, bytes);
> + req = nfs_create_request(ctx, page, NULL, offset, bytes);
> if (IS_ERR(req))
> goto out;
> nfs_inode_add_request(inode, req);
> @@ -805,6 +814,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
> return 0;
> l_ctx = req->wb_lock_context;
> do_flush = req->wb_page != page || req->wb_context != ctx;
> + /* for now, flush if more than 1 request in page_group */
> + do_flush |= req->wb_this_page != req;
> if (l_ctx && ctx->dentry->d_inode->i_flock != NULL) {
> do_flush |= l_ctx->lockowner.l_owner != current->files
> || l_ctx->lockowner.l_pid != current->tgid;
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index 40faddd..b99deb7 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -26,6 +26,9 @@ enum {
> PG_MAPPED, /* page private set for buffered io */
> PG_CLEAN, /* write succeeded */
> PG_COMMIT_TO_DS, /* used by pnfs layouts */
> + PG_INODE_REF, /* extra ref held by inode (head req only) */
> + PG_HEADLOCK, /* page group lock of wb_head */
> + PG_TEARDOWN, /* page group sync for destroy */
> };
>
> struct nfs_inode;
> @@ -41,6 +44,8 @@ struct nfs_page {
> struct kref wb_kref; /* reference count */
> unsigned long wb_flags;
> struct nfs_write_verifier wb_verf; /* Commit cookie */
> + struct nfs_page *wb_this_page; /* list of reqs for this page */
> + struct nfs_page *wb_head; /* head pointer for req list */
> };
>
> struct nfs_pageio_descriptor;
> @@ -87,9 +92,10 @@ struct nfs_pageio_descriptor {
>
> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
> struct page *page,
> + struct nfs_page *last,
> unsigned int offset,
> unsigned int count);
> -extern void nfs_release_request(struct nfs_page *req);
> +extern void nfs_release_request(struct nfs_page *);
>
>
> extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
> @@ -108,7 +114,10 @@ extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
> struct nfs_page *req);
> extern int nfs_wait_on_request(struct nfs_page *);
> extern void nfs_unlock_request(struct nfs_page *req);
> -extern void nfs_unlock_and_release_request(struct nfs_page *req);
> +extern void nfs_unlock_and_release_request(struct nfs_page *);
> +extern void nfs_page_group_lock(struct nfs_page *);
> +extern void nfs_page_group_unlock(struct nfs_page *);
> +extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);
>
> /*
> * Lock the page of an asynchronous request


2014-04-24 18:15:39

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 02/18 v2] nfs: clean up PG_* flags

Remove unused flags PG_NEED_COMMIT and PG_NEED_RESCHED.
Add comments describing how each flag is used.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
include/linux/nfs_page.h | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index dc5e71c..91ac4d4 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -22,12 +22,10 @@
* Valid flags for a dirty buffer
*/
enum {
- PG_BUSY = 0,
- PG_MAPPED,
- PG_CLEAN,
- PG_NEED_COMMIT,
- PG_NEED_RESCHED,
- PG_COMMIT_TO_DS,
+ PG_BUSY = 0, /* nfs_{un}lock_request */
+ PG_MAPPED, /* page private set for buffered io */
+ PG_CLEAN, /* write succeeded */
+ PG_COMMIT_TO_DS, /* used by pnfs layouts */
};

struct nfs_inode;
--
1.8.5.2 (Apple Git-48)


2014-04-25 16:04:35

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

On Apr 25, 2014, at 11:53 AM, Anna Schumaker <[email protected]> wrote:

> On 04/25/2014 11:38 AM, Weston Andros Adamson wrote:
>> On Apr 25, 2014, at 11:12 AM, Weston Andros Adamson <[email protected]> wrote:
>>
>>> On Apr 25, 2014, at 10:15 AM, Anna Schumaker <[email protected]> wrote:
>>>
>>>> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>>>>> Support direct requests that span multiple pnfs data servers by
>>>>> comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
>>>>> Continue to use dreq->verf if the MDS is used / non-pNFS.
>>>>>
>>>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>>>> ---
>>>>> fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>>> fs/nfs/nfs4filelayout.c | 6 +++
>>>>> include/linux/nfs.h | 5 ++-
>>>>> include/linux/nfs_xdr.h | 2 +
>>>>> 4 files changed, 105 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>>>> index 2c0e08f..9349933 100644
>>>>> --- a/fs/nfs/direct.c
>>>>> +++ b/fs/nfs/direct.c
>>>>> @@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
>>>>> return atomic_dec_and_test(&dreq->io_count);
>>>>> }
>>>>>
>>>>> +/*
>>>>> + * nfs_direct_select_verf - select the right verifier
>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>> + * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
>>>>> + * @ds_idx - index of data server in data server list, only valid if ds_clp set
>>>>> + *
>>>>> + * returns the correct verifier to use given the role of the server
>>>>> + */
>>>>> +static struct nfs_writeverf *
>>>>> +nfs_direct_select_verf(struct nfs_direct_req *dreq,
>>>>> + struct nfs_client *ds_clp,
>>>>> + int ds_idx)
>>>>> +{
>>>>> + struct nfs_writeverf *verfp = &dreq->verf;
>>>>> +
>>>>> + if (ds_clp) {
>>>>> + /* pNFS is in use, use the DS verf */
>>>>> + if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
>>>> The struct pnfs_ds_commit_info is empty if CONFIG_NFS_V4_1=n, so this won't compile.
>>>>
>>>> Anna
>>> Good catch, I�ll iterate through the patchset and test with v4.1 disabled.
>>>
>>> Time to add some #ifdefs
>>>
>>> -dros
>> That was the only problem I found with 4.1 disabled.
>>
>> Fixed and pushed.
>
> Thanks! I'm now getting a "defined but not used" error if just v2 is enabled:
>
> NFS_V4_2=n PNFS_OBJLAYOUT=n PNFS_BLOCK=n PNFS_FILE_LAYOUT=n NFS_V4_1=n NFS_V4=n NFS_V3=n NFS_V2=y
>
> fs/nfs/direct.c:189:12: error: ‘nfs_direct_cmp_commit_data_verf’ defined but not used [-Werror=unused-function]
> static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>
> Anna

Oh, i didn’t find that because the direct commit path uses:

IS_ENABLED(CONFIG_NFS_V3) || IS_ENABLED(CONFIG_NFS_V4)

fixed and pushed.

-dros

>
>>
>> -dros
>>
>>>>> + verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
>>>>> + else
>>>>> + WARN_ON_ONCE(1);
>>>>> + }
>>>>> + return verfp;
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/*
>>>>> + * nfs_direct_set_hdr_verf - set the write/commit verifier
>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>> + * @hdr - pageio header to validate against previously seen verfs
>>>>> + *
>>>>> + * Set the server's (MDS or DS) "seen" verifier
>>>>> + */
>>>>> +static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
>>>>> + struct nfs_pgio_header *hdr)
>>>>> +{
>>>>> + struct nfs_writeverf *verfp;
>>>>> +
>>>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>>>> + hdr->data->ds_idx);
>>>>> + WARN_ON_ONCE(verfp->committed >= 0);
>>>>> + memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>>>> + WARN_ON_ONCE(verfp->committed < 0);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>> + * @hdr - pageio header to validate against previously seen verf
>>>>> + *
>>>>> + * set the server's "seen" verf if not initialized.
>>>>> + * returns result of comparison between @hdr->verf and the "seen"
>>>>> + * verf of the server used by @hdr (DS or MDS)
>>>>> + */
>>>>> +static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
>>>>> + struct nfs_pgio_header *hdr)
>>>>> +{
>>>>> + struct nfs_writeverf *verfp;
>>>>> +
>>>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>>>> + hdr->data->ds_idx);
>>>>> + if (verfp->committed < 0) {
>>>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>>>> + return 0;
>>>>> + }
>>>>> + return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>> + * @data - commit data to validate against previously seen verf
>>>>> + *
>>>>> + * returns result of comparison between @data->verf and the verf of
>>>>> + * the server used by @data (DS or MDS)
>>>>> + */
>>>>> +static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>>>>> + struct nfs_commit_data *data)
>>>>> +{
>>>>> + struct nfs_writeverf *verfp;
>>>>> +
>>>>> + verfp = nfs_direct_select_verf(dreq, data->ds_clp,
>>>>> + data->ds_commit_index);
>>>>> + WARN_ON_ONCE(verfp->committed < 0);
>>>>> + return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * nfs_direct_IO - NFS address space operation for direct I/O
>>>>> * @rw: direction (read or write)
>>>>> @@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
>>>>> kref_get(&dreq->kref);
>>>>> init_completion(&dreq->completion);
>>>>> INIT_LIST_HEAD(&dreq->mds_cinfo.list);
>>>>> + dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
>>>>> INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
>>>>> spin_lock_init(&dreq->lock);
>>>>>
>>>>> @@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
>>>>> dprintk("NFS: %5u commit failed with error %d.\n",
>>>>> data->task.tk_pid, status);
>>>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>>> - } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
>>>>> + } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
>>>>> dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
>>>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>>> }
>>>>> @@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>>>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>>>>> bit = NFS_IOHDR_NEED_RESCHED;
>>>>> else if (dreq->flags == 0) {
>>>>> - memcpy(&dreq->verf, &hdr->verf,
>>>>> - sizeof(dreq->verf));
>>>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>>>> bit = NFS_IOHDR_NEED_COMMIT;
>>>>> dreq->flags = NFS_ODIRECT_DO_COMMIT;
>>>>> } else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
>>>>> - if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
>>>>> - dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>>> + if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
>>>>> + dreq->flags =
>>>>> + NFS_ODIRECT_RESCHED_WRITES;
>>>>> bit = NFS_IOHDR_NEED_RESCHED;
>>>>> } else
>>>>> bit = NFS_IOHDR_NEED_COMMIT;
>>>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>>>>> index 7a665e0..0ebc521 100644
>>>>> --- a/fs/nfs/nfs4filelayout.c
>>>>> +++ b/fs/nfs/nfs4filelayout.c
>>>>> @@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
>>>>> /* No multipath support. Use first DS */
>>>>> atomic_inc(&ds->ds_clp->cl_count);
>>>>> data->ds_clp = ds->ds_clp;
>>>>> + data->ds_idx = idx;
>>>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>>>> if (fh)
>>>>> data->args.fh = fh;
>>>>> @@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
>>>>> data->pgio_done_cb = filelayout_write_done_cb;
>>>>> atomic_inc(&ds->ds_clp->cl_count);
>>>>> data->ds_clp = ds->ds_clp;
>>>>> + data->ds_idx = idx;
>>>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>>>> if (fh)
>>>>> data->args.fh = fh;
>>>>> @@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>>>> for (i = 0; i < size; i++) {
>>>>> INIT_LIST_HEAD(&buckets[i].written);
>>>>> INIT_LIST_HEAD(&buckets[i].committing);
>>>>> + /* mark direct verifier as unset */
>>>>> + buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
>>>>> }
>>>>>
>>>>> spin_lock(cinfo->lock);
>>>>> @@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>>>> &buckets[i].written);
>>>>> list_splice(&cinfo->ds->buckets[i].committing,
>>>>> &buckets[i].committing);
>>>>> + buckets[i].direct_verf.committed =
>>>>> + cinfo->ds->buckets[i].direct_verf.committed;
>>>>> buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
>>>>> buckets[i].clseg = cinfo->ds->buckets[i].clseg;
>>>>> }
>>>>> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
>>>>> index 3e794c1..610af51 100644
>>>>> --- a/include/linux/nfs.h
>>>>> +++ b/include/linux/nfs.h
>>>>> @@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
>>>>> enum nfs3_stable_how {
>>>>> NFS_UNSTABLE = 0,
>>>>> NFS_DATA_SYNC = 1,
>>>>> - NFS_FILE_SYNC = 2
>>>>> + NFS_FILE_SYNC = 2,
>>>>> +
>>>>> + /* used by direct.c to mark verf as invalid */
>>>>> + NFS_INVALID_STABLE_HOW = -1
>>>>> };
>>>>> #endif /* _LINUX_NFS_H */
>>>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>>>> index 29828c7..bb9fb88 100644
>>>>> --- a/include/linux/nfs_xdr.h
>>>>> +++ b/include/linux/nfs_xdr.h
>>>>> @@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
>>>>> struct list_head committing;
>>>>> struct pnfs_layout_segment *wlseg;
>>>>> struct pnfs_layout_segment *clseg;
>>>>> + struct nfs_writeverf direct_verf;
>>>>> };
>>>>>
>>>>> struct pnfs_ds_commit_info {
>>>>> @@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
>>>>> __u64 mds_offset; /* Filelayout dense stripe */
>>>>> struct nfs_page_array pages;
>>>>> struct nfs_client *ds_clp; /* pNFS data server */
>>>>> + int ds_idx; /* ds index if ds_clp is set */
>>>>> };
>>>>>
>>>>> struct nfs_rw_header {


2014-04-25 15:38:39

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

On Apr 25, 2014, at 11:12 AM, Weston Andros Adamson <[email protected]> wrote:

> On Apr 25, 2014, at 10:15 AM, Anna Schumaker <[email protected]> wrote:
>
>> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>>> Support direct requests that span multiple pnfs data servers by
>>> comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
>>> Continue to use dreq->verf if the MDS is used / non-pNFS.
>>>
>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>> ---
>>> fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
>>> fs/nfs/nfs4filelayout.c | 6 +++
>>> include/linux/nfs.h | 5 ++-
>>> include/linux/nfs_xdr.h | 2 +
>>> 4 files changed, 105 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>> index 2c0e08f..9349933 100644
>>> --- a/fs/nfs/direct.c
>>> +++ b/fs/nfs/direct.c
>>> @@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
>>> return atomic_dec_and_test(&dreq->io_count);
>>> }
>>>
>>> +/*
>>> + * nfs_direct_select_verf - select the right verifier
>>> + * @dreq - direct request possibly spanning multiple servers
>>> + * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
>>> + * @ds_idx - index of data server in data server list, only valid if ds_clp set
>>> + *
>>> + * returns the correct verifier to use given the role of the server
>>> + */
>>> +static struct nfs_writeverf *
>>> +nfs_direct_select_verf(struct nfs_direct_req *dreq,
>>> + struct nfs_client *ds_clp,
>>> + int ds_idx)
>>> +{
>>> + struct nfs_writeverf *verfp = &dreq->verf;
>>> +
>>> + if (ds_clp) {
>>> + /* pNFS is in use, use the DS verf */
>>> + if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
>>
>> The struct pnfs_ds_commit_info is empty if CONFIG_NFS_V4_1=n, so this won't compile.
>>
>> Anna
>
> Good catch, I?ll iterate through the patchset and test with v4.1 disabled.
>
> Time to add some #ifdefs
>
> -dros

That was the only problem I found with 4.1 disabled.

Fixed and pushed.

-dros

>
>>
>>> + verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
>>> + else
>>> + WARN_ON_ONCE(1);
>>> + }
>>> + return verfp;
>>> +}
>>> +
>>> +
>>> +/*
>>> + * nfs_direct_set_hdr_verf - set the write/commit verifier
>>> + * @dreq - direct request possibly spanning multiple servers
>>> + * @hdr - pageio header to validate against previously seen verfs
>>> + *
>>> + * Set the server's (MDS or DS) "seen" verifier
>>> + */
>>> +static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
>>> + struct nfs_pgio_header *hdr)
>>> +{
>>> + struct nfs_writeverf *verfp;
>>> +
>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>> + hdr->data->ds_idx);
>>> + WARN_ON_ONCE(verfp->committed >= 0);
>>> + memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>> + WARN_ON_ONCE(verfp->committed < 0);
>>> +}
>>> +
>>> +/*
>>> + * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
>>> + * @dreq - direct request possibly spanning multiple servers
>>> + * @hdr - pageio header to validate against previously seen verf
>>> + *
>>> + * set the server's "seen" verf if not initialized.
>>> + * returns result of comparison between @hdr->verf and the "seen"
>>> + * verf of the server used by @hdr (DS or MDS)
>>> + */
>>> +static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
>>> + struct nfs_pgio_header *hdr)
>>> +{
>>> + struct nfs_writeverf *verfp;
>>> +
>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>> + hdr->data->ds_idx);
>>> + if (verfp->committed < 0) {
>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>> + return 0;
>>> + }
>>> + return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>> +}
>>> +
>>> +/*
>>> + * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
>>> + * @dreq - direct request possibly spanning multiple servers
>>> + * @data - commit data to validate against previously seen verf
>>> + *
>>> + * returns result of comparison between @data->verf and the verf of
>>> + * the server used by @data (DS or MDS)
>>> + */
>>> +static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>>> + struct nfs_commit_data *data)
>>> +{
>>> + struct nfs_writeverf *verfp;
>>> +
>>> + verfp = nfs_direct_select_verf(dreq, data->ds_clp,
>>> + data->ds_commit_index);
>>> + WARN_ON_ONCE(verfp->committed < 0);
>>> + return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
>>> +}
>>> +
>>> /**
>>> * nfs_direct_IO - NFS address space operation for direct I/O
>>> * @rw: direction (read or write)
>>> @@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
>>> kref_get(&dreq->kref);
>>> init_completion(&dreq->completion);
>>> INIT_LIST_HEAD(&dreq->mds_cinfo.list);
>>> + dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
>>> INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
>>> spin_lock_init(&dreq->lock);
>>>
>>> @@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
>>> dprintk("NFS: %5u commit failed with error %d.\n",
>>> data->task.tk_pid, status);
>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>> - } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
>>> + } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
>>> dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>> }
>>> @@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>>> bit = NFS_IOHDR_NEED_RESCHED;
>>> else if (dreq->flags == 0) {
>>> - memcpy(&dreq->verf, &hdr->verf,
>>> - sizeof(dreq->verf));
>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>> bit = NFS_IOHDR_NEED_COMMIT;
>>> dreq->flags = NFS_ODIRECT_DO_COMMIT;
>>> } else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
>>> - if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
>>> - dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>> + if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
>>> + dreq->flags =
>>> + NFS_ODIRECT_RESCHED_WRITES;
>>> bit = NFS_IOHDR_NEED_RESCHED;
>>> } else
>>> bit = NFS_IOHDR_NEED_COMMIT;
>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>>> index 7a665e0..0ebc521 100644
>>> --- a/fs/nfs/nfs4filelayout.c
>>> +++ b/fs/nfs/nfs4filelayout.c
>>> @@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
>>> /* No multipath support. Use first DS */
>>> atomic_inc(&ds->ds_clp->cl_count);
>>> data->ds_clp = ds->ds_clp;
>>> + data->ds_idx = idx;
>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>> if (fh)
>>> data->args.fh = fh;
>>> @@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
>>> data->pgio_done_cb = filelayout_write_done_cb;
>>> atomic_inc(&ds->ds_clp->cl_count);
>>> data->ds_clp = ds->ds_clp;
>>> + data->ds_idx = idx;
>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>> if (fh)
>>> data->args.fh = fh;
>>> @@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>> for (i = 0; i < size; i++) {
>>> INIT_LIST_HEAD(&buckets[i].written);
>>> INIT_LIST_HEAD(&buckets[i].committing);
>>> + /* mark direct verifier as unset */
>>> + buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
>>> }
>>>
>>> spin_lock(cinfo->lock);
>>> @@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>> &buckets[i].written);
>>> list_splice(&cinfo->ds->buckets[i].committing,
>>> &buckets[i].committing);
>>> + buckets[i].direct_verf.committed =
>>> + cinfo->ds->buckets[i].direct_verf.committed;
>>> buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
>>> buckets[i].clseg = cinfo->ds->buckets[i].clseg;
>>> }
>>> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
>>> index 3e794c1..610af51 100644
>>> --- a/include/linux/nfs.h
>>> +++ b/include/linux/nfs.h
>>> @@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
>>> enum nfs3_stable_how {
>>> NFS_UNSTABLE = 0,
>>> NFS_DATA_SYNC = 1,
>>> - NFS_FILE_SYNC = 2
>>> + NFS_FILE_SYNC = 2,
>>> +
>>> + /* used by direct.c to mark verf as invalid */
>>> + NFS_INVALID_STABLE_HOW = -1
>>> };
>>> #endif /* _LINUX_NFS_H */
>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>> index 29828c7..bb9fb88 100644
>>> --- a/include/linux/nfs_xdr.h
>>> +++ b/include/linux/nfs_xdr.h
>>> @@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
>>> struct list_head committing;
>>> struct pnfs_layout_segment *wlseg;
>>> struct pnfs_layout_segment *clseg;
>>> + struct nfs_writeverf direct_verf;
>>> };
>>>
>>> struct pnfs_ds_commit_info {
>>> @@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
>>> __u64 mds_offset; /* Filelayout dense stripe */
>>> struct nfs_page_array pages;
>>> struct nfs_client *ds_clp; /* pNFS data server */
>>> + int ds_idx; /* ds index if ds_clp is set */
>>> };
>>>
>>> struct nfs_rw_header {


2014-04-24 18:15:49

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 08/18 v2] nfs: page group syncing in write path

Operations that modify state for a whole page must be syncronized across
all requests within a page group. In the write path, this is calling
end_page_writeback and removing the head request from an inode.
Both of these operations should not be called until all requests
in a page group have reached the point where they would call them.

This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/write.c | 32 ++++++++++++++++++++------------
include/linux/nfs_page.h | 2 ++
2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a65755a..a3e3065 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -201,12 +201,15 @@ static void nfs_set_page_writeback(struct page *page)
}
}

-static void nfs_end_page_writeback(struct page *page)
+static void nfs_end_page_writeback(struct nfs_page *req)
{
- struct inode *inode = page_file_mapping(page)->host;
+ struct inode *inode = page_file_mapping(req->wb_page)->host;
struct nfs_server *nfss = NFS_SERVER(inode);

- end_page_writeback(page);
+ if (!nfs_page_group_sync_on_bit(req, PG_WB_END))
+ return;
+
+ end_page_writeback(req->wb_page);
if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC);
}
@@ -397,15 +400,20 @@ static void nfs_inode_remove_request(struct nfs_page *req)
{
struct inode *inode = req->wb_context->dentry->d_inode;
struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs_page *head;

- spin_lock(&inode->i_lock);
- if (likely(!PageSwapCache(req->wb_page))) {
- set_page_private(req->wb_page, 0);
- ClearPagePrivate(req->wb_page);
- clear_bit(PG_MAPPED, &req->wb_flags);
+ if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
+ head = req->wb_head;
+
+ spin_lock(&inode->i_lock);
+ if (likely(!PageSwapCache(head->wb_page))) {
+ set_page_private(head->wb_page, 0);
+ ClearPagePrivate(head->wb_page);
+ clear_bit(PG_MAPPED, &head->wb_flags);
+ }
+ nfsi->npages--;
+ spin_unlock(&inode->i_lock);
}
- nfsi->npages--;
- spin_unlock(&inode->i_lock);
nfs_release_request(req);
}

@@ -599,7 +607,7 @@ remove_req:
nfs_inode_remove_request(req);
next:
nfs_unlock_request(req);
- nfs_end_page_writeback(req->wb_page);
+ nfs_end_page_writeback(req);
do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags);
nfs_release_request(req);
}
@@ -964,7 +972,7 @@ static void nfs_redirty_request(struct nfs_page *req)
{
nfs_mark_request_dirty(req);
nfs_unlock_request(req);
- nfs_end_page_writeback(req->wb_page);
+ nfs_end_page_writeback(req);
nfs_release_request(req);
}

diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 002f999..6500b85 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -31,6 +31,8 @@ enum {
PG_TEARDOWN, /* page group sync for destroy */
PG_UNLOCKPAGE, /* page group sync bit in read path */
PG_UPTODATE, /* page group sync bit in read path */
+ PG_WB_END, /* page group sync bit in write path */
+ PG_REMOVE, /* page group sync bit in write path */
};

struct nfs_inode;
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:16:03

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 17/18 v2] pnfs: filelayout: support non page aligned layouts

Use the new pg_test interface to adjust requests to fit in the current
stripe / segment.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 50 ++++++++++++++++++-------------------------------
1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 0ebc521..9419061 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -639,7 +639,6 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
struct nfs4_deviceid_node *d;
struct nfs4_file_layout_dsaddr *dsaddr;
int status = -EINVAL;
- struct nfs_server *nfss = NFS_SERVER(lo->plh_inode);

dprintk("--> %s\n", __func__);

@@ -657,7 +656,7 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
goto out;
}

- if (!fl->stripe_unit || fl->stripe_unit % PAGE_SIZE) {
+ if (!fl->stripe_unit) {
dprintk("%s Invalid stripe unit (%u)\n",
__func__, fl->stripe_unit);
goto out;
@@ -694,12 +693,6 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
goto out_put;
}

- if (fl->stripe_unit % nfss->rsize || fl->stripe_unit % nfss->wsize) {
- dprintk("%s Stripe unit (%u) not aligned with rsize %u "
- "wsize %u\n", __func__, fl->stripe_unit, nfss->rsize,
- nfss->wsize);
- }
-
status = 0;
out:
dprintk("--> %s returns %d\n", __func__, status);
@@ -936,44 +929,40 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
{
unsigned int size;
u64 p_stripe, r_stripe;
- u32 stripe_unit;
+ u32 stripe_offset;
+ u64 segment_offset = pgio->pg_lseg->pls_range.offset;
+ u32 stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;

/* calls nfs_generic_pg_test */
size = pnfs_generic_pg_test(pgio, prev, req);
if (!size)
return 0;

+ /* see if req and prev are in the same stripe */
if (prev) {
- p_stripe = (u64)req_offset(prev);
- r_stripe = (u64)req_offset(req);
- stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
-
+ p_stripe = (u64)req_offset(prev) - segment_offset;
+ r_stripe = (u64)req_offset(req) - segment_offset;
do_div(p_stripe, stripe_unit);
do_div(r_stripe, stripe_unit);

if (p_stripe != r_stripe)
return 0;
}
- return min(size, req->wb_bytes);
+
+ /* calculate remaining bytes in the current stripe */
+ stripe_offset = ((u64)req_offset(req) - segment_offset) % stripe_unit;
+ WARN_ON_ONCE(stripe_offset > stripe_unit);
+ if (stripe_offset >= stripe_unit)
+ return 0;
+ return min(stripe_unit - (unsigned int)stripe_offset, size);
}

static void
filelayout_pg_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs_page *req)
{
- WARN_ON_ONCE(pgio->pg_lseg != NULL);
-
- if (req->wb_offset != req->wb_pgbase) {
- /*
- * Handling unaligned pages is difficult, because have to
- * somehow split a req in two in certain cases in the
- * pg.test code. Avoid this by just not using pnfs
- * in this case.
- */
- nfs_pageio_reset_read_mds(pgio);
- return;
- }
- pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
req->wb_context,
0,
NFS4_MAX_UINT64,
@@ -991,11 +980,8 @@ filelayout_pg_init_write(struct nfs_pageio_descriptor *pgio,
struct nfs_commit_info cinfo;
int status;

- WARN_ON_ONCE(pgio->pg_lseg != NULL);
-
- if (req->wb_offset != req->wb_pgbase)
- goto out_mds;
- pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
+ if (!pgio->pg_lseg)
+ pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
req->wb_context,
0,
NFS4_MAX_UINT64,
--
1.8.5.2 (Apple Git-48)


2014-04-25 17:22:34

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

On 04/25/2014 12:04 PM, Weston Andros Adamson wrote:
> On Apr 25, 2014, at 11:53 AM, Anna Schumaker <[email protected]> wrote:
>
>> On 04/25/2014 11:38 AM, Weston Andros Adamson wrote:
>>> On Apr 25, 2014, at 11:12 AM, Weston Andros Adamson <[email protected]> wrote:
>>>
>>>> On Apr 25, 2014, at 10:15 AM, Anna Schumaker <[email protected]> wrote:
>>>>
>>>>> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>>>>>> Support direct requests that span multiple pnfs data servers by
>>>>>> comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
>>>>>> Continue to use dreq->verf if the MDS is used / non-pNFS.
>>>>>>
>>>>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>>>>> ---
>>>>>> fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>> fs/nfs/nfs4filelayout.c | 6 +++
>>>>>> include/linux/nfs.h | 5 ++-
>>>>>> include/linux/nfs_xdr.h | 2 +
>>>>>> 4 files changed, 105 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>>>>> index 2c0e08f..9349933 100644
>>>>>> --- a/fs/nfs/direct.c
>>>>>> +++ b/fs/nfs/direct.c
>>>>>> @@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
>>>>>> return atomic_dec_and_test(&dreq->io_count);
>>>>>> }
>>>>>>
>>>>>> +/*
>>>>>> + * nfs_direct_select_verf - select the right verifier
>>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>>> + * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
>>>>>> + * @ds_idx - index of data server in data server list, only valid if ds_clp set
>>>>>> + *
>>>>>> + * returns the correct verifier to use given the role of the server
>>>>>> + */
>>>>>> +static struct nfs_writeverf *
>>>>>> +nfs_direct_select_verf(struct nfs_direct_req *dreq,
>>>>>> + struct nfs_client *ds_clp,
>>>>>> + int ds_idx)
>>>>>> +{
>>>>>> + struct nfs_writeverf *verfp = &dreq->verf;
>>>>>> +
>>>>>> + if (ds_clp) {
>>>>>> + /* pNFS is in use, use the DS verf */
>>>>>> + if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
>>>>> The struct pnfs_ds_commit_info is empty if CONFIG_NFS_V4_1=n, so this won't compile.
>>>>>
>>>>> Anna
>>>> Good catch, I�ll iterate through the patchset and test with v4.1 disabled.
>>>>
>>>> Time to add some #ifdefs
>>>>
>>>> -dros
>>> That was the only problem I found with 4.1 disabled.
>>>
>>> Fixed and pushed.
>> Thanks! I'm now getting a "defined but not used" error if just v2 is enabled:
>>
>> NFS_V4_2=n PNFS_OBJLAYOUT=n PNFS_BLOCK=n PNFS_FILE_LAYOUT=n NFS_V4_1=n NFS_V4=n NFS_V3=n NFS_V2=y
>>
>> fs/nfs/direct.c:189:12: error: ‘nfs_direct_cmp_commit_data_verf’ defined but not used [-Werror=unused-function]
>> static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>>
>> Anna
> Oh, i didn’t find that because the direct commit path uses:
>
> IS_ENABLED(CONFIG_NFS_V3) || IS_ENABLED(CONFIG_NFS_V4)
>
> fixed and pushed.

Thanks! I haven't hit any other problems.

Anna

>
> -dros
>
>>> -dros
>>>
>>>>>> + verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
>>>>>> + else
>>>>>> + WARN_ON_ONCE(1);
>>>>>> + }
>>>>>> + return verfp;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * nfs_direct_set_hdr_verf - set the write/commit verifier
>>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>>> + * @hdr - pageio header to validate against previously seen verfs
>>>>>> + *
>>>>>> + * Set the server's (MDS or DS) "seen" verifier
>>>>>> + */
>>>>>> +static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
>>>>>> + struct nfs_pgio_header *hdr)
>>>>>> +{
>>>>>> + struct nfs_writeverf *verfp;
>>>>>> +
>>>>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>>>>> + hdr->data->ds_idx);
>>>>>> + WARN_ON_ONCE(verfp->committed >= 0);
>>>>>> + memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>>>>> + WARN_ON_ONCE(verfp->committed < 0);
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
>>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>>> + * @hdr - pageio header to validate against previously seen verf
>>>>>> + *
>>>>>> + * set the server's "seen" verf if not initialized.
>>>>>> + * returns result of comparison between @hdr->verf and the "seen"
>>>>>> + * verf of the server used by @hdr (DS or MDS)
>>>>>> + */
>>>>>> +static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
>>>>>> + struct nfs_pgio_header *hdr)
>>>>>> +{
>>>>>> + struct nfs_writeverf *verfp;
>>>>>> +
>>>>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>>>>> + hdr->data->ds_idx);
>>>>>> + if (verfp->committed < 0) {
>>>>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>>>>> + return 0;
>>>>>> + }
>>>>>> + return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
>>>>>> + * @dreq - direct request possibly spanning multiple servers
>>>>>> + * @data - commit data to validate against previously seen verf
>>>>>> + *
>>>>>> + * returns result of comparison between @data->verf and the verf of
>>>>>> + * the server used by @data (DS or MDS)
>>>>>> + */
>>>>>> +static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>>>>>> + struct nfs_commit_data *data)
>>>>>> +{
>>>>>> + struct nfs_writeverf *verfp;
>>>>>> +
>>>>>> + verfp = nfs_direct_select_verf(dreq, data->ds_clp,
>>>>>> + data->ds_commit_index);
>>>>>> + WARN_ON_ONCE(verfp->committed < 0);
>>>>>> + return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
>>>>>> +}
>>>>>> +
>>>>>> /**
>>>>>> * nfs_direct_IO - NFS address space operation for direct I/O
>>>>>> * @rw: direction (read or write)
>>>>>> @@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
>>>>>> kref_get(&dreq->kref);
>>>>>> init_completion(&dreq->completion);
>>>>>> INIT_LIST_HEAD(&dreq->mds_cinfo.list);
>>>>>> + dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
>>>>>> INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
>>>>>> spin_lock_init(&dreq->lock);
>>>>>>
>>>>>> @@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
>>>>>> dprintk("NFS: %5u commit failed with error %d.\n",
>>>>>> data->task.tk_pid, status);
>>>>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>>>> - } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
>>>>>> + } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
>>>>>> dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
>>>>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>>>> }
>>>>>> @@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>>>>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>>>>>> bit = NFS_IOHDR_NEED_RESCHED;
>>>>>> else if (dreq->flags == 0) {
>>>>>> - memcpy(&dreq->verf, &hdr->verf,
>>>>>> - sizeof(dreq->verf));
>>>>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>>>>> bit = NFS_IOHDR_NEED_COMMIT;
>>>>>> dreq->flags = NFS_ODIRECT_DO_COMMIT;
>>>>>> } else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
>>>>>> - if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
>>>>>> - dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>>>> + if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
>>>>>> + dreq->flags =
>>>>>> + NFS_ODIRECT_RESCHED_WRITES;
>>>>>> bit = NFS_IOHDR_NEED_RESCHED;
>>>>>> } else
>>>>>> bit = NFS_IOHDR_NEED_COMMIT;
>>>>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>>>>>> index 7a665e0..0ebc521 100644
>>>>>> --- a/fs/nfs/nfs4filelayout.c
>>>>>> +++ b/fs/nfs/nfs4filelayout.c
>>>>>> @@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
>>>>>> /* No multipath support. Use first DS */
>>>>>> atomic_inc(&ds->ds_clp->cl_count);
>>>>>> data->ds_clp = ds->ds_clp;
>>>>>> + data->ds_idx = idx;
>>>>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>>>>> if (fh)
>>>>>> data->args.fh = fh;
>>>>>> @@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
>>>>>> data->pgio_done_cb = filelayout_write_done_cb;
>>>>>> atomic_inc(&ds->ds_clp->cl_count);
>>>>>> data->ds_clp = ds->ds_clp;
>>>>>> + data->ds_idx = idx;
>>>>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>>>>> if (fh)
>>>>>> data->args.fh = fh;
>>>>>> @@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>>>>> for (i = 0; i < size; i++) {
>>>>>> INIT_LIST_HEAD(&buckets[i].written);
>>>>>> INIT_LIST_HEAD(&buckets[i].committing);
>>>>>> + /* mark direct verifier as unset */
>>>>>> + buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
>>>>>> }
>>>>>>
>>>>>> spin_lock(cinfo->lock);
>>>>>> @@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>>>>> &buckets[i].written);
>>>>>> list_splice(&cinfo->ds->buckets[i].committing,
>>>>>> &buckets[i].committing);
>>>>>> + buckets[i].direct_verf.committed =
>>>>>> + cinfo->ds->buckets[i].direct_verf.committed;
>>>>>> buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
>>>>>> buckets[i].clseg = cinfo->ds->buckets[i].clseg;
>>>>>> }
>>>>>> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
>>>>>> index 3e794c1..610af51 100644
>>>>>> --- a/include/linux/nfs.h
>>>>>> +++ b/include/linux/nfs.h
>>>>>> @@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
>>>>>> enum nfs3_stable_how {
>>>>>> NFS_UNSTABLE = 0,
>>>>>> NFS_DATA_SYNC = 1,
>>>>>> - NFS_FILE_SYNC = 2
>>>>>> + NFS_FILE_SYNC = 2,
>>>>>> +
>>>>>> + /* used by direct.c to mark verf as invalid */
>>>>>> + NFS_INVALID_STABLE_HOW = -1
>>>>>> };
>>>>>> #endif /* _LINUX_NFS_H */
>>>>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>>>>> index 29828c7..bb9fb88 100644
>>>>>> --- a/include/linux/nfs_xdr.h
>>>>>> +++ b/include/linux/nfs_xdr.h
>>>>>> @@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
>>>>>> struct list_head committing;
>>>>>> struct pnfs_layout_segment *wlseg;
>>>>>> struct pnfs_layout_segment *clseg;
>>>>>> + struct nfs_writeverf direct_verf;
>>>>>> };
>>>>>>
>>>>>> struct pnfs_ds_commit_info {
>>>>>> @@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
>>>>>> __u64 mds_offset; /* Filelayout dense stripe */
>>>>>> struct nfs_page_array pages;
>>>>>> struct nfs_client *ds_clp; /* pNFS data server */
>>>>>> + int ds_idx; /* ds index if ds_clp is set */
>>>>>> };
>>>>>>
>>>>>> struct nfs_rw_header {


2014-04-24 18:16:02

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 16/18 v2] pnfs: allow non page aligned pnfs layout segments

Remove alignment checks that would revert to MDS and change pg_test
to return the max ammount left in the segment (or other pg_test call)
up to size of passed request, or 0 if no space is left.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/pnfs.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 6ef108b..ce46a41 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1388,11 +1388,6 @@ pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *r

WARN_ON_ONCE(pgio->pg_lseg != NULL);

- if (req->wb_offset != req->wb_pgbase) {
- nfs_pageio_reset_read_mds(pgio);
- return;
- }
-
if (pgio->pg_dreq == NULL)
rd_size = i_size_read(pgio->pg_inode) - req_offset(req);
else
@@ -1417,11 +1412,6 @@ pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
{
WARN_ON_ONCE(pgio->pg_lseg != NULL);

- if (req->wb_offset != req->wb_pgbase) {
- nfs_pageio_reset_write_mds(pgio);
- return;
- }
-
pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
req->wb_context,
req_offset(req),
@@ -1443,9 +1433,9 @@ pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
unsigned int size;
+ u64 end;

size = nfs_generic_pg_test(pgio, prev, req);
-
if (!size)
return 0;

@@ -1463,11 +1453,16 @@ pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
* first byte that lies outside the pnfs_layout_range. FIXME?
*
*/
- if (req_offset(req) >= end_offset(pgio->pg_lseg->pls_range.offset,
- pgio->pg_lseg->pls_range.length))
- return 0;
+ if (pgio->pg_lseg) {
+ end = end_offset(pgio->pg_lseg->pls_range.offset,
+ pgio->pg_lseg->pls_range.length);
+ WARN_ON_ONCE(req_offset(req) > end);
+ if (req_offset(req) >= end)
+ return 0;
+ size = min((unsigned int)(end - req_offset(req)), size);
+ }

- return min(size, req->wb_bytes);
+ return size;
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_test);

--
1.8.5.2 (Apple Git-48)


2014-04-24 23:06:30

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 06/18 v2] nfs: add support for multiple nfs reqs per page

Ah my bad, I?ll fix that up.

-dros

On Apr 24, 2014, at 5:03 PM, Anna Schumaker <[email protected]> wrote:

> On 04/24/2014 04:49 PM, Anna Schumaker wrote:
>> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>>> Add "page groups" - a circular list of nfs requests (struct nfs_page)
>>> that all reference the same page. This gives nfs read and write paths
>>> the ability to account for sub-page regions independently. This
>>> somewhat follows the design of struct buffer_head's sub-page
>>> accounting.
>>>
>>> Only "head" requests are ever added/removed from the inode list in
>>> the buffered write path. "head" and "sub" requests are treated the
>>> same through the read path and the rest of the write/commit path.
>>> Requests are given an extra reference across the life of the list.
>>>
>>> Page groups are never rejoined after being split. If the read/write
>>> request fails and the client falls back to another path (ie revert
>>> to MDS in PNFS case), the already split requests are pushed through
>>> the recoalescing code again, which may split them further and then
>>> coalesce them into properly sized requests on the wire. Fragmentation
>>> shouldn't be a problem with the current design, because we flush all
>>> requests in page group when a non-contiguous request is added, so
>>> the only time resplitting should occur is on a resend of a read or
>>> write.
>>>
>>> This patch lays the groundwork for sub-page splitting, but does not
>>> actually do any splitting. For now all page groups have one request
>>> as pg_test functions don't yet split pages. There are several related
>>> patches that are needed support multiple requests per page group.
>>>
>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>> ---
>>> fs/nfs/direct.c | 7 +-
>>> fs/nfs/pagelist.c | 224 ++++++++++++++++++++++++++++++++++++++++++++---
>>> fs/nfs/read.c | 4 +-
>>> fs/nfs/write.c | 13 ++-
>>> include/linux/nfs_page.h | 13 ++-
>>> 5 files changed, 240 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>> index 1dd8c62..2c0e08f 100644
>>> --- a/fs/nfs/direct.c
>>> +++ b/fs/nfs/direct.c
>>> @@ -380,7 +380,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
>>> struct nfs_page *req;
>>> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>>> /* XXX do we need to do the eof zeroing found in async_filler? */
>>> - req = nfs_create_request(dreq->ctx, pagevec[i],
>>> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
>>> pgbase, req_len);
>>> if (IS_ERR(req)) {
>>> result = PTR_ERR(req);
>>> @@ -749,7 +749,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
>>> struct nfs_page *req;
>>> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>>>
>>> - req = nfs_create_request(dreq->ctx, pagevec[i],
>>> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
>>> pgbase, req_len);
>>> if (IS_ERR(req)) {
>>> result = PTR_ERR(req);
>>> @@ -827,6 +827,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>> spin_unlock(&dreq->lock);
>>>
>>> while (!list_empty(&hdr->pages)) {
>>> + bool do_destroy = true;
>>> +
>>> req = nfs_list_entry(hdr->pages.next);
>>> nfs_list_remove_request(req);
>>> switch (bit) {
>>> @@ -834,6 +836,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>> case NFS_IOHDR_NEED_COMMIT:
>>> kref_get(&req->wb_kref);
>>> nfs_mark_request_commit(req, hdr->lseg, &cinfo);
>>> + do_destroy = false;
>>> }
>>> nfs_unlock_and_release_request(req);
>>> }
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index b120728..d57d675 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -26,6 +26,8 @@
>>>
>>> static struct kmem_cache *nfs_page_cachep;
>>>
>>> +static void nfs_free_request(struct nfs_page *);
>>> +
>>> bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
>>> {
>>> p->npages = pagecount;
>>> @@ -133,10 +135,151 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
>>> return __nfs_iocounter_wait(c);
>>> }
>>>
>>> +/*
>>> + * nfs_page_group_lock - lock the head of the page group
>>> + * @req - request in group that is to be locked
>>> + *
>>> + * this lock must be held if modifying the page group list
>>> + */
>>> +void
>>> +nfs_page_group_lock(struct nfs_page *req)
>>> +{
>>> + struct nfs_page *head = req->wb_head;
>>> + int err = -EAGAIN;
>>> +
>>> + WARN_ON_ONCE(head != head->wb_head);
>>> +
>>> + while (err)
>>> + err = wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
>>> + nfs_wait_bit_killable, TASK_KILLABLE);
>>> +}
>>> +
>>> +/*
>>> + * nfs_page_group_unlock - unlock the head of the page group
>>> + * @req - request in group that is to be unlocked
>>> + */
>>> +void
>>> +nfs_page_group_unlock(struct nfs_page *req)
>>> +{
>>> + struct nfs_page *head = req->wb_head;
>>> +
>>> + WARN_ON_ONCE(head != head->wb_head);
>>> +
>>> + smp_mb__before_clear_bit();
>>> + clear_bit(PG_HEADLOCK, &head->wb_flags);
>>> + smp_mb__after_clear_bit();
>>> + wake_up_bit(&head->wb_flags, PG_HEADLOCK);
>>> +}
>>> +
>>> +/*
>>> + * nfs_page_group_sync_on_bit_locked
>>> + *
>>> + * must be called with page group lock held
>>> + */
>>> +static bool
>>> +nfs_page_group_sync_on_bit_locked(struct nfs_page *req, unsigned int bit)
>>> +{
>>> + struct nfs_page *head = req->wb_head;
>>> + struct nfs_page *tmp;
>>> +
>>> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_flags));
>>> + WARN_ON_ONCE(test_and_set_bit(bit, &req->wb_flags));
>>> +
>>> + tmp = req->wb_this_page;
>>> + while (tmp != req) {
>>> + if (!test_bit(bit, &tmp->wb_flags))
>>> + return false;
>>> + tmp = tmp->wb_this_page;
>>> + }
>>> +
>>> + /* true! reset all bits */
>>> + tmp = req;
>>> + do {
>>> + clear_bit(bit, &tmp->wb_flags);
>>> + tmp = tmp->wb_this_page;
>>> + } while (tmp != req);
>>> +
>>> + return true;
>>> +}
>>> +
>>> +/*
>>> + * nfs_page_group_sync_on_bit - set bit on current request, but only
>>> + * return true if the bit is set for all requests in page group
>>> + * @req - request in page group
>>> + * @bit - PG_* bit that is used to sync page group
>>> + */
>>> +bool nfs_page_group_sync_on_bit(struct nfs_page *req, unsigned int bit)
>>> +{
>>> + bool ret;
>>> +
>>> + nfs_page_group_lock(req);
>>> + ret = nfs_page_group_sync_on_bit_locked(req, bit);
>>> + nfs_page_group_unlock(req);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +/*
>>> + * nfs_page_group_init - Initialize the page group linkage for @req
>>> + * @req - a new nfs request
>>> + * @prev - the previous request in page group, or NULL if @req is the first
>>> + * or only request in the group (the head).
>>> + */
>>> +static inline void
>>> +nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
>>> +{
>>> + WARN_ON_ONCE(prev == req);
>>> +
>>> + if (!prev) {
>>> + req->wb_head = req;
>>> + req->wb_this_page = req;
>>> + } else {
>>> + WARN_ON_ONCE(prev->wb_this_page != prev->wb_head);
>>> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags));
>>> + req->wb_head = prev->wb_head;
>>> + req->wb_this_page = prev->wb_this_page;
>>> + prev->wb_this_page = req;
>>> +
>>> + /* grab extra ref if head request has extra ref from
>>> + * the write/commit path to handle handoff between write
>>> + * and commit lists */
>>> + if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags))
>>> + kref_get(&req->wb_kref);
>>> + }
>>> +}
>>> +
>>> +/*
>>> + * nfs_page_group_destroy - sync the destruction of page groups
>>> + * @req - request that no longer needs the page group
>>> + *
>>> + * releases the page group reference from each member once all
>>> + * members have called this function.
>>> + */
>>> +static void
>>> +nfs_page_group_destroy(struct kref *kref)
>>> +{
>>> + struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
>>> + struct nfs_page *tmp, *next;
>>> +
>>> + if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
>>> + return;
>>> +
>>> + tmp = req;
>>> + do {
>>> + next = tmp->wb_this_page;
>>> + /* unlink and free */
>>> + tmp->wb_this_page = tmp;
>>> + tmp->wb_head = tmp;
>>> + nfs_free_request(tmp);
>>> + tmp = next;
>>> + } while (tmp != req);
>>> +}
>>> +
>>> /**
>>> * nfs_create_request - Create an NFS read/write request.
>>> * @ctx: open context to use
>>> * @page: page to write
>>> + * @last: last nfs request created for this page group or NULL if head
>>> * @offset: starting offset within the page for the write
>>> * @count: number of bytes to read/write
>>> *
>>> @@ -146,7 +289,8 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
>>> */
>>> struct nfs_page *
>>> nfs_create_request(struct nfs_open_context *ctx, struct page *page,
>>> - unsigned int offset, unsigned int count)
>>> + struct nfs_page *last, unsigned int offset,
>>> + unsigned int count)
>>> {
>>> struct nfs_page *req;
>>> struct nfs_lock_context *l_ctx;
>>> @@ -178,6 +322,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
>>> req->wb_bytes = count;
>>> req->wb_context = get_nfs_open_context(ctx);
>>> kref_init(&req->wb_kref);
>>> + nfs_page_group_init(req, last);
>>> return req;
>>> }
>>>
>>> @@ -235,16 +380,22 @@ static void nfs_clear_request(struct nfs_page *req)
>>> }
>>> }
>>>
>>> -
>>> /**
>>> * nfs_release_request - Release the count on an NFS read/write request
>>> * @req: request to release
>>> *
>>> * Note: Should never be called with the spinlock held!
>>> */
>>> -static void nfs_free_request(struct kref *kref)
>>> +static void nfs_free_request(struct nfs_page *req)
>>> {
>>> - struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
>>> + WARN_ON_ONCE(req->wb_this_page != req);
>>> +
>>> + /* extra debug: make sure no sync bits are still set */
>>> + WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
>>> + WARN_ON_ONCE(test_bit(PG_UNLOCKPAGE, &req->wb_flags));
>>> + WARN_ON_ONCE(test_bit(PG_UPTODATE, &req->wb_flags));
>>> + WARN_ON_ONCE(test_bit(PG_WB_END, &req->wb_flags));
>> My compiler doesn't know about PG_UNLOCKPAGE, PG_UPTODATE or PG_WB_END. Did you forget to add something?
>>
>> Anna
>
> Ah, patches 7 and 8 introduce these flags. Can you either reorder the patches or add the flags in this one?
>
> Anna
>
>>> + WARN_ON_ONCE(test_bit(PG_REMOVE, &req->wb_flags));
>>>
>>> /* Release struct file and open context */
>>> nfs_clear_request(req);
>>> @@ -253,7 +404,7 @@ static void nfs_free_request(struct kref *kref)
>>>
>>> void nfs_release_request(struct nfs_page *req)
>>> {
>>> - kref_put(&req->wb_kref, nfs_free_request);
>>> + kref_put(&req->wb_kref, nfs_page_group_destroy);
>>> }
>>>
>>> static int nfs_wait_bit_uninterruptible(void *word)
>>> @@ -441,21 +592,66 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>> * @desc: destination io descriptor
>>> * @req: request
>>> *
>>> + * This may split a request into subrequests which are all part of the
>>> + * same page group.
>>> + *
>>> * Returns true if the request 'req' was successfully coalesced into the
>>> * existing list of pages 'desc'.
>>> */
>>> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>> struct nfs_page *req)
>>> {
>>> - while (!nfs_pageio_do_add_request(desc, req)) {
>>> - desc->pg_moreio = 1;
>>> - nfs_pageio_doio(desc);
>>> - if (desc->pg_error < 0)
>>> - return 0;
>>> - desc->pg_moreio = 0;
>>> - if (desc->pg_recoalesce)
>>> - return 0;
>>> - }
>>> + struct nfs_page *subreq;
>>> + unsigned int bytes_left = 0;
>>> + unsigned int offset, pgbase;
>>> +
>>> + nfs_page_group_lock(req);
>>> +
>>> + subreq = req;
>>> + bytes_left = subreq->wb_bytes;
>>> + offset = subreq->wb_offset;
>>> + pgbase = subreq->wb_pgbase;
>>> +
>>> + do {
>>> + if (!nfs_pageio_do_add_request(desc, subreq)) {
>>> + /* make sure pg_test call(s) did nothing */
>>> + WARN_ON_ONCE(subreq->wb_bytes != bytes_left);
>>> + WARN_ON_ONCE(subreq->wb_offset != offset);
>>> + WARN_ON_ONCE(subreq->wb_pgbase != pgbase);
>>> +
>>> + nfs_page_group_unlock(req);
>>> + desc->pg_moreio = 1;
>>> + nfs_pageio_doio(desc);
>>> + if (desc->pg_error < 0)
>>> + return 0;
>>> + desc->pg_moreio = 0;
>>> + if (desc->pg_recoalesce)
>>> + return 0;
>>> + /* retry add_request for this subreq */
>>> + nfs_page_group_lock(req);
>>> + continue;
>>> + }
>>> +
>>> + /* check for buggy pg_test call(s) */
>>> + WARN_ON_ONCE(subreq->wb_bytes + subreq->wb_pgbase > PAGE_SIZE);
>>> + WARN_ON_ONCE(subreq->wb_bytes > bytes_left);
>>> + WARN_ON_ONCE(subreq->wb_bytes == 0);
>>> +
>>> + bytes_left -= subreq->wb_bytes;
>>> + offset += subreq->wb_bytes;
>>> + pgbase += subreq->wb_bytes;
>>> +
>>> + if (bytes_left) {
>>> + subreq = nfs_create_request(req->wb_context,
>>> + req->wb_page,
>>> + subreq, pgbase, bytes_left);
>>> + nfs_lock_request(subreq);
>>> + subreq->wb_offset = offset;
>>> + subreq->wb_index = req->wb_index;
>>> + }
>>> + } while (bytes_left > 0);
>>> +
>>> + nfs_page_group_unlock(req);
>>> return 1;
>>> }
>>>
>>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>>> index a48b909..f814e8a 100644
>>> --- a/fs/nfs/read.c
>>> +++ b/fs/nfs/read.c
>>> @@ -85,7 +85,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>>> len = nfs_page_length(page);
>>> if (len == 0)
>>> return nfs_return_empty_page(page);
>>> - new = nfs_create_request(ctx, page, 0, len);
>>> + new = nfs_create_request(ctx, page, NULL, 0, len);
>>> if (IS_ERR(new)) {
>>> unlock_page(page);
>>> return PTR_ERR(new);
>>> @@ -311,7 +311,7 @@ readpage_async_filler(void *data, struct page *page)
>>> if (len == 0)
>>> return nfs_return_empty_page(page);
>>>
>>> - new = nfs_create_request(desc->ctx, page, 0, len);
>>> + new = nfs_create_request(desc->ctx, page, NULL, 0, len);
>>> if (IS_ERR(new))
>>> goto out_error;
>>>
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index 5f1c965..a65755a 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -367,6 +367,8 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
>>> {
>>> struct nfs_inode *nfsi = NFS_I(inode);
>>>
>>> + WARN_ON_ONCE(req->wb_this_page != req);
>>> +
>>> /* Lock the request! */
>>> nfs_lock_request(req);
>>>
>>> @@ -383,6 +385,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
>>> set_page_private(req->wb_page, (unsigned long)req);
>>> }
>>> nfsi->npages++;
>>> + set_bit(PG_INODE_REF, &req->wb_flags);
>>> kref_get(&req->wb_kref);
>>> spin_unlock(&inode->i_lock);
>>> }
>>> @@ -567,6 +570,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>>> {
>>> struct nfs_commit_info cinfo;
>>> unsigned long bytes = 0;
>>> + bool do_destroy;
>>>
>>> if (test_bit(NFS_IOHDR_REDO, &hdr->flags))
>>> goto out;
>>> @@ -596,6 +600,7 @@ remove_req:
>>> next:
>>> nfs_unlock_request(req);
>>> nfs_end_page_writeback(req->wb_page);
>>> + do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags);
>>> nfs_release_request(req);
>>> }
>>> out:
>>> @@ -700,6 +705,10 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
>>> if (req == NULL)
>>> goto out_unlock;
>>>
>>> + /* should be handled by nfs_flush_incompatible */
>>> + WARN_ON_ONCE(req->wb_head != req);
>>> + WARN_ON_ONCE(req->wb_this_page != req);
>>> +
>>> rqend = req->wb_offset + req->wb_bytes;
>>> /*
>>> * Tell the caller to flush out the request if
>>> @@ -761,7 +770,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
>>> req = nfs_try_to_update_request(inode, page, offset, bytes);
>>> if (req != NULL)
>>> goto out;
>>> - req = nfs_create_request(ctx, page, offset, bytes);
>>> + req = nfs_create_request(ctx, page, NULL, offset, bytes);
>>> if (IS_ERR(req))
>>> goto out;
>>> nfs_inode_add_request(inode, req);
>>> @@ -805,6 +814,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
>>> return 0;
>>> l_ctx = req->wb_lock_context;
>>> do_flush = req->wb_page != page || req->wb_context != ctx;
>>> + /* for now, flush if more than 1 request in page_group */
>>> + do_flush |= req->wb_this_page != req;
>>> if (l_ctx && ctx->dentry->d_inode->i_flock != NULL) {
>>> do_flush |= l_ctx->lockowner.l_owner != current->files
>>> || l_ctx->lockowner.l_pid != current->tgid;
>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>> index 40faddd..b99deb7 100644
>>> --- a/include/linux/nfs_page.h
>>> +++ b/include/linux/nfs_page.h
>>> @@ -26,6 +26,9 @@ enum {
>>> PG_MAPPED, /* page private set for buffered io */
>>> PG_CLEAN, /* write succeeded */
>>> PG_COMMIT_TO_DS, /* used by pnfs layouts */
>>> + PG_INODE_REF, /* extra ref held by inode (head req only) */
>>> + PG_HEADLOCK, /* page group lock of wb_head */
>>> + PG_TEARDOWN, /* page group sync for destroy */
>>> };
>>>
>>> struct nfs_inode;
>>> @@ -41,6 +44,8 @@ struct nfs_page {
>>> struct kref wb_kref; /* reference count */
>>> unsigned long wb_flags;
>>> struct nfs_write_verifier wb_verf; /* Commit cookie */
>>> + struct nfs_page *wb_this_page; /* list of reqs for this page */
>>> + struct nfs_page *wb_head; /* head pointer for req list */
>>> };
>>>
>>> struct nfs_pageio_descriptor;
>>> @@ -87,9 +92,10 @@ struct nfs_pageio_descriptor {
>>>
>>> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
>>> struct page *page,
>>> + struct nfs_page *last,
>>> unsigned int offset,
>>> unsigned int count);
>>> -extern void nfs_release_request(struct nfs_page *req);
>>> +extern void nfs_release_request(struct nfs_page *);
>>>
>>>
>>> extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>> @@ -108,7 +114,10 @@ extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>>> struct nfs_page *req);
>>> extern int nfs_wait_on_request(struct nfs_page *);
>>> extern void nfs_unlock_request(struct nfs_page *req);
>>> -extern void nfs_unlock_and_release_request(struct nfs_page *req);
>>> +extern void nfs_unlock_and_release_request(struct nfs_page *);
>>> +extern void nfs_page_group_lock(struct nfs_page *);
>>> +extern void nfs_page_group_unlock(struct nfs_page *);
>>> +extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);
>>>
>>> /*
>>> * Lock the page of an asynchronous request
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


2014-04-24 21:03:44

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 06/18 v2] nfs: add support for multiple nfs reqs per page

On 04/24/2014 04:49 PM, Anna Schumaker wrote:
> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>> Add "page groups" - a circular list of nfs requests (struct nfs_page)
>> that all reference the same page. This gives nfs read and write paths
>> the ability to account for sub-page regions independently. This
>> somewhat follows the design of struct buffer_head's sub-page
>> accounting.
>>
>> Only "head" requests are ever added/removed from the inode list in
>> the buffered write path. "head" and "sub" requests are treated the
>> same through the read path and the rest of the write/commit path.
>> Requests are given an extra reference across the life of the list.
>>
>> Page groups are never rejoined after being split. If the read/write
>> request fails and the client falls back to another path (ie revert
>> to MDS in PNFS case), the already split requests are pushed through
>> the recoalescing code again, which may split them further and then
>> coalesce them into properly sized requests on the wire. Fragmentation
>> shouldn't be a problem with the current design, because we flush all
>> requests in page group when a non-contiguous request is added, so
>> the only time resplitting should occur is on a resend of a read or
>> write.
>>
>> This patch lays the groundwork for sub-page splitting, but does not
>> actually do any splitting. For now all page groups have one request
>> as pg_test functions don't yet split pages. There are several related
>> patches that are needed support multiple requests per page group.
>>
>> Signed-off-by: Weston Andros Adamson <[email protected]>
>> ---
>> fs/nfs/direct.c | 7 +-
>> fs/nfs/pagelist.c | 224 ++++++++++++++++++++++++++++++++++++++++++++---
>> fs/nfs/read.c | 4 +-
>> fs/nfs/write.c | 13 ++-
>> include/linux/nfs_page.h | 13 ++-
>> 5 files changed, 240 insertions(+), 21 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index 1dd8c62..2c0e08f 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -380,7 +380,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
>> struct nfs_page *req;
>> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>> /* XXX do we need to do the eof zeroing found in async_filler? */
>> - req = nfs_create_request(dreq->ctx, pagevec[i],
>> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
>> pgbase, req_len);
>> if (IS_ERR(req)) {
>> result = PTR_ERR(req);
>> @@ -749,7 +749,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
>> struct nfs_page *req;
>> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>>
>> - req = nfs_create_request(dreq->ctx, pagevec[i],
>> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
>> pgbase, req_len);
>> if (IS_ERR(req)) {
>> result = PTR_ERR(req);
>> @@ -827,6 +827,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>> spin_unlock(&dreq->lock);
>>
>> while (!list_empty(&hdr->pages)) {
>> + bool do_destroy = true;
>> +
>> req = nfs_list_entry(hdr->pages.next);
>> nfs_list_remove_request(req);
>> switch (bit) {
>> @@ -834,6 +836,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>> case NFS_IOHDR_NEED_COMMIT:
>> kref_get(&req->wb_kref);
>> nfs_mark_request_commit(req, hdr->lseg, &cinfo);
>> + do_destroy = false;
>> }
>> nfs_unlock_and_release_request(req);
>> }
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index b120728..d57d675 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -26,6 +26,8 @@
>>
>> static struct kmem_cache *nfs_page_cachep;
>>
>> +static void nfs_free_request(struct nfs_page *);
>> +
>> bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
>> {
>> p->npages = pagecount;
>> @@ -133,10 +135,151 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
>> return __nfs_iocounter_wait(c);
>> }
>>
>> +/*
>> + * nfs_page_group_lock - lock the head of the page group
>> + * @req - request in group that is to be locked
>> + *
>> + * this lock must be held if modifying the page group list
>> + */
>> +void
>> +nfs_page_group_lock(struct nfs_page *req)
>> +{
>> + struct nfs_page *head = req->wb_head;
>> + int err = -EAGAIN;
>> +
>> + WARN_ON_ONCE(head != head->wb_head);
>> +
>> + while (err)
>> + err = wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
>> + nfs_wait_bit_killable, TASK_KILLABLE);
>> +}
>> +
>> +/*
>> + * nfs_page_group_unlock - unlock the head of the page group
>> + * @req - request in group that is to be unlocked
>> + */
>> +void
>> +nfs_page_group_unlock(struct nfs_page *req)
>> +{
>> + struct nfs_page *head = req->wb_head;
>> +
>> + WARN_ON_ONCE(head != head->wb_head);
>> +
>> + smp_mb__before_clear_bit();
>> + clear_bit(PG_HEADLOCK, &head->wb_flags);
>> + smp_mb__after_clear_bit();
>> + wake_up_bit(&head->wb_flags, PG_HEADLOCK);
>> +}
>> +
>> +/*
>> + * nfs_page_group_sync_on_bit_locked
>> + *
>> + * must be called with page group lock held
>> + */
>> +static bool
>> +nfs_page_group_sync_on_bit_locked(struct nfs_page *req, unsigned int bit)
>> +{
>> + struct nfs_page *head = req->wb_head;
>> + struct nfs_page *tmp;
>> +
>> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_flags));
>> + WARN_ON_ONCE(test_and_set_bit(bit, &req->wb_flags));
>> +
>> + tmp = req->wb_this_page;
>> + while (tmp != req) {
>> + if (!test_bit(bit, &tmp->wb_flags))
>> + return false;
>> + tmp = tmp->wb_this_page;
>> + }
>> +
>> + /* true! reset all bits */
>> + tmp = req;
>> + do {
>> + clear_bit(bit, &tmp->wb_flags);
>> + tmp = tmp->wb_this_page;
>> + } while (tmp != req);
>> +
>> + return true;
>> +}
>> +
>> +/*
>> + * nfs_page_group_sync_on_bit - set bit on current request, but only
>> + * return true if the bit is set for all requests in page group
>> + * @req - request in page group
>> + * @bit - PG_* bit that is used to sync page group
>> + */
>> +bool nfs_page_group_sync_on_bit(struct nfs_page *req, unsigned int bit)
>> +{
>> + bool ret;
>> +
>> + nfs_page_group_lock(req);
>> + ret = nfs_page_group_sync_on_bit_locked(req, bit);
>> + nfs_page_group_unlock(req);
>> +
>> + return ret;
>> +}
>> +
>> +/*
>> + * nfs_page_group_init - Initialize the page group linkage for @req
>> + * @req - a new nfs request
>> + * @prev - the previous request in page group, or NULL if @req is the first
>> + * or only request in the group (the head).
>> + */
>> +static inline void
>> +nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
>> +{
>> + WARN_ON_ONCE(prev == req);
>> +
>> + if (!prev) {
>> + req->wb_head = req;
>> + req->wb_this_page = req;
>> + } else {
>> + WARN_ON_ONCE(prev->wb_this_page != prev->wb_head);
>> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags));
>> + req->wb_head = prev->wb_head;
>> + req->wb_this_page = prev->wb_this_page;
>> + prev->wb_this_page = req;
>> +
>> + /* grab extra ref if head request has extra ref from
>> + * the write/commit path to handle handoff between write
>> + * and commit lists */
>> + if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags))
>> + kref_get(&req->wb_kref);
>> + }
>> +}
>> +
>> +/*
>> + * nfs_page_group_destroy - sync the destruction of page groups
>> + * @req - request that no longer needs the page group
>> + *
>> + * releases the page group reference from each member once all
>> + * members have called this function.
>> + */
>> +static void
>> +nfs_page_group_destroy(struct kref *kref)
>> +{
>> + struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
>> + struct nfs_page *tmp, *next;
>> +
>> + if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
>> + return;
>> +
>> + tmp = req;
>> + do {
>> + next = tmp->wb_this_page;
>> + /* unlink and free */
>> + tmp->wb_this_page = tmp;
>> + tmp->wb_head = tmp;
>> + nfs_free_request(tmp);
>> + tmp = next;
>> + } while (tmp != req);
>> +}
>> +
>> /**
>> * nfs_create_request - Create an NFS read/write request.
>> * @ctx: open context to use
>> * @page: page to write
>> + * @last: last nfs request created for this page group or NULL if head
>> * @offset: starting offset within the page for the write
>> * @count: number of bytes to read/write
>> *
>> @@ -146,7 +289,8 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
>> */
>> struct nfs_page *
>> nfs_create_request(struct nfs_open_context *ctx, struct page *page,
>> - unsigned int offset, unsigned int count)
>> + struct nfs_page *last, unsigned int offset,
>> + unsigned int count)
>> {
>> struct nfs_page *req;
>> struct nfs_lock_context *l_ctx;
>> @@ -178,6 +322,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
>> req->wb_bytes = count;
>> req->wb_context = get_nfs_open_context(ctx);
>> kref_init(&req->wb_kref);
>> + nfs_page_group_init(req, last);
>> return req;
>> }
>>
>> @@ -235,16 +380,22 @@ static void nfs_clear_request(struct nfs_page *req)
>> }
>> }
>>
>> -
>> /**
>> * nfs_release_request - Release the count on an NFS read/write request
>> * @req: request to release
>> *
>> * Note: Should never be called with the spinlock held!
>> */
>> -static void nfs_free_request(struct kref *kref)
>> +static void nfs_free_request(struct nfs_page *req)
>> {
>> - struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
>> + WARN_ON_ONCE(req->wb_this_page != req);
>> +
>> + /* extra debug: make sure no sync bits are still set */
>> + WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
>> + WARN_ON_ONCE(test_bit(PG_UNLOCKPAGE, &req->wb_flags));
>> + WARN_ON_ONCE(test_bit(PG_UPTODATE, &req->wb_flags));
>> + WARN_ON_ONCE(test_bit(PG_WB_END, &req->wb_flags));
> My compiler doesn't know about PG_UNLOCKPAGE, PG_UPTODATE or PG_WB_END. Did you forget to add something?
>
> Anna

Ah, patches 7 and 8 introduce these flags. Can you either reorder the patches or add the flags in this one?

Anna

>> + WARN_ON_ONCE(test_bit(PG_REMOVE, &req->wb_flags));
>>
>> /* Release struct file and open context */
>> nfs_clear_request(req);
>> @@ -253,7 +404,7 @@ static void nfs_free_request(struct kref *kref)
>>
>> void nfs_release_request(struct nfs_page *req)
>> {
>> - kref_put(&req->wb_kref, nfs_free_request);
>> + kref_put(&req->wb_kref, nfs_page_group_destroy);
>> }
>>
>> static int nfs_wait_bit_uninterruptible(void *word)
>> @@ -441,21 +592,66 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>> * @desc: destination io descriptor
>> * @req: request
>> *
>> + * This may split a request into subrequests which are all part of the
>> + * same page group.
>> + *
>> * Returns true if the request 'req' was successfully coalesced into the
>> * existing list of pages 'desc'.
>> */
>> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>> struct nfs_page *req)
>> {
>> - while (!nfs_pageio_do_add_request(desc, req)) {
>> - desc->pg_moreio = 1;
>> - nfs_pageio_doio(desc);
>> - if (desc->pg_error < 0)
>> - return 0;
>> - desc->pg_moreio = 0;
>> - if (desc->pg_recoalesce)
>> - return 0;
>> - }
>> + struct nfs_page *subreq;
>> + unsigned int bytes_left = 0;
>> + unsigned int offset, pgbase;
>> +
>> + nfs_page_group_lock(req);
>> +
>> + subreq = req;
>> + bytes_left = subreq->wb_bytes;
>> + offset = subreq->wb_offset;
>> + pgbase = subreq->wb_pgbase;
>> +
>> + do {
>> + if (!nfs_pageio_do_add_request(desc, subreq)) {
>> + /* make sure pg_test call(s) did nothing */
>> + WARN_ON_ONCE(subreq->wb_bytes != bytes_left);
>> + WARN_ON_ONCE(subreq->wb_offset != offset);
>> + WARN_ON_ONCE(subreq->wb_pgbase != pgbase);
>> +
>> + nfs_page_group_unlock(req);
>> + desc->pg_moreio = 1;
>> + nfs_pageio_doio(desc);
>> + if (desc->pg_error < 0)
>> + return 0;
>> + desc->pg_moreio = 0;
>> + if (desc->pg_recoalesce)
>> + return 0;
>> + /* retry add_request for this subreq */
>> + nfs_page_group_lock(req);
>> + continue;
>> + }
>> +
>> + /* check for buggy pg_test call(s) */
>> + WARN_ON_ONCE(subreq->wb_bytes + subreq->wb_pgbase > PAGE_SIZE);
>> + WARN_ON_ONCE(subreq->wb_bytes > bytes_left);
>> + WARN_ON_ONCE(subreq->wb_bytes == 0);
>> +
>> + bytes_left -= subreq->wb_bytes;
>> + offset += subreq->wb_bytes;
>> + pgbase += subreq->wb_bytes;
>> +
>> + if (bytes_left) {
>> + subreq = nfs_create_request(req->wb_context,
>> + req->wb_page,
>> + subreq, pgbase, bytes_left);
>> + nfs_lock_request(subreq);
>> + subreq->wb_offset = offset;
>> + subreq->wb_index = req->wb_index;
>> + }
>> + } while (bytes_left > 0);
>> +
>> + nfs_page_group_unlock(req);
>> return 1;
>> }
>>
>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> index a48b909..f814e8a 100644
>> --- a/fs/nfs/read.c
>> +++ b/fs/nfs/read.c
>> @@ -85,7 +85,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>> len = nfs_page_length(page);
>> if (len == 0)
>> return nfs_return_empty_page(page);
>> - new = nfs_create_request(ctx, page, 0, len);
>> + new = nfs_create_request(ctx, page, NULL, 0, len);
>> if (IS_ERR(new)) {
>> unlock_page(page);
>> return PTR_ERR(new);
>> @@ -311,7 +311,7 @@ readpage_async_filler(void *data, struct page *page)
>> if (len == 0)
>> return nfs_return_empty_page(page);
>>
>> - new = nfs_create_request(desc->ctx, page, 0, len);
>> + new = nfs_create_request(desc->ctx, page, NULL, 0, len);
>> if (IS_ERR(new))
>> goto out_error;
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index 5f1c965..a65755a 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -367,6 +367,8 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
>> {
>> struct nfs_inode *nfsi = NFS_I(inode);
>>
>> + WARN_ON_ONCE(req->wb_this_page != req);
>> +
>> /* Lock the request! */
>> nfs_lock_request(req);
>>
>> @@ -383,6 +385,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
>> set_page_private(req->wb_page, (unsigned long)req);
>> }
>> nfsi->npages++;
>> + set_bit(PG_INODE_REF, &req->wb_flags);
>> kref_get(&req->wb_kref);
>> spin_unlock(&inode->i_lock);
>> }
>> @@ -567,6 +570,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>> {
>> struct nfs_commit_info cinfo;
>> unsigned long bytes = 0;
>> + bool do_destroy;
>>
>> if (test_bit(NFS_IOHDR_REDO, &hdr->flags))
>> goto out;
>> @@ -596,6 +600,7 @@ remove_req:
>> next:
>> nfs_unlock_request(req);
>> nfs_end_page_writeback(req->wb_page);
>> + do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags);
>> nfs_release_request(req);
>> }
>> out:
>> @@ -700,6 +705,10 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
>> if (req == NULL)
>> goto out_unlock;
>>
>> + /* should be handled by nfs_flush_incompatible */
>> + WARN_ON_ONCE(req->wb_head != req);
>> + WARN_ON_ONCE(req->wb_this_page != req);
>> +
>> rqend = req->wb_offset + req->wb_bytes;
>> /*
>> * Tell the caller to flush out the request if
>> @@ -761,7 +770,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
>> req = nfs_try_to_update_request(inode, page, offset, bytes);
>> if (req != NULL)
>> goto out;
>> - req = nfs_create_request(ctx, page, offset, bytes);
>> + req = nfs_create_request(ctx, page, NULL, offset, bytes);
>> if (IS_ERR(req))
>> goto out;
>> nfs_inode_add_request(inode, req);
>> @@ -805,6 +814,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
>> return 0;
>> l_ctx = req->wb_lock_context;
>> do_flush = req->wb_page != page || req->wb_context != ctx;
>> + /* for now, flush if more than 1 request in page_group */
>> + do_flush |= req->wb_this_page != req;
>> if (l_ctx && ctx->dentry->d_inode->i_flock != NULL) {
>> do_flush |= l_ctx->lockowner.l_owner != current->files
>> || l_ctx->lockowner.l_pid != current->tgid;
>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>> index 40faddd..b99deb7 100644
>> --- a/include/linux/nfs_page.h
>> +++ b/include/linux/nfs_page.h
>> @@ -26,6 +26,9 @@ enum {
>> PG_MAPPED, /* page private set for buffered io */
>> PG_CLEAN, /* write succeeded */
>> PG_COMMIT_TO_DS, /* used by pnfs layouts */
>> + PG_INODE_REF, /* extra ref held by inode (head req only) */
>> + PG_HEADLOCK, /* page group lock of wb_head */
>> + PG_TEARDOWN, /* page group sync for destroy */
>> };
>>
>> struct nfs_inode;
>> @@ -41,6 +44,8 @@ struct nfs_page {
>> struct kref wb_kref; /* reference count */
>> unsigned long wb_flags;
>> struct nfs_write_verifier wb_verf; /* Commit cookie */
>> + struct nfs_page *wb_this_page; /* list of reqs for this page */
>> + struct nfs_page *wb_head; /* head pointer for req list */
>> };
>>
>> struct nfs_pageio_descriptor;
>> @@ -87,9 +92,10 @@ struct nfs_pageio_descriptor {
>>
>> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
>> struct page *page,
>> + struct nfs_page *last,
>> unsigned int offset,
>> unsigned int count);
>> -extern void nfs_release_request(struct nfs_page *req);
>> +extern void nfs_release_request(struct nfs_page *);
>>
>>
>> extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>> @@ -108,7 +114,10 @@ extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>> struct nfs_page *req);
>> extern int nfs_wait_on_request(struct nfs_page *);
>> extern void nfs_unlock_request(struct nfs_page *req);
>> -extern void nfs_unlock_and_release_request(struct nfs_page *req);
>> +extern void nfs_unlock_and_release_request(struct nfs_page *);
>> +extern void nfs_page_group_lock(struct nfs_page *);
>> +extern void nfs_page_group_unlock(struct nfs_page *);
>> +extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);
>>
>> /*
>> * Lock the page of an asynchronous request
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2014-04-24 18:16:00

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

Support direct requests that span multiple pnfs data servers by
comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
Continue to use dreq->verf if the MDS is used / non-pNFS.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
fs/nfs/nfs4filelayout.c | 6 +++
include/linux/nfs.h | 5 ++-
include/linux/nfs_xdr.h | 2 +
4 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 2c0e08f..9349933 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
return atomic_dec_and_test(&dreq->io_count);
}

+/*
+ * nfs_direct_select_verf - select the right verifier
+ * @dreq - direct request possibly spanning multiple servers
+ * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
+ * @ds_idx - index of data server in data server list, only valid if ds_clp set
+ *
+ * returns the correct verifier to use given the role of the server
+ */
+static struct nfs_writeverf *
+nfs_direct_select_verf(struct nfs_direct_req *dreq,
+ struct nfs_client *ds_clp,
+ int ds_idx)
+{
+ struct nfs_writeverf *verfp = &dreq->verf;
+
+ if (ds_clp) {
+ /* pNFS is in use, use the DS verf */
+ if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
+ verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
+ else
+ WARN_ON_ONCE(1);
+ }
+ return verfp;
+}
+
+
+/*
+ * nfs_direct_set_hdr_verf - set the write/commit verifier
+ * @dreq - direct request possibly spanning multiple servers
+ * @hdr - pageio header to validate against previously seen verfs
+ *
+ * Set the server's (MDS or DS) "seen" verifier
+ */
+static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
+ struct nfs_pgio_header *hdr)
+{
+ struct nfs_writeverf *verfp;
+
+ verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
+ hdr->data->ds_idx);
+ WARN_ON_ONCE(verfp->committed >= 0);
+ memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
+ WARN_ON_ONCE(verfp->committed < 0);
+}
+
+/*
+ * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
+ * @dreq - direct request possibly spanning multiple servers
+ * @hdr - pageio header to validate against previously seen verf
+ *
+ * set the server's "seen" verf if not initialized.
+ * returns result of comparison between @hdr->verf and the "seen"
+ * verf of the server used by @hdr (DS or MDS)
+ */
+static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
+ struct nfs_pgio_header *hdr)
+{
+ struct nfs_writeverf *verfp;
+
+ verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
+ hdr->data->ds_idx);
+ if (verfp->committed < 0) {
+ nfs_direct_set_hdr_verf(dreq, hdr);
+ return 0;
+ }
+ return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
+}
+
+/*
+ * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
+ * @dreq - direct request possibly spanning multiple servers
+ * @data - commit data to validate against previously seen verf
+ *
+ * returns result of comparison between @data->verf and the verf of
+ * the server used by @data (DS or MDS)
+ */
+static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
+ struct nfs_commit_data *data)
+{
+ struct nfs_writeverf *verfp;
+
+ verfp = nfs_direct_select_verf(dreq, data->ds_clp,
+ data->ds_commit_index);
+ WARN_ON_ONCE(verfp->committed < 0);
+ return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
+}
+
/**
* nfs_direct_IO - NFS address space operation for direct I/O
* @rw: direction (read or write)
@@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
kref_get(&dreq->kref);
init_completion(&dreq->completion);
INIT_LIST_HEAD(&dreq->mds_cinfo.list);
+ dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
spin_lock_init(&dreq->lock);

@@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
dprintk("NFS: %5u commit failed with error %d.\n",
data->task.tk_pid, status);
dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
- } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
+ } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
}
@@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
bit = NFS_IOHDR_NEED_RESCHED;
else if (dreq->flags == 0) {
- memcpy(&dreq->verf, &hdr->verf,
- sizeof(dreq->verf));
+ nfs_direct_set_hdr_verf(dreq, hdr);
bit = NFS_IOHDR_NEED_COMMIT;
dreq->flags = NFS_ODIRECT_DO_COMMIT;
} else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
- if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
- dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
+ if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
+ dreq->flags =
+ NFS_ODIRECT_RESCHED_WRITES;
bit = NFS_IOHDR_NEED_RESCHED;
} else
bit = NFS_IOHDR_NEED_COMMIT;
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 7a665e0..0ebc521 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
/* No multipath support. Use first DS */
atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
+ data->ds_idx = idx;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
data->args.fh = fh;
@@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
data->pgio_done_cb = filelayout_write_done_cb;
atomic_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
+ data->ds_idx = idx;
fh = nfs4_fl_select_ds_fh(lseg, j);
if (fh)
data->args.fh = fh;
@@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
for (i = 0; i < size; i++) {
INIT_LIST_HEAD(&buckets[i].written);
INIT_LIST_HEAD(&buckets[i].committing);
+ /* mark direct verifier as unset */
+ buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
}

spin_lock(cinfo->lock);
@@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
&buckets[i].written);
list_splice(&cinfo->ds->buckets[i].committing,
&buckets[i].committing);
+ buckets[i].direct_verf.committed =
+ cinfo->ds->buckets[i].direct_verf.committed;
buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
buckets[i].clseg = cinfo->ds->buckets[i].clseg;
}
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 3e794c1..610af51 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
enum nfs3_stable_how {
NFS_UNSTABLE = 0,
NFS_DATA_SYNC = 1,
- NFS_FILE_SYNC = 2
+ NFS_FILE_SYNC = 2,
+
+ /* used by direct.c to mark verf as invalid */
+ NFS_INVALID_STABLE_HOW = -1
};
#endif /* _LINUX_NFS_H */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 29828c7..bb9fb88 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
struct list_head committing;
struct pnfs_layout_segment *wlseg;
struct pnfs_layout_segment *clseg;
+ struct nfs_writeverf direct_verf;
};

struct pnfs_ds_commit_info {
@@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
__u64 mds_offset; /* Filelayout dense stripe */
struct nfs_page_array pages;
struct nfs_client *ds_clp; /* pNFS data server */
+ int ds_idx; /* ds index if ds_clp is set */
};

struct nfs_rw_header {
--
1.8.5.2 (Apple Git-48)


2014-04-25 13:32:44

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 06/18 v2] nfs: add support for multiple nfs reqs per page

On Apr 24, 2014, at 7:06 PM, Weston Andros Adamson <[email protected]> wrote:

> Ah my bad, I?ll fix that up.
>
> -dros
>
> On Apr 24, 2014, at 5:03 PM, Anna Schumaker <[email protected]> wrote:
>
>> On 04/24/2014 04:49 PM, Anna Schumaker wrote:
>>> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>>>> Add "page groups" - a circular list of nfs requests (struct nfs_page)
>>>> that all reference the same page. This gives nfs read and write paths
>>>> the ability to account for sub-page regions independently. This
>>>> somewhat follows the design of struct buffer_head's sub-page
>>>> accounting.
>>>>
>>>> Only "head" requests are ever added/removed from the inode list in
>>>> the buffered write path. "head" and "sub" requests are treated the
>>>> same through the read path and the rest of the write/commit path.
>>>> Requests are given an extra reference across the life of the list.
>>>>
>>>> Page groups are never rejoined after being split. If the read/write
>>>> request fails and the client falls back to another path (ie revert
>>>> to MDS in PNFS case), the already split requests are pushed through
>>>> the recoalescing code again, which may split them further and then
>>>> coalesce them into properly sized requests on the wire. Fragmentation
>>>> shouldn't be a problem with the current design, because we flush all
>>>> requests in page group when a non-contiguous request is added, so
>>>> the only time resplitting should occur is on a resend of a read or
>>>> write.
>>>>
>>>> This patch lays the groundwork for sub-page splitting, but does not
>>>> actually do any splitting. For now all page groups have one request
>>>> as pg_test functions don't yet split pages. There are several related
>>>> patches that are needed support multiple requests per page group.
>>>>
>>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>>> ---
>>>> fs/nfs/direct.c | 7 +-
>>>> fs/nfs/pagelist.c | 224 ++++++++++++++++++++++++++++++++++++++++++++---
>>>> fs/nfs/read.c | 4 +-
>>>> fs/nfs/write.c | 13 ++-
>>>> include/linux/nfs_page.h | 13 ++-
>>>> 5 files changed, 240 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>>> index 1dd8c62..2c0e08f 100644
>>>> --- a/fs/nfs/direct.c
>>>> +++ b/fs/nfs/direct.c
>>>> @@ -380,7 +380,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
>>>> struct nfs_page *req;
>>>> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>>>> /* XXX do we need to do the eof zeroing found in async_filler? */
>>>> - req = nfs_create_request(dreq->ctx, pagevec[i],
>>>> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
>>>> pgbase, req_len);
>>>> if (IS_ERR(req)) {
>>>> result = PTR_ERR(req);
>>>> @@ -749,7 +749,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
>>>> struct nfs_page *req;
>>>> unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
>>>>
>>>> - req = nfs_create_request(dreq->ctx, pagevec[i],
>>>> + req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
>>>> pgbase, req_len);
>>>> if (IS_ERR(req)) {
>>>> result = PTR_ERR(req);
>>>> @@ -827,6 +827,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>>> spin_unlock(&dreq->lock);
>>>>
>>>> while (!list_empty(&hdr->pages)) {
>>>> + bool do_destroy = true;
>>>> +
>>>> req = nfs_list_entry(hdr->pages.next);
>>>> nfs_list_remove_request(req);
>>>> switch (bit) {
>>>> @@ -834,6 +836,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>>> case NFS_IOHDR_NEED_COMMIT:
>>>> kref_get(&req->wb_kref);
>>>> nfs_mark_request_commit(req, hdr->lseg, &cinfo);
>>>> + do_destroy = false;
>>>> }
>>>> nfs_unlock_and_release_request(req);
>>>> }
>>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>>> index b120728..d57d675 100644
>>>> --- a/fs/nfs/pagelist.c
>>>> +++ b/fs/nfs/pagelist.c
>>>> @@ -26,6 +26,8 @@
>>>>
>>>> static struct kmem_cache *nfs_page_cachep;
>>>>
>>>> +static void nfs_free_request(struct nfs_page *);
>>>> +
>>>> bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
>>>> {
>>>> p->npages = pagecount;
>>>> @@ -133,10 +135,151 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
>>>> return __nfs_iocounter_wait(c);
>>>> }
>>>>
>>>> +/*
>>>> + * nfs_page_group_lock - lock the head of the page group
>>>> + * @req - request in group that is to be locked
>>>> + *
>>>> + * this lock must be held if modifying the page group list
>>>> + */
>>>> +void
>>>> +nfs_page_group_lock(struct nfs_page *req)
>>>> +{
>>>> + struct nfs_page *head = req->wb_head;
>>>> + int err = -EAGAIN;
>>>> +
>>>> + WARN_ON_ONCE(head != head->wb_head);
>>>> +
>>>> + while (err)
>>>> + err = wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
>>>> + nfs_wait_bit_killable, TASK_KILLABLE);
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_page_group_unlock - unlock the head of the page group
>>>> + * @req - request in group that is to be unlocked
>>>> + */
>>>> +void
>>>> +nfs_page_group_unlock(struct nfs_page *req)
>>>> +{
>>>> + struct nfs_page *head = req->wb_head;
>>>> +
>>>> + WARN_ON_ONCE(head != head->wb_head);
>>>> +
>>>> + smp_mb__before_clear_bit();
>>>> + clear_bit(PG_HEADLOCK, &head->wb_flags);
>>>> + smp_mb__after_clear_bit();
>>>> + wake_up_bit(&head->wb_flags, PG_HEADLOCK);
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_page_group_sync_on_bit_locked
>>>> + *
>>>> + * must be called with page group lock held
>>>> + */
>>>> +static bool
>>>> +nfs_page_group_sync_on_bit_locked(struct nfs_page *req, unsigned int bit)
>>>> +{
>>>> + struct nfs_page *head = req->wb_head;
>>>> + struct nfs_page *tmp;
>>>> +
>>>> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_flags));
>>>> + WARN_ON_ONCE(test_and_set_bit(bit, &req->wb_flags));
>>>> +
>>>> + tmp = req->wb_this_page;
>>>> + while (tmp != req) {
>>>> + if (!test_bit(bit, &tmp->wb_flags))
>>>> + return false;
>>>> + tmp = tmp->wb_this_page;
>>>> + }
>>>> +
>>>> + /* true! reset all bits */
>>>> + tmp = req;
>>>> + do {
>>>> + clear_bit(bit, &tmp->wb_flags);
>>>> + tmp = tmp->wb_this_page;
>>>> + } while (tmp != req);
>>>> +
>>>> + return true;
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_page_group_sync_on_bit - set bit on current request, but only
>>>> + * return true if the bit is set for all requests in page group
>>>> + * @req - request in page group
>>>> + * @bit - PG_* bit that is used to sync page group
>>>> + */
>>>> +bool nfs_page_group_sync_on_bit(struct nfs_page *req, unsigned int bit)
>>>> +{
>>>> + bool ret;
>>>> +
>>>> + nfs_page_group_lock(req);
>>>> + ret = nfs_page_group_sync_on_bit_locked(req, bit);
>>>> + nfs_page_group_unlock(req);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_page_group_init - Initialize the page group linkage for @req
>>>> + * @req - a new nfs request
>>>> + * @prev - the previous request in page group, or NULL if @req is the first
>>>> + * or only request in the group (the head).
>>>> + */
>>>> +static inline void
>>>> +nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
>>>> +{
>>>> + WARN_ON_ONCE(prev == req);
>>>> +
>>>> + if (!prev) {
>>>> + req->wb_head = req;
>>>> + req->wb_this_page = req;
>>>> + } else {
>>>> + WARN_ON_ONCE(prev->wb_this_page != prev->wb_head);
>>>> + WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags));
>>>> + req->wb_head = prev->wb_head;
>>>> + req->wb_this_page = prev->wb_this_page;
>>>> + prev->wb_this_page = req;
>>>> +
>>>> + /* grab extra ref if head request has extra ref from
>>>> + * the write/commit path to handle handoff between write
>>>> + * and commit lists */
>>>> + if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags))
>>>> + kref_get(&req->wb_kref);
>>>> + }
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_page_group_destroy - sync the destruction of page groups
>>>> + * @req - request that no longer needs the page group
>>>> + *
>>>> + * releases the page group reference from each member once all
>>>> + * members have called this function.
>>>> + */
>>>> +static void
>>>> +nfs_page_group_destroy(struct kref *kref)
>>>> +{
>>>> + struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
>>>> + struct nfs_page *tmp, *next;
>>>> +
>>>> + if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
>>>> + return;
>>>> +
>>>> + tmp = req;
>>>> + do {
>>>> + next = tmp->wb_this_page;
>>>> + /* unlink and free */
>>>> + tmp->wb_this_page = tmp;
>>>> + tmp->wb_head = tmp;
>>>> + nfs_free_request(tmp);
>>>> + tmp = next;
>>>> + } while (tmp != req);
>>>> +}
>>>> +
>>>> /**
>>>> * nfs_create_request - Create an NFS read/write request.
>>>> * @ctx: open context to use
>>>> * @page: page to write
>>>> + * @last: last nfs request created for this page group or NULL if head
>>>> * @offset: starting offset within the page for the write
>>>> * @count: number of bytes to read/write
>>>> *
>>>> @@ -146,7 +289,8 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
>>>> */
>>>> struct nfs_page *
>>>> nfs_create_request(struct nfs_open_context *ctx, struct page *page,
>>>> - unsigned int offset, unsigned int count)
>>>> + struct nfs_page *last, unsigned int offset,
>>>> + unsigned int count)
>>>> {
>>>> struct nfs_page *req;
>>>> struct nfs_lock_context *l_ctx;
>>>> @@ -178,6 +322,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
>>>> req->wb_bytes = count;
>>>> req->wb_context = get_nfs_open_context(ctx);
>>>> kref_init(&req->wb_kref);
>>>> + nfs_page_group_init(req, last);
>>>> return req;
>>>> }
>>>>
>>>> @@ -235,16 +380,22 @@ static void nfs_clear_request(struct nfs_page *req)
>>>> }
>>>> }
>>>>
>>>> -
>>>> /**
>>>> * nfs_release_request - Release the count on an NFS read/write request
>>>> * @req: request to release
>>>> *
>>>> * Note: Should never be called with the spinlock held!
>>>> */
>>>> -static void nfs_free_request(struct kref *kref)
>>>> +static void nfs_free_request(struct nfs_page *req)
>>>> {
>>>> - struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
>>>> + WARN_ON_ONCE(req->wb_this_page != req);
>>>> +
>>>> + /* extra debug: make sure no sync bits are still set */
>>>> + WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
>>>> + WARN_ON_ONCE(test_bit(PG_UNLOCKPAGE, &req->wb_flags));
>>>> + WARN_ON_ONCE(test_bit(PG_UPTODATE, &req->wb_flags));
>>>> + WARN_ON_ONCE(test_bit(PG_WB_END, &req->wb_flags));
>>> My compiler doesn't know about PG_UNLOCKPAGE, PG_UPTODATE or PG_WB_END. Did you forget to add something?
>>>
>>> Anna
>>
>> Ah, patches 7 and 8 introduce these flags. Can you either reorder the patches or add the flags in this one?
>>
>> Anna
>>

This has been fixed in my public tree?s ?pgio? branch. I?m going to hold off on reposting untilI get
more comments.

-dros

>>>> + WARN_ON_ONCE(test_bit(PG_REMOVE, &req->wb_flags));
>>>>
>>>> /* Release struct file and open context */
>>>> nfs_clear_request(req);
>>>> @@ -253,7 +404,7 @@ static void nfs_free_request(struct kref *kref)
>>>>
>>>> void nfs_release_request(struct nfs_page *req)
>>>> {
>>>> - kref_put(&req->wb_kref, nfs_free_request);
>>>> + kref_put(&req->wb_kref, nfs_page_group_destroy);
>>>> }
>>>>
>>>> static int nfs_wait_bit_uninterruptible(void *word)
>>>> @@ -441,21 +592,66 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
>>>> * @desc: destination io descriptor
>>>> * @req: request
>>>> *
>>>> + * This may split a request into subrequests which are all part of the
>>>> + * same page group.
>>>> + *
>>>> * Returns true if the request 'req' was successfully coalesced into the
>>>> * existing list of pages 'desc'.
>>>> */
>>>> static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_page *req)
>>>> {
>>>> - while (!nfs_pageio_do_add_request(desc, req)) {
>>>> - desc->pg_moreio = 1;
>>>> - nfs_pageio_doio(desc);
>>>> - if (desc->pg_error < 0)
>>>> - return 0;
>>>> - desc->pg_moreio = 0;
>>>> - if (desc->pg_recoalesce)
>>>> - return 0;
>>>> - }
>>>> + struct nfs_page *subreq;
>>>> + unsigned int bytes_left = 0;
>>>> + unsigned int offset, pgbase;
>>>> +
>>>> + nfs_page_group_lock(req);
>>>> +
>>>> + subreq = req;
>>>> + bytes_left = subreq->wb_bytes;
>>>> + offset = subreq->wb_offset;
>>>> + pgbase = subreq->wb_pgbase;
>>>> +
>>>> + do {
>>>> + if (!nfs_pageio_do_add_request(desc, subreq)) {
>>>> + /* make sure pg_test call(s) did nothing */
>>>> + WARN_ON_ONCE(subreq->wb_bytes != bytes_left);
>>>> + WARN_ON_ONCE(subreq->wb_offset != offset);
>>>> + WARN_ON_ONCE(subreq->wb_pgbase != pgbase);
>>>> +
>>>> + nfs_page_group_unlock(req);
>>>> + desc->pg_moreio = 1;
>>>> + nfs_pageio_doio(desc);
>>>> + if (desc->pg_error < 0)
>>>> + return 0;
>>>> + desc->pg_moreio = 0;
>>>> + if (desc->pg_recoalesce)
>>>> + return 0;
>>>> + /* retry add_request for this subreq */
>>>> + nfs_page_group_lock(req);
>>>> + continue;
>>>> + }
>>>> +
>>>> + /* check for buggy pg_test call(s) */
>>>> + WARN_ON_ONCE(subreq->wb_bytes + subreq->wb_pgbase > PAGE_SIZE);
>>>> + WARN_ON_ONCE(subreq->wb_bytes > bytes_left);
>>>> + WARN_ON_ONCE(subreq->wb_bytes == 0);
>>>> +
>>>> + bytes_left -= subreq->wb_bytes;
>>>> + offset += subreq->wb_bytes;
>>>> + pgbase += subreq->wb_bytes;
>>>> +
>>>> + if (bytes_left) {
>>>> + subreq = nfs_create_request(req->wb_context,
>>>> + req->wb_page,
>>>> + subreq, pgbase, bytes_left);
>>>> + nfs_lock_request(subreq);
>>>> + subreq->wb_offset = offset;
>>>> + subreq->wb_index = req->wb_index;
>>>> + }
>>>> + } while (bytes_left > 0);
>>>> +
>>>> + nfs_page_group_unlock(req);
>>>> return 1;
>>>> }
>>>>
>>>> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>>>> index a48b909..f814e8a 100644
>>>> --- a/fs/nfs/read.c
>>>> +++ b/fs/nfs/read.c
>>>> @@ -85,7 +85,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>>>> len = nfs_page_length(page);
>>>> if (len == 0)
>>>> return nfs_return_empty_page(page);
>>>> - new = nfs_create_request(ctx, page, 0, len);
>>>> + new = nfs_create_request(ctx, page, NULL, 0, len);
>>>> if (IS_ERR(new)) {
>>>> unlock_page(page);
>>>> return PTR_ERR(new);
>>>> @@ -311,7 +311,7 @@ readpage_async_filler(void *data, struct page *page)
>>>> if (len == 0)
>>>> return nfs_return_empty_page(page);
>>>>
>>>> - new = nfs_create_request(desc->ctx, page, 0, len);
>>>> + new = nfs_create_request(desc->ctx, page, NULL, 0, len);
>>>> if (IS_ERR(new))
>>>> goto out_error;
>>>>
>>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>>> index 5f1c965..a65755a 100644
>>>> --- a/fs/nfs/write.c
>>>> +++ b/fs/nfs/write.c
>>>> @@ -367,6 +367,8 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
>>>> {
>>>> struct nfs_inode *nfsi = NFS_I(inode);
>>>>
>>>> + WARN_ON_ONCE(req->wb_this_page != req);
>>>> +
>>>> /* Lock the request! */
>>>> nfs_lock_request(req);
>>>>
>>>> @@ -383,6 +385,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
>>>> set_page_private(req->wb_page, (unsigned long)req);
>>>> }
>>>> nfsi->npages++;
>>>> + set_bit(PG_INODE_REF, &req->wb_flags);
>>>> kref_get(&req->wb_kref);
>>>> spin_unlock(&inode->i_lock);
>>>> }
>>>> @@ -567,6 +570,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>>>> {
>>>> struct nfs_commit_info cinfo;
>>>> unsigned long bytes = 0;
>>>> + bool do_destroy;
>>>>
>>>> if (test_bit(NFS_IOHDR_REDO, &hdr->flags))
>>>> goto out;
>>>> @@ -596,6 +600,7 @@ remove_req:
>>>> next:
>>>> nfs_unlock_request(req);
>>>> nfs_end_page_writeback(req->wb_page);
>>>> + do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags);
>>>> nfs_release_request(req);
>>>> }
>>>> out:
>>>> @@ -700,6 +705,10 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
>>>> if (req == NULL)
>>>> goto out_unlock;
>>>>
>>>> + /* should be handled by nfs_flush_incompatible */
>>>> + WARN_ON_ONCE(req->wb_head != req);
>>>> + WARN_ON_ONCE(req->wb_this_page != req);
>>>> +
>>>> rqend = req->wb_offset + req->wb_bytes;
>>>> /*
>>>> * Tell the caller to flush out the request if
>>>> @@ -761,7 +770,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
>>>> req = nfs_try_to_update_request(inode, page, offset, bytes);
>>>> if (req != NULL)
>>>> goto out;
>>>> - req = nfs_create_request(ctx, page, offset, bytes);
>>>> + req = nfs_create_request(ctx, page, NULL, offset, bytes);
>>>> if (IS_ERR(req))
>>>> goto out;
>>>> nfs_inode_add_request(inode, req);
>>>> @@ -805,6 +814,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
>>>> return 0;
>>>> l_ctx = req->wb_lock_context;
>>>> do_flush = req->wb_page != page || req->wb_context != ctx;
>>>> + /* for now, flush if more than 1 request in page_group */
>>>> + do_flush |= req->wb_this_page != req;
>>>> if (l_ctx && ctx->dentry->d_inode->i_flock != NULL) {
>>>> do_flush |= l_ctx->lockowner.l_owner != current->files
>>>> || l_ctx->lockowner.l_pid != current->tgid;
>>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>>> index 40faddd..b99deb7 100644
>>>> --- a/include/linux/nfs_page.h
>>>> +++ b/include/linux/nfs_page.h
>>>> @@ -26,6 +26,9 @@ enum {
>>>> PG_MAPPED, /* page private set for buffered io */
>>>> PG_CLEAN, /* write succeeded */
>>>> PG_COMMIT_TO_DS, /* used by pnfs layouts */
>>>> + PG_INODE_REF, /* extra ref held by inode (head req only) */
>>>> + PG_HEADLOCK, /* page group lock of wb_head */
>>>> + PG_TEARDOWN, /* page group sync for destroy */
>>>> };
>>>>
>>>> struct nfs_inode;
>>>> @@ -41,6 +44,8 @@ struct nfs_page {
>>>> struct kref wb_kref; /* reference count */
>>>> unsigned long wb_flags;
>>>> struct nfs_write_verifier wb_verf; /* Commit cookie */
>>>> + struct nfs_page *wb_this_page; /* list of reqs for this page */
>>>> + struct nfs_page *wb_head; /* head pointer for req list */
>>>> };
>>>>
>>>> struct nfs_pageio_descriptor;
>>>> @@ -87,9 +92,10 @@ struct nfs_pageio_descriptor {
>>>>
>>>> extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
>>>> struct page *page,
>>>> + struct nfs_page *last,
>>>> unsigned int offset,
>>>> unsigned int count);
>>>> -extern void nfs_release_request(struct nfs_page *req);
>>>> +extern void nfs_release_request(struct nfs_page *);
>>>>
>>>>
>>>> extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>>> @@ -108,7 +114,10 @@ extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
>>>> struct nfs_page *req);
>>>> extern int nfs_wait_on_request(struct nfs_page *);
>>>> extern void nfs_unlock_request(struct nfs_page *req);
>>>> -extern void nfs_unlock_and_release_request(struct nfs_page *req);
>>>> +extern void nfs_unlock_and_release_request(struct nfs_page *);
>>>> +extern void nfs_page_group_lock(struct nfs_page *);
>>>> +extern void nfs_page_group_unlock(struct nfs_page *);
>>>> +extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);
>>>>
>>>> /*
>>>> * Lock the page of an asynchronous request
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2014-04-25 13:56:26

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 14/18 v2] nfs: remove list of [rw]data from pgio header

On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
> Since the ability to split pages into subpage requests has been added,
> nfs_pgio_header->rpc_list only ever has one wdata/rdata.

[rw]data structs don't exist anymore. Can you update the patch description?

Anna

>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> ---
> fs/nfs/pageio.c | 35 ++++-------------------------------
> fs/nfs/pnfs.c | 41 +++++++++++++++--------------------------
> include/linux/nfs_xdr.h | 4 +++-
> 3 files changed, 22 insertions(+), 58 deletions(-)
>
> diff --git a/fs/nfs/pageio.c b/fs/nfs/pageio.c
> index 1899977..f578cc9 100644
> --- a/fs/nfs/pageio.c
> +++ b/fs/nfs/pageio.c
> @@ -29,7 +29,6 @@ struct nfs_rw_header *nfs_rw_header_alloc(const struct nfs_rw_ops *ops)
> struct nfs_pgio_header *hdr = &header->header;
>
> INIT_LIST_HEAD(&hdr->pages);
> - INIT_LIST_HEAD(&hdr->rpc_list);
> spin_lock_init(&hdr->lock);
> atomic_set(&hdr->refcnt, 0);
> hdr->rw_ops = ops;
> @@ -182,37 +181,12 @@ static int nfs_do_pgio(struct nfs_pgio_data *data,
> return nfs_initiate_pgio(NFS_CLIENT(inode), data, call_ops, how, 0);
> }
>
> -int nfs_do_multiple_pgios(struct list_head *head,
> - const struct rpc_call_ops *call_ops,
> - int how)
> -{
> - struct nfs_pgio_data *data;
> - int ret = 0;
> -
> - while (!list_empty(head)) {
> - int ret2;
> -
> - data = list_first_entry(head, struct nfs_pgio_data, list);
> - list_del_init(&data->list);
> -
> - ret2 = nfs_do_pgio(data, call_ops, how);
> - if (ret == 0)
> - ret = ret2;
> - }
> - return ret;
> -}
> -
> static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
> struct nfs_pgio_header *hdr)
> {
> - struct nfs_pgio_data *data;
> -
> set_bit(NFS_IOHDR_REDO, &hdr->flags);
> - while (!list_empty(&hdr->rpc_list)) {
> - data = list_first_entry(&hdr->rpc_list, struct nfs_pgio_data, list);
> - list_del(&data->list);
> - nfs_pgio_data_release(data);
> - }
> + nfs_pgio_data_release(hdr->data);
> + hdr->data = NULL;
> desc->pg_completion_ops->error_cleanup(&desc->pg_list);
> return -ENOMEM;
> }
> @@ -254,7 +228,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>
> /* Set up the argument struct */
> nfs_pgio_rpcsetup(data, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
> - list_add(&data->list, &hdr->rpc_list);
> + hdr->data = data;
> desc->pg_rpc_callops = &nfs_pgio_common_ops;
> return 0;
> }
> @@ -276,8 +250,7 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
> atomic_inc(&hdr->refcnt);
> ret = nfs_generic_pgio(desc, hdr);
> if (ret == 0)
> - ret = nfs_do_multiple_pgios(&hdr->rpc_list,
> - desc->pg_rpc_callops, 0);
> + ret = nfs_do_pgio(hdr->data, desc->pg_rpc_callops, 0);
> if (atomic_dec_and_test(&hdr->refcnt))
> hdr->completion_ops->completion(hdr);
> return ret;
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 354c53c..6ef108b 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1573,23 +1573,18 @@ pnfs_try_to_write_data(struct nfs_pgio_data *wdata,
> }
>
> static void
> -pnfs_do_multiple_writes(struct nfs_pageio_descriptor *desc, struct list_head *head, int how)
> +pnfs_do_write(struct nfs_pageio_descriptor *desc,
> + struct nfs_pgio_header *hdr, int how)
> {
> - struct nfs_pgio_data *data;
> + struct nfs_pgio_data *data = hdr->data;
> const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
> struct pnfs_layout_segment *lseg = desc->pg_lseg;
> + enum pnfs_try_status trypnfs;
>
> desc->pg_lseg = NULL;
> - while (!list_empty(head)) {
> - enum pnfs_try_status trypnfs;
> -
> - data = list_first_entry(head, struct nfs_pgio_data, list);
> - list_del_init(&data->list);
> -
> - trypnfs = pnfs_try_to_write_data(data, call_ops, lseg, how);
> - if (trypnfs == PNFS_NOT_ATTEMPTED)
> - pnfs_write_through_mds(desc, data);
> - }
> + trypnfs = pnfs_try_to_write_data(data, call_ops, lseg, how);
> + if (trypnfs == PNFS_NOT_ATTEMPTED)
> + pnfs_write_through_mds(desc, data);
> pnfs_put_lseg(lseg);
> }
>
> @@ -1623,7 +1618,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
> pnfs_put_lseg(desc->pg_lseg);
> desc->pg_lseg = NULL;
> } else
> - pnfs_do_multiple_writes(desc, &hdr->rpc_list, desc->pg_ioflags);
> + pnfs_do_write(desc, hdr, desc->pg_ioflags);
> if (atomic_dec_and_test(&hdr->refcnt))
> hdr->completion_ops->completion(hdr);
> return ret;
> @@ -1731,23 +1726,17 @@ pnfs_try_to_read_data(struct nfs_pgio_data *rdata,
> }
>
> static void
> -pnfs_do_multiple_reads(struct nfs_pageio_descriptor *desc, struct list_head *head)
> +pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
> {
> - struct nfs_pgio_data *data;
> + struct nfs_pgio_data *data = hdr->data;
> const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
> struct pnfs_layout_segment *lseg = desc->pg_lseg;
> + enum pnfs_try_status trypnfs;
>
> desc->pg_lseg = NULL;
> - while (!list_empty(head)) {
> - enum pnfs_try_status trypnfs;
> -
> - data = list_first_entry(head, struct nfs_pgio_data, list);
> - list_del_init(&data->list);
> -
> - trypnfs = pnfs_try_to_read_data(data, call_ops, lseg);
> - if (trypnfs == PNFS_NOT_ATTEMPTED)
> - pnfs_read_through_mds(desc, data);
> - }
> + trypnfs = pnfs_try_to_read_data(data, call_ops, lseg);
> + if (trypnfs == PNFS_NOT_ATTEMPTED)
> + pnfs_read_through_mds(desc, data);
> pnfs_put_lseg(lseg);
> }
>
> @@ -1782,7 +1771,7 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
> pnfs_put_lseg(desc->pg_lseg);
> desc->pg_lseg = NULL;
> } else
> - pnfs_do_multiple_reads(desc, &hdr->rpc_list);
> + pnfs_do_read(desc, hdr);
> if (atomic_dec_and_test(&hdr->refcnt))
> hdr->completion_ops->completion(hdr);
> return ret;
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index dacd240..29828c7 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1255,11 +1255,13 @@ enum {
> NFS_IOHDR_NEED_RESCHED,
> };
>
> +struct nfs_pgio_data;
> +
> struct nfs_pgio_header {
> struct inode *inode;
> struct rpc_cred *cred;
> struct list_head pages;
> - struct list_head rpc_list;
> + struct nfs_pgio_data *data;
> atomic_t refcnt;
> struct nfs_page *req;
> struct nfs_writeverf verf; /* Used for writes */


2014-04-25 14:15:09

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
> Support direct requests that span multiple pnfs data servers by
> comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
> Continue to use dreq->verf if the MDS is used / non-pNFS.
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> ---
> fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
> fs/nfs/nfs4filelayout.c | 6 +++
> include/linux/nfs.h | 5 ++-
> include/linux/nfs_xdr.h | 2 +
> 4 files changed, 105 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 2c0e08f..9349933 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
> return atomic_dec_and_test(&dreq->io_count);
> }
>
> +/*
> + * nfs_direct_select_verf - select the right verifier
> + * @dreq - direct request possibly spanning multiple servers
> + * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
> + * @ds_idx - index of data server in data server list, only valid if ds_clp set
> + *
> + * returns the correct verifier to use given the role of the server
> + */
> +static struct nfs_writeverf *
> +nfs_direct_select_verf(struct nfs_direct_req *dreq,
> + struct nfs_client *ds_clp,
> + int ds_idx)
> +{
> + struct nfs_writeverf *verfp = &dreq->verf;
> +
> + if (ds_clp) {
> + /* pNFS is in use, use the DS verf */
> + if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)

The struct pnfs_ds_commit_info is empty if CONFIG_NFS_V4_1=n, so this won't compile.

Anna

> + verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
> + else
> + WARN_ON_ONCE(1);
> + }
> + return verfp;
> +}
> +
> +
> +/*
> + * nfs_direct_set_hdr_verf - set the write/commit verifier
> + * @dreq - direct request possibly spanning multiple servers
> + * @hdr - pageio header to validate against previously seen verfs
> + *
> + * Set the server's (MDS or DS) "seen" verifier
> + */
> +static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
> + struct nfs_pgio_header *hdr)
> +{
> + struct nfs_writeverf *verfp;
> +
> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
> + hdr->data->ds_idx);
> + WARN_ON_ONCE(verfp->committed >= 0);
> + memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
> + WARN_ON_ONCE(verfp->committed < 0);
> +}
> +
> +/*
> + * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
> + * @dreq - direct request possibly spanning multiple servers
> + * @hdr - pageio header to validate against previously seen verf
> + *
> + * set the server's "seen" verf if not initialized.
> + * returns result of comparison between @hdr->verf and the "seen"
> + * verf of the server used by @hdr (DS or MDS)
> + */
> +static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
> + struct nfs_pgio_header *hdr)
> +{
> + struct nfs_writeverf *verfp;
> +
> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
> + hdr->data->ds_idx);
> + if (verfp->committed < 0) {
> + nfs_direct_set_hdr_verf(dreq, hdr);
> + return 0;
> + }
> + return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
> +}
> +
> +/*
> + * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
> + * @dreq - direct request possibly spanning multiple servers
> + * @data - commit data to validate against previously seen verf
> + *
> + * returns result of comparison between @data->verf and the verf of
> + * the server used by @data (DS or MDS)
> + */
> +static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
> + struct nfs_commit_data *data)
> +{
> + struct nfs_writeverf *verfp;
> +
> + verfp = nfs_direct_select_verf(dreq, data->ds_clp,
> + data->ds_commit_index);
> + WARN_ON_ONCE(verfp->committed < 0);
> + return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
> +}
> +
> /**
> * nfs_direct_IO - NFS address space operation for direct I/O
> * @rw: direction (read or write)
> @@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
> kref_get(&dreq->kref);
> init_completion(&dreq->completion);
> INIT_LIST_HEAD(&dreq->mds_cinfo.list);
> + dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
> INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
> spin_lock_init(&dreq->lock);
>
> @@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
> dprintk("NFS: %5u commit failed with error %d.\n",
> data->task.tk_pid, status);
> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
> - } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
> + } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
> dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
> }
> @@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
> bit = NFS_IOHDR_NEED_RESCHED;
> else if (dreq->flags == 0) {
> - memcpy(&dreq->verf, &hdr->verf,
> - sizeof(dreq->verf));
> + nfs_direct_set_hdr_verf(dreq, hdr);
> bit = NFS_IOHDR_NEED_COMMIT;
> dreq->flags = NFS_ODIRECT_DO_COMMIT;
> } else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
> - if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
> - dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
> + if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
> + dreq->flags =
> + NFS_ODIRECT_RESCHED_WRITES;
> bit = NFS_IOHDR_NEED_RESCHED;
> } else
> bit = NFS_IOHDR_NEED_COMMIT;
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index 7a665e0..0ebc521 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
> /* No multipath support. Use first DS */
> atomic_inc(&ds->ds_clp->cl_count);
> data->ds_clp = ds->ds_clp;
> + data->ds_idx = idx;
> fh = nfs4_fl_select_ds_fh(lseg, j);
> if (fh)
> data->args.fh = fh;
> @@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
> data->pgio_done_cb = filelayout_write_done_cb;
> atomic_inc(&ds->ds_clp->cl_count);
> data->ds_clp = ds->ds_clp;
> + data->ds_idx = idx;
> fh = nfs4_fl_select_ds_fh(lseg, j);
> if (fh)
> data->args.fh = fh;
> @@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
> for (i = 0; i < size; i++) {
> INIT_LIST_HEAD(&buckets[i].written);
> INIT_LIST_HEAD(&buckets[i].committing);
> + /* mark direct verifier as unset */
> + buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
> }
>
> spin_lock(cinfo->lock);
> @@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
> &buckets[i].written);
> list_splice(&cinfo->ds->buckets[i].committing,
> &buckets[i].committing);
> + buckets[i].direct_verf.committed =
> + cinfo->ds->buckets[i].direct_verf.committed;
> buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
> buckets[i].clseg = cinfo->ds->buckets[i].clseg;
> }
> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
> index 3e794c1..610af51 100644
> --- a/include/linux/nfs.h
> +++ b/include/linux/nfs.h
> @@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
> enum nfs3_stable_how {
> NFS_UNSTABLE = 0,
> NFS_DATA_SYNC = 1,
> - NFS_FILE_SYNC = 2
> + NFS_FILE_SYNC = 2,
> +
> + /* used by direct.c to mark verf as invalid */
> + NFS_INVALID_STABLE_HOW = -1
> };
> #endif /* _LINUX_NFS_H */
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 29828c7..bb9fb88 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
> struct list_head committing;
> struct pnfs_layout_segment *wlseg;
> struct pnfs_layout_segment *clseg;
> + struct nfs_writeverf direct_verf;
> };
>
> struct pnfs_ds_commit_info {
> @@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
> __u64 mds_offset; /* Filelayout dense stripe */
> struct nfs_page_array pages;
> struct nfs_client *ds_clp; /* pNFS data server */
> + int ds_idx; /* ds index if ds_clp is set */
> };
>
> struct nfs_rw_header {


2014-04-25 15:53:17

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

On 04/25/2014 11:38 AM, Weston Andros Adamson wrote:
> On Apr 25, 2014, at 11:12 AM, Weston Andros Adamson <[email protected]> wrote:
>
>> On Apr 25, 2014, at 10:15 AM, Anna Schumaker <[email protected]> wrote:
>>
>>> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>>>> Support direct requests that span multiple pnfs data servers by
>>>> comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
>>>> Continue to use dreq->verf if the MDS is used / non-pNFS.
>>>>
>>>> Signed-off-by: Weston Andros Adamson <[email protected]>
>>>> ---
>>>> fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>> fs/nfs/nfs4filelayout.c | 6 +++
>>>> include/linux/nfs.h | 5 ++-
>>>> include/linux/nfs_xdr.h | 2 +
>>>> 4 files changed, 105 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>>> index 2c0e08f..9349933 100644
>>>> --- a/fs/nfs/direct.c
>>>> +++ b/fs/nfs/direct.c
>>>> @@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
>>>> return atomic_dec_and_test(&dreq->io_count);
>>>> }
>>>>
>>>> +/*
>>>> + * nfs_direct_select_verf - select the right verifier
>>>> + * @dreq - direct request possibly spanning multiple servers
>>>> + * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
>>>> + * @ds_idx - index of data server in data server list, only valid if ds_clp set
>>>> + *
>>>> + * returns the correct verifier to use given the role of the server
>>>> + */
>>>> +static struct nfs_writeverf *
>>>> +nfs_direct_select_verf(struct nfs_direct_req *dreq,
>>>> + struct nfs_client *ds_clp,
>>>> + int ds_idx)
>>>> +{
>>>> + struct nfs_writeverf *verfp = &dreq->verf;
>>>> +
>>>> + if (ds_clp) {
>>>> + /* pNFS is in use, use the DS verf */
>>>> + if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
>>> The struct pnfs_ds_commit_info is empty if CONFIG_NFS_V4_1=n, so this won't compile.
>>>
>>> Anna
>> Good catch, I�ll iterate through the patchset and test with v4.1 disabled.
>>
>> Time to add some #ifdefs
>>
>> -dros
> That was the only problem I found with 4.1 disabled.
>
> Fixed and pushed.

Thanks! I'm now getting a "defined but not used" error if just v2 is enabled:

NFS_V4_2=n PNFS_OBJLAYOUT=n PNFS_BLOCK=n PNFS_FILE_LAYOUT=n NFS_V4_1=n NFS_V4=n NFS_V3=n NFS_V2=y

fs/nfs/direct.c:189:12: error: ‘nfs_direct_cmp_commit_data_verf’ defined but not used [-Werror=unused-function]
static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,

Anna

>
> -dros
>
>>>> + verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
>>>> + else
>>>> + WARN_ON_ONCE(1);
>>>> + }
>>>> + return verfp;
>>>> +}
>>>> +
>>>> +
>>>> +/*
>>>> + * nfs_direct_set_hdr_verf - set the write/commit verifier
>>>> + * @dreq - direct request possibly spanning multiple servers
>>>> + * @hdr - pageio header to validate against previously seen verfs
>>>> + *
>>>> + * Set the server's (MDS or DS) "seen" verifier
>>>> + */
>>>> +static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
>>>> + struct nfs_pgio_header *hdr)
>>>> +{
>>>> + struct nfs_writeverf *verfp;
>>>> +
>>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>>> + hdr->data->ds_idx);
>>>> + WARN_ON_ONCE(verfp->committed >= 0);
>>>> + memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>>> + WARN_ON_ONCE(verfp->committed < 0);
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
>>>> + * @dreq - direct request possibly spanning multiple servers
>>>> + * @hdr - pageio header to validate against previously seen verf
>>>> + *
>>>> + * set the server's "seen" verf if not initialized.
>>>> + * returns result of comparison between @hdr->verf and the "seen"
>>>> + * verf of the server used by @hdr (DS or MDS)
>>>> + */
>>>> +static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
>>>> + struct nfs_pgio_header *hdr)
>>>> +{
>>>> + struct nfs_writeverf *verfp;
>>>> +
>>>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>>>> + hdr->data->ds_idx);
>>>> + if (verfp->committed < 0) {
>>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>>> + return 0;
>>>> + }
>>>> + return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>>>> +}
>>>> +
>>>> +/*
>>>> + * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
>>>> + * @dreq - direct request possibly spanning multiple servers
>>>> + * @data - commit data to validate against previously seen verf
>>>> + *
>>>> + * returns result of comparison between @data->verf and the verf of
>>>> + * the server used by @data (DS or MDS)
>>>> + */
>>>> +static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>>>> + struct nfs_commit_data *data)
>>>> +{
>>>> + struct nfs_writeverf *verfp;
>>>> +
>>>> + verfp = nfs_direct_select_verf(dreq, data->ds_clp,
>>>> + data->ds_commit_index);
>>>> + WARN_ON_ONCE(verfp->committed < 0);
>>>> + return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
>>>> +}
>>>> +
>>>> /**
>>>> * nfs_direct_IO - NFS address space operation for direct I/O
>>>> * @rw: direction (read or write)
>>>> @@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
>>>> kref_get(&dreq->kref);
>>>> init_completion(&dreq->completion);
>>>> INIT_LIST_HEAD(&dreq->mds_cinfo.list);
>>>> + dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
>>>> INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
>>>> spin_lock_init(&dreq->lock);
>>>>
>>>> @@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
>>>> dprintk("NFS: %5u commit failed with error %d.\n",
>>>> data->task.tk_pid, status);
>>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>> - } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
>>>> + } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
>>>> dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
>>>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>> }
>>>> @@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>>>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>>>> bit = NFS_IOHDR_NEED_RESCHED;
>>>> else if (dreq->flags == 0) {
>>>> - memcpy(&dreq->verf, &hdr->verf,
>>>> - sizeof(dreq->verf));
>>>> + nfs_direct_set_hdr_verf(dreq, hdr);
>>>> bit = NFS_IOHDR_NEED_COMMIT;
>>>> dreq->flags = NFS_ODIRECT_DO_COMMIT;
>>>> } else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
>>>> - if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
>>>> - dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>>>> + if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
>>>> + dreq->flags =
>>>> + NFS_ODIRECT_RESCHED_WRITES;
>>>> bit = NFS_IOHDR_NEED_RESCHED;
>>>> } else
>>>> bit = NFS_IOHDR_NEED_COMMIT;
>>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>>>> index 7a665e0..0ebc521 100644
>>>> --- a/fs/nfs/nfs4filelayout.c
>>>> +++ b/fs/nfs/nfs4filelayout.c
>>>> @@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
>>>> /* No multipath support. Use first DS */
>>>> atomic_inc(&ds->ds_clp->cl_count);
>>>> data->ds_clp = ds->ds_clp;
>>>> + data->ds_idx = idx;
>>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>>> if (fh)
>>>> data->args.fh = fh;
>>>> @@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
>>>> data->pgio_done_cb = filelayout_write_done_cb;
>>>> atomic_inc(&ds->ds_clp->cl_count);
>>>> data->ds_clp = ds->ds_clp;
>>>> + data->ds_idx = idx;
>>>> fh = nfs4_fl_select_ds_fh(lseg, j);
>>>> if (fh)
>>>> data->args.fh = fh;
>>>> @@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>>> for (i = 0; i < size; i++) {
>>>> INIT_LIST_HEAD(&buckets[i].written);
>>>> INIT_LIST_HEAD(&buckets[i].committing);
>>>> + /* mark direct verifier as unset */
>>>> + buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
>>>> }
>>>>
>>>> spin_lock(cinfo->lock);
>>>> @@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>>>> &buckets[i].written);
>>>> list_splice(&cinfo->ds->buckets[i].committing,
>>>> &buckets[i].committing);
>>>> + buckets[i].direct_verf.committed =
>>>> + cinfo->ds->buckets[i].direct_verf.committed;
>>>> buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
>>>> buckets[i].clseg = cinfo->ds->buckets[i].clseg;
>>>> }
>>>> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
>>>> index 3e794c1..610af51 100644
>>>> --- a/include/linux/nfs.h
>>>> +++ b/include/linux/nfs.h
>>>> @@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
>>>> enum nfs3_stable_how {
>>>> NFS_UNSTABLE = 0,
>>>> NFS_DATA_SYNC = 1,
>>>> - NFS_FILE_SYNC = 2
>>>> + NFS_FILE_SYNC = 2,
>>>> +
>>>> + /* used by direct.c to mark verf as invalid */
>>>> + NFS_INVALID_STABLE_HOW = -1
>>>> };
>>>> #endif /* _LINUX_NFS_H */
>>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>>> index 29828c7..bb9fb88 100644
>>>> --- a/include/linux/nfs_xdr.h
>>>> +++ b/include/linux/nfs_xdr.h
>>>> @@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
>>>> struct list_head committing;
>>>> struct pnfs_layout_segment *wlseg;
>>>> struct pnfs_layout_segment *clseg;
>>>> + struct nfs_writeverf direct_verf;
>>>> };
>>>>
>>>> struct pnfs_ds_commit_info {
>>>> @@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
>>>> __u64 mds_offset; /* Filelayout dense stripe */
>>>> struct nfs_page_array pages;
>>>> struct nfs_client *ds_clp; /* pNFS data server */
>>>> + int ds_idx; /* ds index if ds_clp is set */
>>>> };
>>>>
>>>> struct nfs_rw_header {


2014-04-25 13:58:27

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 14/18 v2] nfs: remove list of [rw]data from pgio header

good point.

reworded and pushed.

-dros



On Apr 25, 2014, at 9:56 AM, Anna Schumaker <[email protected]> wrote:

> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>> Since the ability to split pages into subpage requests has been added,
>> nfs_pgio_header->rpc_list only ever has one wdata/rdata.
>
> [rw]data structs don't exist anymore. Can you update the patch description?
>
> Anna
>
>>
>> Signed-off-by: Weston Andros Adamson <[email protected]>
>> ---
>> fs/nfs/pageio.c | 35 ++++-------------------------------
>> fs/nfs/pnfs.c | 41 +++++++++++++++--------------------------
>> include/linux/nfs_xdr.h | 4 +++-
>> 3 files changed, 22 insertions(+), 58 deletions(-)
>>
>> diff --git a/fs/nfs/pageio.c b/fs/nfs/pageio.c
>> index 1899977..f578cc9 100644
>> --- a/fs/nfs/pageio.c
>> +++ b/fs/nfs/pageio.c
>> @@ -29,7 +29,6 @@ struct nfs_rw_header *nfs_rw_header_alloc(const struct nfs_rw_ops *ops)
>> struct nfs_pgio_header *hdr = &header->header;
>>
>> INIT_LIST_HEAD(&hdr->pages);
>> - INIT_LIST_HEAD(&hdr->rpc_list);
>> spin_lock_init(&hdr->lock);
>> atomic_set(&hdr->refcnt, 0);
>> hdr->rw_ops = ops;
>> @@ -182,37 +181,12 @@ static int nfs_do_pgio(struct nfs_pgio_data *data,
>> return nfs_initiate_pgio(NFS_CLIENT(inode), data, call_ops, how, 0);
>> }
>>
>> -int nfs_do_multiple_pgios(struct list_head *head,
>> - const struct rpc_call_ops *call_ops,
>> - int how)
>> -{
>> - struct nfs_pgio_data *data;
>> - int ret = 0;
>> -
>> - while (!list_empty(head)) {
>> - int ret2;
>> -
>> - data = list_first_entry(head, struct nfs_pgio_data, list);
>> - list_del_init(&data->list);
>> -
>> - ret2 = nfs_do_pgio(data, call_ops, how);
>> - if (ret == 0)
>> - ret = ret2;
>> - }
>> - return ret;
>> -}
>> -
>> static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
>> struct nfs_pgio_header *hdr)
>> {
>> - struct nfs_pgio_data *data;
>> -
>> set_bit(NFS_IOHDR_REDO, &hdr->flags);
>> - while (!list_empty(&hdr->rpc_list)) {
>> - data = list_first_entry(&hdr->rpc_list, struct nfs_pgio_data, list);
>> - list_del(&data->list);
>> - nfs_pgio_data_release(data);
>> - }
>> + nfs_pgio_data_release(hdr->data);
>> + hdr->data = NULL;
>> desc->pg_completion_ops->error_cleanup(&desc->pg_list);
>> return -ENOMEM;
>> }
>> @@ -254,7 +228,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>>
>> /* Set up the argument struct */
>> nfs_pgio_rpcsetup(data, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
>> - list_add(&data->list, &hdr->rpc_list);
>> + hdr->data = data;
>> desc->pg_rpc_callops = &nfs_pgio_common_ops;
>> return 0;
>> }
>> @@ -276,8 +250,7 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>> atomic_inc(&hdr->refcnt);
>> ret = nfs_generic_pgio(desc, hdr);
>> if (ret == 0)
>> - ret = nfs_do_multiple_pgios(&hdr->rpc_list,
>> - desc->pg_rpc_callops, 0);
>> + ret = nfs_do_pgio(hdr->data, desc->pg_rpc_callops, 0);
>> if (atomic_dec_and_test(&hdr->refcnt))
>> hdr->completion_ops->completion(hdr);
>> return ret;
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 354c53c..6ef108b 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1573,23 +1573,18 @@ pnfs_try_to_write_data(struct nfs_pgio_data *wdata,
>> }
>>
>> static void
>> -pnfs_do_multiple_writes(struct nfs_pageio_descriptor *desc, struct list_head *head, int how)
>> +pnfs_do_write(struct nfs_pageio_descriptor *desc,
>> + struct nfs_pgio_header *hdr, int how)
>> {
>> - struct nfs_pgio_data *data;
>> + struct nfs_pgio_data *data = hdr->data;
>> const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
>> struct pnfs_layout_segment *lseg = desc->pg_lseg;
>> + enum pnfs_try_status trypnfs;
>>
>> desc->pg_lseg = NULL;
>> - while (!list_empty(head)) {
>> - enum pnfs_try_status trypnfs;
>> -
>> - data = list_first_entry(head, struct nfs_pgio_data, list);
>> - list_del_init(&data->list);
>> -
>> - trypnfs = pnfs_try_to_write_data(data, call_ops, lseg, how);
>> - if (trypnfs == PNFS_NOT_ATTEMPTED)
>> - pnfs_write_through_mds(desc, data);
>> - }
>> + trypnfs = pnfs_try_to_write_data(data, call_ops, lseg, how);
>> + if (trypnfs == PNFS_NOT_ATTEMPTED)
>> + pnfs_write_through_mds(desc, data);
>> pnfs_put_lseg(lseg);
>> }
>>
>> @@ -1623,7 +1618,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
>> pnfs_put_lseg(desc->pg_lseg);
>> desc->pg_lseg = NULL;
>> } else
>> - pnfs_do_multiple_writes(desc, &hdr->rpc_list, desc->pg_ioflags);
>> + pnfs_do_write(desc, hdr, desc->pg_ioflags);
>> if (atomic_dec_and_test(&hdr->refcnt))
>> hdr->completion_ops->completion(hdr);
>> return ret;
>> @@ -1731,23 +1726,17 @@ pnfs_try_to_read_data(struct nfs_pgio_data *rdata,
>> }
>>
>> static void
>> -pnfs_do_multiple_reads(struct nfs_pageio_descriptor *desc, struct list_head *head)
>> +pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
>> {
>> - struct nfs_pgio_data *data;
>> + struct nfs_pgio_data *data = hdr->data;
>> const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
>> struct pnfs_layout_segment *lseg = desc->pg_lseg;
>> + enum pnfs_try_status trypnfs;
>>
>> desc->pg_lseg = NULL;
>> - while (!list_empty(head)) {
>> - enum pnfs_try_status trypnfs;
>> -
>> - data = list_first_entry(head, struct nfs_pgio_data, list);
>> - list_del_init(&data->list);
>> -
>> - trypnfs = pnfs_try_to_read_data(data, call_ops, lseg);
>> - if (trypnfs == PNFS_NOT_ATTEMPTED)
>> - pnfs_read_through_mds(desc, data);
>> - }
>> + trypnfs = pnfs_try_to_read_data(data, call_ops, lseg);
>> + if (trypnfs == PNFS_NOT_ATTEMPTED)
>> + pnfs_read_through_mds(desc, data);
>> pnfs_put_lseg(lseg);
>> }
>>
>> @@ -1782,7 +1771,7 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
>> pnfs_put_lseg(desc->pg_lseg);
>> desc->pg_lseg = NULL;
>> } else
>> - pnfs_do_multiple_reads(desc, &hdr->rpc_list);
>> + pnfs_do_read(desc, hdr);
>> if (atomic_dec_and_test(&hdr->refcnt))
>> hdr->completion_ops->completion(hdr);
>> return ret;
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index dacd240..29828c7 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1255,11 +1255,13 @@ enum {
>> NFS_IOHDR_NEED_RESCHED,
>> };
>>
>> +struct nfs_pgio_data;
>> +
>> struct nfs_pgio_header {
>> struct inode *inode;
>> struct rpc_cred *cred;
>> struct list_head pages;
>> - struct list_head rpc_list;
>> + struct nfs_pgio_data *data;
>> atomic_t refcnt;
>> struct nfs_page *req;
>> struct nfs_writeverf verf; /* Used for writes */
>


2014-04-24 18:15:55

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 12/18 v2] nfs: chain calls to pg_test

Now that pg_test can change the size of the request (by returning a non-zero
size smaller than the request), pg_test functions that call other
pg_test functions must return the minimum of the result - or 0 if any fail.

Also clean up the logic of some pg_test functions so that all checks are
for contitions where coalescing is not possible.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 27 ++++++++++++++-------------
fs/nfs/objlayout/objio_osd.c | 9 ++++++---
fs/nfs/pnfs.c | 15 ++++++++++-----
3 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 9cea935..7a665e0 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -928,26 +928,27 @@ static size_t
filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
+ unsigned int size;
u64 p_stripe, r_stripe;
u32 stripe_unit;

- if (!pnfs_generic_pg_test(pgio, prev, req) ||
- !nfs_generic_pg_test(pgio, prev, req))
+ /* calls nfs_generic_pg_test */
+ size = pnfs_generic_pg_test(pgio, prev, req);
+ if (!size)
return 0;

- if (!prev)
- return req->wb_bytes;
+ if (prev) {
+ p_stripe = (u64)req_offset(prev);
+ r_stripe = (u64)req_offset(req);
+ stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;

- p_stripe = (u64)req_offset(prev);
- r_stripe = (u64)req_offset(req);
- stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
+ do_div(p_stripe, stripe_unit);
+ do_div(r_stripe, stripe_unit);

- do_div(p_stripe, stripe_unit);
- do_div(r_stripe, stripe_unit);
-
- if (p_stripe == r_stripe)
- return req->wb_bytes;
- return 0;
+ if (p_stripe != r_stripe)
+ return 0;
+ }
+ return min(size, req->wb_bytes);
}

static void
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 71b9c69..6113207 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -571,12 +571,15 @@ int objio_write_pagelist(struct nfs_pgio_data *wdata, int how)
static size_t objio_pg_test(struct nfs_pageio_descriptor *pgio,
struct nfs_page *prev, struct nfs_page *req)
{
- if (!pnfs_generic_pg_test(pgio, prev, req) ||
- pgio->pg_count + req->wb_bytes >
+ unsigned int size;
+
+ size = pnfs_generic_pg_test(pgio, prev, req);
+
+ if (!size || pgio->pg_count + req->wb_bytes >
(unsigned long)pgio->pg_layout_private)
return 0;

- return req->wb_bytes;
+ return min(size, req->wb_bytes);
}

static void objio_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index de6eb16..354c53c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1442,8 +1442,12 @@ size_t
pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
struct nfs_page *req)
{
- if (pgio->pg_lseg == NULL)
- return nfs_generic_pg_test(pgio, prev, req);
+ unsigned int size;
+
+ size = nfs_generic_pg_test(pgio, prev, req);
+
+ if (!size)
+ return 0;

/*
* Test if a nfs_page is fully contained in the pnfs_layout_range.
@@ -1459,10 +1463,11 @@ pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
* first byte that lies outside the pnfs_layout_range. FIXME?
*
*/
- if (req_offset(req) < end_offset(pgio->pg_lseg->pls_range.offset,
+ if (req_offset(req) >= end_offset(pgio->pg_lseg->pls_range.offset,
pgio->pg_lseg->pls_range.length))
- return req->wb_bytes;
- return 0;
+ return 0;
+
+ return min(size, req->wb_bytes);
}
EXPORT_SYMBOL_GPL(pnfs_generic_pg_test);

--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:45

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 06/18 v2] nfs: add support for multiple nfs reqs per page

Add "page groups" - a circular list of nfs requests (struct nfs_page)
that all reference the same page. This gives nfs read and write paths
the ability to account for sub-page regions independently. This
somewhat follows the design of struct buffer_head's sub-page
accounting.

Only "head" requests are ever added/removed from the inode list in
the buffered write path. "head" and "sub" requests are treated the
same through the read path and the rest of the write/commit path.
Requests are given an extra reference across the life of the list.

Page groups are never rejoined after being split. If the read/write
request fails and the client falls back to another path (ie revert
to MDS in PNFS case), the already split requests are pushed through
the recoalescing code again, which may split them further and then
coalesce them into properly sized requests on the wire. Fragmentation
shouldn't be a problem with the current design, because we flush all
requests in page group when a non-contiguous request is added, so
the only time resplitting should occur is on a resend of a read or
write.

This patch lays the groundwork for sub-page splitting, but does not
actually do any splitting. For now all page groups have one request
as pg_test functions don't yet split pages. There are several related
patches that are needed support multiple requests per page group.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/direct.c | 7 +-
fs/nfs/pagelist.c | 224 ++++++++++++++++++++++++++++++++++++++++++++---
fs/nfs/read.c | 4 +-
fs/nfs/write.c | 13 ++-
include/linux/nfs_page.h | 13 ++-
5 files changed, 240 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1dd8c62..2c0e08f 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -380,7 +380,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
/* XXX do we need to do the eof zeroing found in async_filler? */
- req = nfs_create_request(dreq->ctx, pagevec[i],
+ req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
pgbase, req_len);
if (IS_ERR(req)) {
result = PTR_ERR(req);
@@ -749,7 +749,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);

- req = nfs_create_request(dreq->ctx, pagevec[i],
+ req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
pgbase, req_len);
if (IS_ERR(req)) {
result = PTR_ERR(req);
@@ -827,6 +827,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
spin_unlock(&dreq->lock);

while (!list_empty(&hdr->pages)) {
+ bool do_destroy = true;
+
req = nfs_list_entry(hdr->pages.next);
nfs_list_remove_request(req);
switch (bit) {
@@ -834,6 +836,7 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
case NFS_IOHDR_NEED_COMMIT:
kref_get(&req->wb_kref);
nfs_mark_request_commit(req, hdr->lseg, &cinfo);
+ do_destroy = false;
}
nfs_unlock_and_release_request(req);
}
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b120728..d57d675 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -26,6 +26,8 @@

static struct kmem_cache *nfs_page_cachep;

+static void nfs_free_request(struct nfs_page *);
+
bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
{
p->npages = pagecount;
@@ -133,10 +135,151 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
return __nfs_iocounter_wait(c);
}

+/*
+ * nfs_page_group_lock - lock the head of the page group
+ * @req - request in group that is to be locked
+ *
+ * this lock must be held if modifying the page group list
+ */
+void
+nfs_page_group_lock(struct nfs_page *req)
+{
+ struct nfs_page *head = req->wb_head;
+ int err = -EAGAIN;
+
+ WARN_ON_ONCE(head != head->wb_head);
+
+ while (err)
+ err = wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
+ nfs_wait_bit_killable, TASK_KILLABLE);
+}
+
+/*
+ * nfs_page_group_unlock - unlock the head of the page group
+ * @req - request in group that is to be unlocked
+ */
+void
+nfs_page_group_unlock(struct nfs_page *req)
+{
+ struct nfs_page *head = req->wb_head;
+
+ WARN_ON_ONCE(head != head->wb_head);
+
+ smp_mb__before_clear_bit();
+ clear_bit(PG_HEADLOCK, &head->wb_flags);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&head->wb_flags, PG_HEADLOCK);
+}
+
+/*
+ * nfs_page_group_sync_on_bit_locked
+ *
+ * must be called with page group lock held
+ */
+static bool
+nfs_page_group_sync_on_bit_locked(struct nfs_page *req, unsigned int bit)
+{
+ struct nfs_page *head = req->wb_head;
+ struct nfs_page *tmp;
+
+ WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_flags));
+ WARN_ON_ONCE(test_and_set_bit(bit, &req->wb_flags));
+
+ tmp = req->wb_this_page;
+ while (tmp != req) {
+ if (!test_bit(bit, &tmp->wb_flags))
+ return false;
+ tmp = tmp->wb_this_page;
+ }
+
+ /* true! reset all bits */
+ tmp = req;
+ do {
+ clear_bit(bit, &tmp->wb_flags);
+ tmp = tmp->wb_this_page;
+ } while (tmp != req);
+
+ return true;
+}
+
+/*
+ * nfs_page_group_sync_on_bit - set bit on current request, but only
+ * return true if the bit is set for all requests in page group
+ * @req - request in page group
+ * @bit - PG_* bit that is used to sync page group
+ */
+bool nfs_page_group_sync_on_bit(struct nfs_page *req, unsigned int bit)
+{
+ bool ret;
+
+ nfs_page_group_lock(req);
+ ret = nfs_page_group_sync_on_bit_locked(req, bit);
+ nfs_page_group_unlock(req);
+
+ return ret;
+}
+
+/*
+ * nfs_page_group_init - Initialize the page group linkage for @req
+ * @req - a new nfs request
+ * @prev - the previous request in page group, or NULL if @req is the first
+ * or only request in the group (the head).
+ */
+static inline void
+nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
+{
+ WARN_ON_ONCE(prev == req);
+
+ if (!prev) {
+ req->wb_head = req;
+ req->wb_this_page = req;
+ } else {
+ WARN_ON_ONCE(prev->wb_this_page != prev->wb_head);
+ WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags));
+ req->wb_head = prev->wb_head;
+ req->wb_this_page = prev->wb_this_page;
+ prev->wb_this_page = req;
+
+ /* grab extra ref if head request has extra ref from
+ * the write/commit path to handle handoff between write
+ * and commit lists */
+ if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags))
+ kref_get(&req->wb_kref);
+ }
+}
+
+/*
+ * nfs_page_group_destroy - sync the destruction of page groups
+ * @req - request that no longer needs the page group
+ *
+ * releases the page group reference from each member once all
+ * members have called this function.
+ */
+static void
+nfs_page_group_destroy(struct kref *kref)
+{
+ struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
+ struct nfs_page *tmp, *next;
+
+ if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
+ return;
+
+ tmp = req;
+ do {
+ next = tmp->wb_this_page;
+ /* unlink and free */
+ tmp->wb_this_page = tmp;
+ tmp->wb_head = tmp;
+ nfs_free_request(tmp);
+ tmp = next;
+ } while (tmp != req);
+}
+
/**
* nfs_create_request - Create an NFS read/write request.
* @ctx: open context to use
* @page: page to write
+ * @last: last nfs request created for this page group or NULL if head
* @offset: starting offset within the page for the write
* @count: number of bytes to read/write
*
@@ -146,7 +289,8 @@ nfs_iocounter_wait(struct nfs_io_counter *c)
*/
struct nfs_page *
nfs_create_request(struct nfs_open_context *ctx, struct page *page,
- unsigned int offset, unsigned int count)
+ struct nfs_page *last, unsigned int offset,
+ unsigned int count)
{
struct nfs_page *req;
struct nfs_lock_context *l_ctx;
@@ -178,6 +322,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
req->wb_bytes = count;
req->wb_context = get_nfs_open_context(ctx);
kref_init(&req->wb_kref);
+ nfs_page_group_init(req, last);
return req;
}

@@ -235,16 +380,22 @@ static void nfs_clear_request(struct nfs_page *req)
}
}

-
/**
* nfs_release_request - Release the count on an NFS read/write request
* @req: request to release
*
* Note: Should never be called with the spinlock held!
*/
-static void nfs_free_request(struct kref *kref)
+static void nfs_free_request(struct nfs_page *req)
{
- struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
+ WARN_ON_ONCE(req->wb_this_page != req);
+
+ /* extra debug: make sure no sync bits are still set */
+ WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
+ WARN_ON_ONCE(test_bit(PG_UNLOCKPAGE, &req->wb_flags));
+ WARN_ON_ONCE(test_bit(PG_UPTODATE, &req->wb_flags));
+ WARN_ON_ONCE(test_bit(PG_WB_END, &req->wb_flags));
+ WARN_ON_ONCE(test_bit(PG_REMOVE, &req->wb_flags));

/* Release struct file and open context */
nfs_clear_request(req);
@@ -253,7 +404,7 @@ static void nfs_free_request(struct kref *kref)

void nfs_release_request(struct nfs_page *req)
{
- kref_put(&req->wb_kref, nfs_free_request);
+ kref_put(&req->wb_kref, nfs_page_group_destroy);
}

static int nfs_wait_bit_uninterruptible(void *word)
@@ -441,21 +592,66 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
* @desc: destination io descriptor
* @req: request
*
+ * This may split a request into subrequests which are all part of the
+ * same page group.
+ *
* Returns true if the request 'req' was successfully coalesced into the
* existing list of pages 'desc'.
*/
static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
- while (!nfs_pageio_do_add_request(desc, req)) {
- desc->pg_moreio = 1;
- nfs_pageio_doio(desc);
- if (desc->pg_error < 0)
- return 0;
- desc->pg_moreio = 0;
- if (desc->pg_recoalesce)
- return 0;
- }
+ struct nfs_page *subreq;
+ unsigned int bytes_left = 0;
+ unsigned int offset, pgbase;
+
+ nfs_page_group_lock(req);
+
+ subreq = req;
+ bytes_left = subreq->wb_bytes;
+ offset = subreq->wb_offset;
+ pgbase = subreq->wb_pgbase;
+
+ do {
+ if (!nfs_pageio_do_add_request(desc, subreq)) {
+ /* make sure pg_test call(s) did nothing */
+ WARN_ON_ONCE(subreq->wb_bytes != bytes_left);
+ WARN_ON_ONCE(subreq->wb_offset != offset);
+ WARN_ON_ONCE(subreq->wb_pgbase != pgbase);
+
+ nfs_page_group_unlock(req);
+ desc->pg_moreio = 1;
+ nfs_pageio_doio(desc);
+ if (desc->pg_error < 0)
+ return 0;
+ desc->pg_moreio = 0;
+ if (desc->pg_recoalesce)
+ return 0;
+ /* retry add_request for this subreq */
+ nfs_page_group_lock(req);
+ continue;
+ }
+
+ /* check for buggy pg_test call(s) */
+ WARN_ON_ONCE(subreq->wb_bytes + subreq->wb_pgbase > PAGE_SIZE);
+ WARN_ON_ONCE(subreq->wb_bytes > bytes_left);
+ WARN_ON_ONCE(subreq->wb_bytes == 0);
+
+ bytes_left -= subreq->wb_bytes;
+ offset += subreq->wb_bytes;
+ pgbase += subreq->wb_bytes;
+
+ if (bytes_left) {
+ subreq = nfs_create_request(req->wb_context,
+ req->wb_page,
+ subreq, pgbase, bytes_left);
+ nfs_lock_request(subreq);
+ subreq->wb_offset = offset;
+ subreq->wb_index = req->wb_index;
+ }
+ } while (bytes_left > 0);
+
+ nfs_page_group_unlock(req);
return 1;
}

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index a48b909..f814e8a 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -85,7 +85,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);
- new = nfs_create_request(ctx, page, 0, len);
+ new = nfs_create_request(ctx, page, NULL, 0, len);
if (IS_ERR(new)) {
unlock_page(page);
return PTR_ERR(new);
@@ -311,7 +311,7 @@ readpage_async_filler(void *data, struct page *page)
if (len == 0)
return nfs_return_empty_page(page);

- new = nfs_create_request(desc->ctx, page, 0, len);
+ new = nfs_create_request(desc->ctx, page, NULL, 0, len);
if (IS_ERR(new))
goto out_error;

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5f1c965..a65755a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -367,6 +367,8 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
{
struct nfs_inode *nfsi = NFS_I(inode);

+ WARN_ON_ONCE(req->wb_this_page != req);
+
/* Lock the request! */
nfs_lock_request(req);

@@ -383,6 +385,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
set_page_private(req->wb_page, (unsigned long)req);
}
nfsi->npages++;
+ set_bit(PG_INODE_REF, &req->wb_flags);
kref_get(&req->wb_kref);
spin_unlock(&inode->i_lock);
}
@@ -567,6 +570,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
{
struct nfs_commit_info cinfo;
unsigned long bytes = 0;
+ bool do_destroy;

if (test_bit(NFS_IOHDR_REDO, &hdr->flags))
goto out;
@@ -596,6 +600,7 @@ remove_req:
next:
nfs_unlock_request(req);
nfs_end_page_writeback(req->wb_page);
+ do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags);
nfs_release_request(req);
}
out:
@@ -700,6 +705,10 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
if (req == NULL)
goto out_unlock;

+ /* should be handled by nfs_flush_incompatible */
+ WARN_ON_ONCE(req->wb_head != req);
+ WARN_ON_ONCE(req->wb_this_page != req);
+
rqend = req->wb_offset + req->wb_bytes;
/*
* Tell the caller to flush out the request if
@@ -761,7 +770,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
req = nfs_try_to_update_request(inode, page, offset, bytes);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, page, offset, bytes);
+ req = nfs_create_request(ctx, page, NULL, offset, bytes);
if (IS_ERR(req))
goto out;
nfs_inode_add_request(inode, req);
@@ -805,6 +814,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
return 0;
l_ctx = req->wb_lock_context;
do_flush = req->wb_page != page || req->wb_context != ctx;
+ /* for now, flush if more than 1 request in page_group */
+ do_flush |= req->wb_this_page != req;
if (l_ctx && ctx->dentry->d_inode->i_flock != NULL) {
do_flush |= l_ctx->lockowner.l_owner != current->files
|| l_ctx->lockowner.l_pid != current->tgid;
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 40faddd..b99deb7 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -26,6 +26,9 @@ enum {
PG_MAPPED, /* page private set for buffered io */
PG_CLEAN, /* write succeeded */
PG_COMMIT_TO_DS, /* used by pnfs layouts */
+ PG_INODE_REF, /* extra ref held by inode (head req only) */
+ PG_HEADLOCK, /* page group lock of wb_head */
+ PG_TEARDOWN, /* page group sync for destroy */
};

struct nfs_inode;
@@ -41,6 +44,8 @@ struct nfs_page {
struct kref wb_kref; /* reference count */
unsigned long wb_flags;
struct nfs_write_verifier wb_verf; /* Commit cookie */
+ struct nfs_page *wb_this_page; /* list of reqs for this page */
+ struct nfs_page *wb_head; /* head pointer for req list */
};

struct nfs_pageio_descriptor;
@@ -87,9 +92,10 @@ struct nfs_pageio_descriptor {

extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
struct page *page,
+ struct nfs_page *last,
unsigned int offset,
unsigned int count);
-extern void nfs_release_request(struct nfs_page *req);
+extern void nfs_release_request(struct nfs_page *);


extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
@@ -108,7 +114,10 @@ extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *req);
extern int nfs_wait_on_request(struct nfs_page *);
extern void nfs_unlock_request(struct nfs_page *req);
-extern void nfs_unlock_and_release_request(struct nfs_page *req);
+extern void nfs_unlock_and_release_request(struct nfs_page *);
+extern void nfs_page_group_lock(struct nfs_page *);
+extern void nfs_page_group_unlock(struct nfs_page *);
+extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);

/*
* Lock the page of an asynchronous request
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:43

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 05/18 v2] nfs: call nfs_can_coalesce_requests for every req

Call nfs_can_coalesce_requests for every request, even the first one.
This is needed for future patches to give pg_test a way to inform
add_request to reduce the size of the request.

Now @prev can be null in nfs_can_coalesce_requests and pg_test functions.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 3 +++
fs/nfs/pagelist.c | 34 +++++++++++++++++++---------------
2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index ba9a9aa..9319427 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -929,6 +929,9 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
!nfs_generic_pg_test(pgio, prev, req))
return 0;

+ if (!prev)
+ return req->wb_bytes;
+
p_stripe = (u64)req_offset(prev);
r_stripe = (u64)req_offset(req);
stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 4cd7a2a..b120728 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -289,6 +289,8 @@ nfs_wait_on_request(struct nfs_page *req)
size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev, struct nfs_page *req)
{
+ if (!prev)
+ return req->wb_bytes;
/*
* FIXME: ideally we should be able to coalesce all requests
* that are not block boundary aligned, but currently this
@@ -370,17 +372,20 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
{
size_t size;

- if (!nfs_match_open_context(req->wb_context, prev->wb_context))
- return false;
- if (req->wb_context->dentry->d_inode->i_flock != NULL &&
- !nfs_match_lock_context(req->wb_lock_context, prev->wb_lock_context))
- return false;
- if (req->wb_pgbase != 0)
- return false;
- if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
- return false;
- if (req_offset(req) != req_offset(prev) + prev->wb_bytes)
- return false;
+ if (prev) {
+ if (!nfs_match_open_context(req->wb_context, prev->wb_context))
+ return false;
+ if (req->wb_context->dentry->d_inode->i_flock != NULL &&
+ !nfs_match_lock_context(req->wb_lock_context,
+ prev->wb_lock_context))
+ return false;
+ if (req->wb_pgbase != 0)
+ return false;
+ if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
+ return false;
+ if (req_offset(req) != req_offset(prev) + prev->wb_bytes)
+ return false;
+ }
size = pgio->pg_ops->pg_test(pgio, prev, req);
WARN_ON_ONCE(size && size != req->wb_bytes);
return size > 0;
@@ -397,17 +402,16 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
+ struct nfs_page *prev = NULL;
if (desc->pg_count != 0) {
- struct nfs_page *prev;
-
prev = nfs_list_entry(desc->pg_list.prev);
- if (!nfs_can_coalesce_requests(prev, req, desc))
- return 0;
} else {
if (desc->pg_ops->pg_init)
desc->pg_ops->pg_init(desc, req);
desc->pg_base = req->wb_pgbase;
}
+ if (!nfs_can_coalesce_requests(prev, req, desc))
+ return 0;
nfs_list_remove_request(req);
nfs_list_add_request(req, &desc->pg_list);
desc->pg_count += req->wb_bytes;
--
1.8.5.2 (Apple Git-48)


2014-04-25 15:12:54

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 15/18 v2] pnfs: support multiple verfs per direct req

On Apr 25, 2014, at 10:15 AM, Anna Schumaker <[email protected]> wrote:

> On 04/24/2014 02:15 PM, Weston Andros Adamson wrote:
>> Support direct requests that span multiple pnfs data servers by
>> comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
>> Continue to use dreq->verf if the MDS is used / non-pNFS.
>>
>> Signed-off-by: Weston Andros Adamson <[email protected]>
>> ---
>> fs/nfs/direct.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---
>> fs/nfs/nfs4filelayout.c | 6 +++
>> include/linux/nfs.h | 5 ++-
>> include/linux/nfs_xdr.h | 2 +
>> 4 files changed, 105 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index 2c0e08f..9349933 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -108,6 +108,93 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
>> return atomic_dec_and_test(&dreq->io_count);
>> }
>>
>> +/*
>> + * nfs_direct_select_verf - select the right verifier
>> + * @dreq - direct request possibly spanning multiple servers
>> + * @ds_clp - nfs_client of data server or NULL if MDS / non-pnfs
>> + * @ds_idx - index of data server in data server list, only valid if ds_clp set
>> + *
>> + * returns the correct verifier to use given the role of the server
>> + */
>> +static struct nfs_writeverf *
>> +nfs_direct_select_verf(struct nfs_direct_req *dreq,
>> + struct nfs_client *ds_clp,
>> + int ds_idx)
>> +{
>> + struct nfs_writeverf *verfp = &dreq->verf;
>> +
>> + if (ds_clp) {
>> + /* pNFS is in use, use the DS verf */
>> + if (ds_idx >= 0 && ds_idx < dreq->ds_cinfo.nbuckets)
>
> The struct pnfs_ds_commit_info is empty if CONFIG_NFS_V4_1=n, so this won't compile.
>
> Anna

Good catch, I?ll iterate through the patchset and test with v4.1 disabled.

Time to add some #ifdefs

-dros

>
>> + verfp = &dreq->ds_cinfo.buckets[ds_idx].direct_verf;
>> + else
>> + WARN_ON_ONCE(1);
>> + }
>> + return verfp;
>> +}
>> +
>> +
>> +/*
>> + * nfs_direct_set_hdr_verf - set the write/commit verifier
>> + * @dreq - direct request possibly spanning multiple servers
>> + * @hdr - pageio header to validate against previously seen verfs
>> + *
>> + * Set the server's (MDS or DS) "seen" verifier
>> + */
>> +static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
>> + struct nfs_pgio_header *hdr)
>> +{
>> + struct nfs_writeverf *verfp;
>> +
>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>> + hdr->data->ds_idx);
>> + WARN_ON_ONCE(verfp->committed >= 0);
>> + memcpy(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>> + WARN_ON_ONCE(verfp->committed < 0);
>> +}
>> +
>> +/*
>> + * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
>> + * @dreq - direct request possibly spanning multiple servers
>> + * @hdr - pageio header to validate against previously seen verf
>> + *
>> + * set the server's "seen" verf if not initialized.
>> + * returns result of comparison between @hdr->verf and the "seen"
>> + * verf of the server used by @hdr (DS or MDS)
>> + */
>> +static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
>> + struct nfs_pgio_header *hdr)
>> +{
>> + struct nfs_writeverf *verfp;
>> +
>> + verfp = nfs_direct_select_verf(dreq, hdr->data->ds_clp,
>> + hdr->data->ds_idx);
>> + if (verfp->committed < 0) {
>> + nfs_direct_set_hdr_verf(dreq, hdr);
>> + return 0;
>> + }
>> + return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
>> +}
>> +
>> +/*
>> + * nfs_direct_cmp_commit_data_verf - compare verifier for commit data
>> + * @dreq - direct request possibly spanning multiple servers
>> + * @data - commit data to validate against previously seen verf
>> + *
>> + * returns result of comparison between @data->verf and the verf of
>> + * the server used by @data (DS or MDS)
>> + */
>> +static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
>> + struct nfs_commit_data *data)
>> +{
>> + struct nfs_writeverf *verfp;
>> +
>> + verfp = nfs_direct_select_verf(dreq, data->ds_clp,
>> + data->ds_commit_index);
>> + WARN_ON_ONCE(verfp->committed < 0);
>> + return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
>> +}
>> +
>> /**
>> * nfs_direct_IO - NFS address space operation for direct I/O
>> * @rw: direction (read or write)
>> @@ -168,6 +255,7 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void)
>> kref_get(&dreq->kref);
>> init_completion(&dreq->completion);
>> INIT_LIST_HEAD(&dreq->mds_cinfo.list);
>> + dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */
>> INIT_WORK(&dreq->work, nfs_direct_write_schedule_work);
>> spin_lock_init(&dreq->lock);
>>
>> @@ -602,7 +690,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
>> dprintk("NFS: %5u commit failed with error %d.\n",
>> data->task.tk_pid, status);
>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>> - } else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
>> + } else if (nfs_direct_cmp_commit_data_verf(dreq, data)) {
>> dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
>> dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>> }
>> @@ -811,13 +899,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
>> if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES)
>> bit = NFS_IOHDR_NEED_RESCHED;
>> else if (dreq->flags == 0) {
>> - memcpy(&dreq->verf, &hdr->verf,
>> - sizeof(dreq->verf));
>> + nfs_direct_set_hdr_verf(dreq, hdr);
>> bit = NFS_IOHDR_NEED_COMMIT;
>> dreq->flags = NFS_ODIRECT_DO_COMMIT;
>> } else if (dreq->flags == NFS_ODIRECT_DO_COMMIT) {
>> - if (memcmp(&dreq->verf, &hdr->verf, sizeof(dreq->verf))) {
>> - dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
>> + if (nfs_direct_set_or_cmp_hdr_verf(dreq, hdr)) {
>> + dreq->flags =
>> + NFS_ODIRECT_RESCHED_WRITES;
>> bit = NFS_IOHDR_NEED_RESCHED;
>> } else
>> bit = NFS_IOHDR_NEED_COMMIT;
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 7a665e0..0ebc521 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -560,6 +560,7 @@ filelayout_read_pagelist(struct nfs_pgio_data *data)
>> /* No multipath support. Use first DS */
>> atomic_inc(&ds->ds_clp->cl_count);
>> data->ds_clp = ds->ds_clp;
>> + data->ds_idx = idx;
>> fh = nfs4_fl_select_ds_fh(lseg, j);
>> if (fh)
>> data->args.fh = fh;
>> @@ -603,6 +604,7 @@ filelayout_write_pagelist(struct nfs_pgio_data *data, int sync)
>> data->pgio_done_cb = filelayout_write_done_cb;
>> atomic_inc(&ds->ds_clp->cl_count);
>> data->ds_clp = ds->ds_clp;
>> + data->ds_idx = idx;
>> fh = nfs4_fl_select_ds_fh(lseg, j);
>> if (fh)
>> data->args.fh = fh;
>> @@ -875,6 +877,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>> for (i = 0; i < size; i++) {
>> INIT_LIST_HEAD(&buckets[i].written);
>> INIT_LIST_HEAD(&buckets[i].committing);
>> + /* mark direct verifier as unset */
>> + buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
>> }
>>
>> spin_lock(cinfo->lock);
>> @@ -885,6 +889,8 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
>> &buckets[i].written);
>> list_splice(&cinfo->ds->buckets[i].committing,
>> &buckets[i].committing);
>> + buckets[i].direct_verf.committed =
>> + cinfo->ds->buckets[i].direct_verf.committed;
>> buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
>> buckets[i].clseg = cinfo->ds->buckets[i].clseg;
>> }
>> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
>> index 3e794c1..610af51 100644
>> --- a/include/linux/nfs.h
>> +++ b/include/linux/nfs.h
>> @@ -46,6 +46,9 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
>> enum nfs3_stable_how {
>> NFS_UNSTABLE = 0,
>> NFS_DATA_SYNC = 1,
>> - NFS_FILE_SYNC = 2
>> + NFS_FILE_SYNC = 2,
>> +
>> + /* used by direct.c to mark verf as invalid */
>> + NFS_INVALID_STABLE_HOW = -1
>> };
>> #endif /* _LINUX_NFS_H */
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 29828c7..bb9fb88 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1111,6 +1111,7 @@ struct pnfs_commit_bucket {
>> struct list_head committing;
>> struct pnfs_layout_segment *wlseg;
>> struct pnfs_layout_segment *clseg;
>> + struct nfs_writeverf direct_verf;
>> };
>>
>> struct pnfs_ds_commit_info {
>> @@ -1294,6 +1295,7 @@ struct nfs_pgio_data {
>> __u64 mds_offset; /* Filelayout dense stripe */
>> struct nfs_page_array pages;
>> struct nfs_client *ds_clp; /* pNFS data server */
>> + int ds_idx; /* ds index if ds_clp is set */
>> };
>>
>> struct nfs_rw_header {
>


2014-04-24 18:15:37

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 01/18 v2] pnfs: fix race in filelayout commit path

Hold the lock while modifying commit info dataserver buckets.

The following oops can be reproduced by running iozone for a while against
a 2 DS pynfs filelayout server.

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache
CPU: 0 PID: 903 Comm: iozone Not tainted 3.15.0-rc1-branch-dros_testing+ #44
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
task: ffff880078164480 ti: ffff88006e972000 task.ti: ffff88006e972000
RIP: 0010:[<ffffffffa01936e1>] [<ffffffffa01936e1>] nfs_init_commit+0x22/0x
RSP: 0018:ffff88006e973d30 EFLAGS: 00010246
RAX: ffff88006e973e00 RBX: ffff88006e828800 RCX: ffff88006e973e10
RDX: 0000000000000000 RSI: ffff88006e973e00 RDI: dead4ead00000000
RBP: ffff88006e973d38 R08: ffff88006e8289d8 R09: 0000000000000000
R10: ffff88006e8289d8 R11: 0000000000016988 R12: ffff88006e973b98
R13: ffff88007a0a6648 R14: ffff88006e973e10 R15: ffff88006e828800
FS: 00007f2ce396b740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f03278a1000 CR3: 0000000079043000 CR4: 00000000001407f0
Stack:
ffff88006e8289d8 ffff88006e973da8 ffffffffa00f144f ffff88006e9478c0
ffff88006e973e00 ffff88006de21080 0000000100000002 ffff880079be6c48
ffff88006e973d70 ffff88006e973d70 ffff88006e973e10 ffff88006de21080
Call Trace:
[<ffffffffa00f144f>] filelayout_commit_pagelist+0x1ae/0x34a [nfs_layout_nfsv
[<ffffffffa0194f72>] nfs_generic_commit_list+0x92/0xc4 [nfs]
[<ffffffffa0195053>] nfs_commit_inode+0xaf/0x114 [nfs]
[<ffffffffa01892bd>] nfs_file_fsync_commit+0x82/0xbe [nfs]
[<ffffffffa01ceb0d>] nfs4_file_fsync+0x59/0x9b [nfsv4]
[<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20
[<ffffffff8114ee60>] vfs_fsync+0x1c/0x1e
[<ffffffffa01891c2>] nfs_file_flush+0x7f/0x84 [nfs]
[<ffffffff81127a43>] filp_close+0x3c/0x72
[<ffffffff81140e12>] __close_fd+0x82/0x9a
[<ffffffff81127a9c>] SyS_close+0x23/0x4c
[<ffffffff814acd12>] system_call_fastpath+0x16/0x1b
Code: 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8
RIP [<ffffffffa01936e1>] nfs_init_commit+0x22/0xe1 [nfs]
RSP <ffff88006e973d30>
---[ end trace 732fe6419b235e2f ]---

Suggested-by: Trond Myklebust <[email protected]>
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 7954e16..9fd7ceb 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -1067,6 +1067,7 @@ filelayout_choose_commit_list(struct nfs_page *req,
*/
j = nfs4_fl_calc_j_index(lseg, req_offset(req));
i = select_bucket_index(fl, j);
+ spin_lock(cinfo->lock);
buckets = cinfo->ds->buckets;
list = &buckets[i].written;
if (list_empty(list)) {
@@ -1080,6 +1081,7 @@ filelayout_choose_commit_list(struct nfs_page *req,
}
set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
cinfo->ds->nwritten++;
+ spin_unlock(cinfo->lock);
return list;
}

@@ -1176,6 +1178,7 @@ transfer_commit_list(struct list_head *src, struct list_head *dst,
return ret;
}

+/* Note called with cinfo->lock held. */
static int
filelayout_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
struct nfs_commit_info *cinfo,
@@ -1220,15 +1223,18 @@ static void filelayout_recover_commit_reqs(struct list_head *dst,
struct nfs_commit_info *cinfo)
{
struct pnfs_commit_bucket *b;
+ struct pnfs_layout_segment *freeme;
int i;

+restart:
spin_lock(cinfo->lock);
for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
if (transfer_commit_list(&b->written, dst, cinfo, 0)) {
- spin_unlock(cinfo->lock);
- pnfs_put_lseg(b->wlseg);
+ freeme = b->wlseg;
b->wlseg = NULL;
- spin_lock(cinfo->lock);
+ spin_unlock(cinfo->lock);
+ pnfs_put_lseg(freeme);
+ goto restart;
}
}
cinfo->ds->nwritten = 0;
@@ -1243,6 +1249,7 @@ alloc_ds_commits(struct nfs_commit_info *cinfo, struct list_head *list)
struct nfs_commit_data *data;
int i, j;
unsigned int nreq = 0;
+ struct pnfs_layout_segment *freeme;

fl_cinfo = cinfo->ds;
bucket = fl_cinfo->buckets;
@@ -1253,8 +1260,10 @@ alloc_ds_commits(struct nfs_commit_info *cinfo, struct list_head *list)
if (!data)
break;
data->ds_commit_index = i;
+ spin_lock(cinfo->lock);
data->lseg = bucket->clseg;
bucket->clseg = NULL;
+ spin_unlock(cinfo->lock);
list_add(&data->pages, list);
nreq++;
}
@@ -1264,8 +1273,11 @@ alloc_ds_commits(struct nfs_commit_info *cinfo, struct list_head *list)
if (list_empty(&bucket->committing))
continue;
nfs_retry_commit(&bucket->committing, bucket->clseg, cinfo);
- pnfs_put_lseg(bucket->clseg);
+ spin_lock(cinfo->lock);
+ freeme = bucket->clseg;
bucket->clseg = NULL;
+ spin_unlock(cinfo->lock);
+ pnfs_put_lseg(freeme);
}
/* Caller will clean up entries put on list */
return nreq;
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:50

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 09/18 v2] nfs: page group support in nfs_mark_uptodate

Change how nfs_mark_uptodate checks to see if writes cover a whole page.

This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/write.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 67 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a3e3065..eb1ab6a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -154,18 +154,78 @@ static void nfs_set_pageerror(struct page *page)
nfs_zap_mapping(page_file_mapping(page)->host, page_file_mapping(page));
}

+/*
+ * nfs_page_group_search_locked
+ * @head - head request of page group
+ * @page_offset - offset into page
+ *
+ * Search page group with head @head to find a request that contains the
+ * page offset @page_offset.
+ *
+ * Returns a pointer to the first matching nfs request, or NULL if no
+ * match is found.
+ *
+ * Must be called with the page group lock held
+ */
+static struct nfs_page *
+nfs_page_group_search_locked(struct nfs_page *head, unsigned int page_offset)
+{
+ struct nfs_page *req;
+
+ WARN_ON_ONCE(head != head->wb_head);
+ WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_head->wb_flags));
+
+ req = head;
+ do {
+ if (page_offset >= req->wb_pgbase &&
+ page_offset < (req->wb_pgbase + req->wb_bytes))
+ return req;
+
+ req = req->wb_this_page;
+ } while (req != head);
+
+ return NULL;
+}
+
+/*
+ * nfs_page_group_covers_page
+ * @head - head request of page group
+ *
+ * Return true if the page group with head @head covers the whole page,
+ * returns false otherwise
+ */
+static bool nfs_page_group_covers_page(struct nfs_page *req)
+{
+ struct nfs_page *tmp;
+ unsigned int pos = 0;
+ unsigned int len = nfs_page_length(req->wb_page);
+
+ nfs_page_group_lock(req);
+
+ do {
+ tmp = nfs_page_group_search_locked(req->wb_head, pos);
+ if (tmp) {
+ /* no way this should happen */
+ WARN_ON_ONCE(tmp->wb_pgbase != pos);
+ pos += tmp->wb_bytes - (pos - tmp->wb_pgbase);
+ }
+ } while (tmp && pos < len);
+
+ nfs_page_group_unlock(req);
+ WARN_ON_ONCE(pos > len);
+ return pos == len;
+}
+
/* We can set the PG_uptodate flag if we see that a write request
* covers the full page.
*/
-static void nfs_mark_uptodate(struct page *page, unsigned int base, unsigned int count)
+static void nfs_mark_uptodate(struct nfs_page *req)
{
- if (PageUptodate(page))
- return;
- if (base != 0)
+ if (PageUptodate(req->wb_page))
return;
- if (count != nfs_page_length(page))
+ if (!nfs_page_group_covers_page(req))
return;
- SetPageUptodate(page);
+ SetPageUptodate(req->wb_page);
}

static int wb_priority(struct writeback_control *wbc)
@@ -796,7 +856,7 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
return PTR_ERR(req);
/* Update file length */
nfs_grow_file(page, offset, count);
- nfs_mark_uptodate(page, req->wb_pgbase, req->wb_bytes);
+ nfs_mark_uptodate(req);
nfs_mark_request_dirty(req);
nfs_unlock_and_release_request(req);
return 0;
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:57

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 13/18 v2] nfs: use > 1 request to handle bsize < PAGE_SIZE

Use the newly added support for multiple requests per page for
rsize/wsize < PAGE_SIZE, instead of having multiple read / write
data structures per pageio header.

This allows us to get rid of nfs_pgio_multi.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/pageio.c | 56 ++-----------------------------------------------------
fs/nfs/pagelist.c | 22 ++++++++--------------
2 files changed, 10 insertions(+), 68 deletions(-)

diff --git a/fs/nfs/pageio.c b/fs/nfs/pageio.c
index b9d06c8..1899977 100644
--- a/fs/nfs/pageio.c
+++ b/fs/nfs/pageio.c
@@ -218,50 +218,6 @@ static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
}

/*
- * Generate multiple small requests to read or write a single
- * contiguous dirty on one page.
- */
-static int nfs_pgio_multi(struct nfs_pageio_descriptor *desc,
- struct nfs_pgio_header *hdr)
-{
- struct nfs_page *req = hdr->req;
- struct page *page = req->wb_page;
- struct nfs_pgio_data *data;
- size_t wsize = desc->pg_bsize, nbytes;
- unsigned int offset;
- int requests = 0;
- struct nfs_commit_info cinfo;
-
- nfs_init_cinfo(&cinfo, desc->pg_inode, desc->pg_dreq);
-
- if ((desc->pg_ioflags & FLUSH_COND_STABLE) &&
- (desc->pg_moreio || nfs_reqs_to_commit(&cinfo) ||
- desc->pg_count > wsize))
- desc->pg_ioflags &= ~FLUSH_COND_STABLE;
-
- offset = 0;
- nbytes = desc->pg_count;
- do {
- size_t len = min(nbytes, wsize);
-
- data = nfs_pgio_data_alloc(hdr, 1);
- if (!data)
- return nfs_pgio_error(desc, hdr);
- data->pages.pagevec[0] = page;
- nfs_pgio_rpcsetup(data, len, offset, desc->pg_ioflags, &cinfo);
- list_add(&data->list, &hdr->rpc_list);
- requests++;
- nbytes -= len;
- offset += len;
- } while (nbytes != 0);
-
- nfs_list_remove_request(req);
- nfs_list_add_request(req, &hdr->pages);
- desc->pg_rpc_callops = &nfs_pgio_common_ops;
- return 0;
-}
-
-/*
* Create an RPC task for the given write request and kick it.
* The page must have been locked by the caller.
*
@@ -269,8 +225,8 @@ static int nfs_pgio_multi(struct nfs_pageio_descriptor *desc,
* This is the case if nfs_updatepage detects a conflicting request
* that has been written but not committed.
*/
-static int nfs_pgio_one(struct nfs_pageio_descriptor *desc,
- struct nfs_pgio_header *hdr)
+int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct nfs_page *req;
struct page **pages;
@@ -302,14 +258,6 @@ static int nfs_pgio_one(struct nfs_pageio_descriptor *desc,
desc->pg_rpc_callops = &nfs_pgio_common_ops;
return 0;
}
-
-int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
- struct nfs_pgio_header *hdr)
-{
- if (desc->pg_bsize < PAGE_CACHE_SIZE)
- return nfs_pgio_multi(desc, hdr);
- return nfs_pgio_one(desc, hdr);
-}
EXPORT_SYMBOL_GPL(nfs_generic_pgio);

static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6257f31..78933ac 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -440,21 +440,13 @@ nfs_wait_on_request(struct nfs_page *req)
size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
struct nfs_page *prev, struct nfs_page *req)
{
- if (!prev)
- return req->wb_bytes;
- /*
- * FIXME: ideally we should be able to coalesce all requests
- * that are not block boundary aligned, but currently this
- * is problematic for the case of bsize < PAGE_CACHE_SIZE,
- * since nfs_flush_multi and nfs_pagein_multi assume you
- * can have only one struct nfs_page.
- */
- if (desc->pg_bsize < PAGE_SIZE)
+ if (desc->pg_count > desc->pg_bsize) {
+ /* should never happen */
+ WARN_ON_ONCE(1);
return 0;
+ }

- if (desc->pg_count + req->wb_bytes <= desc->pg_bsize)
- return req->wb_bytes;
- return 0;
+ return min(desc->pg_bsize - desc->pg_count, (size_t)req->wb_bytes);
}
EXPORT_SYMBOL_GPL(nfs_generic_pg_test);

@@ -534,7 +526,9 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
return false;
}
size = pgio->pg_ops->pg_test(pgio, prev, req);
- WARN_ON_ONCE(size && size != req->wb_bytes);
+ WARN_ON_ONCE(size > req->wb_bytes);
+ if (size && size < req->wb_bytes)
+ req->wb_bytes = size;
return size > 0;
}

--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:53

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 11/18 v2] nfs: allow coalescing of subpage requests

Remove check that the request covers a whole page.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/pagelist.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d57d675..6257f31 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -530,10 +530,6 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
!nfs_match_lock_context(req->wb_lock_context,
prev->wb_lock_context))
return false;
- if (req->wb_pgbase != 0)
- return false;
- if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
- return false;
if (req_offset(req) != req_offset(prev) + prev->wb_bytes)
return false;
}
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:52

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 10/18 v2] pnfs: clean up filelayout_alloc_commit_info

Remove unneeded else statement and clean up how commit info
dataserver buckets are replaced.

Suggested-by: Trond Myklebust <[email protected]>
Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 9319427..9cea935 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -850,11 +850,15 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
{
struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
struct pnfs_commit_bucket *buckets;
- int size;
+ int size, i;

if (fl->commit_through_mds)
return 0;
- if (cinfo->ds->nbuckets != 0) {
+
+ size = (fl->stripe_type == STRIPE_SPARSE) ?
+ fl->dsaddr->ds_num : fl->dsaddr->stripe_count;
+
+ if (cinfo->ds->nbuckets >= size) {
/* This assumes there is only one IOMODE_RW lseg. What
* we really want to do is have a layout_hdr level
* dictionary of <multipath_list4, fh> keys, each
@@ -864,30 +868,32 @@ filelayout_alloc_commit_info(struct pnfs_layout_segment *lseg,
return 0;
}

- size = (fl->stripe_type == STRIPE_SPARSE) ?
- fl->dsaddr->ds_num : fl->dsaddr->stripe_count;
-
buckets = kcalloc(size, sizeof(struct pnfs_commit_bucket),
gfp_flags);
if (!buckets)
return -ENOMEM;
- else {
- int i;
+ for (i = 0; i < size; i++) {
+ INIT_LIST_HEAD(&buckets[i].written);
+ INIT_LIST_HEAD(&buckets[i].committing);
+ }

- spin_lock(cinfo->lock);
- if (cinfo->ds->nbuckets != 0)
- kfree(buckets);
- else {
- cinfo->ds->buckets = buckets;
- cinfo->ds->nbuckets = size;
- for (i = 0; i < size; i++) {
- INIT_LIST_HEAD(&buckets[i].written);
- INIT_LIST_HEAD(&buckets[i].committing);
- }
- }
- spin_unlock(cinfo->lock);
- return 0;
+ spin_lock(cinfo->lock);
+ if (cinfo->ds->nbuckets >= size)
+ goto out;
+ for (i = 0; i < cinfo->ds->nbuckets; i++) {
+ list_splice(&cinfo->ds->buckets[i].written,
+ &buckets[i].written);
+ list_splice(&cinfo->ds->buckets[i].committing,
+ &buckets[i].committing);
+ buckets[i].wlseg = cinfo->ds->buckets[i].wlseg;
+ buckets[i].clseg = cinfo->ds->buckets[i].clseg;
}
+ swap(cinfo->ds->buckets, buckets);
+ cinfo->ds->nbuckets = size;
+out:
+ spin_unlock(cinfo->lock);
+ kfree(buckets);
+ return 0;
}

static struct pnfs_layout_segment *
--
1.8.5.2 (Apple Git-48)


2014-04-24 18:15:59

by Weston Andros Adamson

[permalink] [raw]
Subject: [PATCH 14/18 v2] nfs: remove list of [rw]data from pgio header

Since the ability to split pages into subpage requests has been added,
nfs_pgio_header->rpc_list only ever has one wdata/rdata.

Signed-off-by: Weston Andros Adamson <[email protected]>
---
fs/nfs/pageio.c | 35 ++++-------------------------------
fs/nfs/pnfs.c | 41 +++++++++++++++--------------------------
include/linux/nfs_xdr.h | 4 +++-
3 files changed, 22 insertions(+), 58 deletions(-)

diff --git a/fs/nfs/pageio.c b/fs/nfs/pageio.c
index 1899977..f578cc9 100644
--- a/fs/nfs/pageio.c
+++ b/fs/nfs/pageio.c
@@ -29,7 +29,6 @@ struct nfs_rw_header *nfs_rw_header_alloc(const struct nfs_rw_ops *ops)
struct nfs_pgio_header *hdr = &header->header;

INIT_LIST_HEAD(&hdr->pages);
- INIT_LIST_HEAD(&hdr->rpc_list);
spin_lock_init(&hdr->lock);
atomic_set(&hdr->refcnt, 0);
hdr->rw_ops = ops;
@@ -182,37 +181,12 @@ static int nfs_do_pgio(struct nfs_pgio_data *data,
return nfs_initiate_pgio(NFS_CLIENT(inode), data, call_ops, how, 0);
}

-int nfs_do_multiple_pgios(struct list_head *head,
- const struct rpc_call_ops *call_ops,
- int how)
-{
- struct nfs_pgio_data *data;
- int ret = 0;
-
- while (!list_empty(head)) {
- int ret2;
-
- data = list_first_entry(head, struct nfs_pgio_data, list);
- list_del_init(&data->list);
-
- ret2 = nfs_do_pgio(data, call_ops, how);
- if (ret == 0)
- ret = ret2;
- }
- return ret;
-}
-
static int nfs_pgio_error(struct nfs_pageio_descriptor *desc,
struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_data *data;
-
set_bit(NFS_IOHDR_REDO, &hdr->flags);
- while (!list_empty(&hdr->rpc_list)) {
- data = list_first_entry(&hdr->rpc_list, struct nfs_pgio_data, list);
- list_del(&data->list);
- nfs_pgio_data_release(data);
- }
+ nfs_pgio_data_release(hdr->data);
+ hdr->data = NULL;
desc->pg_completion_ops->error_cleanup(&desc->pg_list);
return -ENOMEM;
}
@@ -254,7 +228,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,

/* Set up the argument struct */
nfs_pgio_rpcsetup(data, desc->pg_count, 0, desc->pg_ioflags, &cinfo);
- list_add(&data->list, &hdr->rpc_list);
+ hdr->data = data;
desc->pg_rpc_callops = &nfs_pgio_common_ops;
return 0;
}
@@ -276,8 +250,7 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
atomic_inc(&hdr->refcnt);
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0)
- ret = nfs_do_multiple_pgios(&hdr->rpc_list,
- desc->pg_rpc_callops, 0);
+ ret = nfs_do_pgio(hdr->data, desc->pg_rpc_callops, 0);
if (atomic_dec_and_test(&hdr->refcnt))
hdr->completion_ops->completion(hdr);
return ret;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 354c53c..6ef108b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1573,23 +1573,18 @@ pnfs_try_to_write_data(struct nfs_pgio_data *wdata,
}

static void
-pnfs_do_multiple_writes(struct nfs_pageio_descriptor *desc, struct list_head *head, int how)
+pnfs_do_write(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int how)
{
- struct nfs_pgio_data *data;
+ struct nfs_pgio_data *data = hdr->data;
const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
struct pnfs_layout_segment *lseg = desc->pg_lseg;
+ enum pnfs_try_status trypnfs;

desc->pg_lseg = NULL;
- while (!list_empty(head)) {
- enum pnfs_try_status trypnfs;
-
- data = list_first_entry(head, struct nfs_pgio_data, list);
- list_del_init(&data->list);
-
- trypnfs = pnfs_try_to_write_data(data, call_ops, lseg, how);
- if (trypnfs == PNFS_NOT_ATTEMPTED)
- pnfs_write_through_mds(desc, data);
- }
+ trypnfs = pnfs_try_to_write_data(data, call_ops, lseg, how);
+ if (trypnfs == PNFS_NOT_ATTEMPTED)
+ pnfs_write_through_mds(desc, data);
pnfs_put_lseg(lseg);
}

@@ -1623,7 +1618,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
pnfs_put_lseg(desc->pg_lseg);
desc->pg_lseg = NULL;
} else
- pnfs_do_multiple_writes(desc, &hdr->rpc_list, desc->pg_ioflags);
+ pnfs_do_write(desc, hdr, desc->pg_ioflags);
if (atomic_dec_and_test(&hdr->refcnt))
hdr->completion_ops->completion(hdr);
return ret;
@@ -1731,23 +1726,17 @@ pnfs_try_to_read_data(struct nfs_pgio_data *rdata,
}

static void
-pnfs_do_multiple_reads(struct nfs_pageio_descriptor *desc, struct list_head *head)
+pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
{
- struct nfs_pgio_data *data;
+ struct nfs_pgio_data *data = hdr->data;
const struct rpc_call_ops *call_ops = desc->pg_rpc_callops;
struct pnfs_layout_segment *lseg = desc->pg_lseg;
+ enum pnfs_try_status trypnfs;

desc->pg_lseg = NULL;
- while (!list_empty(head)) {
- enum pnfs_try_status trypnfs;
-
- data = list_first_entry(head, struct nfs_pgio_data, list);
- list_del_init(&data->list);
-
- trypnfs = pnfs_try_to_read_data(data, call_ops, lseg);
- if (trypnfs == PNFS_NOT_ATTEMPTED)
- pnfs_read_through_mds(desc, data);
- }
+ trypnfs = pnfs_try_to_read_data(data, call_ops, lseg);
+ if (trypnfs == PNFS_NOT_ATTEMPTED)
+ pnfs_read_through_mds(desc, data);
pnfs_put_lseg(lseg);
}

@@ -1782,7 +1771,7 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
pnfs_put_lseg(desc->pg_lseg);
desc->pg_lseg = NULL;
} else
- pnfs_do_multiple_reads(desc, &hdr->rpc_list);
+ pnfs_do_read(desc, hdr);
if (atomic_dec_and_test(&hdr->refcnt))
hdr->completion_ops->completion(hdr);
return ret;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index dacd240..29828c7 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1255,11 +1255,13 @@ enum {
NFS_IOHDR_NEED_RESCHED,
};

+struct nfs_pgio_data;
+
struct nfs_pgio_header {
struct inode *inode;
struct rpc_cred *cred;
struct list_head pages;
- struct list_head rpc_list;
+ struct nfs_pgio_data *data;
atomic_t refcnt;
struct nfs_page *req;
struct nfs_writeverf verf; /* Used for writes */
--
1.8.5.2 (Apple Git-48)