2011-05-29 10:15:56

by Boaz Harrosh

[permalink] [raw]
Subject: [PACHSET 0/8] FIXES for pnfs for 2.6.40 - version 9


Hi

I've re-rebased all the fixes in their intended place and cut a
tree on the git.open-osd.org git tree. (some base patches changed do to
conflicts with previous squashmes)

I have structured the above git tree as follows:
v2.6.39
pnfs-submit----|
+---merge_and_compile
pnfsd-exofs----|

So both client and server branches are based on Linus tree and are
conveniently merged in *merge_and_compile* for actual testing of the
code.

Benny please compare your final branch with the *pnfs-submit* branch
to make sure nothing is missing.

tree is at: git://git.open-osd.org/linux-open-osd.git
pnfs-submit pnfsd-exofs & merge_and_compile branches

[On the web:
http://git.open-osd.org/gitweb.cgi?p=linux-open-osd.git;a=shortlog;h=refs/heads/merge_and_compile
]

I have left in the SQUASHME patches (And resending them in their mergeable
form, as reply to this set)

Here are the patches in pnfs-submit (rebase order)
(I have removed the two server patches from the pnfs-submit branch)

git log --oneline --reverse v.2.6.39..pnfs-submit:

3b6445a NFSv4.1: fix typo in filelayout_check_layout
67d51f6 NFSv4.1: use struct nfs_client to qualify deviceid
45df3c8 pnfs: resolve header dependency in pnfs.h
a1eaecb NFSv4.1: make deviceid cache global
47a0c5c NFSv4.1: purge deviceid cache on nfs_free_client
[1/8] bee4df2 SQUASHME: into NFSv4.1: purge deviceid cache - let ver < 4.1 compile
43d3f13 pnfs: CB_NOTIFY_DEVICEID
7073887 NFSv4.1: use layout driver in global device cache
aca9487 SUNRPC: introduce xdr_init_decode_pages
9734962 pnfs: Use byte-range for layoutget
adda93b pnfs: align layoutget requests on page boundaries
d3fa95a pnfs: Use byte-range for cb_layoutrecall
fbc41a4 pnfs: client stats
cb4a216 pnfs-obj: objlayoutdriver module skeleton
0d81921 pnfs-obj: pnfs_osd XDR definitions
08758d6 pnfs-obj: pnfs_osd XDR client implementation
4e2d111 pnfs-obj: decode layout, alloc/free lseg
[2/8] 63ab8d8 SQUASHME V2: objio alloc/free lseg Bugs fixes
33e6214 NEWVERSION: pnfs-obj: objio_osd device information retrieval and caching
40981be pnfs: alloc and free layout_hdr layoutdriver methods
d868d03 pnfs-obj: define per-inode private structure
03287b4 pnfs: support for non-rpc layout drivers
459cdad pnfs-obj: osd raid engine read/write implementation
[3/8] 5f4a353 SQUASHME: objio read/write patch: Bugs fixes
a56f840 pnfs: layoutreturn
79d5152 pnfs: layoutret_on_setattr
b8b72b3 pnfs: encode_layoutreturn
[4/8] 3195a51 NEWVERSION: pnfs-obj: report errors and .encode_layoutreturn Implementation.
2ae4424 pnfs: encode_layoutcommit
9b0b09b pnfs-obj: objlayout_encode_layoutcommit implementation
aa838bf NFSv4.1: unify pnfs_pageio_init functions
[5/8] e7f243f SQUASHME: Fix BUG in: NFSv4.1: unify pnfs_pageio_init functions
d93594a NFSv4.1: change pg_test return type to bool
8630b61 NFSv4.1: use pnfs_generic_pg_test directly by layout driver
[6/8] a1f29c7 NFSv4.1: define nfs_generic_pg_test

(This one had merge conflict with e7f243f SQUASHME, so it is different)

[7/8] 413647e SQUASHME: Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test
(I've sent two changes to this. This is the unified fix)

8e88dcd pnfs-obj: pg_test check for max_io_size
[8/8] 4e3c595 SQUASHME: pnfs-obj: objio_pg_test some checkpatch love

One might also be interested in the *pnfsd-exofs* branch. It includes all the server
patches, and exofs changes. And was changed to be made cleanly mergable with pnfs-submit.
(Please not that in it's current form it is not compilable before the merge do to headers
changes that are in pnfs-submit.)

With the above unified tree I'm able to not crash and do pnfs-IO. I will continue the
testing all day, to make sure.

Thanks
Boaz



2011-05-29 10:30:23

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 2/8] SQUASHME V2: objio alloc/free lseg Bugs fixes

Wrong allocation and pointering in lseg_alloc.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/objlayout/objio_osd.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 725b1df..08f1d90 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -65,7 +65,7 @@ struct objio_segment {
unsigned comps_index;
unsigned num_comps;
/* variable length */
- struct osd_dev *ods[1];
+ struct objio_dev_ent *ods[];
};

static inline struct objio_segment *
@@ -143,7 +143,6 @@ int objio_alloc_lseg(struct pnfs_layout_segment **outp,
struct pnfs_osd_layout layout;
struct pnfs_osd_object_cred *cur_comp, src_comp;
struct caps_buffers *caps_p;
-
int err;

err = pnfs_osd_xdr_decode_layout_map(&layout, &iter, xdr);
@@ -155,13 +154,15 @@ int objio_alloc_lseg(struct pnfs_layout_segment **outp,
return err;

objio_seg = kzalloc(sizeof(*objio_seg) +
+ sizeof(objio_seg->ods[0]) * layout.olo_num_comps +
sizeof(*objio_seg->comps) * layout.olo_num_comps +
sizeof(struct caps_buffers) * layout.olo_num_comps,
gfp_flags);
if (!objio_seg)
return -ENOMEM;

- cur_comp = objio_seg->comps = (void *)(objio_seg + 1);
+ objio_seg->comps = (void *)(objio_seg->ods + layout.olo_num_comps);
+ cur_comp = objio_seg->comps;
caps_p = (void *)(cur_comp + layout.olo_num_comps);
while (pnfs_osd_xdr_decode_layout_comp(&src_comp, &iter, xdr, &err))
copy_single_comp(cur_comp++, &src_comp, caps_p++);
--
1.7.2.3


2011-05-29 10:30:38

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 3/8] SQUASHME: objio read/write patch: Bugs fixes

Cap BIO size it one page

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/objlayout/objio_osd.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 6925567..cc92d3b 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -397,7 +397,6 @@ int objio_alloc_io_state(struct pnfs_layout_segment *lseg,
const unsigned first_size = sizeof(*ios) +
objio_seg->num_comps * sizeof(ios->per_dev[0]);

- dprintk("%s: num_comps=%d\n", __func__, objio_seg->num_comps);
ios = kzalloc(first_size, gfp_flags);
if (unlikely(!ios))
return -ENOMEM;
@@ -561,6 +560,9 @@ static int _add_stripe_unit(struct objio_state *ios, unsigned *cur_pg,
unsigned bio_size = (ios->ol_state.nr_pages + pages_in_stripe) /
stripes;

+ if (BIO_MAX_PAGES_KMALLOC < bio_size)
+ bio_size = BIO_MAX_PAGES_KMALLOC;
+
per_dev->bio = bio_kmalloc(gfp_flags, bio_size);
if (unlikely(!per_dev->bio)) {
dprintk("Faild to allocate BIO size=%u\n", bio_size);
--
1.7.2.3


2011-05-29 10:31:40

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 7/8] SQUASHME: Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test

desc->pg_bsize is negotiated with the MDS. But if we are doing
pnfs-IO it is not relevent.

While at it cleanup nfs_pageio_do_add_request() in light of the
less things it needs to do.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/pagelist.c | 27 +++++++++++++--------------
1 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 36bb67f..624ec2c 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -206,6 +206,16 @@ nfs_wait_on_request(struct nfs_page *req)

static bool nfs_generic_pg_test(struct nfs_pageio_descriptor *desc, struct nfs_page *prev, struct nfs_page *req)
{
+ /*
+ * FIXME: ideally we should be able to coalesce all requests
+ * that are not block boundary aligned, but currently this
+ * is problematic for the case of bsize < PAGE_CACHE_SIZE,
+ * since nfs_flush_multi and nfs_pagein_multi assume you
+ * can have only one struct nfs_page.
+ */
+ if (desc->pg_bsize < PAGE_SIZE)
+ return 0;
+
return desc->pg_count + req->wb_bytes <= desc->pg_bsize;
}

@@ -279,29 +289,18 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
struct nfs_page *req)
{
- size_t newlen = req->wb_bytes;
-
if (desc->pg_count != 0) {
struct nfs_page *prev;

- /*
- * FIXME: ideally we should be able to coalesce all requests
- * that are not block boundary aligned, but currently this
- * is problematic for the case of bsize < PAGE_CACHE_SIZE,
- * since nfs_flush_multi and nfs_pagein_multi assume you
- * can have only one struct nfs_page.
- */
- if (desc->pg_bsize < PAGE_SIZE)
- return 0;
- newlen += desc->pg_count;
prev = nfs_list_entry(desc->pg_list.prev);
if (!nfs_can_coalesce_requests(prev, req, desc))
return 0;
- } else
+ } else {
desc->pg_base = req->wb_pgbase;
+ }
nfs_list_remove_request(req);
nfs_list_add_request(req, &desc->pg_list);
- desc->pg_count = newlen;
+ desc->pg_count += req->wb_bytes;
return 1;
}

--
1.7.2.3


2011-05-29 10:31:57

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 8/8] SQUASHME: pnfs-obj: objio_pg_test some checkpatch love

Lines to long

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/objlayout/objio_osd.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index f431c4b..9cf208d 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -995,13 +995,14 @@ ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state, bool stable)
return _write_exec(ios);
}

-static bool
-objio_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, struct nfs_page *req)
+static bool objio_pg_test(struct nfs_pageio_descriptor *pgio,
+ struct nfs_page *prev, struct nfs_page *req)
{
if (!pnfs_generic_pg_test(pgio, prev, req))
return false;

- return pgio->pg_count + req->wb_bytes <= OBJIO_LSEG(pgio->pg_lseg)->max_io_size;
+ return pgio->pg_count + req->wb_bytes <=
+ OBJIO_LSEG(pgio->pg_lseg)->max_io_size;
}

static struct pnfs_layoutdriver_type objlayout_type = {
--
1.7.2.3


2011-05-29 10:30:05

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 1/8] SQUASHME: into NFSv4.1: purge deviceid cache - let ver < 4.1 compile

In C parameter names cannot be omitted from function implementation

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/pnfs.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9667a62..80a5d0e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -357,7 +357,7 @@ static inline int pnfs_layoutcommit_inode(struct inode *inode, bool sync)
return 0;
}

-static inline void nfs4_deviceid_purge_client(struct nfs_client *)
+static inline void nfs4_deviceid_purge_client(struct nfs_client *ncl)
{
}
#endif /* CONFIG_NFS_V4_1 */
--
1.7.2.3


2011-05-29 10:31:12

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 5/8] SQUASHME: Fix BUG in: NFSv4.1: unify pnfs_pageio_init functions

The call to pnfs_pageio_init() was done before the call
to nfs_pageio_init which would override the .pg_test set
there.

But enough is enough. One init function is more than
enough. Call pnfs_pageio_init() from within the
nfs_pageio_init(). It is kept separate so the code can
compile in ver < 4.1, where the layout_driver type is not
defined

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/pagelist.c | 1 +
fs/nfs/pnfs.c | 2 +-
fs/nfs/pnfs.h | 6 ++++--
fs/nfs/read.c | 1 -
fs/nfs/write.c | 1 -
5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 0918ea8..b8704fe 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -230,6 +230,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
desc->pg_error = 0;
desc->pg_lseg = NULL;
desc->pg_test = NULL;
+ pnfs_pageio_init(desc, inode);
}

/**
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index f7a9405..568ab0e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1058,7 +1058,7 @@ pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
access_type = IOMODE_RW;
gfp_flags = GFP_NOFS;
}
-
+
if (pgio->pg_count == prev->wb_bytes) {
/* This is first coelesce call for a series of nfs_pages */
pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 4cfc494..c056688 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -292,7 +292,8 @@ static inline int pnfs_return_layout(struct inode *ino)
return 0;
}

-static inline void pnfs_pageio_init(struct nfs_pageio_descriptor *pgio, struct inode *inode)
+static inline void pnfs_pageio_init(struct nfs_pageio_descriptor *pgio,
+ struct inode *inode)
{
if (NFS_SERVER(inode)->pnfs_curr_ld)
pgio->pg_test = pnfs_generic_pg_test;
@@ -381,7 +382,8 @@ static inline void unset_pnfs_layoutdriver(struct nfs_server *s)
{
}

-static inline void pnfs_pageio_init(struct nfs_pageio_descriptor *, struct inode *)
+static inline void pnfs_pageio_init(struct nfs_pageio_descriptor *pgio,
+ struct inode *inode)
{
}

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 6bd09a8..20a7f95 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -664,7 +664,6 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
if (ret == 0)
goto read_complete; /* all pages were read */

- pnfs_pageio_init(&pgio, inode);
if (rsize < PAGE_CACHE_SIZE)
nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
else
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d81c5c0..e268e3b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1036,7 +1036,6 @@ static void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
{
size_t wsize = NFS_SERVER(inode)->wsize;

- pnfs_pageio_init(pgio, inode);
if (wsize < PAGE_CACHE_SIZE)
nfs_pageio_init(pgio, inode, nfs_flush_multi, wsize, ioflags);
else
--
1.7.2.3


2011-05-29 10:30:54

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 4/8] NEWVERSION: pnfs-obj: report errors and .encode_layoutreturn Implementation.

An io_state pre-allocates an error information structure for each
possible osd-device that might error during IO. When IO is done if all
was well the io_state is freed. (as today). If the I/O has ended with an
error, the io_state is queued on a per-layout err_list. When eventually
encode_layoutreturn() is called, each error is properly encoded on the
XDR buffer and only then the io_state is removed from err_list and
de-allocated.

It is up to the io_engine to fill in the segment that fault and the type
of osd_error that occurred. By calling objlayout_io_set_result() for
each failing device.

In objio_osd:
* Allocate io-error descriptors space as part of io_state
* Use generic objlayout error reporting at end of io.

Signed-off-by: Boaz Harrosh <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/objlayout/objio_osd.c | 44 ++++++++-
fs/nfs/objlayout/objlayout.c | 232 +++++++++++++++++++++++++++++++++++++++++-
fs/nfs/objlayout/objlayout.h | 23 ++++
3 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 4e8de3e..8bca5e1 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -396,12 +396,16 @@ int objio_alloc_io_state(struct pnfs_layout_segment *lseg,
struct objio_state *ios;
const unsigned first_size = sizeof(*ios) +
objio_seg->num_comps * sizeof(ios->per_dev[0]);
+ const unsigned sec_size = objio_seg->num_comps *
+ sizeof(ios->ol_state.ioerrs[0]);

- ios = kzalloc(first_size, gfp_flags);
+ ios = kzalloc(first_size + sec_size, gfp_flags);
if (unlikely(!ios))
return -ENOMEM;

ios->layout = objio_seg;
+ ios->ol_state.ioerrs = ((void *)ios) + first_size;
+ ios->ol_state.num_comps = objio_seg->num_comps;

*outp = &ios->ol_state;
return 0;
@@ -415,6 +419,36 @@ void objio_free_io_state(struct objlayout_io_state *ol_state)
kfree(ios);
}

+enum pnfs_osd_errno osd_pri_2_pnfs_err(enum osd_err_priority oep)
+{
+ switch (oep) {
+ case OSD_ERR_PRI_NO_ERROR:
+ return (enum pnfs_osd_errno)0;
+
+ case OSD_ERR_PRI_CLEAR_PAGES:
+ BUG_ON(1);
+ return 0;
+
+ case OSD_ERR_PRI_RESOURCE:
+ return PNFS_OSD_ERR_RESOURCE;
+ case OSD_ERR_PRI_BAD_CRED:
+ return PNFS_OSD_ERR_BAD_CRED;
+ case OSD_ERR_PRI_NO_ACCESS:
+ return PNFS_OSD_ERR_NO_ACCESS;
+ case OSD_ERR_PRI_UNREACHABLE:
+ return PNFS_OSD_ERR_UNREACHABLE;
+ case OSD_ERR_PRI_NOT_FOUND:
+ return PNFS_OSD_ERR_NOT_FOUND;
+ case OSD_ERR_PRI_NO_SPACE:
+ return PNFS_OSD_ERR_NO_SPACE;
+ default:
+ WARN_ON(1);
+ /* fallthrough */
+ case OSD_ERR_PRI_EIO:
+ return PNFS_OSD_ERR_EIO;
+ }
+}
+
static void _clear_bio(struct bio *bio)
{
struct bio_vec *bv;
@@ -461,6 +495,12 @@ static int _io_check(struct objio_state *ios, bool is_write)
continue; /* we recovered */
}
dev = ios->per_dev[i].dev;
+ objlayout_io_set_result(&ios->ol_state, dev,
+ &ios->layout->comps[dev].oc_object_id,
+ osd_pri_2_pnfs_err(osi.osd_err_pri),
+ ios->per_dev[i].offset,
+ ios->per_dev[i].length,
+ is_write);

if (osi.osd_err_pri >= oep) {
oep = osi.osd_err_pri;
@@ -977,6 +1017,8 @@ static struct pnfs_layoutdriver_type objlayout_type = {
.pg_test = objlayout_pg_test,

.free_deviceid_node = objio_free_deviceid_node,
+
+ .encode_layoutreturn = objlayout_encode_layoutreturn,
};

MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 5157ef6..f7caecf 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -50,6 +50,10 @@ objlayout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
struct objlayout *objlay;

objlay = kzalloc(sizeof(struct objlayout), gfp_flags);
+ if (objlay) {
+ spin_lock_init(&objlay->lock);
+ INIT_LIST_HEAD(&objlay->err_list);
+ }
dprintk("%s: Return %p\n", __func__, objlay);
return &objlay->pnfs_layout;
}
@@ -64,6 +68,7 @@ objlayout_free_layout_hdr(struct pnfs_layout_hdr *lo)

dprintk("%s: objlay %p\n", __func__, objlay);

+ WARN_ON(!list_empty(&objlay->err_list));
kfree(objlay);
}

@@ -183,6 +188,7 @@ objlayout_alloc_io_state(struct pnfs_layout_hdr *pnfs_layout_type,
pgbase &= ~PAGE_MASK;
}

+ INIT_LIST_HEAD(&state->err_list);
state->lseg = lseg;
state->rpcdata = rpcdata;
state->pages = pages;
@@ -213,7 +219,52 @@ objlayout_iodone(struct objlayout_io_state *state)
{
dprintk("%s: state %p status\n", __func__, state);

- objlayout_free_io_state(state);
+ if (likely(state->status >= 0)) {
+ objlayout_free_io_state(state);
+ } else {
+ struct objlayout *objlay = OBJLAYOUT(state->lseg->pls_layout);
+
+ spin_lock(&objlay->lock);
+ list_add(&objlay->err_list, &state->err_list);
+ spin_unlock(&objlay->lock);
+ }
+}
+
+/*
+ * objlayout_io_set_result - Set an osd_error code on a specific osd comp.
+ *
+ * The @index component IO failed (error returned from target). Register
+ * the error for later reporting at layout-return.
+ */
+void
+objlayout_io_set_result(struct objlayout_io_state *state, unsigned index,
+ struct pnfs_osd_objid *pooid, int osd_error,
+ u64 offset, u64 length, bool is_write)
+{
+ struct pnfs_osd_ioerr *ioerr = &state->ioerrs[index];
+
+ BUG_ON(index >= state->num_comps);
+ if (osd_error) {
+ ioerr->oer_component = *pooid;
+ ioerr->oer_comp_offset = offset;
+ ioerr->oer_comp_length = length;
+ ioerr->oer_iswrite = is_write;
+ ioerr->oer_errno = osd_error;
+
+ dprintk("%s: err[%d]: errno=%d is_write=%d dev(%llx:%llx) "
+ "par=0x%llx obj=0x%llx offset=0x%llx length=0x%llx\n",
+ __func__, index, ioerr->oer_errno,
+ ioerr->oer_iswrite,
+ _DEVID_LO(&ioerr->oer_component.oid_device_id),
+ _DEVID_HI(&ioerr->oer_component.oid_device_id),
+ ioerr->oer_component.oid_partition_id,
+ ioerr->oer_component.oid_object_id,
+ ioerr->oer_comp_offset,
+ ioerr->oer_comp_length);
+ } else {
+ /* User need not call if no error is reported */
+ ioerr->oer_errno = 0;
+ }
}

/* Function scheduled on rpc workqueue to call ->nfs_readlist_complete().
@@ -382,6 +433,185 @@ objlayout_write_pagelist(struct nfs_write_data *wdata,
return PNFS_ATTEMPTED;
}

+static int
+err_prio(u32 oer_errno)
+{
+ switch (oer_errno) {
+ case 0:
+ return 0;
+
+ case PNFS_OSD_ERR_RESOURCE:
+ return OSD_ERR_PRI_RESOURCE;
+ case PNFS_OSD_ERR_BAD_CRED:
+ return OSD_ERR_PRI_BAD_CRED;
+ case PNFS_OSD_ERR_NO_ACCESS:
+ return OSD_ERR_PRI_NO_ACCESS;
+ case PNFS_OSD_ERR_UNREACHABLE:
+ return OSD_ERR_PRI_UNREACHABLE;
+ case PNFS_OSD_ERR_NOT_FOUND:
+ return OSD_ERR_PRI_NOT_FOUND;
+ case PNFS_OSD_ERR_NO_SPACE:
+ return OSD_ERR_PRI_NO_SPACE;
+ default:
+ WARN_ON(1);
+ /* fallthrough */
+ case PNFS_OSD_ERR_EIO:
+ return OSD_ERR_PRI_EIO;
+ }
+}
+
+static void
+merge_ioerr(struct pnfs_osd_ioerr *dest_err,
+ const struct pnfs_osd_ioerr *src_err)
+{
+ u64 dest_end, src_end;
+
+ if (!dest_err->oer_errno) {
+ *dest_err = *src_err;
+ /* accumulated device must be blank */
+ memset(&dest_err->oer_component.oid_device_id, 0,
+ sizeof(dest_err->oer_component.oid_device_id));
+
+ return;
+ }
+
+ if (dest_err->oer_component.oid_partition_id !=
+ src_err->oer_component.oid_partition_id)
+ dest_err->oer_component.oid_partition_id = 0;
+
+ if (dest_err->oer_component.oid_object_id !=
+ src_err->oer_component.oid_object_id)
+ dest_err->oer_component.oid_object_id = 0;
+
+ if (dest_err->oer_comp_offset > src_err->oer_comp_offset)
+ dest_err->oer_comp_offset = src_err->oer_comp_offset;
+
+ dest_end = end_offset(dest_err->oer_comp_offset,
+ dest_err->oer_comp_length);
+ src_end = end_offset(src_err->oer_comp_offset,
+ src_err->oer_comp_length);
+ if (dest_end < src_end)
+ dest_end = src_end;
+
+ dest_err->oer_comp_length = dest_end - dest_err->oer_comp_offset;
+
+ if ((src_err->oer_iswrite == dest_err->oer_iswrite) &&
+ (err_prio(src_err->oer_errno) > err_prio(dest_err->oer_errno))) {
+ dest_err->oer_errno = src_err->oer_errno;
+ } else if (src_err->oer_iswrite) {
+ dest_err->oer_iswrite = true;
+ dest_err->oer_errno = src_err->oer_errno;
+ }
+}
+
+static void
+encode_accumulated_error(struct objlayout *objlay, __be32 *p)
+{
+ struct objlayout_io_state *state, *tmp;
+ struct pnfs_osd_ioerr accumulated_err = {.oer_errno = 0};
+
+ list_for_each_entry_safe(state, tmp, &objlay->err_list, err_list) {
+ unsigned i;
+
+ for (i = 0; i < state->num_comps; i++) {
+ struct pnfs_osd_ioerr *ioerr = &state->ioerrs[i];
+
+ if (!ioerr->oer_errno)
+ continue;
+
+ printk(KERN_ERR "%s: err[%d]: errno=%d is_write=%d "
+ "dev(%llx:%llx) par=0x%llx obj=0x%llx "
+ "offset=0x%llx length=0x%llx\n",
+ __func__, i, ioerr->oer_errno,
+ ioerr->oer_iswrite,
+ _DEVID_LO(&ioerr->oer_component.oid_device_id),
+ _DEVID_HI(&ioerr->oer_component.oid_device_id),
+ ioerr->oer_component.oid_partition_id,
+ ioerr->oer_component.oid_object_id,
+ ioerr->oer_comp_offset,
+ ioerr->oer_comp_length);
+
+ merge_ioerr(&accumulated_err, ioerr);
+ }
+ list_del(&state->err_list);
+ objlayout_free_io_state(state);
+ }
+
+ pnfs_osd_xdr_encode_ioerr(p, &accumulated_err);
+}
+
+void
+objlayout_encode_layoutreturn(struct pnfs_layout_hdr *pnfslay,
+ struct xdr_stream *xdr,
+ const struct nfs4_layoutreturn_args *args)
+{
+ struct objlayout *objlay = OBJLAYOUT(pnfslay);
+ struct objlayout_io_state *state, *tmp;
+ __be32 *start;
+
+ dprintk("%s: Begin\n", __func__);
+ start = xdr_reserve_space(xdr, 4);
+ BUG_ON(!start);
+
+ spin_lock(&objlay->lock);
+
+ list_for_each_entry_safe(state, tmp, &objlay->err_list, err_list) {
+ __be32 *last_xdr = NULL, *p;
+ unsigned i;
+ int res = 0;
+
+ for (i = 0; i < state->num_comps; i++) {
+ struct pnfs_osd_ioerr *ioerr = &state->ioerrs[i];
+
+ if (!ioerr->oer_errno)
+ continue;
+
+ dprintk("%s: err[%d]: errno=%d is_write=%d "
+ "dev(%llx:%llx) par=0x%llx obj=0x%llx "
+ "offset=0x%llx length=0x%llx\n",
+ __func__, i, ioerr->oer_errno,
+ ioerr->oer_iswrite,
+ _DEVID_LO(&ioerr->oer_component.oid_device_id),
+ _DEVID_HI(&ioerr->oer_component.oid_device_id),
+ ioerr->oer_component.oid_partition_id,
+ ioerr->oer_component.oid_object_id,
+ ioerr->oer_comp_offset,
+ ioerr->oer_comp_length);
+
+ p = pnfs_osd_xdr_ioerr_reserve_space(xdr);
+ if (unlikely(!p)) {
+ res = -E2BIG;
+ break; /* accumulated_error */
+ }
+
+ last_xdr = p;
+ pnfs_osd_xdr_encode_ioerr(p, &state->ioerrs[i]);
+ }
+
+ /* TODO: use xdr_write_pages */
+ if (unlikely(res)) {
+ /* no space for even one error descriptor */
+ BUG_ON(!last_xdr);
+
+ /* we've encountered a situation with lots and lots of
+ * errors and no space to encode them all. Use the last
+ * available slot to report the union of all the
+ * remaining errors.
+ */
+ encode_accumulated_error(objlay, last_xdr);
+ goto loop_done;
+ }
+ list_del(&state->err_list);
+ objlayout_free_io_state(state);
+ }
+loop_done:
+ spin_unlock(&objlay->lock);
+
+ *start = cpu_to_be32((xdr->p - start - 1) * 4);
+ dprintk("%s: Return\n", __func__);
+}
+
+
/*
* Get Device Info API for io engines
*/
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index 9a405e8..b0bb975 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -50,6 +50,10 @@
*/
struct objlayout {
struct pnfs_layout_hdr pnfs_layout;
+
+ /* for layout_return */
+ spinlock_t lock;
+ struct list_head err_list;
};

static inline struct objlayout *
@@ -76,6 +80,16 @@ struct objlayout_io_state {
int status; /* res */
int eof; /* res */
int committed; /* res */
+
+ /* Error reporting (layout_return) */
+ struct list_head err_list;
+ unsigned num_comps;
+ /* Pointer to array of error descriptors of size num_comps.
+ * It should contain as many entries as devices in the osd_layout
+ * that participate in the I/O. It is up to the io_engine to allocate
+ * needed space and set num_comps.
+ */
+ struct pnfs_osd_ioerr *ioerrs;
};

/*
@@ -101,6 +115,10 @@ extern ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state,
/*
* callback API
*/
+extern void objlayout_io_set_result(struct objlayout_io_state *state,
+ unsigned index, struct pnfs_osd_objid *pooid,
+ int osd_error, u64 offset, u64 length, bool is_write);
+
extern void objlayout_read_done(struct objlayout_io_state *state,
ssize_t status, bool sync);
extern void objlayout_write_done(struct objlayout_io_state *state,
@@ -131,4 +149,9 @@ extern enum pnfs_try_status objlayout_write_pagelist(
struct nfs_write_data *,
int how);

+extern void objlayout_encode_layoutreturn(
+ struct pnfs_layout_hdr *,
+ struct xdr_stream *,
+ const struct nfs4_layoutreturn_args *);
+
#endif /* _OBJLAYOUT_H */
--
1.7.2.3


2011-05-29 10:31:26

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 6/8] NFSv4.1: define nfs_generic_pg_test

By default, unless pnfs is used coalesce pages until pg_bsize
(rsize or wsize) is reached.

pnfs layout drivers define their own pg_test methods that use
pnfs_generic_pg_test and need to define their own I/O size
limits (e.g. based on the file stripe size).

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/pagelist.c | 17 +++++++----------
1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 5344371..36bb67f 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -204,6 +204,11 @@ nfs_wait_on_request(struct nfs_page *req)
TASK_UNINTERRUPTIBLE);
}

+static bool nfs_generic_pg_test(struct nfs_pageio_descriptor *desc, struct nfs_page *prev, struct nfs_page *req)
+{
+ return desc->pg_count + req->wb_bytes <= desc->pg_bsize;
+}
+
/**
* nfs_pageio_init - initialise a page io descriptor
* @desc: pointer to descriptor
@@ -229,7 +234,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
desc->pg_ioflags = io_flags;
desc->pg_error = 0;
desc->pg_lseg = NULL;
- desc->pg_test = NULL;
+ desc->pg_test = nfs_generic_pg_test;
pnfs_pageio_init(desc, inode);
}

@@ -260,13 +265,7 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
return false;
if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
return false;
- /*
- * Non-whole file layouts need to check that req is inside of
- * pgio->pg_lseg.
- */
- if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
- return false;
- return true;
+ return pgio->pg_test(pgio, prev, req);
}

/**
@@ -295,8 +294,6 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
if (desc->pg_bsize < PAGE_SIZE)
return 0;
newlen += desc->pg_count;
- if (newlen > desc->pg_bsize)
- return 0;
prev = nfs_list_entry(desc->pg_list.prev);
if (!nfs_can_coalesce_requests(prev, req, desc))
return 0;
--
1.7.2.3


2011-05-29 17:30:09

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 7/8] SQUASHME: Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test

On 2011-05-29 13:31, Boaz Harrosh wrote:
> desc->pg_bsize is negotiated with the MDS. But if we are doing
> pnfs-IO it is not relevent.
>
> While at it cleanup nfs_pageio_do_add_request() in light of the
> less things it needs to do.
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> fs/nfs/pagelist.c | 27 +++++++++++++--------------
> 1 files changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 36bb67f..624ec2c 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -206,6 +206,16 @@ nfs_wait_on_request(struct nfs_page *req)
>
> static bool nfs_generic_pg_test(struct nfs_pageio_descriptor *desc, struct nfs_page *prev, struct nfs_page *req)
> {
> + /*
> + * FIXME: ideally we should be able to coalesce all requests
> + * that are not block boundary aligned, but currently this
> + * is problematic for the case of bsize < PAGE_CACHE_SIZE,
> + * since nfs_flush_multi and nfs_pagein_multi assume you
> + * can have only one struct nfs_page.
> + */

nit: comment indent (will fix)

> + if (desc->pg_bsize < PAGE_SIZE)
> + return 0;
> +
> return desc->pg_count + req->wb_bytes <= desc->pg_bsize;
> }
>
> @@ -279,29 +289,18 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
> static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
> struct nfs_page *req)
> {
> - size_t newlen = req->wb_bytes;
> -
> if (desc->pg_count != 0) {
> struct nfs_page *prev;
>
> - /*
> - * FIXME: ideally we should be able to coalesce all requests
> - * that are not block boundary aligned, but currently this
> - * is problematic for the case of bsize < PAGE_CACHE_SIZE,
> - * since nfs_flush_multi and nfs_pagein_multi assume you
> - * can have only one struct nfs_page.
> - */
> - if (desc->pg_bsize < PAGE_SIZE)
> - return 0;
> - newlen += desc->pg_count;
> prev = nfs_list_entry(desc->pg_list.prev);
> if (!nfs_can_coalesce_requests(prev, req, desc))
> return 0;
> - } else
> + } else {
> desc->pg_base = req->wb_pgbase;
> + }

nit: no need to add the braces here (will remove)

> nfs_list_remove_request(req);
> nfs_list_add_request(req, &desc->pg_list);
> - desc->pg_count = newlen;
> + desc->pg_count += req->wb_bytes;

Thanks!

Benny

> return 1;
> }
>