2011-08-30 03:23:16

by Boaz Harrosh

[permalink] [raw]
Subject: [RFC] pnfs: Add per-LD-info to nfs_pageio_descriptor


What do you guys think? would it be acceptable to add a per-layout
private-data to nfs_pageio_descriptor?

In obj-LD we have bunch of constrains on the size of the IO that
involves some 64bit divisions, and math. These calculations are only
needed to be preformed once when the offset of the first page is
known. Then a simple wb_bytes can cache the results for subsequent
calls. (And cannot be calculated before we know the IO's offset)

Also I might want to allocate the io_state earlier at the insert
of the first page instead of at the actual call to write/read_pagelist,
again, for the same reason above.

Today we get by because at the very end, if some constraints hit
and not the full IO was preformed then we only set r/wdata->res.count
to less then what was requested and these pages that are outside of
the IOed range get to be read/written as part of a future request. But
this is sub-optimal because that is done only at read/write_done time.
By then the contiguous pages were already submitted to requests and
the few left-over pages get submitted as their own request. This causes
a seeky, unaligned and additional small IOs which, if calculated for at
coalesce time, would be spared. With the up coming raid5/6 code this
can cost dearly. (A single simple large contiguous IO becomes bunch of
read-modify-write IOs)

I can see that also at filelayout_pg_test there are two 64bit divisions
preformed on every page insert which could be optimized to a simple
compare.

[BTW: Perhaps change the .write/read_pagelist() API to directly receive
the nfs_pageio_descriptor and avoid all the duplication of types and
members copy
]

I'm making pg_ld_private as a "long" because a long is good for a pointer
as well as an integer.

Signed-off-by: Boaz Harrosh <[email protected]>

---
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index e2791a2..c86bae5 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -77,6 +77,7 @@ struct nfs_pageio_descriptor {
int pg_error;
const struct rpc_call_ops *pg_rpc_callops;
struct pnfs_layout_segment *pg_lseg;
+ long pg_ld_private;
};

#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))


2011-08-31 21:27:21

by Benny Halevy

[permalink] [raw]
Subject: Re: [RFC] pnfs: Add per-LD-info to nfs_pageio_descriptor

On 2011-08-29 20:22, Boaz Harrosh wrote:
>
> What do you guys think? would it be acceptable to add a per-layout
> private-data to nfs_pageio_descriptor?
>
> In obj-LD we have bunch of constrains on the size of the IO that
> involves some 64bit divisions, and math. These calculations are only
> needed to be preformed once when the offset of the first page is
> known. Then a simple wb_bytes can cache the results for subsequent
> calls. (And cannot be calculated before we know the IO's offset)
>
> Also I might want to allocate the io_state earlier at the insert
> of the first page instead of at the actual call to write/read_pagelist,
> again, for the same reason above.
>
> Today we get by because at the very end, if some constraints hit
> and not the full IO was preformed then we only set r/wdata->res.count
> to less then what was requested and these pages that are outside of
> the IOed range get to be read/written as part of a future request. But
> this is sub-optimal because that is done only at read/write_done time.
> By then the contiguous pages were already submitted to requests and
> the few left-over pages get submitted as their own request. This causes
> a seeky, unaligned and additional small IOs which, if calculated for at
> coalesce time, would be spared. With the up coming raid5/6 code this
> can cost dearly. (A single simple large contiguous IO becomes bunch of
> read-modify-write IOs)
>
> I can see that also at filelayout_pg_test there are two 64bit divisions
> preformed on every page insert which could be optimized to a simple
> compare.
>
> [BTW: Perhaps change the .write/read_pagelist() API to directly receive
> the nfs_pageio_descriptor and avoid all the duplication of types and
> members copy
> ]
>
> I'm making pg_ld_private as a "long" because a long is good for a pointer
> as well as an integer.

I really prefer it to be a void * rather than long for the same reason
just as used in practically every other place.

Benny

>
> Signed-off-by: Boaz Harrosh <[email protected]>
>
> ---
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index e2791a2..c86bae5 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -77,6 +77,7 @@ struct nfs_pageio_descriptor {
> int pg_error;
> const struct rpc_call_ops *pg_rpc_callops;
> struct pnfs_layout_segment *pg_lseg;
> + long pg_ld_private;
> };
>
> #define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html