2010-09-12 17:01:16

by Boaz Harrosh

[permalink] [raw]
Subject: [RFC 0/3] Ugliness in struct pnfs_layout_segment and possible uselessness of the device cache

Struct pnfs_layout_segment is the generic part of the pnfs-getlayout (segment) operation
and should not hold any layout specific part.

So what is the struct nfs4_deviceid *deviceid; member doing there. This is a files
only useful member that has no meaning in objects and blocks. (The later have multiple
devices per layout)

I've hacked up a possible fix. Please review. These will need more work done:
- Actual testing. They are compiled only.
- Adjusted to multiple pending changes from the changes to the layout segment allocation
to all these latest Fred changes.
- I wish the relevant parts get into the first batch of submission in the final form
presented here, instead of changing later. Note the cleanup it gives to the files
layout driver. Which can even be driven deeper. So they'll need to be integrated
down deeper. (And code rebased above it)
- So I've actually attempted using the generic cache in the objects driver. And it
should be usable once all the code changes settle down, and I fix that FIXME: which
is there now (See [RFC 3/3])

Andy? (Fred)? would you want to take the generic and file-layout parts out of my hands
and see them through. For one they need testing which I don't have readily available.
That is if you like them and they make any sense.

[RFC 1/3] SQUASHME: Generalize the device cache so it can be used by all layouts
[RFC 2/3] SQUASHME: fileslayout: Adjust to device_cache API changes
[RFC 3/3] SQUASHME: pnfs-obj: Use the generic device cache

Thanks
Boaz


2010-09-12 19:42:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC 1/3] SQUASHME: Generalize the device cache so it can be used by all layouts

On Sun, 2010-09-12 at 19:03 +0200, Boaz Harrosh wrote:
> Current code thinks that there can be a single device_id per
> layout segment. Change that to assume no relations between segments
> and device_ids. It's now up to the layout-driver to make any relations.
>
> Files layout driver is fixed in next patch.
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> fs/nfs/pnfs.c | 20 +++++---------------
> include/linux/nfs4_pnfs.h | 6 +-----
> 2 files changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index eb4e092..f26abc0 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1892,26 +1892,16 @@ nfs4_init_deviceid_node(struct nfs4_deviceid *d)
> }
> EXPORT_SYMBOL(nfs4_init_deviceid_node);
>
> -/* Called from layoutdriver_io_operations->alloc_lseg */
> -void
> -nfs4_set_layout_deviceid(struct pnfs_layout_segment *l, struct nfs4_deviceid *d)
> -{
> - dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
> - l->deviceid = d;
> -}
> -EXPORT_SYMBOL(nfs4_set_layout_deviceid);
> -
> /* Called from layoutdriver_io_operations->free_lseg */
> void
> -nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *l,
> - struct nfs4_deviceid *d,
> - void (*free_callback)(struct kref *))
> +nfs4_put_deviceid(struct nfs4_deviceid_cache *c,
> + struct nfs4_deviceid *d)
> {
> dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
> - l->deviceid = NULL;
> - kref_put(&d->de_kref, free_callback);
> + kref_put(&d->de_kref, c->dc_free_callback);
> + /* Do we need to return the deviceid_cache ref */
> }
> -EXPORT_SYMBOL(nfs4_put_unset_layout_deviceid);
> +EXPORT_SYMBOL(nfs4_put_deviceid);
>
> /* Find and reference a deviceid */
> struct nfs4_deviceid *
> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
> index 577cd2b..dc3410e 100644
> --- a/include/linux/nfs4_pnfs.h
> +++ b/include/linux/nfs4_pnfs.h
> @@ -111,7 +111,6 @@ struct pnfs_layout_segment {
> struct kref kref;
> bool valid;
> struct pnfs_layout_hdr *layout;
> - struct nfs4_deviceid *deviceid;
> u8 ld_data[]; /* layout driver private data */
> };
>
> @@ -287,11 +286,8 @@ extern struct nfs4_deviceid *nfs4_find_get_deviceid(
> struct pnfs_deviceid *);
> extern struct nfs4_deviceid *nfs4_add_get_deviceid(struct nfs4_deviceid_cache *,
> struct nfs4_deviceid *);
> -extern void nfs4_set_layout_deviceid(struct pnfs_layout_segment *,
> +extern void nfs4_put_deviceid(struct nfs4_deviceid_cache *,
> struct nfs4_deviceid *);
> -extern void nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *,
> - struct nfs4_deviceid *,
> - void (*free_callback)(struct kref *));
> extern void nfs4_delete_device(struct nfs4_deviceid_cache *,
> struct pnfs_deviceid *);
>

This doesn't look like it needs to be part of the initial single-layout
submission. Why is it being labelled as a squashme?

Trond

2010-09-13 09:50:32

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [RFC 1/3] SQUASHME: Generalize the device cache so it can be used by all layouts

On 09/12/2010 09:42 PM, Trond Myklebust wrote:
> On Sun, 2010-09-12 at 19:03 +0200, Boaz Harrosh wrote:
>> Current code thinks that there can be a single device_id per
>> layout segment. Change that to assume no relations between segments
>> and device_ids. It's now up to the layout-driver to make any relations.
>>
>> Files layout driver is fixed in next patch.
>>
>> Signed-off-by: Boaz Harrosh <[email protected]>
>> ---
>> fs/nfs/pnfs.c | 20 +++++---------------
>> include/linux/nfs4_pnfs.h | 6 +-----
>> 2 files changed, 6 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index eb4e092..f26abc0 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1892,26 +1892,16 @@ nfs4_init_deviceid_node(struct nfs4_deviceid *d)
>> }
>> EXPORT_SYMBOL(nfs4_init_deviceid_node);
>>
>> -/* Called from layoutdriver_io_operations->alloc_lseg */
>> -void
>> -nfs4_set_layout_deviceid(struct pnfs_layout_segment *l, struct nfs4_deviceid *d)
>> -{
>> - dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
>> - l->deviceid = d;
>> -}
>> -EXPORT_SYMBOL(nfs4_set_layout_deviceid);
>> -
>> /* Called from layoutdriver_io_operations->free_lseg */
>> void
>> -nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *l,
>> - struct nfs4_deviceid *d,
>> - void (*free_callback)(struct kref *))
>> +nfs4_put_deviceid(struct nfs4_deviceid_cache *c,
>> + struct nfs4_deviceid *d)
>> {
>> dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
>> - l->deviceid = NULL;
>> - kref_put(&d->de_kref, free_callback);
>> + kref_put(&d->de_kref, c->dc_free_callback);
>> + /* Do we need to return the deviceid_cache ref */
>> }
>> -EXPORT_SYMBOL(nfs4_put_unset_layout_deviceid);
>> +EXPORT_SYMBOL(nfs4_put_deviceid);
>>
>> /* Find and reference a deviceid */
>> struct nfs4_deviceid *
>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>> index 577cd2b..dc3410e 100644
>> --- a/include/linux/nfs4_pnfs.h
>> +++ b/include/linux/nfs4_pnfs.h
>> @@ -111,7 +111,6 @@ struct pnfs_layout_segment {
>> struct kref kref;
>> bool valid;
>> struct pnfs_layout_hdr *layout;
>> - struct nfs4_deviceid *deviceid;
>> u8 ld_data[]; /* layout driver private data */
>> };
>>
>> @@ -287,11 +286,8 @@ extern struct nfs4_deviceid *nfs4_find_get_deviceid(
>> struct pnfs_deviceid *);
>> extern struct nfs4_deviceid *nfs4_add_get_deviceid(struct nfs4_deviceid_cache *,
>> struct nfs4_deviceid *);
>> -extern void nfs4_set_layout_deviceid(struct pnfs_layout_segment *,
>> +extern void nfs4_put_deviceid(struct nfs4_deviceid_cache *,
>> struct nfs4_deviceid *);
>> -extern void nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *,
>> - struct nfs4_deviceid *,
>> - void (*free_callback)(struct kref *));
>> extern void nfs4_delete_device(struct nfs4_deviceid_cache *,
>> struct pnfs_deviceid *);
>>
>
> This doesn't look like it needs to be part of the initial single-layout
> submission. Why is it being labelled as a squashme?
>
> Trond

No, this is a device_id cache per nfs-client it is part of the initial
getdeviceinfo call. Many "single-layout" may refer to the same device_id
therefor it is cached so the long getdeviceinfo call can be done only once.

Cheers
Boaz

2010-09-13 12:45:17

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [RFC 1/3] SQUASHME: Generalize the device cache so it can be used by all layouts

On 09/13/2010 02:16 PM, Fred Isaman wrote:
> On Mon, Sep 13, 2010 at 2:50 AM, Boaz Harrosh <[email protected]> wrote:
>>> This doesn't look like it needs to be part of the initial single-layout
>>> submission. Why is it being labelled as a squashme?
>>>
>>> Trond
>>
>> No, this is a device_id cache per nfs-client it is part of the initial
>> getdeviceinfo call. Many "single-layout" may refer to the same device_id
>> therefor it is cached so the long getdeviceinfo call can be done only once.
>>
>> Cheers
>> Boaz
>
> I agree. I'll roll it into the next submission.
>
> Fred

Thanks Fred. I would like if you can report that it actually works, as I have
not tested it.

BTW. For the objects I have decided to not keep a reference on the device_id
during the lseg life span, but just keep a reference to the underlying osd_device
so the device_id can be removed from the cache but still be used by a
layout_seg/io_state. I have tested and it works and apparently it unmounts
cleanly without any reference leaks. Will post patches later once the new API
hits Benny's tree.

Thanks
Boaz

2010-09-12 17:31:21

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [RFC 3/3] SQUASHME: pnfs-obj: Use the generic device cache

On 09/12/2010 07:06 PM, Boaz Harrosh wrote:
>
> Drop the hand crafted device id cache and use the generic
> one.
>
> FIXME: This is done bad and will need some segment structures
> reorganization. I will do it once all pending changes are
> integrated and code is a bit more stable.
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> fs/nfs/objlayout/objio_osd.c | 120 ++++++++++++++++++++++-------------------
> fs/nfs/objlayout/objlayout.c | 5 +-
> fs/nfs/objlayout/objlayout.h | 4 +-
> fs/nfs/objlayout/panfs_shim.c | 4 +-
> 4 files changed, 72 insertions(+), 61 deletions(-)
>
> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
> index 1a155c3..b60b8c5 100644
> --- a/fs/nfs/objlayout/objio_osd.c
> +++ b/fs/nfs/objlayout/objio_osd.c
> @@ -63,77 +63,69 @@ struct objio_mount_type {
> };
>
> struct _dev_ent {
> - struct list_head list;
> - struct pnfs_deviceid d_id;
> + struct nfs4_deviceid nfs4_di;
> struct osd_dev *od;
> };
>
> -static void _dev_list_remove_all(struct objio_mount_type *omt)
> -{
> - spin_lock(&omt->dev_list_lock);
> -
> - while (!list_empty(&omt->dev_list)) {
> - struct _dev_ent *de = list_entry(omt->dev_list.next,
> - struct _dev_ent, list);
> -
> - list_del_init(&de->list);
> - osduld_put_device(de->od);
> - kfree(de);
> - }
> -
> - spin_unlock(&omt->dev_list_lock);
> -}
> -
> -static struct osd_dev *___dev_list_find(struct objio_mount_type *omt,
> +static struct osd_dev *_dev_list_find(struct pnfs_layout_hdr *pnfslay,
> struct pnfs_deviceid *d_id)
> {
> - struct list_head *le;
> -
> - list_for_each(le, &omt->dev_list) {
> - struct _dev_ent *de = list_entry(le, struct _dev_ent, list);
> + struct nfs4_deviceid *d;
> + struct nfs4_deviceid_cache *devid_cache =
> + NFS_SERVER(pnfslay->inode)->nfs_client->cl_devid_cache;
> + struct osd_dev *od;
>
> - if (0 == memcmp(&de->d_id, d_id, sizeof(*d_id)))
> - return de->od;
> - }
> + d = nfs4_find_get_deviceid(devid_cache, d_id);
>
> - return NULL;
> -}
> + if (!d)
> + return NULL;
>
> -static struct osd_dev *_dev_list_find(struct objio_mount_type *omt,
> - struct pnfs_deviceid *d_id)
> -{
> - struct osd_dev *od;
> + od = container_of(d, struct _dev_ent, nfs4_di)->od;
>
> - spin_lock(&omt->dev_list_lock);
> - od = ___dev_list_find(omt, d_id);
> - spin_unlock(&omt->dev_list_lock);
> + /* FIXME: A referance is taken on an added deviceid in _dev_list_add
> + * That ref is only dropped at unload when device cache is freed.
> + * Whatever prevented uninitialize_mountpoint from been called
> + * with active layouts still applies today. Later when we will
> + * Have code that caps the cache
> + */

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index b60b8c5..1acc527 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -82,11 +82,12 @@ static struct osd_dev *_dev_list_find(struct pnfs_layout_hdr *pnfslay,

od = container_of(d, struct _dev_ent, nfs4_di)->od;

- /* FIXME: A referance is taken on an added deviceid in _dev_list_add
+ /* FIXME: A reference is taken on an added deviceid in _dev_list_add
* That ref is only dropped at unload when device cache is freed.
* Whatever prevented uninitialize_mountpoint from been called
* with active layouts still applies today. Later when we will
- * Have code that caps the cache
+ * Have code that caps the cache size and starts freeing devices
+ * dynamically, this will be a bug that must be fixed.
*/
nfs4_put_deviceid(devid_cache, d);
return od;

> + nfs4_put_deviceid(devid_cache, d);
> return od;
> }
>
> -static int _dev_list_add(struct objio_mount_type *omt,
> +static struct osd_dev *_dev_list_add(struct pnfs_layout_hdr *pnfslay,
> struct pnfs_deviceid *d_id, struct osd_dev *od)
> {
> struct _dev_ent *de = kzalloc(sizeof(*de), GFP_KERNEL);
> + struct nfs4_deviceid *d;
>
> if (!de)
> - return -ENOMEM;
> + return ERR_PTR(-ENOMEM);
>
> - spin_lock(&omt->dev_list_lock);
> + d = nfs4_add_get_deviceid(
> + NFS_SERVER(pnfslay->inode)->nfs_client->cl_devid_cache,
> + &de->nfs4_di);
>
> - if (___dev_list_find(omt, d_id)) {
> - kfree(de);
> - goto out;
> + if (d != &de->nfs4_di) {
> + /* there was a race and we lost */
> + osduld_put_device(de->od);
> + de = container_of(d, struct _dev_ent, nfs4_di);
> }
>
> - de->d_id = *d_id;
> - de->od = od;
> - list_add(&de->list, &omt->dev_list);
> + return de->od;
> +}
>
> -out:
> - spin_unlock(&omt->dev_list_lock);
> - return 0;
> +void
> +objio_free_dev(struct kref *kref)
> +{
> + struct nfs4_deviceid *d_id =
> + container_of(kref, struct nfs4_deviceid, de_kref);
> + struct _dev_ent *de = container_of(d_id, struct _dev_ent, nfs4_di);
> +
> + osduld_put_device(de->od);
> + kfree(de);
> }
>
> +
> struct objio_segment {
> struct pnfs_osd_object_cred *comps;
>
> @@ -183,12 +175,11 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
> struct pnfs_deviceid *d_id;
> struct osd_dev *od;
> struct osd_dev_info odi;
> - struct objio_mount_type *omt = PNFS_NFS_SERVER(pnfslay)->pnfs_ld_data;
> int err;
>
> d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;
>
> - od = _dev_list_find(omt, d_id);
> + od = _dev_list_find(pnfslay, d_id);
> if (od)
> return od;
>
> @@ -222,7 +213,8 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
> goto out;
> }
>
> - _dev_list_add(omt, d_id, od);
> + /* od might have changed during add */
> + od = _dev_list_add(pnfslay, d_id, od);
>
> out:
> dprintk("%s: return=%d\n", __func__, err);
> @@ -347,6 +339,12 @@ free_seg:
> void objio_free_lseg(void *p)
> {
> struct objio_segment *objio_seg = p;
> +/* struct nfs_server *nfss = NFS_SERVER(PNFS_INODE(pnfslay));
> +
> + for (i = 0; i < objio_seg->num_comps; i++) {
> + nfs4_put_deviceid(nfss->nfs_client->cl_devid_cache,
> + objio_seg->devids[i]);
> + }*/
>
> kfree(objio_seg);
> }
> @@ -1044,21 +1042,33 @@ static struct pnfs_layoutdriver_type objlayout_type = {
> .ld_policy_ops = &objlayout_policy_operations,
> };
>
> -void *objio_init_mt(void)
> +void *objio_init_mt(struct nfs_server *nfss)
> {
> + int status;
> struct objio_mount_type *omt = kzalloc(sizeof(*omt), GFP_KERNEL);
>
> if (!omt)
> return ERR_PTR(-ENOMEM);
>
> - INIT_LIST_HEAD(&omt->dev_list);
> - spin_lock_init(&omt->dev_list_lock);
> + status = nfs4_alloc_init_deviceid_cache(nfss->nfs_client,
> + objio_free_dev);
> + if (status) {
> + printk(KERN_WARNING "%s: deviceid cache could not be "
> + "initialized\n", __func__);
> + kfree(omt);
> + return ERR_PTR(status);
> + }
> + dprintk("%s: deviceid cache has been initialized successfully\n",
> + __func__);
> +
> return omt;
> }
>
> -void objio_fini_mt(void *mountid)
> +void objio_fini_mt(struct nfs_server *nfss, void *mountid)
> {
> - _dev_list_remove_all(mountid);
> + if (nfss->pnfs_curr_ld && nfss->nfs_client->cl_devid_cache)
> + nfs4_put_deviceid_cache(nfss->nfs_client);
> +
> kfree(mountid);
> }
>
> diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
> index 259c616..0b541e6 100644
> --- a/fs/nfs/objlayout/objlayout.c
> +++ b/fs/nfs/objlayout/objlayout.c
> @@ -752,7 +752,7 @@ objlayout_initialize_mountpoint(struct nfs_server *server,
> {
> void *data;
>
> - data = objio_init_mt();
> + data = objio_init_mt(server);
> if (IS_ERR(data)) {
> printk(KERN_INFO "%s: objlayout lib not ready err=%ld\n",
> __func__, PTR_ERR(data));
> @@ -771,7 +771,8 @@ static int
> objlayout_uninitialize_mountpoint(struct nfs_server *server)
> {
> dprintk("%s: Begin %p\n", __func__, server->pnfs_ld_data);
> - objio_fini_mt(server->pnfs_ld_data);
> +
> + objio_fini_mt(server, server->pnfs_ld_data);
> return 0;
> }
>
> diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
> index adec7ad..a531016 100644
> --- a/fs/nfs/objlayout/objlayout.h
> +++ b/fs/nfs/objlayout/objlayout.h
> @@ -113,8 +113,8 @@ struct objlayout_io_state {
> /*
> * Raid engine I/O API
> */
> -extern void *objio_init_mt(void);
> -extern void objio_fini_mt(void *mt);
> +extern void *objio_init_mt(struct nfs_server *nfss);
> +extern void objio_fini_mt(struct nfs_server *nfss, void *mt);
>
> extern int objio_alloc_lseg(void **outp,
> struct pnfs_layout_hdr *pnfslay,
> diff --git a/fs/nfs/objlayout/panfs_shim.c b/fs/nfs/objlayout/panfs_shim.c
> index fd46e96..362d088 100644
> --- a/fs/nfs/objlayout/panfs_shim.c
> +++ b/fs/nfs/objlayout/panfs_shim.c
> @@ -51,12 +51,12 @@
> struct panfs_export_operations *panfs_export_ops;
>
> void *
> -objio_init_mt(void)
> +objio_init_mt(struct nfs_server *nfss)
> {
> return panfs_export_ops == NULL ? ERR_PTR(-EAGAIN) : NULL;
> }
>
> -void objio_fini_mt(void *mountid)
> +void objio_fini_mt(struct nfs_server *nfss, void *mountid)
> {
> }
>


2010-09-12 17:03:20

by Boaz Harrosh

[permalink] [raw]
Subject: [RFC 1/3] SQUASHME: Generalize the device cache so it can be used by all layouts


Current code thinks that there can be a single device_id per
layout segment. Change that to assume no relations between segments
and device_ids. It's now up to the layout-driver to make any relations.

Files layout driver is fixed in next patch.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/pnfs.c | 20 +++++---------------
include/linux/nfs4_pnfs.h | 6 +-----
2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index eb4e092..f26abc0 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1892,26 +1892,16 @@ nfs4_init_deviceid_node(struct nfs4_deviceid *d)
}
EXPORT_SYMBOL(nfs4_init_deviceid_node);

-/* Called from layoutdriver_io_operations->alloc_lseg */
-void
-nfs4_set_layout_deviceid(struct pnfs_layout_segment *l, struct nfs4_deviceid *d)
-{
- dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
- l->deviceid = d;
-}
-EXPORT_SYMBOL(nfs4_set_layout_deviceid);
-
/* Called from layoutdriver_io_operations->free_lseg */
void
-nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *l,
- struct nfs4_deviceid *d,
- void (*free_callback)(struct kref *))
+nfs4_put_deviceid(struct nfs4_deviceid_cache *c,
+ struct nfs4_deviceid *d)
{
dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
- l->deviceid = NULL;
- kref_put(&d->de_kref, free_callback);
+ kref_put(&d->de_kref, c->dc_free_callback);
+ /* Do we need to return the deviceid_cache ref */
}
-EXPORT_SYMBOL(nfs4_put_unset_layout_deviceid);
+EXPORT_SYMBOL(nfs4_put_deviceid);

/* Find and reference a deviceid */
struct nfs4_deviceid *
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 577cd2b..dc3410e 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -111,7 +111,6 @@ struct pnfs_layout_segment {
struct kref kref;
bool valid;
struct pnfs_layout_hdr *layout;
- struct nfs4_deviceid *deviceid;
u8 ld_data[]; /* layout driver private data */
};

@@ -287,11 +286,8 @@ extern struct nfs4_deviceid *nfs4_find_get_deviceid(
struct pnfs_deviceid *);
extern struct nfs4_deviceid *nfs4_add_get_deviceid(struct nfs4_deviceid_cache *,
struct nfs4_deviceid *);
-extern void nfs4_set_layout_deviceid(struct pnfs_layout_segment *,
+extern void nfs4_put_deviceid(struct nfs4_deviceid_cache *,
struct nfs4_deviceid *);
-extern void nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *,
- struct nfs4_deviceid *,
- void (*free_callback)(struct kref *));
extern void nfs4_delete_device(struct nfs4_deviceid_cache *,
struct pnfs_deviceid *);

--
1.7.2.2



2010-09-12 17:05:06

by Boaz Harrosh

[permalink] [raw]
Subject: [RFC 2/3] SQUASHME: fileslayout: Adjust to device_cache API changes.


Move the struct nfs4_deviceid *deviceid pointer from the
Generic part to the fileslayout specific part. The rest
is left the same as Before.

This prompts for a cleanup at the nfs4_filelayout_segment
level as now it stands out more bluntly the (possible)
duplication of members. It looks like by accessing
nfs4_file_layout_dsaddr directly from nfs4_filelayout_segment
we can drop lots of the segments members.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/nfs4filelayout.c | 14 ++++++++------
fs/nfs/nfs4filelayout.h | 5 +----
fs/nfs/nfs4filelayoutdev.c | 7 ++++---
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 46a8a03..1434163 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -113,7 +113,7 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
u64 tmp, off;
u32 unit = flseg->stripe_unit;

- stripe_width = unit * FILE_DSADDR(lseg)->stripe_count;
+ stripe_width = unit * flseg->dsaddr->stripe_count;
tmp = off = offset - flseg->pattern_offset;
do_div(tmp, stripe_width);
return tmp * unit + do_div(off, unit);
@@ -376,15 +376,14 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
nfss->wsize);
}

- nfs4_set_layout_deviceid(lseg, &dsaddr->deviceid);
+ fl->dsaddr = dsaddr;

status = 0;
out:
dprintk("--> %s returns %d\n", __func__, status);
return status;
out_put:
- nfs4_put_unset_layout_deviceid(lseg, &dsaddr->deviceid,
- nfs4_fl_free_deviceid_callback);
+ nfs4_put_deviceid(nfss->nfs_client->cl_devid_cache, &dsaddr->deviceid);
goto out;
}

@@ -507,9 +506,12 @@ _filelayout_free_lseg(struct pnfs_layout_segment *lseg)
static void
filelayout_free_lseg(struct pnfs_layout_segment *lseg)
{
+ struct nfs4_filelayout_segment *fl = LSEG_LD_DATA(lseg);
+ struct nfs_server *nfss = NFS_SERVER(PNFS_INODE(lseg->layout));
+
dprintk("--> %s\n", __func__);
- nfs4_put_unset_layout_deviceid(lseg, lseg->deviceid,
- nfs4_fl_free_deviceid_callback);
+ nfs4_put_deviceid(nfss->nfs_client->cl_devid_cache,
+ &fl->dsaddr->deviceid);
_filelayout_free_lseg(lseg);
}

diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index be6dc33..a2beac8 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -23,10 +23,6 @@
#define NFS4_PNFS_MAX_MULTI_CNT 64 /* 256 fit into a u8 stripe_index */
#define NFS4_PNFS_MAX_MULTI_DS 2

-#define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
- struct nfs4_file_layout_dsaddr, \
- deviceid))
-
enum stripetype4 {
STRIPE_SPARSE = 1,
STRIPE_DENSE = 2
@@ -62,6 +58,7 @@ struct nfs4_filelayout_segment {
u32 first_stripe_index;
u64 pattern_offset;
struct pnfs_deviceid dev_id;
+ struct nfs4_file_layout_dsaddr *dsaddr;
unsigned int num_fh;
struct nfs_fh *fh_array;
};
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 0354c61..7bdb1f8 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -573,16 +573,17 @@ _nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset)
tmp = offset - flseg->pattern_offset;
do_div(tmp, flseg->stripe_unit);
tmp += flseg->first_stripe_index;
- return do_div(tmp, FILE_DSADDR(lseg)->stripe_count);
+ return do_div(tmp, flseg->dsaddr->stripe_count);
}

u32
nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset)
{
+ struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
u32 j;

j = _nfs4_fl_calc_j_index(lseg, offset);
- return FILE_DSADDR(lseg)->stripe_indices[j];
+ return flseg->dsaddr->stripe_indices[j];
}

struct nfs_fh *
@@ -609,7 +610,7 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
struct nfs4_file_layout_dsaddr *dsaddr;

- dsaddr = FILE_DSADDR(lseg);
+ dsaddr = flseg->dsaddr;
if (dsaddr->ds_list[ds_idx] == NULL) {
printk(KERN_ERR "%s: No data server for device id (%s)!!\n",
__func__, deviceid_fmt(&flseg->dev_id));
--
1.7.2.2



2010-09-12 17:06:48

by Boaz Harrosh

[permalink] [raw]
Subject: [RFC 3/3] SQUASHME: pnfs-obj: Use the generic device cache


Drop the hand crafted device id cache and use the generic
one.

FIXME: This is done bad and will need some segment structures
reorganization. I will do it once all pending changes are
integrated and code is a bit more stable.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfs/objlayout/objio_osd.c | 120 ++++++++++++++++++++++-------------------
fs/nfs/objlayout/objlayout.c | 5 +-
fs/nfs/objlayout/objlayout.h | 4 +-
fs/nfs/objlayout/panfs_shim.c | 4 +-
4 files changed, 72 insertions(+), 61 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 1a155c3..b60b8c5 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -63,77 +63,69 @@ struct objio_mount_type {
};

struct _dev_ent {
- struct list_head list;
- struct pnfs_deviceid d_id;
+ struct nfs4_deviceid nfs4_di;
struct osd_dev *od;
};

-static void _dev_list_remove_all(struct objio_mount_type *omt)
-{
- spin_lock(&omt->dev_list_lock);
-
- while (!list_empty(&omt->dev_list)) {
- struct _dev_ent *de = list_entry(omt->dev_list.next,
- struct _dev_ent, list);
-
- list_del_init(&de->list);
- osduld_put_device(de->od);
- kfree(de);
- }
-
- spin_unlock(&omt->dev_list_lock);
-}
-
-static struct osd_dev *___dev_list_find(struct objio_mount_type *omt,
+static struct osd_dev *_dev_list_find(struct pnfs_layout_hdr *pnfslay,
struct pnfs_deviceid *d_id)
{
- struct list_head *le;
-
- list_for_each(le, &omt->dev_list) {
- struct _dev_ent *de = list_entry(le, struct _dev_ent, list);
+ struct nfs4_deviceid *d;
+ struct nfs4_deviceid_cache *devid_cache =
+ NFS_SERVER(pnfslay->inode)->nfs_client->cl_devid_cache;
+ struct osd_dev *od;

- if (0 == memcmp(&de->d_id, d_id, sizeof(*d_id)))
- return de->od;
- }
+ d = nfs4_find_get_deviceid(devid_cache, d_id);

- return NULL;
-}
+ if (!d)
+ return NULL;

-static struct osd_dev *_dev_list_find(struct objio_mount_type *omt,
- struct pnfs_deviceid *d_id)
-{
- struct osd_dev *od;
+ od = container_of(d, struct _dev_ent, nfs4_di)->od;

- spin_lock(&omt->dev_list_lock);
- od = ___dev_list_find(omt, d_id);
- spin_unlock(&omt->dev_list_lock);
+ /* FIXME: A referance is taken on an added deviceid in _dev_list_add
+ * That ref is only dropped at unload when device cache is freed.
+ * Whatever prevented uninitialize_mountpoint from been called
+ * with active layouts still applies today. Later when we will
+ * Have code that caps the cache
+ */
+ nfs4_put_deviceid(devid_cache, d);
return od;
}

-static int _dev_list_add(struct objio_mount_type *omt,
+static struct osd_dev *_dev_list_add(struct pnfs_layout_hdr *pnfslay,
struct pnfs_deviceid *d_id, struct osd_dev *od)
{
struct _dev_ent *de = kzalloc(sizeof(*de), GFP_KERNEL);
+ struct nfs4_deviceid *d;

if (!de)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);

- spin_lock(&omt->dev_list_lock);
+ d = nfs4_add_get_deviceid(
+ NFS_SERVER(pnfslay->inode)->nfs_client->cl_devid_cache,
+ &de->nfs4_di);

- if (___dev_list_find(omt, d_id)) {
- kfree(de);
- goto out;
+ if (d != &de->nfs4_di) {
+ /* there was a race and we lost */
+ osduld_put_device(de->od);
+ de = container_of(d, struct _dev_ent, nfs4_di);
}

- de->d_id = *d_id;
- de->od = od;
- list_add(&de->list, &omt->dev_list);
+ return de->od;
+}

-out:
- spin_unlock(&omt->dev_list_lock);
- return 0;
+void
+objio_free_dev(struct kref *kref)
+{
+ struct nfs4_deviceid *d_id =
+ container_of(kref, struct nfs4_deviceid, de_kref);
+ struct _dev_ent *de = container_of(d_id, struct _dev_ent, nfs4_di);
+
+ osduld_put_device(de->od);
+ kfree(de);
}

+
struct objio_segment {
struct pnfs_osd_object_cred *comps;

@@ -183,12 +175,11 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
struct pnfs_deviceid *d_id;
struct osd_dev *od;
struct osd_dev_info odi;
- struct objio_mount_type *omt = PNFS_NFS_SERVER(pnfslay)->pnfs_ld_data;
int err;

d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;

- od = _dev_list_find(omt, d_id);
+ od = _dev_list_find(pnfslay, d_id);
if (od)
return od;

@@ -222,7 +213,8 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
goto out;
}

- _dev_list_add(omt, d_id, od);
+ /* od might have changed during add */
+ od = _dev_list_add(pnfslay, d_id, od);

out:
dprintk("%s: return=%d\n", __func__, err);
@@ -347,6 +339,12 @@ free_seg:
void objio_free_lseg(void *p)
{
struct objio_segment *objio_seg = p;
+/* struct nfs_server *nfss = NFS_SERVER(PNFS_INODE(pnfslay));
+
+ for (i = 0; i < objio_seg->num_comps; i++) {
+ nfs4_put_deviceid(nfss->nfs_client->cl_devid_cache,
+ objio_seg->devids[i]);
+ }*/

kfree(objio_seg);
}
@@ -1044,21 +1042,33 @@ static struct pnfs_layoutdriver_type objlayout_type = {
.ld_policy_ops = &objlayout_policy_operations,
};

-void *objio_init_mt(void)
+void *objio_init_mt(struct nfs_server *nfss)
{
+ int status;
struct objio_mount_type *omt = kzalloc(sizeof(*omt), GFP_KERNEL);

if (!omt)
return ERR_PTR(-ENOMEM);

- INIT_LIST_HEAD(&omt->dev_list);
- spin_lock_init(&omt->dev_list_lock);
+ status = nfs4_alloc_init_deviceid_cache(nfss->nfs_client,
+ objio_free_dev);
+ if (status) {
+ printk(KERN_WARNING "%s: deviceid cache could not be "
+ "initialized\n", __func__);
+ kfree(omt);
+ return ERR_PTR(status);
+ }
+ dprintk("%s: deviceid cache has been initialized successfully\n",
+ __func__);
+
return omt;
}

-void objio_fini_mt(void *mountid)
+void objio_fini_mt(struct nfs_server *nfss, void *mountid)
{
- _dev_list_remove_all(mountid);
+ if (nfss->pnfs_curr_ld && nfss->nfs_client->cl_devid_cache)
+ nfs4_put_deviceid_cache(nfss->nfs_client);
+
kfree(mountid);
}

diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 259c616..0b541e6 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -752,7 +752,7 @@ objlayout_initialize_mountpoint(struct nfs_server *server,
{
void *data;

- data = objio_init_mt();
+ data = objio_init_mt(server);
if (IS_ERR(data)) {
printk(KERN_INFO "%s: objlayout lib not ready err=%ld\n",
__func__, PTR_ERR(data));
@@ -771,7 +771,8 @@ static int
objlayout_uninitialize_mountpoint(struct nfs_server *server)
{
dprintk("%s: Begin %p\n", __func__, server->pnfs_ld_data);
- objio_fini_mt(server->pnfs_ld_data);
+
+ objio_fini_mt(server, server->pnfs_ld_data);
return 0;
}

diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index adec7ad..a531016 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -113,8 +113,8 @@ struct objlayout_io_state {
/*
* Raid engine I/O API
*/
-extern void *objio_init_mt(void);
-extern void objio_fini_mt(void *mt);
+extern void *objio_init_mt(struct nfs_server *nfss);
+extern void objio_fini_mt(struct nfs_server *nfss, void *mt);

extern int objio_alloc_lseg(void **outp,
struct pnfs_layout_hdr *pnfslay,
diff --git a/fs/nfs/objlayout/panfs_shim.c b/fs/nfs/objlayout/panfs_shim.c
index fd46e96..362d088 100644
--- a/fs/nfs/objlayout/panfs_shim.c
+++ b/fs/nfs/objlayout/panfs_shim.c
@@ -51,12 +51,12 @@
struct panfs_export_operations *panfs_export_ops;

void *
-objio_init_mt(void)
+objio_init_mt(struct nfs_server *nfss)
{
return panfs_export_ops == NULL ? ERR_PTR(-EAGAIN) : NULL;
}

-void objio_fini_mt(void *mountid)
+void objio_fini_mt(struct nfs_server *nfss, void *mountid)
{
}

--
1.7.2.2



2010-09-13 12:16:15

by Fred Isaman

[permalink] [raw]
Subject: Re: [RFC 1/3] SQUASHME: Generalize the device cache so it can be used by all layouts

On Mon, Sep 13, 2010 at 2:50 AM, Boaz Harrosh <[email protected]> wr=
ote:
> On 09/12/2010 09:42 PM, Trond Myklebust wrote:
>> On Sun, 2010-09-12 at 19:03 +0200, Boaz Harrosh wrote:
>>> Current code thinks that there can be a single device_id per
>>> layout segment. Change that to assume no relations between segments
>>> and device_ids. It's now up to the layout-driver to make any relati=
ons.
>>>
>>> Files layout driver is fixed in next patch.
>>>
>>> Signed-off-by: Boaz Harrosh <[email protected]>
>>> ---
>>> =A0fs/nfs/pnfs.c =A0 =A0 =A0 =A0 =A0 =A0 | =A0 20 +++++------------=
---
>>> =A0include/linux/nfs4_pnfs.h | =A0 =A06 +-----
>>> =A02 files changed, 6 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index eb4e092..f26abc0 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -1892,26 +1892,16 @@ nfs4_init_deviceid_node(struct nfs4_devicei=
d *d)
>>> =A0}
>>> =A0EXPORT_SYMBOL(nfs4_init_deviceid_node);
>>>
>>> -/* Called from layoutdriver_io_operations->alloc_lseg */
>>> -void
>>> -nfs4_set_layout_deviceid(struct pnfs_layout_segment *l, struct nfs=
4_deviceid *d)
>>> -{
>>> - =A0 =A0dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.ref=
count));
>>> - =A0 =A0l->deviceid =3D d;
>>> -}
>>> -EXPORT_SYMBOL(nfs4_set_layout_deviceid);
>>> -
>>> =A0/* Called from layoutdriver_io_operations->free_lseg */
>>> =A0void
>>> -nfs4_put_unset_layout_deviceid(struct pnfs_layout_segment *l,
>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_deviceid =
*d,
>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 void (*free_callback)=
(struct kref *))
>>> +nfs4_put_deviceid(struct nfs4_deviceid_cache *c,
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_deviceid =
*d)
>>> =A0{
>>> =A0 =A0 =A0dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.r=
efcount));
>>> - =A0 =A0l->deviceid =3D NULL;
>>> - =A0 =A0kref_put(&d->de_kref, free_callback);
>>> + =A0 =A0kref_put(&d->de_kref, c->dc_free_callback);
>>> + =A0 =A0/* Do we need to return the deviceid_cache ref */
>>> =A0}
>>> -EXPORT_SYMBOL(nfs4_put_unset_layout_deviceid);
>>> +EXPORT_SYMBOL(nfs4_put_deviceid);
>>>
>>> =A0/* Find and reference a deviceid */
>>> =A0struct nfs4_deviceid *
>>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>>> index 577cd2b..dc3410e 100644
>>> --- a/include/linux/nfs4_pnfs.h
>>> +++ b/include/linux/nfs4_pnfs.h
>>> @@ -111,7 +111,6 @@ struct pnfs_layout_segment {
>>> =A0 =A0 =A0struct kref kref;
>>> =A0 =A0 =A0bool valid;
>>> =A0 =A0 =A0struct pnfs_layout_hdr *layout;
>>> - =A0 =A0struct nfs4_deviceid *deviceid;
>>> =A0 =A0 =A0u8 ld_data[]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* lay=
out driver private data */
>>> =A0};
>>>
>>> @@ -287,11 +286,8 @@ extern struct nfs4_deviceid *nfs4_find_get_dev=
iceid(
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct p=
nfs_deviceid *);
>>> =A0extern struct nfs4_deviceid *nfs4_add_get_deviceid(struct nfs4_d=
eviceid_cache *,
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct n=
fs4_deviceid *);
>>> -extern void nfs4_set_layout_deviceid(struct pnfs_layout_segment *,
>>> +extern void nfs4_put_deviceid(struct nfs4_deviceid_cache *,
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct n=
fs4_deviceid *);
>>> -extern void nfs4_put_unset_layout_deviceid(struct pnfs_layout_segm=
ent *,
>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs=
4_deviceid *,
>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0void (*fre=
e_callback)(struct kref *));
>>> =A0extern void nfs4_delete_device(struct nfs4_deviceid_cache *,
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct p=
nfs_deviceid *);
>>>
>>
>> This doesn't look like it needs to be part of the initial single-lay=
out
>> submission. Why is it being labelled as a squashme?
>>
>> Trond
>
> No, this is a device_id cache per nfs-client it is part of the initia=
l
> getdeviceinfo call. Many "single-layout" may refer to the same device=
_id
> therefor it is cached so the long getdeviceinfo call can be done only=
once.
>
> Cheers
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

I agree. I'll roll it into the next submission.

=46red