2010-07-20 17:01:53

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 0/5] pnfs-submit fix kfree under spin lock


Fix kfree under spin lock

Both put_lseg and put_layout are called under the inode i_lock where the
last reference will end up freeing structures.

I know there has been a lot of churn in this code, but free'ing under the
spin lock is a no-no.
In this patch I refactor the layout allocation, combining it with the
layout segment lookup code. The new function, pnfs_get_layout_segment takes
inode spin lock, and looks to see if the requested range is serviced by an
existing layout segment. If the layout has not been allocated, allocate it.
If a layout segement is found, reference it and give up the lock. If no
layout segment is found, reference the layout for the layoutget call and
give up the lock.

0001-SQUASHME-pnfs-submit-alloc-layout-don-t-call-put_lay.patch
0002-SQUASHME-pnfs-submit-use-atomic_dec_and_lock-for-lay.patch

Fix the put_lseg under spin lock.
0003-SQUASHME-pnfs-submit-don-t-call-put_lseg-under-spin-.patch

Cleanup.
0004-SQUASHME-pnfs-submit-pnfs_release_layout-just-use-in.patch
0005-SQUASHME-pnfs-submit-fix-has_layout-compile-error.patch

Tests:
CONFIG_NFS_V4_1 set:
Connectathon tests pass against GFS2 and pyNFS file layout servers.

CONFIG_NFS_V4_1 not set:
Connectathon tests pass.

-->Andy



2010-07-22 16:26:38

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 1/5] SQUASHME pnfs-submit alloc layout don't call put_layout under spin lock

On Jul. 20, 2010, 20:01 +0300, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Remove the pnfs_alloc_layout call to put_layout_locked under the i_lock.
>
> Combine the search for an lseg range with the potential allocation of the
> pnfs_layout_type. Return a referenced layout segment or NULL with a reference
> taken on the layout in preparation for a layoutget call
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/pnfs.c | 117 +++++++++++++++++++++------------------------------------
> 1 files changed, 43 insertions(+), 74 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index fd979ec..77e6671 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -863,40 +863,6 @@ alloc_init_layout(struct inode *ino)
> }
>
> /*
> - * Retrieve and possibly allocate the inode layout
> - *
> - * ino->i_lock must be taken by the caller.
> - */
> -static struct pnfs_layout_type *
> -pnfs_alloc_layout(struct inode *ino)
> -{
> - struct nfs_inode *nfsi = NFS_I(ino);
> - struct pnfs_layout_type *new = NULL;
> -
> - dprintk("%s Begin ino=%p layout=%p\n", __func__, ino, nfsi->layout);
> -
> - BUG_ON_UNLOCKED_INO(ino);
> - if (likely(nfsi->layout))
> - return nfsi->layout;
> -
> - spin_unlock(&ino->i_lock);
> - new = alloc_init_layout(ino);
> - spin_lock(&ino->i_lock);
> -
> - if (likely(nfsi->layout == NULL)) { /* Won the race? */
> - nfsi->layout = new;
> - } else if (new) {
> - /* Reference the layout accross i_lock release and grab */
> - get_layout(nfsi->layout);
> - spin_unlock(&ino->i_lock);
> - NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->free_layout(new);
> - spin_lock(&ino->i_lock);
> - put_layout_locked(nfsi->layout);
> - }
> - return nfsi->layout;
> -}
> -
> -/*
> * iomode matching rules:
> * range lseg match
> * ----- ----- -----
> @@ -915,28 +881,52 @@ has_matching_lseg(struct pnfs_layout_segment *lseg,
> }
>
> /*
> - * lookup range in layout
> + * Lookup range in layout. Return a referenced layout segment or
> + * NULL with a reference on the layout in preparation for a layoutget call.
> */
> static struct pnfs_layout_segment *
> -pnfs_has_layout(struct pnfs_layout_type *lo,
> - struct nfs4_pnfs_layout_segment *range)
> +pnfs_get_layout_segment(struct inode *inode,
> + struct nfs4_pnfs_layout_segment *range)
> {
> struct pnfs_layout_segment *lseg, *ret = NULL;
> + struct nfs_inode *nfsi = NFS_I(inode);
> + struct pnfs_layout_type *new = NULL;
>
> dprintk("%s:Begin\n", __func__);
> -
> - BUG_ON_UNLOCKED_LO(lo);
> - list_for_each_entry (lseg, &lo->segs, fi_list) {
> - if (has_matching_lseg(lseg, range)) {
> + spin_lock(&inode->i_lock);
> + if (nfsi->layout == NULL) {
> + spin_unlock(&inode->i_lock);
> + new = alloc_init_layout(inode);
> + if (new == NULL)
> + goto out;
> + spin_lock(&inode->i_lock);
> + if (NFS_I(inode)->layout == NULL) {
> + NFS_I(inode)->layout = new;
> + new = NULL;
> + }

I'm fine with moving the code inline, but moving it here seems like
you're trying to cram too much into this function. let it do just one thing
which is to look up the lseg while the inode is locked, and the caller
can alloc the layout however it wants

Also freeing of new at out_unlock time is unnecessarily late.
Actually, new can be defined in this block's scope only and be freed in the
(hopefully) unlikely case of losing the race and hitting layout != NULL
after allocating new and reacquiring the lock.

> + }
> + /* Is there a layout segment covering requested range? */
> + list_for_each_entry(lseg, &NFS_I(inode)->layout->segs, fi_list) {
> + if (has_matching_lseg(lseg, range) && lseg->valid) {
> ret = lseg;
> get_lseg(ret);
> - break;
> + goto out_unlock;
> }
> if (cmp_layout(range, &lseg->range) > 0)
> break;
> }
> + /*
> + * No layout segment. Reference the layout for layoutget
> + * (matched in pnfs_layout_release)
> + */
> + get_layout(NFS_I(inode)->layout);

Although this is a static function with only one user I don't like
the fact it had multiple side effects, depending on the outcome:
if the lseg is found we don't get a reference on the layout,
but we do get a reference on the lseg, otherwise, we get a reference
that's needed for the caller. This is why I think that keeping that logic
(of allocating the layout and possibly getting a refcount on it)
in the caller is more appropriate.

Benny

>
> - dprintk("%s:Return lseg %p ref %d valid %d\n",
> +out_unlock:
> + spin_unlock(&inode->i_lock);
> + if (new)
> + NFS_SERVER(inode)->pnfs_curr_ld->ld_io_ops->free_layout(new);
> +out:
> + dprintk("<-- %s lseg %p ref %d valid %d\n",
> __func__, ret, ret ? atomic_read(&ret->kref.refcount) : 0,
> ret ? ret->valid : 0);
> return ret;
> @@ -958,26 +948,10 @@ _pnfs_update_layout(struct inode *ino,
> .length = NFS4_MAX_UINT64,
> };
> struct nfs_inode *nfsi = NFS_I(ino);
> - struct pnfs_layout_type *lo;
> struct pnfs_layout_segment *lseg = NULL;
>
> *lsegpp = NULL;
> - spin_lock(&ino->i_lock);
> - lo = pnfs_alloc_layout(ino);
> - if (lo == NULL) {
> - dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
> - goto out_unlock;
> - }
> -
> - /* Check to see if the layout for the given range already exists */
> - lseg = pnfs_has_layout(lo, &arg);
> - if (lseg && !lseg->valid) {
> - put_lseg_locked(lseg);
> - /* someone is cleaning the layout */
> - lseg = NULL;
> - goto out_unlock;
> - }
> -
> + lseg = pnfs_get_layout_segment(ino, &arg);
> if (lseg) {
> dprintk("%s: Using cached lseg %p for %llu@%llu iomode %d)\n",
> __func__,
> @@ -985,35 +959,30 @@ _pnfs_update_layout(struct inode *ino,
> arg.length,
> arg.offset,
> arg.iomode);
> -
> - goto out_unlock;
> + *lsegpp = lseg;
> + goto out;
> }
>
> - /* if get layout already failed once goto out */
> + /* if layoutget failed wait pnfs_layout_suspend time to retry */
> if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
> if (unlikely(nfsi->pnfs_layout_suspend &&
> get_seconds() >= nfsi->pnfs_layout_suspend)) {
> dprintk("%s: layout_get resumed\n", __func__);
> + spin_lock(&ino->i_lock);
> clear_bit(lo_fail_bit(iomode),
> &nfsi->layout->pnfs_layout_state);
> nfsi->pnfs_layout_suspend = 0;
> - } else
> - goto out_unlock;
> + spin_unlock(&ino->i_lock);
> + } else {
> + goto out;
> + }
> }
>
> - /* Reference the layout for layoutget matched in pnfs_layout_release */
> - get_layout(lo);
> - spin_unlock(&ino->i_lock);
> -
> - send_layoutget(ino, ctx, &arg, lsegpp, lo);
> + send_layoutget(ino, ctx, &arg, lsegpp, nfsi->layout);
> out:
> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
> nfsi->layout->pnfs_layout_state, lseg);
> return;
> -out_unlock:
> - *lsegpp = lseg;
> - spin_unlock(&ino->i_lock);
> - goto out;
> }
>
> void

2010-07-20 17:01:58

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 1/5] SQUASHME pnfs-submit alloc layout don't call put_layout under spin lock

From: Andy Adamson <[email protected]>

Remove the pnfs_alloc_layout call to put_layout_locked under the i_lock.

Combine the search for an lseg range with the potential allocation of the
pnfs_layout_type. Return a referenced layout segment or NULL with a reference
taken on the layout in preparation for a layoutget call

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 117 +++++++++++++++++++++------------------------------------
1 files changed, 43 insertions(+), 74 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index fd979ec..77e6671 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -863,40 +863,6 @@ alloc_init_layout(struct inode *ino)
}

/*
- * Retrieve and possibly allocate the inode layout
- *
- * ino->i_lock must be taken by the caller.
- */
-static struct pnfs_layout_type *
-pnfs_alloc_layout(struct inode *ino)
-{
- struct nfs_inode *nfsi = NFS_I(ino);
- struct pnfs_layout_type *new = NULL;
-
- dprintk("%s Begin ino=%p layout=%p\n", __func__, ino, nfsi->layout);
-
- BUG_ON_UNLOCKED_INO(ino);
- if (likely(nfsi->layout))
- return nfsi->layout;
-
- spin_unlock(&ino->i_lock);
- new = alloc_init_layout(ino);
- spin_lock(&ino->i_lock);
-
- if (likely(nfsi->layout == NULL)) { /* Won the race? */
- nfsi->layout = new;
- } else if (new) {
- /* Reference the layout accross i_lock release and grab */
- get_layout(nfsi->layout);
- spin_unlock(&ino->i_lock);
- NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->free_layout(new);
- spin_lock(&ino->i_lock);
- put_layout_locked(nfsi->layout);
- }
- return nfsi->layout;
-}
-
-/*
* iomode matching rules:
* range lseg match
* ----- ----- -----
@@ -915,28 +881,52 @@ has_matching_lseg(struct pnfs_layout_segment *lseg,
}

/*
- * lookup range in layout
+ * Lookup range in layout. Return a referenced layout segment or
+ * NULL with a reference on the layout in preparation for a layoutget call.
*/
static struct pnfs_layout_segment *
-pnfs_has_layout(struct pnfs_layout_type *lo,
- struct nfs4_pnfs_layout_segment *range)
+pnfs_get_layout_segment(struct inode *inode,
+ struct nfs4_pnfs_layout_segment *range)
{
struct pnfs_layout_segment *lseg, *ret = NULL;
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct pnfs_layout_type *new = NULL;

dprintk("%s:Begin\n", __func__);
-
- BUG_ON_UNLOCKED_LO(lo);
- list_for_each_entry (lseg, &lo->segs, fi_list) {
- if (has_matching_lseg(lseg, range)) {
+ spin_lock(&inode->i_lock);
+ if (nfsi->layout == NULL) {
+ spin_unlock(&inode->i_lock);
+ new = alloc_init_layout(inode);
+ if (new == NULL)
+ goto out;
+ spin_lock(&inode->i_lock);
+ if (NFS_I(inode)->layout == NULL) {
+ NFS_I(inode)->layout = new;
+ new = NULL;
+ }
+ }
+ /* Is there a layout segment covering requested range? */
+ list_for_each_entry(lseg, &NFS_I(inode)->layout->segs, fi_list) {
+ if (has_matching_lseg(lseg, range) && lseg->valid) {
ret = lseg;
get_lseg(ret);
- break;
+ goto out_unlock;
}
if (cmp_layout(range, &lseg->range) > 0)
break;
}
+ /*
+ * No layout segment. Reference the layout for layoutget
+ * (matched in pnfs_layout_release)
+ */
+ get_layout(NFS_I(inode)->layout);

- dprintk("%s:Return lseg %p ref %d valid %d\n",
+out_unlock:
+ spin_unlock(&inode->i_lock);
+ if (new)
+ NFS_SERVER(inode)->pnfs_curr_ld->ld_io_ops->free_layout(new);
+out:
+ dprintk("<-- %s lseg %p ref %d valid %d\n",
__func__, ret, ret ? atomic_read(&ret->kref.refcount) : 0,
ret ? ret->valid : 0);
return ret;
@@ -958,26 +948,10 @@ _pnfs_update_layout(struct inode *ino,
.length = NFS4_MAX_UINT64,
};
struct nfs_inode *nfsi = NFS_I(ino);
- struct pnfs_layout_type *lo;
struct pnfs_layout_segment *lseg = NULL;

*lsegpp = NULL;
- spin_lock(&ino->i_lock);
- lo = pnfs_alloc_layout(ino);
- if (lo == NULL) {
- dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
- goto out_unlock;
- }
-
- /* Check to see if the layout for the given range already exists */
- lseg = pnfs_has_layout(lo, &arg);
- if (lseg && !lseg->valid) {
- put_lseg_locked(lseg);
- /* someone is cleaning the layout */
- lseg = NULL;
- goto out_unlock;
- }
-
+ lseg = pnfs_get_layout_segment(ino, &arg);
if (lseg) {
dprintk("%s: Using cached lseg %p for %llu@%llu iomode %d)\n",
__func__,
@@ -985,35 +959,30 @@ _pnfs_update_layout(struct inode *ino,
arg.length,
arg.offset,
arg.iomode);
-
- goto out_unlock;
+ *lsegpp = lseg;
+ goto out;
}

- /* if get layout already failed once goto out */
+ /* if layoutget failed wait pnfs_layout_suspend time to retry */
if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
if (unlikely(nfsi->pnfs_layout_suspend &&
get_seconds() >= nfsi->pnfs_layout_suspend)) {
dprintk("%s: layout_get resumed\n", __func__);
+ spin_lock(&ino->i_lock);
clear_bit(lo_fail_bit(iomode),
&nfsi->layout->pnfs_layout_state);
nfsi->pnfs_layout_suspend = 0;
- } else
- goto out_unlock;
+ spin_unlock(&ino->i_lock);
+ } else {
+ goto out;
+ }
}

- /* Reference the layout for layoutget matched in pnfs_layout_release */
- get_layout(lo);
- spin_unlock(&ino->i_lock);
-
- send_layoutget(ino, ctx, &arg, lsegpp, lo);
+ send_layoutget(ino, ctx, &arg, lsegpp, nfsi->layout);
out:
dprintk("%s end, state 0x%lx lseg %p\n", __func__,
nfsi->layout->pnfs_layout_state, lseg);
return;
-out_unlock:
- *lsegpp = lseg;
- spin_unlock(&ino->i_lock);
- goto out;
}

void
--
1.6.6


2010-07-20 17:02:00

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 2/5] SQUASHME pnfs-submit use atomic_dec_and_lock for layout refcounting

From: Andy Adamson <[email protected]>

Remove put_layout from under i_lock.
Note: destroy_lseg still calls put_layout under i_lock - fixed in next patch.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 69 ++++++++++++++++++++---------------------------
include/linux/nfs_fs.h | 2 +-
2 files changed, 30 insertions(+), 41 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 77e6671..2cef53e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -322,35 +322,26 @@ pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
static inline void
get_layout(struct pnfs_layout_type *lo)
{
- BUG_ON_UNLOCKED_LO(lo);
- lo->refcount++;
+ atomic_inc(&lo->refcount);
}

-static inline void
-put_layout_locked(struct pnfs_layout_type *lo)
+inline void
+put_layout(struct inode *inode)
{
- BUG_ON_UNLOCKED_LO(lo);
- BUG_ON(lo->refcount <= 0);
+ struct pnfs_layout_type *lo = NFS_I(inode)->layout;

- lo->refcount--;
- if (!lo->refcount) {
- struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
- struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
+ dprintk("--> %s i_ino %lu, lo %p\n", __func__, inode->i_ino, lo);

- dprintk("%s: freeing layout cache %p\n", __func__, lo);
- WARN_ON(!list_empty(&lo->lo_layouts));
- io_ops->free_layout(lo);
- nfsi->layout = NULL;
- }
-}
+ BUG_ON(lo == NULL);
+ if (!atomic_dec_and_lock(&lo->refcount, &inode->i_lock))
+ return;
+ WARN_ON(!list_empty(&lo->lo_layouts));

-void
-put_layout(struct inode *inode)
-{
- spin_lock(&inode->i_lock);
- put_layout_locked(NFS_I(inode)->layout);
+ NFS_I(inode)->layout = NULL;
spin_unlock(&inode->i_lock);

+ dprintk("%s: freeing layout cache %p\n", __func__, lo);
+ NFS_SERVER(inode)->pnfs_curr_ld->ld_io_ops->free_layout(lo);
}

void
@@ -362,41 +353,39 @@ pnfs_layout_release(struct pnfs_layout_type *lo,
spin_lock(&nfsi->vfs_inode.i_lock);
if (range)
pnfs_free_layout(lo, range);
+ spin_unlock(&nfsi->vfs_inode.i_lock);
/*
* Matched in _pnfs_update_layout for layoutget
* and by get_layout in _pnfs_return_layout for layoutreturn
*/
- put_layout_locked(lo);
- spin_unlock(&nfsi->vfs_inode.i_lock);
+ put_layout(lo->lo_inode);
wake_up_all(&nfsi->lo_waitq);
}

void
pnfs_destroy_layout(struct nfs_inode *nfsi)
{
- struct pnfs_layout_type *lo;
struct nfs4_pnfs_layout_segment range = {
.iomode = IOMODE_ANY,
.offset = 0,
.length = NFS4_MAX_UINT64,
};

+ if (!has_layout(nfsi))
+ return;
spin_lock(&nfsi->vfs_inode.i_lock);
- lo = nfsi->layout;
- if (lo) {
- pnfs_free_layout(lo, &range);
- WARN_ON(!list_empty(&nfsi->layout->segs));
- WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
-
- if (nfsi->layout->refcount != 1)
- printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
- __func__, nfsi->layout->refcount);
- WARN_ON(nfsi->layout->refcount != 1);
-
- /* Matched by refcount set to 1 in alloc_init_layout */
- put_layout_locked(lo);
- }
+ pnfs_free_layout(nfsi->layout, &range);
+ WARN_ON(!list_empty(&nfsi->layout->segs));
+ WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
+
+ if (atomic_read(&nfsi->layout->refcount) != 1)
+ printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
+ __func__, atomic_read(&nfsi->layout->refcount));
+
spin_unlock(&nfsi->vfs_inode.i_lock);
+
+ /* Matched by refcount set to 1 in alloc_init_layout */
+ put_layout(&nfsi->vfs_inode);
}

static inline void
@@ -416,7 +405,7 @@ destroy_lseg(struct kref *kref)

dprintk("--> %s\n", __func__);
/* Matched by get_layout in pnfs_insert_layout */
- put_layout_locked(lseg->layout);
+ put_layout(lseg->layout->lo_inode);
PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
}

@@ -854,7 +843,7 @@ alloc_init_layout(struct inode *ino)
__func__);
return NULL;
}
- lo->refcount = 1;
+ atomic_set(&lo->refcount, 1);
INIT_LIST_HEAD(&lo->lo_layouts);
INIT_LIST_HEAD(&lo->segs);
seqlock_init(&lo->seqlock);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 005b3ea..877673e 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -98,7 +98,7 @@ struct nfs_delegation;
struct posix_acl;

struct pnfs_layout_type {
- int refcount;
+ atomic_t refcount;
struct list_head lo_layouts; /* other client layouts */
struct list_head segs; /* layout segments list */
int roc_iomode; /* iomode to return on close, 0=none */
--
1.6.6


2010-07-20 17:02:01

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 3/5] SQUASHME pnfs-submit don't call put_lseg under spin lock

From: Andy Adamson <[email protected]>

kfree can be called in destroy_lseg.

Move the spin_lock into pnfs_free_layout.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 89 +++++++++++++++++++--------------------------
fs/nfs/pnfs.h | 2 +-
include/linux/nfs4_pnfs.h | 2 +-
3 files changed, 40 insertions(+), 53 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2cef53e..cf64f16 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -57,7 +57,7 @@

static int pnfs_initialized;

-static void pnfs_free_layout(struct pnfs_layout_type *lo,
+static void pnfs_free_layout(struct inode *inode,
struct nfs4_pnfs_layout_segment *range);
static inline void get_layout(struct pnfs_layout_type *lo);

@@ -350,10 +350,8 @@ pnfs_layout_release(struct pnfs_layout_type *lo,
{
struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);

- spin_lock(&nfsi->vfs_inode.i_lock);
if (range)
- pnfs_free_layout(lo, range);
- spin_unlock(&nfsi->vfs_inode.i_lock);
+ pnfs_free_layout(lo->lo_inode, range);
/*
* Matched in _pnfs_update_layout for layoutget
* and by get_layout in _pnfs_return_layout for layoutreturn
@@ -373,8 +371,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)

if (!has_layout(nfsi))
return;
- spin_lock(&nfsi->vfs_inode.i_lock);
- pnfs_free_layout(nfsi->layout, &range);
+ pnfs_free_layout(&nfsi->vfs_inode, &range);
WARN_ON(!list_empty(&nfsi->layout->segs));
WARN_ON(!list_empty(&nfsi->layout->lo_layouts));

@@ -382,7 +379,6 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
__func__, atomic_read(&nfsi->layout->refcount));

- spin_unlock(&nfsi->vfs_inode.i_lock);

/* Matched by refcount set to 1 in alloc_init_layout */
put_layout(&nfsi->vfs_inode);
@@ -392,59 +388,40 @@ static inline void
init_lseg(struct pnfs_layout_type *lo, struct pnfs_layout_segment *lseg)
{
INIT_LIST_HEAD(&lseg->fi_list);
- kref_init(&lseg->kref);
+ atomic_set(&lseg->refcount, 1);
lseg->valid = true;
lseg->layout = lo;
}

static void
-destroy_lseg(struct kref *kref)
+destroy_lseg(struct pnfs_layout_segment *lseg)
{
- struct pnfs_layout_segment *lseg =
- container_of(kref, struct pnfs_layout_segment, kref);
-
dprintk("--> %s\n", __func__);
/* Matched by get_layout in pnfs_insert_layout */
put_layout(lseg->layout->lo_inode);
PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
}

-static void
-put_lseg_locked(struct pnfs_layout_segment *lseg)
-{
- bool do_wake_up;
- struct nfs_inode *nfsi;
-
- if (!lseg)
- return;
-
- dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
- atomic_read(&lseg->kref.refcount), lseg->valid);
- do_wake_up = !lseg->valid;
- nfsi = PNFS_NFS_INODE(lseg->layout);
- kref_put(&lseg->kref, destroy_lseg);
- if (do_wake_up)
- wake_up(&nfsi->lo_waitq);
-}
-
void
put_lseg(struct pnfs_layout_segment *lseg)
{
bool do_wake_up;
- struct nfs_inode *nfsi;
+ struct inode *inode;

if (!lseg)
return;

dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
- atomic_read(&lseg->kref.refcount), lseg->valid);
+ atomic_read(&lseg->refcount), lseg->valid);
do_wake_up = !lseg->valid;
- nfsi = PNFS_NFS_INODE(lseg->layout);
- spin_lock(&nfsi->vfs_inode.i_lock);
- kref_put(&lseg->kref, destroy_lseg);
- spin_unlock(&nfsi->vfs_inode.i_lock);
+ inode = lseg->layout->lo_inode;
+
+ if (!atomic_dec_and_lock(&lseg->refcount, &inode->i_lock))
+ return;
+ spin_unlock(&inode->i_lock);
+ destroy_lseg(lseg);
if (do_wake_up)
- wake_up(&nfsi->lo_waitq);
+ wake_up(&NFS_I(inode)->lo_waitq);
}
EXPORT_SYMBOL(put_lseg);

@@ -588,39 +565,49 @@ has_layout_to_return(struct pnfs_layout_type *lo,
static inline bool
_pnfs_can_return_lseg(struct pnfs_layout_segment *lseg)
{
- return atomic_read(&lseg->kref.refcount) == 1;
+ return atomic_read(&lseg->refcount) == 1;
}


static void
-pnfs_free_layout(struct pnfs_layout_type *lo,
- struct nfs4_pnfs_layout_segment *range)
+pnfs_free_layout(struct inode *inode,
+ struct nfs4_pnfs_layout_segment *range)
{
struct pnfs_layout_segment *lseg, *next;
- dprintk("%s:Begin lo %p offset %llu length %llu iomode %d\n",
- __func__, lo, range->offset, range->length, range->iomode);
+ struct pnfs_layout_type *lo = NFS_I(inode)->layout;
+ LIST_HEAD(freeme);

- BUG_ON_UNLOCKED_LO(lo);
- list_for_each_entry_safe (lseg, next, &lo->segs, fi_list) {
+ dprintk("--> %s i_ino %lu lo %p offset %llu length %llu iomode %d\n",
+ __func__, inode->i_ino, lo, range->offset, range->length,
+ range->iomode);
+
+ spin_lock(&inode->i_lock);
+ if (lo == NULL)
+ goto out_unlock;
+
+ list_for_each_entry_safe(lseg, next, &lo->segs, fi_list) {
if (!should_free_lseg(lseg, range) ||
!_pnfs_can_return_lseg(lseg))
continue;
- dprintk("%s: freeing lseg %p iomode %d "
+ dprintk("%s: put lseg %p iomode %d "
"offset %llu length %llu\n", __func__,
lseg, lseg->range.iomode, lseg->range.offset,
lseg->range.length);
- list_del(&lseg->fi_list);
- put_lseg_locked(lseg);
+ list_move(&lseg->fi_list, &freeme);
}
if (list_empty(&lo->segs)) {
- struct nfs_client *clp;
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;

- clp = PNFS_NFS_SERVER(lo)->nfs_client;
spin_lock(&clp->cl_lock);
list_del_init(&lo->lo_layouts);
spin_unlock(&clp->cl_lock);
pnfs_set_layout_stateid(lo, &zero_stateid);
}
+out_unlock:
+ spin_unlock(&inode->i_lock);
+
+ list_for_each_entry_safe(lseg, next, &freeme, fi_list)
+ put_lseg(lseg);

dprintk("%s:Return\n", __func__);
}
@@ -640,7 +627,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
if (!_pnfs_can_return_lseg(lseg)) {
dprintk("%s: wait on lseg %p refcount %d\n",
__func__, lseg,
- atomic_read(&lseg->kref.refcount));
+ atomic_read(&lseg->refcount));
ret = true;
}
}
@@ -916,7 +903,7 @@ out_unlock:
NFS_SERVER(inode)->pnfs_curr_ld->ld_io_ops->free_layout(new);
out:
dprintk("<-- %s lseg %p ref %d valid %d\n",
- __func__, ret, ret ? atomic_read(&ret->kref.refcount) : 0,
+ __func__, ret, ret ? atomic_read(&ret->refcount) : 0,
ret ? ret->valid : 0);
return ret;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1d16049..9498e4e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -84,7 +84,7 @@ static inline int lo_fail_bit(u32 iomode)

static inline void get_lseg(struct pnfs_layout_segment *lseg)
{
- kref_get(&lseg->kref);
+ atomic_inc(&lseg->refcount);
}

/* Return true if a layout driver is being used for this mountpoint */
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 52fe384..a2239a5 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -94,7 +94,7 @@ layoutcommit_needed(struct nfs_inode *nfsi)
struct pnfs_layout_segment {
struct list_head fi_list;
struct nfs4_pnfs_layout_segment range;
- struct kref kref;
+ atomic_t refcount;
bool valid;
struct pnfs_layout_type *layout;
struct nfs4_deviceid *deviceid;
--
1.6.6


2010-07-20 17:02:04

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 5/5] SQUASHME pnfs-submit fix has_layout compile error

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
include/linux/nfs4_pnfs.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index a2239a5..b0c2b0a 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -84,6 +84,12 @@ layoutcommit_needed(struct nfs_inode *nfsi)
#else /* CONFIG_NFS_V4_1 */

static inline bool
+has_layout(struct nfs_inode *nfsi)
+{
+ return false;
+}
+
+static inline bool
layoutcommit_needed(struct nfs_inode *nfsi)
{
return 0;
--
1.6.6


2010-07-20 17:02:04

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 4/5] SQUASHME pnfs-submit pnfs_release_layout just use inode

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4proc.c | 4 ++--
fs/nfs/pnfs.c | 16 +++++++---------
fs/nfs/pnfs.h | 3 ++-
3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 17b0c31..d0fa4d1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5467,7 +5467,7 @@ static void nfs4_pnfs_layoutget_release(void *calldata)
struct nfs4_pnfs_layoutget *lgp = calldata;

dprintk("--> %s\n", __func__);
- pnfs_layout_release(NFS_I(lgp->args.inode)->layout, NULL);
+ pnfs_layout_release(lgp->args.inode, NULL);
if (lgp->res.layout.buf != NULL)
free_page((unsigned long) lgp->res.layout.buf);
kfree(calldata);
@@ -5687,7 +5687,7 @@ static void nfs4_pnfs_layoutreturn_release(void *calldata)
if (lrp->args.return_type == RETURN_FILE) {
if (!lrp->res.lrs_present)
pnfs_set_layout_stateid(lo, &zero_stateid);
- pnfs_layout_release(lo, &lrp->args.lseg);
+ pnfs_layout_release(lrp->args.inode, &lrp->args.lseg);
}
kfree(calldata);
dprintk("<-- %s\n", __func__);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index cf64f16..aba3478 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -345,19 +345,17 @@ put_layout(struct inode *inode)
}

void
-pnfs_layout_release(struct pnfs_layout_type *lo,
+pnfs_layout_release(struct inode *inode,
struct nfs4_pnfs_layout_segment *range)
{
- struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
-
if (range)
- pnfs_free_layout(lo->lo_inode, range);
+ pnfs_free_layout(inode, range);
/*
* Matched in _pnfs_update_layout for layoutget
* and by get_layout in _pnfs_return_layout for layoutreturn
*/
- put_layout(lo->lo_inode);
- wake_up_all(&nfsi->lo_waitq);
+ put_layout(inode);
+ wake_up_all(&NFS_I(inode)->lo_waitq);
}

void
@@ -490,7 +488,7 @@ send_layoutget(struct inode *ino,

lgp = kzalloc(sizeof(*lgp), GFP_KERNEL);
if (lgp == NULL) {
- pnfs_layout_release(lo, NULL);
+ pnfs_layout_release(ino, NULL);
return -ENOMEM;
}
lgp->args.minlength = NFS4_MAX_UINT64;
@@ -652,7 +650,7 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
if (lrp == NULL) {
if (lo && (type == RETURN_FILE))
- pnfs_layout_release(lo, NULL);
+ pnfs_layout_release(ino, NULL);
goto out;
}
lrp->args.reclaim = 0;
@@ -730,7 +728,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
if (!stateid)
status = return_layout(ino, &arg, type, lo, wait);
else
- pnfs_layout_release(lo, &arg);
+ pnfs_layout_release(ino, &arg);
}
out:
dprintk("<-- %s status: %d\n", __func__, status);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9498e4e..b3830f5 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -60,7 +60,8 @@ void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
-void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
+void pnfs_layout_release(struct inode *,
+ struct nfs4_pnfs_layout_segment *range);
void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
const nfs4_stateid *stateid);
void pnfs_destroy_layout(struct nfs_inode *);
--
1.6.6


2010-07-20 17:30:36

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 0/5] pnfs-submit fix kfree under spin lock

Trond and Fred informed me that kfree under a spin lock is actually
OK. So, maybe these patches aren't needed.

On the other hand, with these patches we don't hold the spin lock over
function calls which to my mind is easier to read. Also, the
free_layout/free_lseg layoutdriver calls may do more than call kfree.

At least, we should document what we expect from the
free_layout/free_lseg - e.g. no side affects that conflict with being
called under a spinlock..

-->Andy

On Tue, Jul 20, 2010 at 1:01 PM, <[email protected]> wrote:
>
> Fix kfree under spin lock
>
> Both put_lseg and put_layout are called under the inode i_lock where =
the
> last reference will end up freeing structures.
>
> I know there has been a lot of churn in this code, but free'ing under=
the
> spin lock is a no-no.
> In this patch I refactor the layout allocation, combining it with the
> layout segment lookup code. The new function, pnfs_get_layout_segment=
takes
> inode spin lock, and looks to see if the requested range is serviced =
by an
> existing layout segment. If the layout has not been allocated, alloca=
te it.
> If a layout segement is found, reference it and give up the lock. If =
no
> layout segment is found, reference the layout for the layoutget call =
and
> give up the lock.
>
> 0001-SQUASHME-pnfs-submit-alloc-layout-don-t-call-put_lay.patch
> 0002-SQUASHME-pnfs-submit-use-atomic_dec_and_lock-for-lay.patch
>
> Fix the put_lseg under spin lock.
> 0003-SQUASHME-pnfs-submit-don-t-call-put_lseg-under-spin-.patch
>
> Cleanup.
> 0004-SQUASHME-pnfs-submit-pnfs_release_layout-just-use-in.patch
> 0005-SQUASHME-pnfs-submit-fix-has_layout-compile-error.patch
>
> Tests:
> CONFIG_NFS_V4_1 set:
> Connectathon tests pass against GFS2 and pyNFS file layout servers.
>
> CONFIG_NFS_V4_1 not set:
> Connectathon tests pass.
>
> -->Andy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2010-07-20 19:10:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/5] pnfs-submit fix kfree under spin lock

On Tue, Jul 20, 2010 at 01:01:18PM -0400, [email protected] wrote:
>
> Fix kfree under spin lock
>
> Both put_lseg and put_layout are called under the inode i_lock where the
> last reference will end up freeing structures.
>
> I know there has been a lot of churn in this code, but free'ing under the
> spin lock is a no-no.

kfree under a spinlock is perfectly fine. That is as long as you
don't free the spinlock itself.


2010-07-20 20:37:00

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 0/5] pnfs-submit fix kfree under spin lock

On Tue, Jul 20, 2010 at 3:09 PM, Christoph Hellwig <[email protected]> wrote:
> On Tue, Jul 20, 2010 at 01:01:18PM -0400, [email protected] wrote:
>>
>> Fix kfree under spin lock
>>
>> Both put_lseg and put_layout are called under the inode i_lock where the
>> last reference will end up freeing structures.
>>
>> I know there has been a lot of churn in this code, but free'ing under the
>> spin lock is a no-no.
>
> kfree under a spinlock is perfectly fine. That is as long as you
> don't free the spinlock itself.

:)

-->Andy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-07-21 07:24:07

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 0/5] pnfs-submit fix kfree under spin lock

On Jul. 20, 2010, 20:30 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
> Trond and Fred informed me that kfree under a spin lock is actually
> OK. So, maybe these patches aren't needed.
>
> On the other hand, with these patches we don't hold the spin lock over
> function calls which to my mind is easier to read. Also, the
> free_layout/free_lseg layoutdriver calls may do more than call kfree.

I'm all for that.
Just document they must not block.
If we ever see that as a requirement we can then change the caller
to drop the spin lock before calling the free method.

Beny

>
> At least, we should document what we expect from the
> free_layout/free_lseg - e.g. no side affects that conflict with being
> called under a spinlock..
>
> -->Andy
>
> On Tue, Jul 20, 2010 at 1:01 PM, <[email protected]> wrote:
>>
>> Fix kfree under spin lock
>>
>> Both put_lseg and put_layout are called under the inode i_lock where the
>> last reference will end up freeing structures.
>>
>> I know there has been a lot of churn in this code, but free'ing under the
>> spin lock is a no-no.
>> In this patch I refactor the layout allocation, combining it with the
>> layout segment lookup code. The new function, pnfs_get_layout_segment takes
>> inode spin lock, and looks to see if the requested range is serviced by an
>> existing layout segment. If the layout has not been allocated, allocate it.
>> If a layout segement is found, reference it and give up the lock. If no
>> layout segment is found, reference the layout for the layoutget call and
>> give up the lock.
>>
>> 0001-SQUASHME-pnfs-submit-alloc-layout-don-t-call-put_lay.patch
>> 0002-SQUASHME-pnfs-submit-use-atomic_dec_and_lock-for-lay.patch
>>
>> Fix the put_lseg under spin lock.
>> 0003-SQUASHME-pnfs-submit-don-t-call-put_lseg-under-spin-.patch
>>
>> Cleanup.
>> 0004-SQUASHME-pnfs-submit-pnfs_release_layout-just-use-in.patch
>> 0005-SQUASHME-pnfs-submit-fix-has_layout-compile-error.patch
>>
>> Tests:
>> CONFIG_NFS_V4_1 set:
>> Connectathon tests pass against GFS2 and pyNFS file layout servers.
>>
>> CONFIG_NFS_V4_1 not set:
>> Connectathon tests pass.
>>
>> -->Andy
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

2010-07-21 14:46:17

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 0/5] pnfs-submit fix kfree under spin lock

On Wed, Jul 21, 2010 at 3:23 AM, Benny Halevy <[email protected]> wro=
te:
> On Jul. 20, 2010, 20:30 +0300, "William A. (Andy) Adamson" <androsada=
[email protected]> wrote:
>> Trond and Fred informed me that kfree under a spin lock is actually
>> OK. So, maybe these patches aren't needed.
>>
>> On the other hand, with these patches we don't hold the spin lock ov=
er
>> function calls which to my mind is easier to read. Also, the
>> free_layout/free_lseg layoutdriver calls may do more than call kfree=
=2E
>
> I'm all for that.
> Just document they must not block.
> If we ever see that as a requirement we can then change the caller
> to drop the spin lock before calling the free method.

You might consider the first patch which combines looking up an lseg
and allocating the pnfs_layout_type. They are really the same job, and
I think it's more clear what is going on.

-->Andy

>
> Beny
>
>>
>> At least, we should document what we expect from the
>> free_layout/free_lseg - e.g. no side affects that conflict with bein=
g
>> called under a spinlock..
>>
>> -->Andy
>>
>> On Tue, Jul 20, 2010 at 1:01 PM, =A0<[email protected]> wrote:
>>>
>>> Fix kfree under spin lock
>>>
>>> Both put_lseg and put_layout are called under the inode i_lock wher=
e the
>>> last reference will end up freeing structures.
>>>
>>> I know there has been a lot of churn in this code, but free'ing und=
er the
>>> spin lock is a no-no.
>>> In this patch I refactor the layout allocation, combining it with t=
he
>>> layout segment lookup code. The new function, pnfs_get_layout_segme=
nt takes
>>> inode spin lock, and looks to see if the requested range is service=
d by an
>>> existing layout segment. If the layout has not been allocated, allo=
cate it.
>>> If a layout segement is found, reference it and give up the lock. I=
f no
>>> layout segment is found, reference the layout for the layoutget cal=
l and
>>> give up the lock.
>>>
>>> 0001-SQUASHME-pnfs-submit-alloc-layout-don-t-call-put_lay.patch
>>> 0002-SQUASHME-pnfs-submit-use-atomic_dec_and_lock-for-lay.patch
>>>
>>> Fix the put_lseg under spin lock.
>>> 0003-SQUASHME-pnfs-submit-don-t-call-put_lseg-under-spin-.patch
>>>
>>> Cleanup.
>>> 0004-SQUASHME-pnfs-submit-pnfs_release_layout-just-use-in.patch
>>> 0005-SQUASHME-pnfs-submit-fix-has_layout-compile-error.patch
>>>
>>> Tests:
>>> CONFIG_NFS_V4_1 set:
>>> Connectathon tests pass against GFS2 and pyNFS file layout servers.
>>>
>>> CONFIG_NFS_V4_1 not set:
>>> Connectathon tests pass.
>>>
>>> -->Andy
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs=
" in
>>> the body of a message to [email protected]
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>