2010-07-07 22:34:22

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 0/16] pnfs-submit fix layout allocation and reference counting



The current nfs_inode has an embedded pnfs_layout_type structure, with per
layout type private data allocated. Change nfs_inode->layout to be a pointer
to a pnfs_layout_type structure, embed the pnfs_layout_type in the per
layout type structure, and allocate both.

The current pnfs_layout_type allocation waits on a bit lock to handle
concurrent allocation attempts. Replace this with the normal form.

The current pnfs_layout_type reference counting is very un-clear, and one
instance of put_layout was called outside the i_lock which probably was
causing the intermittant pnfs_layout_type refcount bug we've been seeing.

Replace the nfs_inode->layout reference counting with the following scheme:

As in the current code, the pnfs_layout_type reference counting is always done
with the inode->i_lock held.

The nfs_inode->layout comes into existence when the first layout_segment is
cached and stays until inode is destroyed.

1) alloc nfs_inode->layout:
- Initialized to 1. This holds it around for the clp->cl_layouts
(layout->lo_layouts) list.

2) Each layoutget
layoutget GET
layoutget release PUT

3) insert lseg into nfs_inode->layout->segs GET
remove lseg from nfs_inode->layout->segs PUT

I/O - no reference (except the lseg is used and referenced in 3)

4) Each layoutcommit references the layout which keeps it around while in use
by the call which could race with layoutreturn

layoutcommit GET
layoutcommit release PUT

5) Each layoutreturn references the layout which keeps it around while in use
by the call.

layoutreturn GET
layoutreturn release PUT

6) inode destruction (usually umount)

Destroy_layout PUT to balance initial allocation where it is set to 1.

When the reference moves from 1->0 the layout is removed from the nfs_client
cl_layouts list and freed.


Change nfs_inode->layout to be a pointer to a pnfs_layout_type strucure.

0001-SQUASHME-pnfs-submit-add-state-flag-for-layoutcommit.patch
0002-SQUASHME-pnfs-submit-move-pnfs_layout_suspend-back-t.patch
0003-SQUASHME-pnfs-submit-embed-pnfs_layout_type.patch
0004-SQUASHME-pnfs-submit-filelayout-use-new-alloc-free_l.patch

Rewrite the pnfs_layout_type allocation and reference counting

0005-SQUASHME-pnfs-submit-rewrite-layout-allocation.patch
0006-SQUASHME-pnfs-submit-fix-pnfs_update_layout-referenc.patch
0007-SQUASHME-pnfs_submit-don-t-get-a-reference-on-bounda.patch
0008-SQUASHME-pnfs-submit-don-t-reference-the-layout-in-i.patch
0009-SQUASHME-pnfs-submit-pnfs_update_layout-always-refer.patch
0010-SQUASHME-pnfs-submit-reference-the-layout-when-inser.patch
0011-SQUASHME-pnfs-submit-rename-put_layout-to-put_layout.patch
0012-SQUASHME-pnfs-submit-reference-layout-across-layoutc.patch
0013-SQUASHME-pnfs-submit-reference-layout-for-layoutretu.patch
0014-SQUASHME-pnfs-submit-remove-put_layout-from-pnfs_fre.patch
0015-SQUASHME-pnfs-submit-do-not-reference-a-layout-in-de.patch
0016-SQUASHME-pnfs-submit-remove-grab_current_layout.patch

Testing;

CONFIG_NFS_V4_1 set:
Connectathon tests pass against GFS2/pNFS and pyNFS file layout server. Tested
with layout return-on-close off and on.

CONFIG_NFS_V4_1 not set;
NFSv4.0 mount passes Connectation tests.

-->Andy




2010-07-08 10:16:44

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 0/16] pnfs-submit fix layout allocation and reference counting

On 07/08/2010 01:34 AM, [email protected] wrote:

Hi Andy,

> The current nfs_inode has an embedded pnfs_layout_type structure, with per
> layout type private data allocated. Change nfs_inode->layout to be a pointer
> to a pnfs_layout_type structure, embed the pnfs_layout_type in the per
> layout type structure, and allocate both.
>

Amen

> The current pnfs_layout_type allocation waits on a bit lock to handle
> concurrent allocation attempts. Replace this with the normal form.
>

Why don't we allocate this at inode allocation. Or inode iget() once
and be done with it. Why the fights, races, and error handling?
In a normal pnfs-able mount five-9(s) percent of IO operations need
a layout_type structure. Let's optimize the fast path. On these
rare "error" cases when we could not get any lsegs and eventually
did not use the nfsi->layout, who cares that we allocated extra
20 bytes.

> The current pnfs_layout_type reference counting is very un-clear, and one
> instance of put_layout was called outside the i_lock which probably was
> causing the intermittant pnfs_layout_type refcount bug we've been seeing.
>
> Replace the nfs_inode->layout reference counting with the following scheme:
>
> As in the current code, the pnfs_layout_type reference counting is always done
> with the inode->i_lock held.
>
> The nfs_inode->layout comes into existence when the first layout_segment is
> cached and stays until inode is destroyed.
>

I see that you have thought about my proposal. I have not looked at the
patches yet, will do soon.

So I have a brave question. If nfs_inode->layout is only freed at
inode-destroy, why do we need to ref-count it. Refcounting is for
holding things so they don't go away. But since now the nfsi->layout
stays until the very end then what's the point?

If you really think it is possible that the layout is held longer then
the inode itself then, 1- this is surly a bug, I'm not sure you can do
much with a layout with a dangling inode pointer. 2- Fine then just
take the inode reference. If you equate the life time of these two
objects why not use the ref that is already there?

> 1) alloc nfs_inode->layout:
> - Initialized to 1. This holds it around for the clp->cl_layouts
> (layout->lo_layouts) list.
>
> 2) Each layoutget
> layoutget GET
> layoutget release PUT
>
> 3) insert lseg into nfs_inode->layout->segs GET
> remove lseg from nfs_inode->layout->segs PUT
>
> I/O - no reference (except the lseg is used and referenced in 3)
>
> 4) Each layoutcommit references the layout which keeps it around while in use
> by the call which could race with layoutreturn
>
> layoutcommit GET
> layoutcommit release PUT
>
> 5) Each layoutreturn references the layout which keeps it around while in use
> by the call.
>
> layoutreturn GET
> layoutreturn release PUT
>

2, 4, 5 - surly the inode ref-count is taken during RPC?
3 - inode gone while IO? I don't think so.

> 6) inode destruction (usually umount)
>
> Destroy_layout PUT to balance initial allocation where it is set to 1.
>

inode are destructed when evicted from cache and/or at last reference
drop. Also at umount, but you should start testing like I do, "git clone"
you'll see inodes start to be destroyed 27 seconds into the operation.

> When the reference moves from 1->0 the layout is removed from the nfs_client
> cl_layouts list and freed.
>

The nfs_client->cl_layouts operations can be moved to the first/last lseg
insertion/removal, as an optimization step.

>
> Change nfs_inode->layout to be a pointer to a pnfs_layout_type strucure.
>
> 0001-SQUASHME-pnfs-submit-add-state-flag-for-layoutcommit.patch
> 0002-SQUASHME-pnfs-submit-move-pnfs_layout_suspend-back-t.patch
> 0003-SQUASHME-pnfs-submit-embed-pnfs_layout_type.patch
> 0004-SQUASHME-pnfs-submit-filelayout-use-new-alloc-free_l.patch
>
> Rewrite the pnfs_layout_type allocation and reference counting
>
> 0005-SQUASHME-pnfs-submit-rewrite-layout-allocation.patch
> 0006-SQUASHME-pnfs-submit-fix-pnfs_update_layout-referenc.patch
> 0007-SQUASHME-pnfs_submit-don-t-get-a-reference-on-bounda.patch
> 0008-SQUASHME-pnfs-submit-don-t-reference-the-layout-in-i.patch
> 0009-SQUASHME-pnfs-submit-pnfs_update_layout-always-refer.patch
> 0010-SQUASHME-pnfs-submit-reference-the-layout-when-inser.patch
> 0011-SQUASHME-pnfs-submit-rename-put_layout-to-put_layout.patch
> 0012-SQUASHME-pnfs-submit-reference-layout-across-layoutc.patch
> 0013-SQUASHME-pnfs-submit-reference-layout-for-layoutretu.patch
> 0014-SQUASHME-pnfs-submit-remove-put_layout-from-pnfs_fre.patch
> 0015-SQUASHME-pnfs-submit-do-not-reference-a-layout-in-de.patch
> 0016-SQUASHME-pnfs-submit-remove-grab_current_layout.patch
>
> Testing;
>
> CONFIG_NFS_V4_1 set:
> Connectathon tests pass against GFS2/pNFS and pyNFS file layout server. Tested
> with layout return-on-close off and on.
>
> CONFIG_NFS_V4_1 not set;
> NFSv4.0 mount passes Connectation tests.
>
> -->Andy

Thanks Andy for this work. The code is really getting clearer finally.

Boaz

2010-07-12 15:33:44

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 0/16] pnfs-submit fix layout allocation and reference counting

On 07/08/2010 07:14 PM, William A. (Andy) Adamson wrote:
> On Thu, Jul 8, 2010 at 6:16 AM, Boaz Harrosh <[email protected]> wrote:
>> On 07/08/2010 01:34 AM, [email protected] wrote:
>>
>> Why don't we allocate this at inode allocation.
>
> We don't know if we need it at inode allocation. Remember, GFS2 only
> uses RO layouts.

right! optimize for a specific File System (broken at that).

I don't mind if you leave the current code placement and allocate
layout_type on first layout-request (first call to _pnfs_update_layout)
Then deallocate at inode destruction.

The few places that couple the destruction of all layout segments
and return them, with the layout_type structure deallocation should just
be de-coupled. Don't mix up between the allocation/deallocation of
layout_type and destruction of layout-segments.

> Plus, the protocol allows a file system to use more
> than one layout type and each layout type has a different private
> portion of the layout structure.

We don't support that, now, for other reasons. Lets not go there.

>
>
> Actually this is not true. pnfs_destroy_layout is not only called a
> inode destruction - it is also called in pnfs_reclaim_layout (state
> reclaim after reboot)
>
> In this case, the layout will be destroyed while the inode continues on.
>

You mean the layout-segments. But the layout_type can stay, why deallocate
it? it will be needed again.

> The use and implementation of pnfs_reclaim_layout needs a review.
>
> The question is, when should the nfs_inode->layout be freed when the
> inode is not? These are candidates.
> - reclaim state after server reboot (current code does this)
> - reclaim state after a network partition (current code does this)
> - file system migration
> - switching to a different file system replica
> - CB_LAYOUTRECALL FSID
> - CB_LAYOUTRECALL ALL
>

None of the above. These are all related to on-the-line layouts which
are layout-segments in our code. The governing structure is just an
inode-thing/per-layout-type.

If we follow my simple rules then you see that all these run-away pnfs
flags and states that where put back into nfsi can return to ->layout.

Boaz

2010-07-07 22:57:22

by Gilliam, PaulX J

[permalink] [raw]
Subject: RE: [PATCH 0/16] pnfs-submit fix layout allocation and reference counting



>-----Original Message-----
>From: [email protected] [mailto:linux-nfs-
>[email protected]] On Behalf Of [email protected]
>Sent: Wednesday, July 07, 2010 3:34 PM
>To: [email protected]
>Cc: [email protected]
>Subject: [PATCH 0/16] pnfs-submit fix layout allocation and reference
>counting
>
>
>
>The current nfs_inode has an embedded pnfs_layout_type structure, with per
>layout type private data allocated. Change nfs_inode->layout to be a
>pointer
>to a pnfs_layout_type structure, embed the pnfs_layout_type in the per
>layout type structure, and allocate both.
>
>The current pnfs_layout_type allocation waits on a bit lock to handle
>concurrent allocation attempts. Replace this with the normal form.
>
>The current pnfs_layout_type reference counting is very un-clear, and one
>instance of put_layout was called outside the i_lock which probably was
>causing the intermittant pnfs_layout_type refcount bug we've been seeing.
>
>Replace the nfs_inode->layout reference counting with the following scheme:

I am a newbee and would appreciate a little enlightenment:

I read an article from a few years ago that stated 'The "kobject" structure first made its appearance in the 2.5.45 development kernel. It was initially meant as a simple way of unifying kernel code which manages reference counted objects.'

It seems from my na?ve viewpoint that a "kobject" might be used here. Why not? Does it have something to do with the "mission creep" mentioned in the article?

Thanks for your patients.

-=# Paul Gilliam #=-



2010-07-08 16:14:08

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 0/16] pnfs-submit fix layout allocation and reference counting

On Thu, Jul 8, 2010 at 6:16 AM, Boaz Harrosh <[email protected]> wrote:
> On 07/08/2010 01:34 AM, [email protected] wrote:
>
> Hi Andy,
>
>> The current nfs_inode has an embedded pnfs_layout_type structure, with per
>> layout type private data allocated. Change nfs_inode->layout to be a pointer
>> to a pnfs_layout_type structure, embed the pnfs_layout_type in the per
>> layout type structure, and allocate both.
>>
>
> Amen
>
>> The current pnfs_layout_type allocation waits on a bit lock to handle
>> concurrent allocation attempts. Replace this with the normal form.
>>
>
> Why don't we allocate this at inode allocation.

We don't know if we need it at inode allocation. Remember, GFS2 only
uses RO layouts. Plus, the protocol allows a file system to use more
than one layout type and each layout type has a different private
portion of the layout structure.


> Or inode iget() once
> and be done with it.
> Why the fights, races, and error handling?
> In a normal pnfs-able mount five-9(s) percent of IO operations need
> a layout_type structure. Let's optimize the fast path. On these
> rare "error" cases when we could not get any lsegs and eventually
> did not use the nfsi->layout, who cares that we allocated extra
> 20 bytes.
>
>> The current pnfs_layout_type reference counting is very un-clear, and one
>> instance of put_layout was called outside the i_lock which probably was
>> causing the intermittant pnfs_layout_type refcount bug we've been seeing.
>>
>> Replace the nfs_inode->layout reference counting with the following scheme:
>>
>> As in the current code, the pnfs_layout_type reference counting is always done
>> with the inode->i_lock held.
>>
>> The nfs_inode->layout comes into existence when the first layout_segment is
>> cached and stays until inode is destroyed.

Actually this is not true. pnfs_destroy_layout is not only called a
inode destruction - it is also called in pnfs_reclaim_layout (state
reclaim after reboot)

In this case, the layout will be destroyed while the inode continues on.

The use and implementation of pnfs_reclaim_layout needs a review.

The question is, when should the nfs_inode->layout be freed when the
inode is not? These are candidates.
- reclaim state after server reboot (current code does this)
- reclaim state after a network partition (current code does this)
- file system migration
- switching to a different file system replica
- CB_LAYOUTRECALL FSID
- CB_LAYOUTRECALL ALL

>>
>
> I see that you have thought about my proposal. I have not looked at the
> patches yet, will do soon.
>
> So I have a brave question. If nfs_inode->layout is only freed at
> inode-destroy, why do we need to ref-count it. Refcounting is for
> holding things so they don't go away. But since now the nfsi->layout
> stays until the very end then what's the point?
>
> If you really think it is possible that the layout is held longer then
> the inode itself then, 1- this is surly a bug, I'm not sure you can do
> much with a layout with a dangling inode pointer. 2- Fine then just
> take the inode reference. If you equate the life time of these two
> objects why not use the ref that is already there?
>
>> 1) alloc nfs_inode->layout:
>> ? ? ? ? - Initialized to 1. This holds it around for the clp->cl_layouts
>> ? ? ? ? (layout->lo_layouts) list.
>>
>> 2) Each layoutget
>> ? ?layoutget ? ?GET
>> ? ?layoutget release PUT
>>
>> 3) insert lseg into nfs_inode->layout->segs GET
>> ? ?remove lseg from nfs_inode->layout->segs ?PUT
>>
>> I/O - no reference (except the lseg is used and referenced in 3)
>>
>> 4) Each layoutcommit references the layout which keeps it around while in use
>> ? ?by the call which could race with layoutreturn
>>
>> ? ?layoutcommit GET
>> ? ?layoutcommit release PUT
>>
>> 5) Each layoutreturn references the layout which keeps it around while in use
>> ? ?by the call.
>>
>> ? ?layoutreturn ?GET
>> ? ?layoutreturn release ?PUT
>>
>
> 2, 4, 5 - surly the inode ref-count is taken during RPC?
> 3 - inode gone while IO? I don't think so.
>
>> 6) inode destruction (usually umount)
>>
>> ? ?Destroy_layout PUT to balance initial allocation where it is set to 1.
>>
>
> inode are destructed when evicted from cache and/or at last reference
> drop. Also at umount, but you should start testing like I do, "git clone"
> you'll see inodes start to be destroyed 27 seconds into the operation.
>
>> When the reference moves from 1->0 the layout is removed from the nfs_client
>> cl_layouts list and freed.
>>
>
> The nfs_client->cl_layouts operations can be moved to the first/last lseg
> insertion/removal, as an optimization step.
>
>>
>> Change nfs_inode->layout to be a pointer to a pnfs_layout_type strucure.
>>
>> 0001-SQUASHME-pnfs-submit-add-state-flag-for-layoutcommit.patch
>> 0002-SQUASHME-pnfs-submit-move-pnfs_layout_suspend-back-t.patch
>> 0003-SQUASHME-pnfs-submit-embed-pnfs_layout_type.patch
>> 0004-SQUASHME-pnfs-submit-filelayout-use-new-alloc-free_l.patch
>>
>> Rewrite the pnfs_layout_type allocation and reference counting
>>
>> 0005-SQUASHME-pnfs-submit-rewrite-layout-allocation.patch
>> 0006-SQUASHME-pnfs-submit-fix-pnfs_update_layout-referenc.patch
>> 0007-SQUASHME-pnfs_submit-don-t-get-a-reference-on-bounda.patch
>> 0008-SQUASHME-pnfs-submit-don-t-reference-the-layout-in-i.patch
>> 0009-SQUASHME-pnfs-submit-pnfs_update_layout-always-refer.patch
>> 0010-SQUASHME-pnfs-submit-reference-the-layout-when-inser.patch
>> 0011-SQUASHME-pnfs-submit-rename-put_layout-to-put_layout.patch
>> 0012-SQUASHME-pnfs-submit-reference-layout-across-layoutc.patch
>> 0013-SQUASHME-pnfs-submit-reference-layout-for-layoutretu.patch
>> 0014-SQUASHME-pnfs-submit-remove-put_layout-from-pnfs_fre.patch
>> 0015-SQUASHME-pnfs-submit-do-not-reference-a-layout-in-de.patch
>> 0016-SQUASHME-pnfs-submit-remove-grab_current_layout.patch
>>
>> Testing;
>>
>> CONFIG_NFS_V4_1 set:
>> Connectathon tests pass against GFS2/pNFS and pyNFS file layout server. Tested
>> with layout return-on-close off and on.
>>
>> CONFIG_NFS_V4_1 not set;
>> NFSv4.0 mount passes Connectation tests.
>>
>> -->Andy
>
> Thanks Andy for this work. The code is really getting clearer finally.
>
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-07-12 18:33:53

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

On Mon, Jul 12, 2010 at 1:43 PM, Boaz Harrosh <[email protected]> wrote:
> On 07/12/2010 08:34 PM, William A. (Andy) Adamson wrote:
>> On Mon, Jul 12, 2010 at 12:29 PM, Boaz Harrosh <[email protected]> wrote:
>>>> static inline bool
>>>> layoutcommit_needed(struct nfs_inode *nfsi)
>>>> {
>>>> - return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>>>> + return has_layout(nfsi) &&
>>>> + test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>>
>>> What is the purpose of patch[1/1] ?
>>
>> To not segfault when the nfs_inode->layout field is NULL.
>>
>
> Yes that is this patch [3/16].
>
> I was asking about [PATCH 01/16] that changed it to a state_bit from lo_cred
> why do we need that one. Now that you proved that has_layout(nfsi) cannot become
> negative on us while asking?

Hmm. Looks like you're right, it's not needed. We do need to document
that the existance of the lo_cred is what signals the need for layout
commit.....

Thanks

-->Andy

>
>>>
>>> you are unlocked and asking about a layout pointer. Which according to you might
>>> go away mid-flight?
>>
>> No, it simply has not been allocated. The pnfs_layout_type reference
>> counting will be done under the inode->i_lock and all in-flight RPC's
>> will hold a reference, so there is no reason to hold a lock to call
>> has_layout().
>>
>
> I agree it is safe.
>
> Boaz
>
>> -->Andy
>>
>

2010-07-13 13:39:08

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 05/16] SQUASHME pnfs-submit: rewrite layout allocation

On Mon, Jul 12, 2010 at 4:30 AM, Benny Halevy <[email protected]> wro=
te:
> On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Don't wait on a bit if layout is not allocated. Instead, allocate,
>> check for existance, and possibly free.
>>
>> As per new layout refcounting scheme,don't take a reference on the l=
ayout.
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> =A0fs/nfs/pnfs.c =A0 =A0 =A0 =A0 =A0| =A0 84 +++++++++++++----------=
------------------------
>> =A0include/linux/nfs_fs.h | =A0 =A01 -
>> =A02 files changed, 23 insertions(+), 62 deletions(-)
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 6d435ed..92b7772 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -914,7 +914,6 @@ alloc_init_layout(struct inode *ino)
>> =A0 =A0 =A0 struct pnfs_layout_type *lo;
>> =A0 =A0 =A0 struct layoutdriver_io_operations *io_ops;
>>
>> - =A0 =A0 BUG_ON(NFS_I(ino)->layout !=3D NULL);
>> =A0 =A0 =A0 io_ops =3D NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
>> =A0 =A0 =A0 lo =3D io_ops->alloc_layout(ino);
>> =A0 =A0 =A0 if (!lo) {
>> @@ -923,84 +922,47 @@ alloc_init_layout(struct inode *ino)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 __func__);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return NULL;
>> =A0 =A0 =A0 }
>> - =A0 =A0 spin_lock(&ino->i_lock);
>> =A0 =A0 =A0 lo->refcount =3D 1;
>> =A0 =A0 =A0 INIT_LIST_HEAD(&lo->lo_layouts);
>> =A0 =A0 =A0 INIT_LIST_HEAD(&lo->segs);
>> =A0 =A0 =A0 seqlock_init(&lo->seqlock);
>> =A0 =A0 =A0 lo->lo_inode =3D ino;
>> - =A0 =A0 NFS_I(ino)->layout =3D lo;
>> - =A0 =A0 spin_unlock(&ino->i_lock);
>> =A0 =A0 =A0 return lo;
>> =A0}
>>
>> -static int pnfs_wait_schedule(void *word)
>> -{
>> - =A0 =A0 if (signal_pending(current))
>> - =A0 =A0 =A0 =A0 =A0 =A0 return -ERESTARTSYS;
>> - =A0 =A0 schedule();
>> - =A0 =A0 return 0;
>> -}
>> -
>> =A0/*
>> - * get, possibly allocate, and lock current_layout
>> + * Lock and possibly allocate the inode layout
>> =A0 *
>> - * Note: If successful, ino->i_lock is taken and the caller
>> - * must put and unlock current_layout when the returned layout is r=
eleased.
>> + * If successful, ino->i_lock is taken, and the caller must unlock.
>> =A0 */
>> =A0static struct pnfs_layout_type *
>> -get_lock_alloc_layout(struct inode *ino)
>> +nfs_lock_alloc_layout(struct inode *ino)
>> =A0{
>> - =A0 =A0 struct nfs_inode *nfsi =3D NFS_I(ino);
>> - =A0 =A0 struct pnfs_layout_type *lo;
>> - =A0 =A0 int res;
>> + =A0 =A0 struct pnfs_layout_type *new =3D NULL;
>>
>> =A0 =A0 =A0 dprintk("%s Begin\n", __func__);
>>
>> =A0 =A0 =A0 spin_lock(&ino->i_lock);
>> - =A0 =A0 while ((lo =3D grab_current_layout(nfsi)) =3D=3D NULL) {
>> + =A0 =A0 if (NFS_I(ino)->layout =3D=3D NULL) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&ino->i_lock);
>
> There's no real need to take the lock here just for checking for NULL=
,
> only down below before actually committing the newly allocated struct=
=2E
> [I'll make this fix)


I'm not taking the lock just to check for NULL. I'm taking the lock
because the function returns a locked i_lock - (or did before you
changed it!)

-->Andy

>
> Benny
>
>> - =A0 =A0 =A0 =A0 =A0 =A0 /* Compete against other threads on who's =
doing the allocation,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0* wait until bit is cleared if we lost =
this race.
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
>> - =A0 =A0 =A0 =A0 =A0 =A0 res =3D wait_on_bit_lock(
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 &nfsi->layout->pnfs_layout=
_state,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NFS_INO_LAYOUT_ALLOC,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_wait_schedule, TASK_K=
ILLABLE);
>> - =A0 =A0 =A0 =A0 =A0 =A0 if (res) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lo =3D ERR_PTR(res);
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
>> - =A0 =A0 =A0 =A0 =A0 =A0 }
>> -
>> - =A0 =A0 =A0 =A0 =A0 =A0 /* Was current_layout already allocated wh=
ile we slept?
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0* If so, retry get_lock'ing it. Otherwi=
se, allocate it.
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
>> - =A0 =A0 =A0 =A0 =A0 =A0 if (nfsi->layout) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&ino->i_lock);
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue;
>> + =A0 =A0 =A0 =A0 =A0 =A0 new =3D alloc_init_layout(ino);
>> + =A0 =A0 =A0 =A0 =A0 =A0 if (new =3D=3D NULL)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return NULL;
>> + =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&ino->i_lock);
>> + =A0 =A0 =A0 =A0 =A0 =A0 if (NFS_I(ino)->layout =3D=3D NULL) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NFS_I(ino)->layout =3D new=
;
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new =3D NULL;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> -
>> - =A0 =A0 =A0 =A0 =A0 =A0 lo =3D alloc_init_layout(ino);
>> - =A0 =A0 =A0 =A0 =A0 =A0 if (lo)
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&ino->i_lock);
>> - =A0 =A0 =A0 =A0 =A0 =A0 else
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lo =3D ERR_PTR(-ENOMEM);
>> -
>> - =A0 =A0 =A0 =A0 =A0 =A0 /* release the NFS_INO_LAYOUT_ALLOC bit an=
d wake up waiters */
>> - =A0 =A0 =A0 =A0 =A0 =A0 clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&nfsi->=
layout->pnfs_layout_state);
>> - =A0 =A0 =A0 =A0 =A0 =A0 wake_up_bit(&nfsi->layout->pnfs_layout_sta=
te,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NFS_INO_LAYOUT_ALL=
OC);
>> - =A0 =A0 =A0 =A0 =A0 =A0 break;
>> =A0 =A0 =A0 }
>> -
>> -#ifdef NFS_DEBUG
>> - =A0 =A0 if (!IS_ERR(lo))
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s Return %p\n", __func__, lo);
>> - =A0 =A0 else
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s Return error %ld\n", __func__,=
PTR_ERR(lo));
>> -#endif
>> - =A0 =A0 return lo;
>> + =A0 =A0 if (new) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 /* Reference the layout accross i_lock rel=
ease and grab */
>> + =A0 =A0 =A0 =A0 =A0 =A0 get_layout(NFS_I(ino)->layout);
>> + =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&ino->i_lock);
>> + =A0 =A0 =A0 =A0 =A0 =A0 NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->=
free_layout(new);
>> + =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&ino->i_lock);
>> + =A0 =A0 =A0 =A0 =A0 =A0 put_layout(NFS_I(ino)->layout);
>> + =A0 =A0 }
>> + =A0 =A0 return NFS_I(ino)->layout;
>> =A0}
>>
>> =A0/*
>> @@ -1088,8 +1050,8 @@ _pnfs_update_layout(struct inode *ino,
>>
>> =A0 =A0 =A0 if (take_ref)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 *lsegpp =3D NULL;
>> - =A0 =A0 lo =3D get_lock_alloc_layout(ino);
>> - =A0 =A0 if (IS_ERR(lo)) {
>> + =A0 =A0 lo =3D nfs_lock_alloc_layout(ino);
>> + =A0 =A0 if (lo =3D=3D NULL) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s ERROR: can't get pnfs_layout=
_type\n", __func__);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>> =A0 =A0 =A0 }
>> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
>> index 7b999a4..005b3ea 100644
>> --- a/include/linux/nfs_fs.h
>> +++ b/include/linux/nfs_fs.h
>> @@ -107,7 +107,6 @@ struct pnfs_layout_type {
>> =A0 =A0 =A0 unsigned long pnfs_layout_state;
>> =A0 =A0 =A0 #define NFS_INO_RO_LAYOUT_FAILED 0 =A0 =A0 =A0/* get ro =
layout failed stop trying */
>> =A0 =A0 =A0 #define NFS_INO_RW_LAYOUT_FAILED 1 =A0 =A0 =A0/* get rw =
layout failed stop trying */
>> - =A0 =A0 #define NFS_INO_LAYOUT_ALLOC =A0 =A0 2 =A0 =A0 =A0/* bit l=
ock for layout allocation */
>> =A0 =A0 =A0 #define NFS_INO_LAYOUTCOMMIT =A0 =A0 3 =A0 =A0 =A0/* LAY=
OUTCOMMIT needed */
>> =A0 =A0 =A0 struct rpc_cred =A0 =A0 =A0 =A0 *lo_cred; /* layoutcommi=
t credential */
>> =A0 =A0 =A0 /* DH: These vars keep track of the maximum write range
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2010-07-13 13:50:02

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Mon, Jul 12, 2010 at 2:29 PM, Benny Halevy <[email protected]> wro=
te:
> On Jul. 12, 2010, 21:27 +0300, "William A. (Andy) Adamson" <androsada=
[email protected]> wrote:
>> On Mon, Jul 12, 2010 at 1:19 PM, Benny Halevy <[email protected]> =
wrote:
>>> On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
>>>> From: Andy Adamson <[email protected]>
>>>>
>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>> ---
>>>> =A0fs/nfs/nfs4proc.c | =A0 =A02 ++
>>>> =A0fs/nfs/pnfs.c =A0 =A0 | =A0 13 +++++++++++++
>>>> =A0fs/nfs/pnfs.h =A0 =A0 | =A0 =A01 +
>>>> =A03 files changed, 16 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index 6acebc3..f763746 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *=
lcdata)
>>>> =A0 =A0 =A0 struct pnfs_layoutcommit_data *data =3D
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 (struct pnfs_layoutcommit_data *)lcdat=
a;
>>>>
>>>> + =A0 =A0 /* Matched by get_layout in pnfs_layoutcommit_inode */
>>>> + =A0 =A0 put_layout(data->args.inode);
>>>> =A0 =A0 =A0 put_rpccred(data->cred);
>>>> =A0 =A0 =A0 pnfs_layoutcommit_free(lcdata);
>>>> =A0}
>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>> index aa16e5d..d42c5da 100644
>>>> --- a/fs/nfs/pnfs.c
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo=
)
>>>> =A0}
>>>>
>>>> =A0void
>>>> +put_layout(struct inode *inode)
>>>> +{
>>>> + =A0 =A0 spin_lock(&inode->i_lock);
>>>> + =A0 =A0 put_layout_locked(NFS_I(inode)->layout);
>>>> + =A0 =A0 spin_unlock(&inode->i_lock);
>>>> +
>>>> +}
>>>> +
>>>> +void
>>>> =A0pnfs_layout_release(struct pnfs_layout_type *lo,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_pnfs_layout_segmen=
t *range)
>>>> =A0{
>>>> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode,=
int sync)
>>>> =A0 =A0 =A0 __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_=
layout_state);
>>>> =A0 =A0 =A0 pnfs_get_layout_stateid(&data->args.stateid, nfsi->lay=
out);
>>>>
>>>> + =A0 =A0 /* Reference for layoutcommit matched in pnfs_layoutcomm=
it_release */
>>>> + =A0 =A0 get_layout(NFS_I(inode)->layout);
>>>> +
>>>
>>> So from the 30000 foot level, I need to remind myself what do
>>> we need the refcount on the layout hdr (pnfs_layout_type) for?
>>
>>> Can we really use it detached from the inode? NO
>>> Is it only for debugging to make catch the case that the inode
>>> is released while there are references to the layout?
>>
>> When we migrate the filesystem, we need to reap the nfs_inode->layou=
t
>> while keeping the nfs_inode.
>>
>
> But how does the refcount help us with that?
> Don't we have to do this synchronously before reusing the nfs_inode?

We need to drain all slots which could have async LAYOUTCOMMITs or
async LAYOUTRETURNs still on the wire. The refcount keeps the
nfs_inode->layout allocated for the LAYOUTCOMMIT return.


-->Andy

>
> Benny
>
>> Same with server reboot, network partition, and use of a different r=
eplica.
>>
>> -->Andy
>>
>>>
>>> Benny
>>>
>>>> =A0 =A0 =A0 spin_unlock(&inode->i_lock);
>>>>
>>>> =A0 =A0 =A0 /* Set up layout commit args */
>>>> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode,=
int sync)
>>>> =A0 =A0 =A0 if (status) {
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* The layout driver failed to setup t=
he layoutcommit */
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 put_rpccred(data->cred);
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 put_layout(inode);
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out_free;
>>>> =A0 =A0 =A0 }
>>>> =A0 =A0 =A0 status =3D pnfs4_proc_layoutcommit(data, sync);
>>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>>> index 9b0fed4..e04b9d4 100644
>>>> --- a/fs/nfs/pnfs.h
>>>> +++ b/fs/nfs/pnfs.h
>>>> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type=
*, struct nfs4_pnfs_layout_segm
>>>> =A0void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const nfs4_=
stateid *stateid);
>>>> =A0void pnfs_destroy_layout(struct nfs_inode *);
>>>> +void put_layout(struct inode *inode);
>>>>
>>>> =A0#define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &=
& =A0 =A0 \
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0(srv)->pnfs_curr_ld->ld_io_ops && =A0\
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs=
" in
>>> the body of a message to [email protected]
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>

2010-07-07 22:34:22

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 01/16] SQUASHME pnfs-submit: add state flag for layoutcommit_needed

From: Fred Isaman <[email protected]>

layoutcommit_needed is called outside of any lock.
This avoids locking and existance of layout issues

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 3 +++
include/linux/nfs4_pnfs.h | 2 +-
include/linux/nfs_fs.h | 1 +
3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index fb9374b..5d472c6 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -154,6 +154,8 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
spin_lock(&nfsi->vfs_inode.i_lock);
if (has_layout(nfsi) && !layoutcommit_needed(nfsi)) {
nfsi->layout.lo_cred = get_rpccred(ctx->state->owner->so_cred);
+ __set_bit(NFS_INO_LAYOUTCOMMIT,
+ &nfsi->layout.pnfs_layout_state);
nfsi->change_attr++;
spin_unlock(&nfsi->vfs_inode.i_lock);
dprintk("%s: Set layoutcommit\n", __func__);
@@ -1643,6 +1645,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
nfsi->layout.pnfs_write_begin_pos = 0;
nfsi->layout.pnfs_write_end_pos = 0;
nfsi->layout.lo_cred = NULL;
+ __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
pnfs_get_layout_stateid(&data->args.stateid, &nfsi->layout);

spin_unlock(&inode->i_lock);
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 782fdd9..c5af82f 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -83,7 +83,7 @@ has_layout(struct nfs_inode *nfsi)
static inline bool
layoutcommit_needed(struct nfs_inode *nfsi)
{
- return nfsi->layout.lo_cred != NULL;
+ return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
}

#else /* CONFIG_NFS_V4_1 */
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a60d880..3394d1e 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -109,6 +109,7 @@ struct pnfs_layout_type {
#define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
#define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
#define NFS_INO_LAYOUT_ALLOC 2 /* bit lock for layout allocation */
+ #define NFS_INO_LAYOUTCOMMIT 3 /* LAYOUTCOMMIT needed */
time_t pnfs_layout_suspend;
struct rpc_cred *lo_cred; /* layoutcommit credential */
/* DH: These vars keep track of the maximum write range
--
1.6.6


2010-07-07 22:34:23

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 02/16] SQUASHME: pnfs-submit: move pnfs_layout_suspend back to nfs_inode

From: Fred Isaman <[email protected]>

Prepare to embed struct pnfs_layout_tupe into layout driver private structs
and to make layout a pointer. Since a temp error on first LAYOUTGET
erases the layout, the suspend timer needs to be moved.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/inode.c | 1 +
fs/nfs/pnfs.c | 8 ++++----
include/linux/nfs_fs.h | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 229cdab..3d51602 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1396,6 +1396,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
init_rwsem(&nfsi->rwsem);
#ifdef CONFIG_NFS_V4_1
init_waitqueue_head(&nfsi->lo_waitq);
+ nfsi->pnfs_layout_suspend = 0;
seqlock_init(&nfsi->layout.seqlock);
INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
INIT_LIST_HEAD(&nfsi->layout.segs);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5d472c6..53c15b1 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1114,12 +1114,12 @@ _pnfs_update_layout(struct inode *ino,

/* if get layout already failed once goto out */
if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
- if (unlikely(nfsi->layout.pnfs_layout_suspend &&
- get_seconds() >= nfsi->layout.pnfs_layout_suspend)) {
+ if (unlikely(nfsi->pnfs_layout_suspend &&
+ get_seconds() >= nfsi->pnfs_layout_suspend)) {
dprintk("%s: layout_get resumed\n", __func__);
clear_bit(lo_fail_bit(iomode),
&nfsi->layout.pnfs_layout_state);
- nfsi->layout.pnfs_layout_suspend = 0;
+ nfsi->pnfs_layout_suspend = 0;
} else
goto out_put;
}
@@ -1233,7 +1233,7 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
}

/* remember that get layout failed and suspend trying */
- nfsi->layout.pnfs_layout_suspend = suspend;
+ nfsi->pnfs_layout_suspend = suspend;
set_bit(lo_fail_bit(lgp->args.lseg.iomode),
&nfsi->layout.pnfs_layout_state);
dprintk("%s: layout_get suspended until %ld\n",
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 3394d1e..b612746 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -110,7 +110,6 @@ struct pnfs_layout_type {
#define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
#define NFS_INO_LAYOUT_ALLOC 2 /* bit lock for layout allocation */
#define NFS_INO_LAYOUTCOMMIT 3 /* LAYOUTCOMMIT needed */
- time_t pnfs_layout_suspend;
struct rpc_cred *lo_cred; /* layoutcommit credential */
/* DH: These vars keep track of the maximum write range
* so the values can be used for layoutcommit.
@@ -208,6 +207,7 @@ struct nfs_inode {
#if defined(CONFIG_NFS_V4_1)
wait_queue_head_t lo_waitq;
struct pnfs_layout_type layout;
+ time_t pnfs_layout_suspend;
#endif /* CONFIG_NFS_V4_1 */
#endif /* CONFIG_NFS_V4*/
#ifdef CONFIG_NFS_FSCACHE
--
1.6.6


2010-07-07 22:34:25

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 04/16] SQUASHME pnfs-submit: filelayout: use new alloc/free_layout API

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4filelayout.c | 15 +++++++++------
fs/nfs/nfs4filelayout.h | 7 +++++++
2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 57a0010..b49ccf4 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -301,20 +301,23 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
* will use the pnfs_layout_type type to refer to the layout for this
* inode from now on.
*/
-static void *
+static struct pnfs_layout_type *
filelayout_alloc_layout(struct inode *inode)
{
+ struct nfs4_filelayout *flp;
+
dprintk("NFS_FILELAYOUT: allocating layout\n");
- return kzalloc(sizeof(struct nfs4_filelayout), GFP_KERNEL);
+ flp = kzalloc(sizeof(struct nfs4_filelayout), GFP_KERNEL);
+ return flp ? &flp->fl_layout : NULL;
}

/* Free a filelayout layout structure
*/
static void
-filelayout_free_layout(void *layoutid)
+filelayout_free_layout(struct pnfs_layout_type *lo)
{
dprintk("NFS_FILELAYOUT: freeing layout\n");
- kfree(layoutid);
+ kfree(lo);
}

/*
@@ -471,7 +474,7 @@ static struct pnfs_layout_segment *
filelayout_alloc_lseg(struct pnfs_layout_type *layoutid,
struct nfs4_pnfs_layoutget_res *lgr)
{
- struct nfs4_filelayout *flo = PNFS_LD_DATA(layoutid);
+ struct nfs4_filelayout *flo = FILE_LO(layoutid);
struct pnfs_layout_segment *lseg;
int rc;

@@ -693,7 +696,7 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
ssize_t
filelayout_get_stripesize(struct pnfs_layout_type *layoutid)
{
- struct nfs4_filelayout *flo = PNFS_LD_DATA(layoutid);
+ struct nfs4_filelayout *flo = FILE_LO(layoutid);

return flo->stripe_unit;
}
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index de8391f..fd25e76 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -68,6 +68,7 @@ struct nfs4_filelayout_segment {
};

struct nfs4_filelayout {
+ struct pnfs_layout_type fl_layout;
int uncommitted_write;
loff_t last_commit_size;
u64 layout_id;
@@ -77,6 +78,12 @@ struct nfs4_filelayout {
extern struct nfs_fh *
nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset);

+static inline struct nfs4_filelayout *
+FILE_LO(struct pnfs_layout_type *lo)
+{
+ return container_of(lo, struct nfs4_filelayout, fl_layout);
+}
+
extern struct pnfs_client_operations *pnfs_callback_ops;

extern void nfs4_fl_free_deviceid_callback(struct kref *);
--
1.6.6


2010-07-07 22:34:24

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

From: Andy Adamson <[email protected]>

Change nfs_inode layout field to a pointer

Each layoutdriver embeds the pnfs_layout_type in it's private layout cache
head structure, and allocates them both with the alloc_layout
layoutdriver_io_operation.

NOTE API change: the layoutdriver_io_operation alloc_layout and free_layout now
use pnfs_layout_type pointers instead of void*.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/callback_proc.c | 2 +-
fs/nfs/inode.c | 12 +-----
fs/nfs/nfs4proc.c | 2 +-
fs/nfs/nfs4state.c | 4 +-
fs/nfs/pnfs.c | 111 +++++++++++++++++++++++----------------------
include/linux/nfs4_pnfs.h | 19 +++-----
include/linux/nfs_fs.h | 4 +-
7 files changed, 71 insertions(+), 83 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 4766695..d999ea8 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -233,7 +233,7 @@ static int pnfs_recall_layout(void *data)
rl.cbl_seg.length = NFS4_MAX_UINT64;

if (rl.cbl_recall_type == RETURN_FILE) {
- if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout,
+ if (pnfs_is_next_layout_stateid(NFS_I(inode)->layout,
rl.cbl_stateid))
status = pnfs_return_layout(inode, &rl.cbl_seg,
&rl.cbl_stateid, RETURN_FILE,
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 3d51602..30252f4 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1397,17 +1397,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
#ifdef CONFIG_NFS_V4_1
init_waitqueue_head(&nfsi->lo_waitq);
nfsi->pnfs_layout_suspend = 0;
- seqlock_init(&nfsi->layout.seqlock);
- INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
- INIT_LIST_HEAD(&nfsi->layout.segs);
- nfsi->layout.refcount = 0;
- nfsi->layout.ld_data = NULL;
- nfsi->layout.pnfs_layout_state = 0;
- memset(&nfsi->layout.stateid, 0, NFS4_STATEID_SIZE);
- nfsi->layout.roc_iomode = 0;
- nfsi->layout.lo_cred = NULL;
- nfsi->layout.pnfs_write_begin_pos = 0;
- nfsi->layout.pnfs_write_end_pos = 0;
+ nfsi->layout = NULL;
#endif /* CONFIG_NFS_V4_1 */
#endif
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c5b8a2f..6acebc3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1086,7 +1086,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
* flag during grace period
*/
pnfs_destroy_layout(NFS_I(state->inode));
- pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
+ pnfs_set_layout_stateid(NFS_I(state->inode)->layout, &zero_stateid);
#endif /* CONFIG_NFS_V4_1 */
}

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index bfe679b..6f44cb0 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -597,10 +597,10 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state,
#ifdef CONFIG_NFS_V4_1
struct nfs_inode *nfsi = NFS_I(state->inode);

- if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
+ if (has_layout(nfsi) && nfsi->layout->roc_iomode) {
struct nfs4_pnfs_layout_segment range;

- range.iomode = nfsi->layout.roc_iomode;
+ range.iomode = nfsi->layout->roc_iomode;
range.offset = 0;
range.length = NFS4_MAX_UINT64;
pnfs_return_layout(state->inode, &range, NULL,
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 53c15b1..6d435ed 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -152,10 +152,11 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
{
dprintk("%s: has_layout=%d ctx=%p\n", __func__, has_layout(nfsi), ctx);
spin_lock(&nfsi->vfs_inode.i_lock);
- if (has_layout(nfsi) && !layoutcommit_needed(nfsi)) {
- nfsi->layout.lo_cred = get_rpccred(ctx->state->owner->so_cred);
+ if (has_layout(nfsi) && !
+ !test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state)) {
+ nfsi->layout->lo_cred = get_rpccred(ctx->state->owner->so_cred);
__set_bit(NFS_INO_LAYOUTCOMMIT,
- &nfsi->layout.pnfs_layout_state);
+ &nfsi->layout->pnfs_layout_state);
nfsi->change_attr++;
spin_unlock(&nfsi->vfs_inode.i_lock);
dprintk("%s: Set layoutcommit\n", __func__);
@@ -175,17 +176,17 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
loff_t end_pos;

spin_lock(&nfsi->vfs_inode.i_lock);
- if (offset < nfsi->layout.pnfs_write_begin_pos)
- nfsi->layout.pnfs_write_begin_pos = offset;
+ if (offset < nfsi->layout->pnfs_write_begin_pos)
+ nfsi->layout->pnfs_write_begin_pos = offset;
end_pos = offset + extent - 1; /* I'm being inclusive */
- if (end_pos > nfsi->layout.pnfs_write_end_pos)
- nfsi->layout.pnfs_write_end_pos = end_pos;
+ if (end_pos > nfsi->layout->pnfs_write_end_pos)
+ nfsi->layout->pnfs_write_end_pos = end_pos;
dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
__func__,
(unsigned long) extent,
(unsigned long) offset ,
- (unsigned long) nfsi->layout.pnfs_write_begin_pos,
- (unsigned long) nfsi->layout.pnfs_write_end_pos);
+ (unsigned long) nfsi->layout->pnfs_write_begin_pos,
+ (unsigned long) nfsi->layout->pnfs_write_end_pos);
spin_unlock(&nfsi->vfs_inode.i_lock);
}

@@ -326,10 +327,9 @@ get_layout(struct pnfs_layout_type *lo)
static inline struct pnfs_layout_type *
grab_current_layout(struct nfs_inode *nfsi)
{
- struct pnfs_layout_type *lo = &nfsi->layout;
+ struct pnfs_layout_type *lo = nfsi->layout;

- BUG_ON_UNLOCKED_LO(lo);
- if (!lo->ld_data)
+ if (!lo)
return NULL;
get_layout(lo);
return lo;
@@ -343,13 +343,13 @@ put_layout(struct pnfs_layout_type *lo)

lo->refcount--;
if (!lo->refcount) {
- struct layoutdriver_io_operations *io_ops =
- PNFS_LD_IO_OPS(lo);
+ struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
+ struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);

- dprintk("%s: freeing layout %p\n", __func__, lo->ld_data);
+ dprintk("%s: freeing layout cache %p\n", __func__, lo);
WARN_ON(!list_empty(&lo->lo_layouts));
- io_ops->free_layout(lo->ld_data);
- lo->ld_data = NULL;
+ io_ops->free_layout(lo);
+ nfsi->layout = NULL;
}
}

@@ -381,14 +381,13 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
lo = grab_current_layout(nfsi);
if (lo) {
pnfs_free_layout(lo, &range);
- WARN_ON(!list_empty(&nfsi->layout.segs));
- WARN_ON(!list_empty(&nfsi->layout.lo_layouts));
- WARN_ON(nfsi->layout.ld_data);
+ WARN_ON(!list_empty(&nfsi->layout->segs));
+ WARN_ON(!list_empty(&nfsi->layout->lo_layouts));

- if (nfsi->layout.refcount != 1)
+ if (nfsi->layout->refcount != 1)
printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
- __func__, nfsi->layout.refcount);
- WARN_ON(nfsi->layout.refcount != 1);
+ __func__, nfsi->layout->refcount);
+ WARN_ON(nfsi->layout->refcount != 1);

put_layout(lo);
}
@@ -698,7 +697,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
bool ret = false;

spin_lock(&nfsi->vfs_inode.i_lock);
- list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
+ list_for_each_entry(lseg, &nfsi->layout->segs, fi_list) {
if (!should_free_lseg(lseg, range))
continue;
lseg->valid = false;
@@ -904,29 +903,33 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
dprintk("%s:Return\n", __func__);
}

+/*
+ * Each layoutdriver embeds pnfs_layout_type as the first field in it's
+ * per-layout type layout cache structure and returns it ZEROed
+ * from layoutdriver_io_ops->alloc_layout
+ */
static struct pnfs_layout_type *
alloc_init_layout(struct inode *ino)
{
- void *ld_data;
struct pnfs_layout_type *lo;
struct layoutdriver_io_operations *io_ops;

+ BUG_ON(NFS_I(ino)->layout != NULL);
io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
- lo = &NFS_I(ino)->layout;
- ld_data = io_ops->alloc_layout(ino);
- if (!ld_data) {
+ lo = io_ops->alloc_layout(ino);
+ if (!lo) {
printk(KERN_ERR
"%s: out of memory: io_ops->alloc_layout failed\n",
__func__);
return NULL;
}
-
spin_lock(&ino->i_lock);
- BUG_ON(lo->ld_data != NULL);
- lo->ld_data = ld_data;
- memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
lo->refcount = 1;
- lo->roc_iomode = 0;
+ INIT_LIST_HEAD(&lo->lo_layouts);
+ INIT_LIST_HEAD(&lo->segs);
+ seqlock_init(&lo->seqlock);
+ lo->lo_inode = ino;
+ NFS_I(ino)->layout = lo;
spin_unlock(&ino->i_lock);
return lo;
}
@@ -961,7 +964,7 @@ get_lock_alloc_layout(struct inode *ino)
* wait until bit is cleared if we lost this race.
*/
res = wait_on_bit_lock(
- &nfsi->layout.pnfs_layout_state,
+ &nfsi->layout->pnfs_layout_state,
NFS_INO_LAYOUT_ALLOC,
pnfs_wait_schedule, TASK_KILLABLE);
if (res) {
@@ -972,7 +975,7 @@ get_lock_alloc_layout(struct inode *ino)
/* Was current_layout already allocated while we slept?
* If so, retry get_lock'ing it. Otherwise, allocate it.
*/
- if (nfsi->layout.ld_data) {
+ if (nfsi->layout) {
spin_lock(&ino->i_lock);
continue;
}
@@ -985,8 +988,8 @@ get_lock_alloc_layout(struct inode *ino)

/* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
- &nfsi->layout.pnfs_layout_state);
- wake_up_bit(&nfsi->layout.pnfs_layout_state,
+ &nfsi->layout->pnfs_layout_state);
+ wake_up_bit(&nfsi->layout->pnfs_layout_state,
NFS_INO_LAYOUT_ALLOC);
break;
}
@@ -1113,12 +1116,12 @@ _pnfs_update_layout(struct inode *ino,
}

/* if get layout already failed once goto out */
- if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
+ if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
if (unlikely(nfsi->pnfs_layout_suspend &&
get_seconds() >= nfsi->pnfs_layout_suspend)) {
dprintk("%s: layout_get resumed\n", __func__);
clear_bit(lo_fail_bit(iomode),
- &nfsi->layout.pnfs_layout_state);
+ &nfsi->layout->pnfs_layout_state);
nfsi->pnfs_layout_suspend = 0;
} else
goto out_put;
@@ -1130,7 +1133,7 @@ _pnfs_update_layout(struct inode *ino,
send_layoutget(ino, ctx, &arg, lsegpp, lo);
out:
dprintk("%s end, state 0x%lx lseg %p\n", __func__,
- nfsi->layout.pnfs_layout_state, lseg);
+ nfsi->layout->pnfs_layout_state, lseg);
return;
out_put:
if (lsegpp)
@@ -1235,12 +1238,12 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
/* remember that get layout failed and suspend trying */
nfsi->pnfs_layout_suspend = suspend;
set_bit(lo_fail_bit(lgp->args.lseg.iomode),
- &nfsi->layout.pnfs_layout_state);
+ &nfsi->layout->pnfs_layout_state);
dprintk("%s: layout_get suspended until %ld\n",
__func__, suspend);
out:
dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
- __func__, lgp->status, nfsi->layout.pnfs_layout_state, lseg);
+ __func__, lgp->status, nfsi->layout->pnfs_layout_state, lseg);
return;
}

@@ -1321,7 +1324,7 @@ pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)

pgio->pg_test = NULL;

- laytype = &NFS_I(inode)->layout;
+ laytype = NFS_I(inode)->layout;
ld = NFS_SERVER(inode)->pnfs_curr_ld;
if (!pnfs_enabled_sb(NFS_SERVER(inode)) || !laytype)
return;
@@ -1454,7 +1457,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
numpages);
wdata->pdata.lseg = lseg;
trypnfs = nfss->pnfs_curr_ld->ld_io_ops->write_pagelist(
- &nfsi->layout,
+ nfsi->layout,
args->pages,
args->pgbase,
numpages,
@@ -1507,7 +1510,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
__func__, numpages);
rdata->pdata.lseg = lseg;
trypnfs = nfss->pnfs_curr_ld->ld_io_ops->read_pagelist(
- &nfsi->layout,
+ nfsi->layout,
args->pages,
args->pgbase,
numpages,
@@ -1568,7 +1571,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
* This will be done by passing the buck to the layout driver.
*/
data->pdata.lseg = NULL;
- trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
+ trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(nfsi->layout,
sync, data);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
@@ -1639,14 +1642,14 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
/* Clear layoutcommit properties in the inode so
* new lc info can be generated
*/
- write_begin_pos = nfsi->layout.pnfs_write_begin_pos;
- write_end_pos = nfsi->layout.pnfs_write_end_pos;
- data->cred = nfsi->layout.lo_cred;
- nfsi->layout.pnfs_write_begin_pos = 0;
- nfsi->layout.pnfs_write_end_pos = 0;
- nfsi->layout.lo_cred = NULL;
- __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
- pnfs_get_layout_stateid(&data->args.stateid, &nfsi->layout);
+ write_begin_pos = nfsi->layout->pnfs_write_begin_pos;
+ write_end_pos = nfsi->layout->pnfs_write_end_pos;
+ data->cred = nfsi->layout->lo_cred;
+ nfsi->layout->pnfs_write_begin_pos = 0;
+ nfsi->layout->pnfs_write_end_pos = 0;
+ nfsi->layout->lo_cred = NULL;
+ __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
+ pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);

spin_unlock(&inode->i_lock);

diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index c5af82f..2c6ce4c 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -35,13 +35,13 @@ struct pnfs_layoutdriver_type {
static inline struct nfs_inode *
PNFS_NFS_INODE(struct pnfs_layout_type *lo)
{
- return container_of(lo, struct nfs_inode, layout);
+ return NFS_I(lo->lo_inode);
}

static inline struct inode *
PNFS_INODE(struct pnfs_layout_type *lo)
{
- return &PNFS_NFS_INODE(lo)->vfs_inode;
+ return lo->lo_inode;
}

static inline struct nfs_server *
@@ -50,12 +50,6 @@ PNFS_NFS_SERVER(struct pnfs_layout_type *lo)
return NFS_SERVER(PNFS_INODE(lo));
}

-static inline void *
-PNFS_LD_DATA(struct pnfs_layout_type *lo)
-{
- return lo->ld_data;
-}
-
static inline struct pnfs_layoutdriver_type *
PNFS_LD(struct pnfs_layout_type *lo)
{
@@ -77,13 +71,14 @@ PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
static inline bool
has_layout(struct nfs_inode *nfsi)
{
- return nfsi->layout.ld_data != NULL;
+ return nfsi->layout != NULL;
}

static inline bool
layoutcommit_needed(struct nfs_inode *nfsi)
{
- return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
+ return has_layout(nfsi) &&
+ test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
}

#else /* CONFIG_NFS_V4_1 */
@@ -149,8 +144,8 @@ struct layoutdriver_io_operations {
/* Layout information. For each inode, alloc_layout is executed once to retrieve an
* inode specific layout structure. Each subsequent layoutget operation results in
* a set_layout call to set the opaque layout in the layout driver.*/
- void * (*alloc_layout) (struct inode *inode);
- void (*free_layout) (void *layoutid);
+ struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
+ void (*free_layout) (struct pnfs_layout_type *);
struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_type *layoutid, struct nfs4_pnfs_layoutget_res *lgr);
void (*free_lseg) (struct pnfs_layout_segment *lseg);

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index b612746..7b999a4 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -104,7 +104,6 @@ struct pnfs_layout_type {
int roc_iomode; /* iomode to return on close, 0=none */
seqlock_t seqlock; /* Protects the stateid */
nfs4_stateid stateid;
- void *ld_data; /* layout driver private data */
unsigned long pnfs_layout_state;
#define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
#define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
@@ -116,6 +115,7 @@ struct pnfs_layout_type {
*/
loff_t pnfs_write_begin_pos;
loff_t pnfs_write_end_pos;
+ struct inode *lo_inode;
};

/*
@@ -206,7 +206,7 @@ struct nfs_inode {
/* pNFS layout information */
#if defined(CONFIG_NFS_V4_1)
wait_queue_head_t lo_waitq;
- struct pnfs_layout_type layout;
+ struct pnfs_layout_type *layout;
time_t pnfs_layout_suspend;
#endif /* CONFIG_NFS_V4_1 */
#endif /* CONFIG_NFS_V4*/
--
1.6.6


2010-07-07 22:34:25

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 05/16] SQUASHME pnfs-submit: rewrite layout allocation

From: Andy Adamson <[email protected]>

Don't wait on a bit if layout is not allocated. Instead, allocate,
check for existance, and possibly free.

As per new layout refcounting scheme,don't take a reference on the layout.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 84 +++++++++++++----------------------------------
include/linux/nfs_fs.h | 1 -
2 files changed, 23 insertions(+), 62 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 6d435ed..92b7772 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -914,7 +914,6 @@ alloc_init_layout(struct inode *ino)
struct pnfs_layout_type *lo;
struct layoutdriver_io_operations *io_ops;

- BUG_ON(NFS_I(ino)->layout != NULL);
io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
lo = io_ops->alloc_layout(ino);
if (!lo) {
@@ -923,84 +922,47 @@ alloc_init_layout(struct inode *ino)
__func__);
return NULL;
}
- spin_lock(&ino->i_lock);
lo->refcount = 1;
INIT_LIST_HEAD(&lo->lo_layouts);
INIT_LIST_HEAD(&lo->segs);
seqlock_init(&lo->seqlock);
lo->lo_inode = ino;
- NFS_I(ino)->layout = lo;
- spin_unlock(&ino->i_lock);
return lo;
}

-static int pnfs_wait_schedule(void *word)
-{
- if (signal_pending(current))
- return -ERESTARTSYS;
- schedule();
- return 0;
-}
-
/*
- * get, possibly allocate, and lock current_layout
+ * Lock and possibly allocate the inode layout
*
- * Note: If successful, ino->i_lock is taken and the caller
- * must put and unlock current_layout when the returned layout is released.
+ * If successful, ino->i_lock is taken, and the caller must unlock.
*/
static struct pnfs_layout_type *
-get_lock_alloc_layout(struct inode *ino)
+nfs_lock_alloc_layout(struct inode *ino)
{
- struct nfs_inode *nfsi = NFS_I(ino);
- struct pnfs_layout_type *lo;
- int res;
+ struct pnfs_layout_type *new = NULL;

dprintk("%s Begin\n", __func__);

spin_lock(&ino->i_lock);
- while ((lo = grab_current_layout(nfsi)) == NULL) {
+ if (NFS_I(ino)->layout == NULL) {
spin_unlock(&ino->i_lock);
- /* Compete against other threads on who's doing the allocation,
- * wait until bit is cleared if we lost this race.
- */
- res = wait_on_bit_lock(
- &nfsi->layout->pnfs_layout_state,
- NFS_INO_LAYOUT_ALLOC,
- pnfs_wait_schedule, TASK_KILLABLE);
- if (res) {
- lo = ERR_PTR(res);
- break;
- }
-
- /* Was current_layout already allocated while we slept?
- * If so, retry get_lock'ing it. Otherwise, allocate it.
- */
- if (nfsi->layout) {
- spin_lock(&ino->i_lock);
- continue;
+ new = alloc_init_layout(ino);
+ if (new == NULL)
+ return NULL;
+ spin_lock(&ino->i_lock);
+ if (NFS_I(ino)->layout == NULL) {
+ NFS_I(ino)->layout = new;
+ new = NULL;
}
-
- lo = alloc_init_layout(ino);
- if (lo)
- spin_lock(&ino->i_lock);
- else
- lo = ERR_PTR(-ENOMEM);
-
- /* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
- clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
- &nfsi->layout->pnfs_layout_state);
- wake_up_bit(&nfsi->layout->pnfs_layout_state,
- NFS_INO_LAYOUT_ALLOC);
- break;
}
-
-#ifdef NFS_DEBUG
- if (!IS_ERR(lo))
- dprintk("%s Return %p\n", __func__, lo);
- else
- dprintk("%s Return error %ld\n", __func__, PTR_ERR(lo));
-#endif
- return lo;
+ if (new) {
+ /* Reference the layout accross i_lock release and grab */
+ get_layout(NFS_I(ino)->layout);
+ spin_unlock(&ino->i_lock);
+ NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->free_layout(new);
+ spin_lock(&ino->i_lock);
+ put_layout(NFS_I(ino)->layout);
+ }
+ return NFS_I(ino)->layout;
}

/*
@@ -1088,8 +1050,8 @@ _pnfs_update_layout(struct inode *ino,

if (take_ref)
*lsegpp = NULL;
- lo = get_lock_alloc_layout(ino);
- if (IS_ERR(lo)) {
+ lo = nfs_lock_alloc_layout(ino);
+ if (lo == NULL) {
dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
goto out;
}
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7b999a4..005b3ea 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -107,7 +107,6 @@ struct pnfs_layout_type {
unsigned long pnfs_layout_state;
#define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
#define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
- #define NFS_INO_LAYOUT_ALLOC 2 /* bit lock for layout allocation */
#define NFS_INO_LAYOUTCOMMIT 3 /* LAYOUTCOMMIT needed */
struct rpc_cred *lo_cred; /* layoutcommit credential */
/* DH: These vars keep track of the maximum write range
--
1.6.6


2010-07-07 22:34:26

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 06/16] SQUASHME pnfs-submit; fix pnfs_update_layout reference counting

From: Andy Adamson <[email protected]>

Only take a reference on the layout if layoutget is to be called.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 92b7772..7219d2b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -362,6 +362,7 @@ pnfs_layout_release(struct pnfs_layout_type *lo,
spin_lock(&nfsi->vfs_inode.i_lock);
if (range)
pnfs_free_layout(lo, range);
+ /* Matched in _pnfs_update_layout for layoutget */
put_layout(lo);
spin_unlock(&nfsi->vfs_inode.i_lock);
wake_up_all(&nfsi->lo_waitq);
@@ -1063,7 +1064,7 @@ _pnfs_update_layout(struct inode *ino,
put_lseg_locked(lseg);
/* someone is cleaning the layout */
lseg = NULL;
- goto out_put;
+ goto out_unlock;
}

if (lseg) {
@@ -1074,7 +1075,7 @@ _pnfs_update_layout(struct inode *ino,
arg.offset,
arg.iomode);

- goto out_put;
+ goto out_unlock;
}

/* if get layout already failed once goto out */
@@ -1086,10 +1087,11 @@ _pnfs_update_layout(struct inode *ino,
&nfsi->layout->pnfs_layout_state);
nfsi->pnfs_layout_suspend = 0;
} else
- goto out_put;
+ goto out_unlock;
}

- /* Lose lock, but not reference, match this with pnfs_layout_release */
+ /* Reference the layout for layoutget matched in pnfs_layout_release */
+ get_layout(lo);
spin_unlock(&ino->i_lock);

send_layoutget(ino, ctx, &arg, lsegpp, lo);
@@ -1097,10 +1099,9 @@ out:
dprintk("%s end, state 0x%lx lseg %p\n", __func__,
nfsi->layout->pnfs_layout_state, lseg);
return;
-out_put:
+out_unlock:
if (lsegpp)
*lsegpp = lseg;
- put_layout(lo);
spin_unlock(&ino->i_lock);
goto out;
}
--
1.6.6


2010-07-07 22:34:28

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 09/16] SQUASHME pnfs-submit: pnfs_update_layout always references the lseg

From: Andy Adamson <[email protected]>

So remove test_ref

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 30 ++++++++++--------------------
1 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 118d34c..d4ffd1c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -997,9 +997,7 @@ has_matching_lseg(struct pnfs_layout_segment *lseg,
*/
static struct pnfs_layout_segment *
pnfs_has_layout(struct pnfs_layout_type *lo,
- struct nfs4_pnfs_layout_segment *range,
- bool take_ref,
- bool only_valid)
+ struct nfs4_pnfs_layout_segment *range)
{
struct pnfs_layout_segment *lseg, *ret = NULL;

@@ -1007,28 +1005,24 @@ pnfs_has_layout(struct pnfs_layout_type *lo,

BUG_ON_UNLOCKED_LO(lo);
list_for_each_entry (lseg, &lo->segs, fi_list) {
- if (has_matching_lseg(lseg, range) &&
- (lseg->valid || !only_valid)) {
+ if (has_matching_lseg(lseg, range)) {
ret = lseg;
- if (take_ref)
- get_lseg(ret);
+ get_lseg(ret);
break;
}
if (cmp_layout(range, &lseg->range) > 0)
break;
}

- dprintk("%s:Return lseg %p take_ref %d ref %d valid %d\n",
- __func__, ret, take_ref,
- ret ? atomic_read(&ret->kref.refcount) : 0,
+ dprintk("%s:Return lseg %p ref %d valid %d\n",
+ __func__, ret, ret ? atomic_read(&ret->kref.refcount) : 0,
ret ? ret->valid : 0);
return ret;
}

/* Update the file's layout for the given range and iomode.
* Layout is retreived from the server if needed.
- * If lsegpp is given, the appropriate layout segment is referenced and
- * returned to the caller.
+ * The appropriate layout segment is referenced and returned to the caller.
*/
void
_pnfs_update_layout(struct inode *ino,
@@ -1046,10 +1040,8 @@ _pnfs_update_layout(struct inode *ino,
struct nfs_inode *nfsi = NFS_I(ino);
struct pnfs_layout_type *lo;
struct pnfs_layout_segment *lseg = NULL;
- bool take_ref = (lsegpp != NULL);

- if (take_ref)
- *lsegpp = NULL;
+ *lsegpp = NULL;
lo = nfs_lock_alloc_layout(ino);
if (lo == NULL) {
dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
@@ -1057,10 +1049,9 @@ _pnfs_update_layout(struct inode *ino,
}

/* Check to see if the layout for the given range already exists */
- lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
+ lseg = pnfs_has_layout(lo, &arg);
if (lseg && !lseg->valid) {
- if (take_ref)
- put_lseg_locked(lseg);
+ put_lseg_locked(lseg);
/* someone is cleaning the layout */
lseg = NULL;
goto out_unlock;
@@ -1099,8 +1090,7 @@ out:
nfsi->layout->pnfs_layout_state, lseg);
return;
out_unlock:
- if (lsegpp)
- *lsegpp = lseg;
+ *lsegpp = lseg;
spin_unlock(&ino->i_lock);
goto out;
}
--
1.6.6


2010-07-07 22:34:30

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 11/16] SQUASHME pnfs-submit: rename put_layout to put_layout_locked

From: Andy Adamson <[email protected]>

In preparation for non-locked put_layout, follow naming convention.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 1372ae1..aa16e5d 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -336,7 +336,7 @@ grab_current_layout(struct nfs_inode *nfsi)
}

static inline void
-put_layout(struct pnfs_layout_type *lo)
+put_layout_locked(struct pnfs_layout_type *lo)
{
BUG_ON_UNLOCKED_LO(lo);
BUG_ON(lo->refcount <= 0);
@@ -363,7 +363,7 @@ pnfs_layout_release(struct pnfs_layout_type *lo,
if (range)
pnfs_free_layout(lo, range);
/* Matched in _pnfs_update_layout for layoutget */
- put_layout(lo);
+ put_layout_locked(lo);
spin_unlock(&nfsi->vfs_inode.i_lock);
wake_up_all(&nfsi->lo_waitq);
}
@@ -390,7 +390,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
__func__, nfsi->layout->refcount);
WARN_ON(nfsi->layout->refcount != 1);

- put_layout(lo);
+ put_layout_locked(lo);
}
spin_unlock(&nfsi->vfs_inode.i_lock);
}
@@ -412,7 +412,7 @@ destroy_lseg(struct kref *kref)

dprintk("--> %s\n", __func__);
/* Matched by get_layout in pnfs_insert_layout */
- put_layout(lseg->layout);
+ put_layout_locked(lseg->layout);
PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
}

@@ -684,7 +684,7 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
spin_lock(&clp->cl_lock);
list_del_init(&lo->lo_layouts);
spin_unlock(&clp->cl_lock);
- put_layout(lo);
+ put_layout_locked(lo);
}

dprintk("%s:Return\n", __func__);
@@ -961,7 +961,7 @@ nfs_lock_alloc_layout(struct inode *ino)
spin_unlock(&ino->i_lock);
NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->free_layout(new);
spin_lock(&ino->i_lock);
- put_layout(NFS_I(ino)->layout);
+ put_layout_locked(NFS_I(ino)->layout);
}
return NFS_I(ino)->layout;
}
--
1.6.6


2010-07-07 22:34:32

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 13/16] SQUASHME pnfs-submit: reference layout for layoutreturn

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 24 +++++++++++++-----------
1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d42c5da..040f9e0 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -371,7 +371,10 @@ pnfs_layout_release(struct pnfs_layout_type *lo,
spin_lock(&nfsi->vfs_inode.i_lock);
if (range)
pnfs_free_layout(lo, range);
- /* Matched in _pnfs_update_layout for layoutget */
+ /*
+ * Matched in _pnfs_update_layout for layoutget
+ * and by get_layout in _pnfs_return_layout for layoutreturn
+ */
put_layout_locked(lo);
spin_unlock(&nfsi->vfs_inode.i_lock);
wake_up_all(&nfsi->lo_waitq);
@@ -776,9 +779,8 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,

if (type == RETURN_FILE) {
spin_lock(&ino->i_lock);
- lo = grab_current_layout(nfsi);
+ lo = nfsi->layout;
if (lo && !has_layout_to_return(lo, &arg)) {
- put_layout(lo);
lo = NULL;
}
if (!lo) {
@@ -787,18 +789,15 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
goto out;
}

- /* unlock w/o put rebalanced by eventual call to
- * pnfs_layout_release
- */
+ /* Reference for layoutreturn matched in pnfs_layout_release */
+ get_layout(lo);
+
spin_unlock(&ino->i_lock);

if (pnfs_return_layout_barrier(nfsi, &arg)) {
if (stateid) { /* callback */
status = -EAGAIN;
- spin_lock(&ino->i_lock);
- put_layout(lo);
- spin_unlock(&ino->i_lock);
- goto out;
+ goto out_put;
}
dprintk("%s: waiting\n", __func__);
wait_event(nfsi->lo_waitq,
@@ -809,7 +808,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
if (stateid && !wait) { /* callback */
dprintk("%s: layoutcommit pending\n", __func__);
status = -EAGAIN;
- goto out;
+ goto out_put;
}
status = pnfs_layoutcommit_inode(ino, wait);
if (status) {
@@ -828,6 +827,9 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
+out_put:
+ put_layout(ino);
+ goto out;
}

void
--
1.6.6


2010-07-07 22:34:28

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 08/16] SQUASHME pnfs-submit: don't reference the layout in init_lseg

From: Andy Adamson <[email protected]>

The layout is referenced when the layout segment is inserted into the cache.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 18ebde9..118d34c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -402,7 +402,6 @@ init_lseg(struct pnfs_layout_type *lo, struct pnfs_layout_segment *lseg)
kref_init(&lseg->kref);
lseg->valid = true;
lseg->layout = lo;
- get_layout(lo);
}

static void
--
1.6.6


2010-07-07 22:34:27

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 07/16] SQUASHME pnfs_submit: don't get a reference on boundary calculation

From: Andy Adamson <[email protected]>

The i_lock needs to be held because in the write path, we don't know if any
of the pages served by the pgio have a layout segment.

FIXME: Review when we calculate the boundary in the write path, perhaps
delaying until we know if pNFS I/O is to occur (at least one page is
referencing a layout segment).

Signed-off-by: Andy Adamon <[email protected]>
---
fs/nfs/pnfs.c | 10 ++--------
1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7219d2b..18ebde9 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1302,8 +1302,6 @@ pnfs_getboundary(struct inode *inode)
u32 stripe_size = 0;
struct nfs_server *nfss = NFS_SERVER(inode);
struct layoutdriver_policy_operations *policy_ops;
- struct nfs_inode *nfsi;
- struct pnfs_layout_type *lo;

if (!nfss->pnfs_curr_ld)
goto out;
@@ -1316,13 +1314,9 @@ pnfs_getboundary(struct inode *inode)
if (pnfs_ld_gather_across_stripes(nfss->pnfs_curr_ld))
goto out;

- nfsi = NFS_I(inode);
spin_lock(&inode->i_lock);
- lo = grab_current_layout(nfsi);;
- if (lo) {
- stripe_size = policy_ops->get_stripesize(lo);
- put_layout(lo);
- }
+ if (NFS_I(inode)->layout)
+ stripe_size = policy_ops->get_stripesize(NFS_I(inode)->layout);
spin_unlock(&inode->i_lock);
out:
return stripe_size;
--
1.6.6


2010-07-07 22:34:32

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 14/16] SQUASHME pnfs-submit: remove put_layout from pnfs_free_layout

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 040f9e0..7631efe 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -402,6 +402,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
__func__, nfsi->layout->refcount);
WARN_ON(nfsi->layout->refcount != 1);

+ /* Matched by refcount set to 1 in alloc_init_layout */
put_layout_locked(lo);
}
spin_unlock(&nfsi->vfs_inode.i_lock);
@@ -696,7 +697,6 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
spin_lock(&clp->cl_lock);
list_del_init(&lo->lo_layouts);
spin_unlock(&clp->cl_lock);
- put_layout_locked(lo);
}

dprintk("%s:Return\n", __func__);
--
1.6.6


2010-07-07 22:34:34

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 16/16] SQUASHME pnfs-submit: remove grab_current_layout

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 11 -----------
1 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 704288d..6832237 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -324,17 +324,6 @@ get_layout(struct pnfs_layout_type *lo)
lo->refcount++;
}

-static inline struct pnfs_layout_type *
-grab_current_layout(struct nfs_inode *nfsi)
-{
- struct pnfs_layout_type *lo = nfsi->layout;
-
- if (!lo)
- return NULL;
- get_layout(lo);
- return lo;
-}
-
static inline void
put_layout_locked(struct pnfs_layout_type *lo)
{
--
1.6.6


2010-07-07 22:34:33

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 15/16] SQUASHME pnfs-submit: do not reference a layout in destroy_layout.

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7631efe..704288d 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -391,7 +391,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
};

spin_lock(&nfsi->vfs_inode.i_lock);
- lo = grab_current_layout(nfsi);
+ lo = nfsi->layout;
if (lo) {
pnfs_free_layout(lo, &range);
WARN_ON(!list_empty(&nfsi->layout->segs));
--
1.6.6


2010-07-07 22:34:31

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

From: Andy Adamson <[email protected]>

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs4proc.c | 2 ++
fs/nfs/pnfs.c | 13 +++++++++++++
fs/nfs/pnfs.h | 1 +
3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6acebc3..f763746 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
struct pnfs_layoutcommit_data *data =
(struct pnfs_layoutcommit_data *)lcdata;

+ /* Matched by get_layout in pnfs_layoutcommit_inode */
+ put_layout(data->args.inode);
put_rpccred(data->cred);
pnfs_layoutcommit_free(lcdata);
}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index aa16e5d..d42c5da 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
}

void
+put_layout(struct inode *inode)
+{
+ spin_lock(&inode->i_lock);
+ put_layout_locked(NFS_I(inode)->layout);
+ spin_unlock(&inode->i_lock);
+
+}
+
+void
pnfs_layout_release(struct pnfs_layout_type *lo,
struct nfs4_pnfs_layout_segment *range)
{
@@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
__clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);

+ /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
+ get_layout(NFS_I(inode)->layout);
+
spin_unlock(&inode->i_lock);

/* Set up layout commit args */
@@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
if (status) {
/* The layout driver failed to setup the layoutcommit */
put_rpccred(data->cred);
+ put_layout(inode);
goto out_free;
}
status = pnfs4_proc_layoutcommit(data, sync);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9b0fed4..e04b9d4 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
const nfs4_stateid *stateid);
void pnfs_destroy_layout(struct nfs_inode *);
+void put_layout(struct inode *inode);

#define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
(srv)->pnfs_curr_ld->ld_io_ops && \
--
1.6.6


2010-07-07 22:34:29

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 10/16] SQUASHME pnfs-submit: reference the layout when inserted into segs list

From: Andy Adamson <[email protected]>

We take a reference on the pnfs_layout_type structure at allocation, so we
don't need to take another reference when adding it to the nfs_client cl_layouts
list.

We do need to take a referece when an lseg is inserted into the layout->segs
list to keep the pnfs_layout_type around while the layout segment is in the
list.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/pnfs.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d4ffd1c..1372ae1 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -411,6 +411,7 @@ destroy_lseg(struct kref *kref)
container_of(kref, struct pnfs_layout_segment, kref);

dprintk("--> %s\n", __func__);
+ /* Matched by get_layout in pnfs_insert_layout */
put_layout(lseg->layout);
PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
}
@@ -876,7 +877,6 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
BUG_ON(!list_empty(&lo->lo_layouts));
list_add_tail(&lo->lo_layouts, &clp->cl_layouts);
spin_unlock(&clp->cl_lock);
- get_layout(lo);
}
list_for_each_entry (lp, &lo->segs, fi_list) {
if (cmp_layout(&lp->range, &lseg->range) > 0)
@@ -899,6 +899,7 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
__func__, lseg, lseg->range.iomode,
lseg->range.offset, lseg->range.length);
}
+ get_layout(lo);

dprintk("%s:Return\n", __func__);
}
--
1.6.6


2010-07-12 08:30:43

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 05/16] SQUASHME pnfs-submit: rewrite layout allocation

On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Don't wait on a bit if layout is not allocated. Instead, allocate,
> check for existance, and possibly free.
>
> As per new layout refcounting scheme,don't take a reference on the layout.
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/pnfs.c | 84 +++++++++++++----------------------------------
> include/linux/nfs_fs.h | 1 -
> 2 files changed, 23 insertions(+), 62 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 6d435ed..92b7772 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -914,7 +914,6 @@ alloc_init_layout(struct inode *ino)
> struct pnfs_layout_type *lo;
> struct layoutdriver_io_operations *io_ops;
>
> - BUG_ON(NFS_I(ino)->layout != NULL);
> io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
> lo = io_ops->alloc_layout(ino);
> if (!lo) {
> @@ -923,84 +922,47 @@ alloc_init_layout(struct inode *ino)
> __func__);
> return NULL;
> }
> - spin_lock(&ino->i_lock);
> lo->refcount = 1;
> INIT_LIST_HEAD(&lo->lo_layouts);
> INIT_LIST_HEAD(&lo->segs);
> seqlock_init(&lo->seqlock);
> lo->lo_inode = ino;
> - NFS_I(ino)->layout = lo;
> - spin_unlock(&ino->i_lock);
> return lo;
> }
>
> -static int pnfs_wait_schedule(void *word)
> -{
> - if (signal_pending(current))
> - return -ERESTARTSYS;
> - schedule();
> - return 0;
> -}
> -
> /*
> - * get, possibly allocate, and lock current_layout
> + * Lock and possibly allocate the inode layout
> *
> - * Note: If successful, ino->i_lock is taken and the caller
> - * must put and unlock current_layout when the returned layout is released.
> + * If successful, ino->i_lock is taken, and the caller must unlock.
> */
> static struct pnfs_layout_type *
> -get_lock_alloc_layout(struct inode *ino)
> +nfs_lock_alloc_layout(struct inode *ino)
> {
> - struct nfs_inode *nfsi = NFS_I(ino);
> - struct pnfs_layout_type *lo;
> - int res;
> + struct pnfs_layout_type *new = NULL;
>
> dprintk("%s Begin\n", __func__);
>
> spin_lock(&ino->i_lock);
> - while ((lo = grab_current_layout(nfsi)) == NULL) {
> + if (NFS_I(ino)->layout == NULL) {
> spin_unlock(&ino->i_lock);

There's no real need to take the lock here just for checking for NULL,
only down below before actually committing the newly allocated struct.
[I'll make this fix)

Benny

> - /* Compete against other threads on who's doing the allocation,
> - * wait until bit is cleared if we lost this race.
> - */
> - res = wait_on_bit_lock(
> - &nfsi->layout->pnfs_layout_state,
> - NFS_INO_LAYOUT_ALLOC,
> - pnfs_wait_schedule, TASK_KILLABLE);
> - if (res) {
> - lo = ERR_PTR(res);
> - break;
> - }
> -
> - /* Was current_layout already allocated while we slept?
> - * If so, retry get_lock'ing it. Otherwise, allocate it.
> - */
> - if (nfsi->layout) {
> - spin_lock(&ino->i_lock);
> - continue;
> + new = alloc_init_layout(ino);
> + if (new == NULL)
> + return NULL;
> + spin_lock(&ino->i_lock);
> + if (NFS_I(ino)->layout == NULL) {
> + NFS_I(ino)->layout = new;
> + new = NULL;
> }
> -
> - lo = alloc_init_layout(ino);
> - if (lo)
> - spin_lock(&ino->i_lock);
> - else
> - lo = ERR_PTR(-ENOMEM);
> -
> - /* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
> - clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
> - &nfsi->layout->pnfs_layout_state);
> - wake_up_bit(&nfsi->layout->pnfs_layout_state,
> - NFS_INO_LAYOUT_ALLOC);
> - break;
> }
> -
> -#ifdef NFS_DEBUG
> - if (!IS_ERR(lo))
> - dprintk("%s Return %p\n", __func__, lo);
> - else
> - dprintk("%s Return error %ld\n", __func__, PTR_ERR(lo));
> -#endif
> - return lo;
> + if (new) {
> + /* Reference the layout accross i_lock release and grab */
> + get_layout(NFS_I(ino)->layout);
> + spin_unlock(&ino->i_lock);
> + NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->free_layout(new);
> + spin_lock(&ino->i_lock);
> + put_layout(NFS_I(ino)->layout);
> + }
> + return NFS_I(ino)->layout;
> }
>
> /*
> @@ -1088,8 +1050,8 @@ _pnfs_update_layout(struct inode *ino,
>
> if (take_ref)
> *lsegpp = NULL;
> - lo = get_lock_alloc_layout(ino);
> - if (IS_ERR(lo)) {
> + lo = nfs_lock_alloc_layout(ino);
> + if (lo == NULL) {
> dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
> goto out;
> }
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 7b999a4..005b3ea 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -107,7 +107,6 @@ struct pnfs_layout_type {
> unsigned long pnfs_layout_state;
> #define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
> #define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
> - #define NFS_INO_LAYOUT_ALLOC 2 /* bit lock for layout allocation */
> #define NFS_INO_LAYOUTCOMMIT 3 /* LAYOUTCOMMIT needed */
> struct rpc_cred *lo_cred; /* layoutcommit credential */
> /* DH: These vars keep track of the maximum write range

2010-07-12 09:13:07

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 09/16] SQUASHME pnfs-submit: pnfs_update_layout always references the lseg

On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> So remove test_ref
>
> Signed-off-by: Andy Adamson <[email protected]>

OK, but this has to be reintroduced in post submit for
better synchronization with parallel layout gets and returns.

Benny

> ---
> fs/nfs/pnfs.c | 30 ++++++++++--------------------
> 1 files changed, 10 insertions(+), 20 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 118d34c..d4ffd1c 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -997,9 +997,7 @@ has_matching_lseg(struct pnfs_layout_segment *lseg,
> */
> static struct pnfs_layout_segment *
> pnfs_has_layout(struct pnfs_layout_type *lo,
> - struct nfs4_pnfs_layout_segment *range,
> - bool take_ref,
> - bool only_valid)
> + struct nfs4_pnfs_layout_segment *range)
> {
> struct pnfs_layout_segment *lseg, *ret = NULL;
>
> @@ -1007,28 +1005,24 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
>
> BUG_ON_UNLOCKED_LO(lo);
> list_for_each_entry (lseg, &lo->segs, fi_list) {
> - if (has_matching_lseg(lseg, range) &&
> - (lseg->valid || !only_valid)) {
> + if (has_matching_lseg(lseg, range)) {
> ret = lseg;
> - if (take_ref)
> - get_lseg(ret);
> + get_lseg(ret);
> break;
> }
> if (cmp_layout(range, &lseg->range) > 0)
> break;
> }
>
> - dprintk("%s:Return lseg %p take_ref %d ref %d valid %d\n",
> - __func__, ret, take_ref,
> - ret ? atomic_read(&ret->kref.refcount) : 0,
> + dprintk("%s:Return lseg %p ref %d valid %d\n",
> + __func__, ret, ret ? atomic_read(&ret->kref.refcount) : 0,
> ret ? ret->valid : 0);
> return ret;
> }
>
> /* Update the file's layout for the given range and iomode.
> * Layout is retreived from the server if needed.
> - * If lsegpp is given, the appropriate layout segment is referenced and
> - * returned to the caller.
> + * The appropriate layout segment is referenced and returned to the caller.
> */
> void
> _pnfs_update_layout(struct inode *ino,
> @@ -1046,10 +1040,8 @@ _pnfs_update_layout(struct inode *ino,
> struct nfs_inode *nfsi = NFS_I(ino);
> struct pnfs_layout_type *lo;
> struct pnfs_layout_segment *lseg = NULL;
> - bool take_ref = (lsegpp != NULL);
>
> - if (take_ref)
> - *lsegpp = NULL;
> + *lsegpp = NULL;
> lo = nfs_lock_alloc_layout(ino);
> if (lo == NULL) {
> dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
> @@ -1057,10 +1049,9 @@ _pnfs_update_layout(struct inode *ino,
> }
>
> /* Check to see if the layout for the given range already exists */
> - lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> + lseg = pnfs_has_layout(lo, &arg);
> if (lseg && !lseg->valid) {
> - if (take_ref)
> - put_lseg_locked(lseg);
> + put_lseg_locked(lseg);
> /* someone is cleaning the layout */
> lseg = NULL;
> goto out_unlock;
> @@ -1099,8 +1090,7 @@ out:
> nfsi->layout->pnfs_layout_state, lseg);
> return;
> out_unlock:
> - if (lsegpp)
> - *lsegpp = lseg;
> + *lsegpp = lseg;
> spin_unlock(&ino->i_lock);
> goto out;
> }

2010-07-12 16:29:56

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

On 07/08/2010 01:34 AM, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Change nfs_inode layout field to a pointer
>
> Each layoutdriver embeds the pnfs_layout_type in it's private layout cache
> head structure, and allocates them both with the alloc_layout
> layoutdriver_io_operation.
>
> NOTE API change: the layoutdriver_io_operation alloc_layout and free_layout now
> use pnfs_layout_type pointers instead of void*.
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/callback_proc.c | 2 +-
> fs/nfs/inode.c | 12 +-----
> fs/nfs/nfs4proc.c | 2 +-
> fs/nfs/nfs4state.c | 4 +-
> fs/nfs/pnfs.c | 111 +++++++++++++++++++++++----------------------
> include/linux/nfs4_pnfs.h | 19 +++-----
> include/linux/nfs_fs.h | 4 +-
> 7 files changed, 71 insertions(+), 83 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 4766695..d999ea8 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -233,7 +233,7 @@ static int pnfs_recall_layout(void *data)
> rl.cbl_seg.length = NFS4_MAX_UINT64;
>
> if (rl.cbl_recall_type == RETURN_FILE) {
> - if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout,
> + if (pnfs_is_next_layout_stateid(NFS_I(inode)->layout,
> rl.cbl_stateid))
> status = pnfs_return_layout(inode, &rl.cbl_seg,
> &rl.cbl_stateid, RETURN_FILE,
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 3d51602..30252f4 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1397,17 +1397,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
> #ifdef CONFIG_NFS_V4_1
> init_waitqueue_head(&nfsi->lo_waitq);
> nfsi->pnfs_layout_suspend = 0;
> - seqlock_init(&nfsi->layout.seqlock);
> - INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
> - INIT_LIST_HEAD(&nfsi->layout.segs);
> - nfsi->layout.refcount = 0;
> - nfsi->layout.ld_data = NULL;
> - nfsi->layout.pnfs_layout_state = 0;
> - memset(&nfsi->layout.stateid, 0, NFS4_STATEID_SIZE);
> - nfsi->layout.roc_iomode = 0;
> - nfsi->layout.lo_cred = NULL;
> - nfsi->layout.pnfs_write_begin_pos = 0;
> - nfsi->layout.pnfs_write_end_pos = 0;
> + nfsi->layout = NULL;
> #endif /* CONFIG_NFS_V4_1 */
> #endif
> }
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index c5b8a2f..6acebc3 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1086,7 +1086,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
> * flag during grace period
> */
> pnfs_destroy_layout(NFS_I(state->inode));
> - pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
> + pnfs_set_layout_stateid(NFS_I(state->inode)->layout, &zero_stateid);
> #endif /* CONFIG_NFS_V4_1 */
> }
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index bfe679b..6f44cb0 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -597,10 +597,10 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state,
> #ifdef CONFIG_NFS_V4_1
> struct nfs_inode *nfsi = NFS_I(state->inode);
>
> - if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
> + if (has_layout(nfsi) && nfsi->layout->roc_iomode) {
> struct nfs4_pnfs_layout_segment range;
>
> - range.iomode = nfsi->layout.roc_iomode;
> + range.iomode = nfsi->layout->roc_iomode;
> range.offset = 0;
> range.length = NFS4_MAX_UINT64;
> pnfs_return_layout(state->inode, &range, NULL,
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 53c15b1..6d435ed 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -152,10 +152,11 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
> {
> dprintk("%s: has_layout=%d ctx=%p\n", __func__, has_layout(nfsi), ctx);
> spin_lock(&nfsi->vfs_inode.i_lock);
> - if (has_layout(nfsi) && !layoutcommit_needed(nfsi)) {
> - nfsi->layout.lo_cred = get_rpccred(ctx->state->owner->so_cred);
> + if (has_layout(nfsi) && !
> + !test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state)) {

Is the "!!" intended

Why did you open code layoutcommit_needed() ?

> + nfsi->layout->lo_cred = get_rpccred(ctx->state->owner->so_cred);
> __set_bit(NFS_INO_LAYOUTCOMMIT,
> - &nfsi->layout.pnfs_layout_state);
> + &nfsi->layout->pnfs_layout_state);
> nfsi->change_attr++;
> spin_unlock(&nfsi->vfs_inode.i_lock);
> dprintk("%s: Set layoutcommit\n", __func__);
> @@ -175,17 +176,17 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
> loff_t end_pos;
>
> spin_lock(&nfsi->vfs_inode.i_lock);
> - if (offset < nfsi->layout.pnfs_write_begin_pos)
> - nfsi->layout.pnfs_write_begin_pos = offset;
> + if (offset < nfsi->layout->pnfs_write_begin_pos)
> + nfsi->layout->pnfs_write_begin_pos = offset;
> end_pos = offset + extent - 1; /* I'm being inclusive */
> - if (end_pos > nfsi->layout.pnfs_write_end_pos)
> - nfsi->layout.pnfs_write_end_pos = end_pos;
> + if (end_pos > nfsi->layout->pnfs_write_end_pos)
> + nfsi->layout->pnfs_write_end_pos = end_pos;
> dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
> __func__,
> (unsigned long) extent,
> (unsigned long) offset ,
> - (unsigned long) nfsi->layout.pnfs_write_begin_pos,
> - (unsigned long) nfsi->layout.pnfs_write_end_pos);
> + (unsigned long) nfsi->layout->pnfs_write_begin_pos,
> + (unsigned long) nfsi->layout->pnfs_write_end_pos);
> spin_unlock(&nfsi->vfs_inode.i_lock);
> }
>
> @@ -326,10 +327,9 @@ get_layout(struct pnfs_layout_type *lo)
> static inline struct pnfs_layout_type *
> grab_current_layout(struct nfs_inode *nfsi)
> {
> - struct pnfs_layout_type *lo = &nfsi->layout;
> + struct pnfs_layout_type *lo = nfsi->layout;
>
> - BUG_ON_UNLOCKED_LO(lo);
> - if (!lo->ld_data)
> + if (!lo)
> return NULL;
> get_layout(lo);
> return lo;
> @@ -343,13 +343,13 @@ put_layout(struct pnfs_layout_type *lo)
>
> lo->refcount--;
> if (!lo->refcount) {
> - struct layoutdriver_io_operations *io_ops =
> - PNFS_LD_IO_OPS(lo);
> + struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
> + struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
>
> - dprintk("%s: freeing layout %p\n", __func__, lo->ld_data);
> + dprintk("%s: freeing layout cache %p\n", __func__, lo);
> WARN_ON(!list_empty(&lo->lo_layouts));
> - io_ops->free_layout(lo->ld_data);
> - lo->ld_data = NULL;
> + io_ops->free_layout(lo);
> + nfsi->layout = NULL;
> }
> }
>
> @@ -381,14 +381,13 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
> lo = grab_current_layout(nfsi);
> if (lo) {
> pnfs_free_layout(lo, &range);
> - WARN_ON(!list_empty(&nfsi->layout.segs));
> - WARN_ON(!list_empty(&nfsi->layout.lo_layouts));
> - WARN_ON(nfsi->layout.ld_data);
> + WARN_ON(!list_empty(&nfsi->layout->segs));
> + WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
>
> - if (nfsi->layout.refcount != 1)
> + if (nfsi->layout->refcount != 1)
> printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
> - __func__, nfsi->layout.refcount);
> - WARN_ON(nfsi->layout.refcount != 1);
> + __func__, nfsi->layout->refcount);
> + WARN_ON(nfsi->layout->refcount != 1);
>
> put_layout(lo);
> }
> @@ -698,7 +697,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
> bool ret = false;
>
> spin_lock(&nfsi->vfs_inode.i_lock);
> - list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
> + list_for_each_entry(lseg, &nfsi->layout->segs, fi_list) {
> if (!should_free_lseg(lseg, range))
> continue;
> lseg->valid = false;
> @@ -904,29 +903,33 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
> dprintk("%s:Return\n", __func__);
> }
>
> +/*
> + * Each layoutdriver embeds pnfs_layout_type as the first field in it's
> + * per-layout type layout cache structure and returns it ZEROed
> + * from layoutdriver_io_ops->alloc_layout
> + */
> static struct pnfs_layout_type *
> alloc_init_layout(struct inode *ino)
> {
> - void *ld_data;
> struct pnfs_layout_type *lo;
> struct layoutdriver_io_operations *io_ops;
>
> + BUG_ON(NFS_I(ino)->layout != NULL);
> io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
> - lo = &NFS_I(ino)->layout;
> - ld_data = io_ops->alloc_layout(ino);
> - if (!ld_data) {
> + lo = io_ops->alloc_layout(ino);
> + if (!lo) {
> printk(KERN_ERR
> "%s: out of memory: io_ops->alloc_layout failed\n",
> __func__);
> return NULL;
> }
> -
> spin_lock(&ino->i_lock);
> - BUG_ON(lo->ld_data != NULL);
> - lo->ld_data = ld_data;
> - memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
> lo->refcount = 1;
> - lo->roc_iomode = 0;
> + INIT_LIST_HEAD(&lo->lo_layouts);
> + INIT_LIST_HEAD(&lo->segs);
> + seqlock_init(&lo->seqlock);
> + lo->lo_inode = ino;


The spin_lock() could be moved to here, it's the single place
that actually touches the nfsi.

> + NFS_I(ino)->layout = lo;
> spin_unlock(&ino->i_lock);
> return lo;
> }
> @@ -961,7 +964,7 @@ get_lock_alloc_layout(struct inode *ino)
> * wait until bit is cleared if we lost this race.
> */
> res = wait_on_bit_lock(
> - &nfsi->layout.pnfs_layout_state,
> + &nfsi->layout->pnfs_layout_state,
> NFS_INO_LAYOUT_ALLOC,
> pnfs_wait_schedule, TASK_KILLABLE);
> if (res) {
> @@ -972,7 +975,7 @@ get_lock_alloc_layout(struct inode *ino)
> /* Was current_layout already allocated while we slept?
> * If so, retry get_lock'ing it. Otherwise, allocate it.
> */
> - if (nfsi->layout.ld_data) {
> + if (nfsi->layout) {
> spin_lock(&ino->i_lock);
> continue;
> }
> @@ -985,8 +988,8 @@ get_lock_alloc_layout(struct inode *ino)
>
> /* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
> clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
> - &nfsi->layout.pnfs_layout_state);
> - wake_up_bit(&nfsi->layout.pnfs_layout_state,
> + &nfsi->layout->pnfs_layout_state);
> + wake_up_bit(&nfsi->layout->pnfs_layout_state,
> NFS_INO_LAYOUT_ALLOC);
> break;
> }
> @@ -1113,12 +1116,12 @@ _pnfs_update_layout(struct inode *ino,
> }
>
> /* if get layout already failed once goto out */
> - if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
> + if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
> if (unlikely(nfsi->pnfs_layout_suspend &&
> get_seconds() >= nfsi->pnfs_layout_suspend)) {
> dprintk("%s: layout_get resumed\n", __func__);
> clear_bit(lo_fail_bit(iomode),
> - &nfsi->layout.pnfs_layout_state);
> + &nfsi->layout->pnfs_layout_state);
> nfsi->pnfs_layout_suspend = 0;
> } else
> goto out_put;
> @@ -1130,7 +1133,7 @@ _pnfs_update_layout(struct inode *ino,
> send_layoutget(ino, ctx, &arg, lsegpp, lo);
> out:
> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
> - nfsi->layout.pnfs_layout_state, lseg);
> + nfsi->layout->pnfs_layout_state, lseg);
> return;
> out_put:
> if (lsegpp)
> @@ -1235,12 +1238,12 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
> /* remember that get layout failed and suspend trying */
> nfsi->pnfs_layout_suspend = suspend;
> set_bit(lo_fail_bit(lgp->args.lseg.iomode),
> - &nfsi->layout.pnfs_layout_state);
> + &nfsi->layout->pnfs_layout_state);
> dprintk("%s: layout_get suspended until %ld\n",
> __func__, suspend);
> out:
> dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
> - __func__, lgp->status, nfsi->layout.pnfs_layout_state, lseg);
> + __func__, lgp->status, nfsi->layout->pnfs_layout_state, lseg);
> return;
> }
>
> @@ -1321,7 +1324,7 @@ pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
>
> pgio->pg_test = NULL;
>
> - laytype = &NFS_I(inode)->layout;
> + laytype = NFS_I(inode)->layout;
> ld = NFS_SERVER(inode)->pnfs_curr_ld;
> if (!pnfs_enabled_sb(NFS_SERVER(inode)) || !laytype)
> return;
> @@ -1454,7 +1457,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
> numpages);
> wdata->pdata.lseg = lseg;
> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->write_pagelist(
> - &nfsi->layout,
> + nfsi->layout,
> args->pages,
> args->pgbase,
> numpages,
> @@ -1507,7 +1510,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
> __func__, numpages);
> rdata->pdata.lseg = lseg;
> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->read_pagelist(
> - &nfsi->layout,
> + nfsi->layout,
> args->pages,
> args->pgbase,
> numpages,
> @@ -1568,7 +1571,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
> * This will be done by passing the buck to the layout driver.
> */
> data->pdata.lseg = NULL;
> - trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
> + trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(nfsi->layout,
> sync, data);
> dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
> return trypnfs;
> @@ -1639,14 +1642,14 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> /* Clear layoutcommit properties in the inode so
> * new lc info can be generated
> */
> - write_begin_pos = nfsi->layout.pnfs_write_begin_pos;
> - write_end_pos = nfsi->layout.pnfs_write_end_pos;
> - data->cred = nfsi->layout.lo_cred;
> - nfsi->layout.pnfs_write_begin_pos = 0;
> - nfsi->layout.pnfs_write_end_pos = 0;
> - nfsi->layout.lo_cred = NULL;
> - __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
> - pnfs_get_layout_stateid(&data->args.stateid, &nfsi->layout);
> + write_begin_pos = nfsi->layout->pnfs_write_begin_pos;
> + write_end_pos = nfsi->layout->pnfs_write_end_pos;
> + data->cred = nfsi->layout->lo_cred;
> + nfsi->layout->pnfs_write_begin_pos = 0;
> + nfsi->layout->pnfs_write_end_pos = 0;
> + nfsi->layout->lo_cred = NULL;
> + __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
> + pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>
> spin_unlock(&inode->i_lock);
>
> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
> index c5af82f..2c6ce4c 100644
> --- a/include/linux/nfs4_pnfs.h
> +++ b/include/linux/nfs4_pnfs.h
> @@ -35,13 +35,13 @@ struct pnfs_layoutdriver_type {
> static inline struct nfs_inode *
> PNFS_NFS_INODE(struct pnfs_layout_type *lo)
> {
> - return container_of(lo, struct nfs_inode, layout);
> + return NFS_I(lo->lo_inode);
> }
>
> static inline struct inode *
> PNFS_INODE(struct pnfs_layout_type *lo)
> {
> - return &PNFS_NFS_INODE(lo)->vfs_inode;
> + return lo->lo_inode;
> }
>
> static inline struct nfs_server *
> @@ -50,12 +50,6 @@ PNFS_NFS_SERVER(struct pnfs_layout_type *lo)
> return NFS_SERVER(PNFS_INODE(lo));
> }
>
> -static inline void *
> -PNFS_LD_DATA(struct pnfs_layout_type *lo)
> -{
> - return lo->ld_data;
> -}
> -
> static inline struct pnfs_layoutdriver_type *
> PNFS_LD(struct pnfs_layout_type *lo)
> {
> @@ -77,13 +71,14 @@ PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
> static inline bool
> has_layout(struct nfs_inode *nfsi)
> {
> - return nfsi->layout.ld_data != NULL;
> + return nfsi->layout != NULL;

does this mean the same thing now?

You could perhaps optimize with a bit flag that is set on
first segment insert, and reset on last segment removal.
Or some other/better scheme.

> }
>
> static inline bool
> layoutcommit_needed(struct nfs_inode *nfsi)
> {
> - return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
> + return has_layout(nfsi) &&
> + test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);

What is the purpose of patch[1/1] ?

you are unlocked and asking about a layout pointer. Which according to you might
go away mid-flight?

> }
>
> #else /* CONFIG_NFS_V4_1 */
> @@ -149,8 +144,8 @@ struct layoutdriver_io_operations {
> /* Layout information. For each inode, alloc_layout is executed once to retrieve an
> * inode specific layout structure. Each subsequent layoutget operation results in
> * a set_layout call to set the opaque layout in the layout driver.*/
> - void * (*alloc_layout) (struct inode *inode);
> - void (*free_layout) (void *layoutid);
> + struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
> + void (*free_layout) (struct pnfs_layout_type *);
> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_type *layoutid, struct nfs4_pnfs_layoutget_res *lgr);
> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index b612746..7b999a4 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -104,7 +104,6 @@ struct pnfs_layout_type {
> int roc_iomode; /* iomode to return on close, 0=none */
> seqlock_t seqlock; /* Protects the stateid */
> nfs4_stateid stateid;
> - void *ld_data; /* layout driver private data */
> unsigned long pnfs_layout_state;
> #define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
> #define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
> @@ -116,6 +115,7 @@ struct pnfs_layout_type {
> */
> loff_t pnfs_write_begin_pos;
> loff_t pnfs_write_end_pos;
> + struct inode *lo_inode;
> };
>
> /*
> @@ -206,7 +206,7 @@ struct nfs_inode {
> /* pNFS layout information */
> #if defined(CONFIG_NFS_V4_1)
> wait_queue_head_t lo_waitq;
> - struct pnfs_layout_type layout;
> + struct pnfs_layout_type *layout;
> time_t pnfs_layout_suspend;
> #endif /* CONFIG_NFS_V4_1 */
> #endif /* CONFIG_NFS_V4*/


2010-07-12 16:32:48

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 04/16] SQUASHME pnfs-submit: filelayout: use new alloc/free_layout API

On 07/08/2010 01:34 AM, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/nfs4filelayout.c | 15 +++++++++------
> fs/nfs/nfs4filelayout.h | 7 +++++++
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index 57a0010..b49ccf4 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -301,20 +301,23 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
> * will use the pnfs_layout_type type to refer to the layout for this
> * inode from now on.
> */
> -static void *
> +static struct pnfs_layout_type *
> filelayout_alloc_layout(struct inode *inode)
> {
> + struct nfs4_filelayout *flp;
> +
> dprintk("NFS_FILELAYOUT: allocating layout\n");
> - return kzalloc(sizeof(struct nfs4_filelayout), GFP_KERNEL);
> + flp = kzalloc(sizeof(struct nfs4_filelayout), GFP_KERNEL);
> + return flp ? &flp->fl_layout : NULL;
> }
>
> /* Free a filelayout layout structure
> */
> static void
> -filelayout_free_layout(void *layoutid)
> +filelayout_free_layout(struct pnfs_layout_type *lo)
> {
> dprintk("NFS_FILELAYOUT: freeing layout\n");
> - kfree(layoutid);
> + kfree(lo);

+ kfree(FILE_LO(lo));
> }
>
> /*
> @@ -471,7 +474,7 @@ static struct pnfs_layout_segment *
> filelayout_alloc_lseg(struct pnfs_layout_type *layoutid,
> struct nfs4_pnfs_layoutget_res *lgr)
> {
> - struct nfs4_filelayout *flo = PNFS_LD_DATA(layoutid);
> + struct nfs4_filelayout *flo = FILE_LO(layoutid);
> struct pnfs_layout_segment *lseg;
> int rc;
>
> @@ -693,7 +696,7 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
> ssize_t
> filelayout_get_stripesize(struct pnfs_layout_type *layoutid)
> {
> - struct nfs4_filelayout *flo = PNFS_LD_DATA(layoutid);
> + struct nfs4_filelayout *flo = FILE_LO(layoutid);
>
> return flo->stripe_unit;
> }
> diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
> index de8391f..fd25e76 100644
> --- a/fs/nfs/nfs4filelayout.h
> +++ b/fs/nfs/nfs4filelayout.h
> @@ -68,6 +68,7 @@ struct nfs4_filelayout_segment {
> };
>
> struct nfs4_filelayout {
> + struct pnfs_layout_type fl_layout;
> int uncommitted_write;
> loff_t last_commit_size;
> u64 layout_id;
> @@ -77,6 +78,12 @@ struct nfs4_filelayout {
> extern struct nfs_fh *
> nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset);
>
> +static inline struct nfs4_filelayout *
> +FILE_LO(struct pnfs_layout_type *lo)
> +{
> + return container_of(lo, struct nfs4_filelayout, fl_layout);
> +}
> +
> extern struct pnfs_client_operations *pnfs_callback_ops;
>
> extern void nfs4_fl_free_deviceid_callback(struct kref *);

Boaz

2010-07-12 17:05:32

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

On Jul. 12, 2010, 19:29 +0300, Boaz Harrosh <[email protected]> wrote:
> On 07/08/2010 01:34 AM, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Change nfs_inode layout field to a pointer
>>
>> Each layoutdriver embeds the pnfs_layout_type in it's private layout cache
>> head structure, and allocates them both with the alloc_layout
>> layoutdriver_io_operation.
>>
>> NOTE API change: the layoutdriver_io_operation alloc_layout and free_layout now
>> use pnfs_layout_type pointers instead of void*.
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> fs/nfs/callback_proc.c | 2 +-
>> fs/nfs/inode.c | 12 +-----
>> fs/nfs/nfs4proc.c | 2 +-
>> fs/nfs/nfs4state.c | 4 +-
>> fs/nfs/pnfs.c | 111 +++++++++++++++++++++++----------------------
>> include/linux/nfs4_pnfs.h | 19 +++-----
>> include/linux/nfs_fs.h | 4 +-
>> 7 files changed, 71 insertions(+), 83 deletions(-)
>>
>> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
>> index 4766695..d999ea8 100644
>> --- a/fs/nfs/callback_proc.c
>> +++ b/fs/nfs/callback_proc.c
>> @@ -233,7 +233,7 @@ static int pnfs_recall_layout(void *data)
>> rl.cbl_seg.length = NFS4_MAX_UINT64;
>>
>> if (rl.cbl_recall_type == RETURN_FILE) {
>> - if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout,
>> + if (pnfs_is_next_layout_stateid(NFS_I(inode)->layout,
>> rl.cbl_stateid))
>> status = pnfs_return_layout(inode, &rl.cbl_seg,
>> &rl.cbl_stateid, RETURN_FILE,
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index 3d51602..30252f4 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -1397,17 +1397,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
>> #ifdef CONFIG_NFS_V4_1
>> init_waitqueue_head(&nfsi->lo_waitq);
>> nfsi->pnfs_layout_suspend = 0;
>> - seqlock_init(&nfsi->layout.seqlock);
>> - INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
>> - INIT_LIST_HEAD(&nfsi->layout.segs);
>> - nfsi->layout.refcount = 0;
>> - nfsi->layout.ld_data = NULL;
>> - nfsi->layout.pnfs_layout_state = 0;
>> - memset(&nfsi->layout.stateid, 0, NFS4_STATEID_SIZE);
>> - nfsi->layout.roc_iomode = 0;
>> - nfsi->layout.lo_cred = NULL;
>> - nfsi->layout.pnfs_write_begin_pos = 0;
>> - nfsi->layout.pnfs_write_end_pos = 0;
>> + nfsi->layout = NULL;
>> #endif /* CONFIG_NFS_V4_1 */
>> #endif
>> }
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index c5b8a2f..6acebc3 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -1086,7 +1086,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
>> * flag during grace period
>> */
>> pnfs_destroy_layout(NFS_I(state->inode));
>> - pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
>> + pnfs_set_layout_stateid(NFS_I(state->inode)->layout, &zero_stateid);
>> #endif /* CONFIG_NFS_V4_1 */
>> }
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index bfe679b..6f44cb0 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -597,10 +597,10 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state,
>> #ifdef CONFIG_NFS_V4_1
>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>
>> - if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>> + if (has_layout(nfsi) && nfsi->layout->roc_iomode) {
>> struct nfs4_pnfs_layout_segment range;
>>
>> - range.iomode = nfsi->layout.roc_iomode;
>> + range.iomode = nfsi->layout->roc_iomode;
>> range.offset = 0;
>> range.length = NFS4_MAX_UINT64;
>> pnfs_return_layout(state->inode, &range, NULL,
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 53c15b1..6d435ed 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -152,10 +152,11 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
>> {
>> dprintk("%s: has_layout=%d ctx=%p\n", __func__, has_layout(nfsi), ctx);
>> spin_lock(&nfsi->vfs_inode.i_lock);
>> - if (has_layout(nfsi) && !layoutcommit_needed(nfsi)) {
>> - nfsi->layout.lo_cred = get_rpccred(ctx->state->owner->so_cred);
>> + if (has_layout(nfsi) && !
>> + !test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state)) {
>
> Is the "!!" intended
>
> Why did you open code layoutcommit_needed() ?
>
>> + nfsi->layout->lo_cred = get_rpccred(ctx->state->owner->so_cred);
>> __set_bit(NFS_INO_LAYOUTCOMMIT,
>> - &nfsi->layout.pnfs_layout_state);
>> + &nfsi->layout->pnfs_layout_state);
>> nfsi->change_attr++;
>> spin_unlock(&nfsi->vfs_inode.i_lock);
>> dprintk("%s: Set layoutcommit\n", __func__);
>> @@ -175,17 +176,17 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
>> loff_t end_pos;
>>
>> spin_lock(&nfsi->vfs_inode.i_lock);
>> - if (offset < nfsi->layout.pnfs_write_begin_pos)
>> - nfsi->layout.pnfs_write_begin_pos = offset;
>> + if (offset < nfsi->layout->pnfs_write_begin_pos)
>> + nfsi->layout->pnfs_write_begin_pos = offset;
>> end_pos = offset + extent - 1; /* I'm being inclusive */
>> - if (end_pos > nfsi->layout.pnfs_write_end_pos)
>> - nfsi->layout.pnfs_write_end_pos = end_pos;
>> + if (end_pos > nfsi->layout->pnfs_write_end_pos)
>> + nfsi->layout->pnfs_write_end_pos = end_pos;
>> dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
>> __func__,
>> (unsigned long) extent,
>> (unsigned long) offset ,
>> - (unsigned long) nfsi->layout.pnfs_write_begin_pos,
>> - (unsigned long) nfsi->layout.pnfs_write_end_pos);
>> + (unsigned long) nfsi->layout->pnfs_write_begin_pos,
>> + (unsigned long) nfsi->layout->pnfs_write_end_pos);
>> spin_unlock(&nfsi->vfs_inode.i_lock);
>> }
>>
>> @@ -326,10 +327,9 @@ get_layout(struct pnfs_layout_type *lo)
>> static inline struct pnfs_layout_type *
>> grab_current_layout(struct nfs_inode *nfsi)
>> {
>> - struct pnfs_layout_type *lo = &nfsi->layout;
>> + struct pnfs_layout_type *lo = nfsi->layout;
>>
>> - BUG_ON_UNLOCKED_LO(lo);
>> - if (!lo->ld_data)
>> + if (!lo)
>> return NULL;
>> get_layout(lo);
>> return lo;
>> @@ -343,13 +343,13 @@ put_layout(struct pnfs_layout_type *lo)
>>
>> lo->refcount--;
>> if (!lo->refcount) {
>> - struct layoutdriver_io_operations *io_ops =
>> - PNFS_LD_IO_OPS(lo);
>> + struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
>> + struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
>>
>> - dprintk("%s: freeing layout %p\n", __func__, lo->ld_data);
>> + dprintk("%s: freeing layout cache %p\n", __func__, lo);
>> WARN_ON(!list_empty(&lo->lo_layouts));
>> - io_ops->free_layout(lo->ld_data);
>> - lo->ld_data = NULL;
>> + io_ops->free_layout(lo);
>> + nfsi->layout = NULL;
>> }
>> }
>>
>> @@ -381,14 +381,13 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
>> lo = grab_current_layout(nfsi);
>> if (lo) {
>> pnfs_free_layout(lo, &range);
>> - WARN_ON(!list_empty(&nfsi->layout.segs));
>> - WARN_ON(!list_empty(&nfsi->layout.lo_layouts));
>> - WARN_ON(nfsi->layout.ld_data);
>> + WARN_ON(!list_empty(&nfsi->layout->segs));
>> + WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
>>
>> - if (nfsi->layout.refcount != 1)
>> + if (nfsi->layout->refcount != 1)
>> printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
>> - __func__, nfsi->layout.refcount);
>> - WARN_ON(nfsi->layout.refcount != 1);
>> + __func__, nfsi->layout->refcount);
>> + WARN_ON(nfsi->layout->refcount != 1);
>>
>> put_layout(lo);
>> }
>> @@ -698,7 +697,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
>> bool ret = false;
>>
>> spin_lock(&nfsi->vfs_inode.i_lock);
>> - list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
>> + list_for_each_entry(lseg, &nfsi->layout->segs, fi_list) {
>> if (!should_free_lseg(lseg, range))
>> continue;
>> lseg->valid = false;
>> @@ -904,29 +903,33 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
>> dprintk("%s:Return\n", __func__);
>> }
>>
>> +/*
>> + * Each layoutdriver embeds pnfs_layout_type as the first field in it's
>> + * per-layout type layout cache structure and returns it ZEROed
>> + * from layoutdriver_io_ops->alloc_layout
>> + */
>> static struct pnfs_layout_type *
>> alloc_init_layout(struct inode *ino)
>> {
>> - void *ld_data;
>> struct pnfs_layout_type *lo;
>> struct layoutdriver_io_operations *io_ops;
>>
>> + BUG_ON(NFS_I(ino)->layout != NULL);
>> io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
>> - lo = &NFS_I(ino)->layout;
>> - ld_data = io_ops->alloc_layout(ino);
>> - if (!ld_data) {
>> + lo = io_ops->alloc_layout(ino);
>> + if (!lo) {
>> printk(KERN_ERR
>> "%s: out of memory: io_ops->alloc_layout failed\n",
>> __func__);
>> return NULL;
>> }
>> -
>> spin_lock(&ino->i_lock);
>> - BUG_ON(lo->ld_data != NULL);
>> - lo->ld_data = ld_data;
>> - memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
>> lo->refcount = 1;
>> - lo->roc_iomode = 0;
>> + INIT_LIST_HEAD(&lo->lo_layouts);
>> + INIT_LIST_HEAD(&lo->segs);
>> + seqlock_init(&lo->seqlock);
>> + lo->lo_inode = ino;
>
>
> The spin_lock() could be moved to here, it's the single place
> that actually touches the nfsi.
>
>> + NFS_I(ino)->layout = lo;
>> spin_unlock(&ino->i_lock);
>> return lo;
>> }
>> @@ -961,7 +964,7 @@ get_lock_alloc_layout(struct inode *ino)
>> * wait until bit is cleared if we lost this race.
>> */
>> res = wait_on_bit_lock(
>> - &nfsi->layout.pnfs_layout_state,
>> + &nfsi->layout->pnfs_layout_state,
>> NFS_INO_LAYOUT_ALLOC,
>> pnfs_wait_schedule, TASK_KILLABLE);
>> if (res) {
>> @@ -972,7 +975,7 @@ get_lock_alloc_layout(struct inode *ino)
>> /* Was current_layout already allocated while we slept?
>> * If so, retry get_lock'ing it. Otherwise, allocate it.
>> */
>> - if (nfsi->layout.ld_data) {
>> + if (nfsi->layout) {
>> spin_lock(&ino->i_lock);
>> continue;
>> }
>> @@ -985,8 +988,8 @@ get_lock_alloc_layout(struct inode *ino)
>>
>> /* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
>> clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
>> - &nfsi->layout.pnfs_layout_state);
>> - wake_up_bit(&nfsi->layout.pnfs_layout_state,
>> + &nfsi->layout->pnfs_layout_state);
>> + wake_up_bit(&nfsi->layout->pnfs_layout_state,
>> NFS_INO_LAYOUT_ALLOC);
>> break;
>> }
>> @@ -1113,12 +1116,12 @@ _pnfs_update_layout(struct inode *ino,
>> }
>>
>> /* if get layout already failed once goto out */
>> - if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
>> + if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
>> if (unlikely(nfsi->pnfs_layout_suspend &&
>> get_seconds() >= nfsi->pnfs_layout_suspend)) {
>> dprintk("%s: layout_get resumed\n", __func__);
>> clear_bit(lo_fail_bit(iomode),
>> - &nfsi->layout.pnfs_layout_state);
>> + &nfsi->layout->pnfs_layout_state);
>> nfsi->pnfs_layout_suspend = 0;
>> } else
>> goto out_put;
>> @@ -1130,7 +1133,7 @@ _pnfs_update_layout(struct inode *ino,
>> send_layoutget(ino, ctx, &arg, lsegpp, lo);
>> out:
>> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
>> - nfsi->layout.pnfs_layout_state, lseg);
>> + nfsi->layout->pnfs_layout_state, lseg);
>> return;
>> out_put:
>> if (lsegpp)
>> @@ -1235,12 +1238,12 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
>> /* remember that get layout failed and suspend trying */
>> nfsi->pnfs_layout_suspend = suspend;
>> set_bit(lo_fail_bit(lgp->args.lseg.iomode),
>> - &nfsi->layout.pnfs_layout_state);
>> + &nfsi->layout->pnfs_layout_state);
>> dprintk("%s: layout_get suspended until %ld\n",
>> __func__, suspend);
>> out:
>> dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
>> - __func__, lgp->status, nfsi->layout.pnfs_layout_state, lseg);
>> + __func__, lgp->status, nfsi->layout->pnfs_layout_state, lseg);
>> return;
>> }
>>
>> @@ -1321,7 +1324,7 @@ pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
>>
>> pgio->pg_test = NULL;
>>
>> - laytype = &NFS_I(inode)->layout;
>> + laytype = NFS_I(inode)->layout;
>> ld = NFS_SERVER(inode)->pnfs_curr_ld;
>> if (!pnfs_enabled_sb(NFS_SERVER(inode)) || !laytype)
>> return;
>> @@ -1454,7 +1457,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
>> numpages);
>> wdata->pdata.lseg = lseg;
>> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->write_pagelist(
>> - &nfsi->layout,
>> + nfsi->layout,
>> args->pages,
>> args->pgbase,
>> numpages,
>> @@ -1507,7 +1510,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>> __func__, numpages);
>> rdata->pdata.lseg = lseg;
>> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->read_pagelist(
>> - &nfsi->layout,
>> + nfsi->layout,
>> args->pages,
>> args->pgbase,
>> numpages,
>> @@ -1568,7 +1571,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
>> * This will be done by passing the buck to the layout driver.
>> */
>> data->pdata.lseg = NULL;
>> - trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
>> + trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(nfsi->layout,
>> sync, data);
>> dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>> return trypnfs;
>> @@ -1639,14 +1642,14 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>> /* Clear layoutcommit properties in the inode so
>> * new lc info can be generated
>> */
>> - write_begin_pos = nfsi->layout.pnfs_write_begin_pos;
>> - write_end_pos = nfsi->layout.pnfs_write_end_pos;
>> - data->cred = nfsi->layout.lo_cred;
>> - nfsi->layout.pnfs_write_begin_pos = 0;
>> - nfsi->layout.pnfs_write_end_pos = 0;
>> - nfsi->layout.lo_cred = NULL;
>> - __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>> - pnfs_get_layout_stateid(&data->args.stateid, &nfsi->layout);
>> + write_begin_pos = nfsi->layout->pnfs_write_begin_pos;
>> + write_end_pos = nfsi->layout->pnfs_write_end_pos;
>> + data->cred = nfsi->layout->lo_cred;
>> + nfsi->layout->pnfs_write_begin_pos = 0;
>> + nfsi->layout->pnfs_write_end_pos = 0;
>> + nfsi->layout->lo_cred = NULL;
>> + __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>> + pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>
>> spin_unlock(&inode->i_lock);
>>
>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>> index c5af82f..2c6ce4c 100644
>> --- a/include/linux/nfs4_pnfs.h
>> +++ b/include/linux/nfs4_pnfs.h
>> @@ -35,13 +35,13 @@ struct pnfs_layoutdriver_type {
>> static inline struct nfs_inode *
>> PNFS_NFS_INODE(struct pnfs_layout_type *lo)
>> {
>> - return container_of(lo, struct nfs_inode, layout);
>> + return NFS_I(lo->lo_inode);
>> }
>>
>> static inline struct inode *
>> PNFS_INODE(struct pnfs_layout_type *lo)
>> {
>> - return &PNFS_NFS_INODE(lo)->vfs_inode;
>> + return lo->lo_inode;
>> }
>>
>> static inline struct nfs_server *
>> @@ -50,12 +50,6 @@ PNFS_NFS_SERVER(struct pnfs_layout_type *lo)
>> return NFS_SERVER(PNFS_INODE(lo));
>> }
>>
>> -static inline void *
>> -PNFS_LD_DATA(struct pnfs_layout_type *lo)
>> -{
>> - return lo->ld_data;
>> -}
>> -
>> static inline struct pnfs_layoutdriver_type *
>> PNFS_LD(struct pnfs_layout_type *lo)
>> {
>> @@ -77,13 +71,14 @@ PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
>> static inline bool
>> has_layout(struct nfs_inode *nfsi)
>> {
>> - return nfsi->layout.ld_data != NULL;
>> + return nfsi->layout != NULL;
>
> does this mean the same thing now?
>
> You could perhaps optimize with a bit flag that is set on
> first segment insert, and reset on last segment removal.
> Or some other/better scheme.

We may want to have has_layout() as well as has_layout_segs()
which will check that the lsegs list is not empty.

Benny

>
>> }
>>
>> static inline bool
>> layoutcommit_needed(struct nfs_inode *nfsi)
>> {
>> - return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>> + return has_layout(nfsi) &&
>> + test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>
> What is the purpose of patch[1/1] ?
>
> you are unlocked and asking about a layout pointer. Which according to you might
> go away mid-flight?
>
>> }
>>
>> #else /* CONFIG_NFS_V4_1 */
>> @@ -149,8 +144,8 @@ struct layoutdriver_io_operations {
>> /* Layout information. For each inode, alloc_layout is executed once to retrieve an
>> * inode specific layout structure. Each subsequent layoutget operation results in
>> * a set_layout call to set the opaque layout in the layout driver.*/
>> - void * (*alloc_layout) (struct inode *inode);
>> - void (*free_layout) (void *layoutid);
>> + struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
>> + void (*free_layout) (struct pnfs_layout_type *);
>> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_type *layoutid, struct nfs4_pnfs_layoutget_res *lgr);
>> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>>
>> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
>> index b612746..7b999a4 100644
>> --- a/include/linux/nfs_fs.h
>> +++ b/include/linux/nfs_fs.h
>> @@ -104,7 +104,6 @@ struct pnfs_layout_type {
>> int roc_iomode; /* iomode to return on close, 0=none */
>> seqlock_t seqlock; /* Protects the stateid */
>> nfs4_stateid stateid;
>> - void *ld_data; /* layout driver private data */
>> unsigned long pnfs_layout_state;
>> #define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
>> #define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
>> @@ -116,6 +115,7 @@ struct pnfs_layout_type {
>> */
>> loff_t pnfs_write_begin_pos;
>> loff_t pnfs_write_end_pos;
>> + struct inode *lo_inode;
>> };
>>
>> /*
>> @@ -206,7 +206,7 @@ struct nfs_inode {
>> /* pNFS layout information */
>> #if defined(CONFIG_NFS_V4_1)
>> wait_queue_head_t lo_waitq;
>> - struct pnfs_layout_type layout;
>> + struct pnfs_layout_type *layout;
>> time_t pnfs_layout_suspend;
>> #endif /* CONFIG_NFS_V4_1 */
>> #endif /* CONFIG_NFS_V4*/
>

2010-07-12 17:25:20

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On 07/08/2010 01:34 AM, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 2 ++
> fs/nfs/pnfs.c | 13 +++++++++++++
> fs/nfs/pnfs.h | 1 +
> 3 files changed, 16 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6acebc3..f763746 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
> struct pnfs_layoutcommit_data *data =
> (struct pnfs_layoutcommit_data *)lcdata;
>
> + /* Matched by get_layout in pnfs_layoutcommit_inode */
> + put_layout(data->args.inode);
> put_rpccred(data->cred);
> pnfs_layoutcommit_free(lcdata);
> }
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index aa16e5d..d42c5da 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
> }
>
> void
> +put_layout(struct inode *inode)
> +{
> + spin_lock(&inode->i_lock);
> + put_layout_locked(NFS_I(inode)->layout);
> + spin_unlock(&inode->i_lock);
> +
> +}
> +
> +void
> pnfs_layout_release(struct pnfs_layout_type *lo,
> struct nfs4_pnfs_layout_segment *range)
> {
> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>
> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
> + get_layout(NFS_I(inode)->layout);
> +

Has of your rules this should be now called get_layout_locked

> spin_unlock(&inode->i_lock);
>
> /* Set up layout commit args */
> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> if (status) {
> /* The layout driver failed to setup the layoutcommit */
> put_rpccred(data->cred);
> + put_layout(inode);

And it is really nice that put_layout takes an inode and get_layout takes an
struct layout_type.

> goto out_free;
> }
> status = pnfs4_proc_layoutcommit(data, sync);
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 9b0fed4..e04b9d4 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
> const nfs4_stateid *stateid);
> void pnfs_destroy_layout(struct nfs_inode *);
> +void put_layout(struct inode *inode);
>
> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
> (srv)->pnfs_curr_ld->ld_io_ops && \

I still think they can all go away.
What is the point of a layout without it's nfsi+inode. And who cares if we free the layout_type structure
2 milliseconds before, and in the error case?

Boaz

2010-07-12 17:26:27

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 2 ++
> fs/nfs/pnfs.c | 13 +++++++++++++
> fs/nfs/pnfs.h | 1 +
> 3 files changed, 16 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6acebc3..f763746 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
> struct pnfs_layoutcommit_data *data =
> (struct pnfs_layoutcommit_data *)lcdata;
>
> + /* Matched by get_layout in pnfs_layoutcommit_inode */
> + put_layout(data->args.inode);
> put_rpccred(data->cred);
> pnfs_layoutcommit_free(lcdata);
> }
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index aa16e5d..d42c5da 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
> }
>
> void
> +put_layout(struct inode *inode)
> +{
> + spin_lock(&inode->i_lock);
> + put_layout_locked(NFS_I(inode)->layout);
> + spin_unlock(&inode->i_lock);
> +
> +}
> +
> +void
> pnfs_layout_release(struct pnfs_layout_type *lo,
> struct nfs4_pnfs_layout_segment *range)
> {
> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>
> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
> + get_layout(NFS_I(inode)->layout);
> +

So from the 30000 foot level, I need to remind myself what do
we need the refcount on the layout hdr (pnfs_layout_type) for?

Can we really use it detached from the inode? NO
Is it only for debugging to make catch the case that the inode
is released while there are references to the layout?

Benny

> spin_unlock(&inode->i_lock);
>
> /* Set up layout commit args */
> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> if (status) {
> /* The layout driver failed to setup the layoutcommit */
> put_rpccred(data->cred);
> + put_layout(inode);
> goto out_free;
> }
> status = pnfs4_proc_layoutcommit(data, sync);
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 9b0fed4..e04b9d4 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
> const nfs4_stateid *stateid);
> void pnfs_destroy_layout(struct nfs_inode *);
> +void put_layout(struct inode *inode);
>
> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
> (srv)->pnfs_curr_ld->ld_io_ops && \


2010-07-12 17:34:28

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

On Mon, Jul 12, 2010 at 12:29 PM, Boaz Harrosh <[email protected]> wrote:
> On 07/08/2010 01:34 AM, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Change nfs_inode layout field to a pointer
>>
>> Each layoutdriver embeds the pnfs_layout_type in it's private layout cache
>> head structure, and allocates them both with the alloc_layout
>> layoutdriver_io_operation.
>>
>> NOTE API change: the layoutdriver_io_operation alloc_layout and free_layout now
>> use pnfs_layout_type pointers instead of void*.
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> fs/nfs/callback_proc.c | 2 +-
>> fs/nfs/inode.c | 12 +-----
>> fs/nfs/nfs4proc.c | 2 +-
>> fs/nfs/nfs4state.c | 4 +-
>> fs/nfs/pnfs.c | 111 +++++++++++++++++++++++----------------------
>> include/linux/nfs4_pnfs.h | 19 +++-----
>> include/linux/nfs_fs.h | 4 +-
>> 7 files changed, 71 insertions(+), 83 deletions(-)
>>
>> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
>> index 4766695..d999ea8 100644
>> --- a/fs/nfs/callback_proc.c
>> +++ b/fs/nfs/callback_proc.c
>> @@ -233,7 +233,7 @@ static int pnfs_recall_layout(void *data)
>> rl.cbl_seg.length = NFS4_MAX_UINT64;
>>
>> if (rl.cbl_recall_type == RETURN_FILE) {
>> - if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout,
>> + if (pnfs_is_next_layout_stateid(NFS_I(inode)->layout,
>> rl.cbl_stateid))
>> status = pnfs_return_layout(inode, &rl.cbl_seg,
>> &rl.cbl_stateid, RETURN_FILE,
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index 3d51602..30252f4 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -1397,17 +1397,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
>> #ifdef CONFIG_NFS_V4_1
>> init_waitqueue_head(&nfsi->lo_waitq);
>> nfsi->pnfs_layout_suspend = 0;
>> - seqlock_init(&nfsi->layout.seqlock);
>> - INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
>> - INIT_LIST_HEAD(&nfsi->layout.segs);
>> - nfsi->layout.refcount = 0;
>> - nfsi->layout.ld_data = NULL;
>> - nfsi->layout.pnfs_layout_state = 0;
>> - memset(&nfsi->layout.stateid, 0, NFS4_STATEID_SIZE);
>> - nfsi->layout.roc_iomode = 0;
>> - nfsi->layout.lo_cred = NULL;
>> - nfsi->layout.pnfs_write_begin_pos = 0;
>> - nfsi->layout.pnfs_write_end_pos = 0;
>> + nfsi->layout = NULL;
>> #endif /* CONFIG_NFS_V4_1 */
>> #endif
>> }
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index c5b8a2f..6acebc3 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -1086,7 +1086,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
>> * flag during grace period
>> */
>> pnfs_destroy_layout(NFS_I(state->inode));
>> - pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
>> + pnfs_set_layout_stateid(NFS_I(state->inode)->layout, &zero_stateid);
>> #endif /* CONFIG_NFS_V4_1 */
>> }
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index bfe679b..6f44cb0 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -597,10 +597,10 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state,
>> #ifdef CONFIG_NFS_V4_1
>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>
>> - if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>> + if (has_layout(nfsi) && nfsi->layout->roc_iomode) {
>> struct nfs4_pnfs_layout_segment range;
>>
>> - range.iomode = nfsi->layout.roc_iomode;
>> + range.iomode = nfsi->layout->roc_iomode;
>> range.offset = 0;
>> range.length = NFS4_MAX_UINT64;
>> pnfs_return_layout(state->inode, &range, NULL,
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 53c15b1..6d435ed 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -152,10 +152,11 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
>> {
>> dprintk("%s: has_layout=%d ctx=%p\n", __func__, has_layout(nfsi), ctx);
>> spin_lock(&nfsi->vfs_inode.i_lock);
>> - if (has_layout(nfsi) && !layoutcommit_needed(nfsi)) {
>> - nfsi->layout.lo_cred = get_rpccred(ctx->state->owner->so_cred);
>> + if (has_layout(nfsi) && !
>> + !test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state)) {
>
> Is the "!!" intended

No! :)

>
> Why did you open code layoutcommit_needed() ?

Because I added a has_layout test to the layoutcommit_needed, and so
can't use !layoutcommit_needed.



>
>> + nfsi->layout->lo_cred = get_rpccred(ctx->state->owner->so_cred);
>> __set_bit(NFS_INO_LAYOUTCOMMIT,
>> - &nfsi->layout.pnfs_layout_state);
>> + &nfsi->layout->pnfs_layout_state);
>> nfsi->change_attr++;
>> spin_unlock(&nfsi->vfs_inode.i_lock);
>> dprintk("%s: Set layoutcommit\n", __func__);
>> @@ -175,17 +176,17 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
>> loff_t end_pos;
>>
>> spin_lock(&nfsi->vfs_inode.i_lock);
>> - if (offset < nfsi->layout.pnfs_write_begin_pos)
>> - nfsi->layout.pnfs_write_begin_pos = offset;
>> + if (offset < nfsi->layout->pnfs_write_begin_pos)
>> + nfsi->layout->pnfs_write_begin_pos = offset;
>> end_pos = offset + extent - 1; /* I'm being inclusive */
>> - if (end_pos > nfsi->layout.pnfs_write_end_pos)
>> - nfsi->layout.pnfs_write_end_pos = end_pos;
>> + if (end_pos > nfsi->layout->pnfs_write_end_pos)
>> + nfsi->layout->pnfs_write_end_pos = end_pos;
>> dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
>> __func__,
>> (unsigned long) extent,
>> (unsigned long) offset ,
>> - (unsigned long) nfsi->layout.pnfs_write_begin_pos,
>> - (unsigned long) nfsi->layout.pnfs_write_end_pos);
>> + (unsigned long) nfsi->layout->pnfs_write_begin_pos,
>> + (unsigned long) nfsi->layout->pnfs_write_end_pos);
>> spin_unlock(&nfsi->vfs_inode.i_lock);
>> }
>>
>> @@ -326,10 +327,9 @@ get_layout(struct pnfs_layout_type *lo)
>> static inline struct pnfs_layout_type *
>> grab_current_layout(struct nfs_inode *nfsi)
>> {
>> - struct pnfs_layout_type *lo = &nfsi->layout;
>> + struct pnfs_layout_type *lo = nfsi->layout;
>>
>> - BUG_ON_UNLOCKED_LO(lo);
>> - if (!lo->ld_data)
>> + if (!lo)
>> return NULL;
>> get_layout(lo);
>> return lo;
>> @@ -343,13 +343,13 @@ put_layout(struct pnfs_layout_type *lo)
>>
>> lo->refcount--;
>> if (!lo->refcount) {
>> - struct layoutdriver_io_operations *io_ops =
>> - PNFS_LD_IO_OPS(lo);
>> + struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
>> + struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
>>
>> - dprintk("%s: freeing layout %p\n", __func__, lo->ld_data);
>> + dprintk("%s: freeing layout cache %p\n", __func__, lo);
>> WARN_ON(!list_empty(&lo->lo_layouts));
>> - io_ops->free_layout(lo->ld_data);
>> - lo->ld_data = NULL;
>> + io_ops->free_layout(lo);
>> + nfsi->layout = NULL;
>> }
>> }
>>
>> @@ -381,14 +381,13 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
>> lo = grab_current_layout(nfsi);
>> if (lo) {
>> pnfs_free_layout(lo, &range);
>> - WARN_ON(!list_empty(&nfsi->layout.segs));
>> - WARN_ON(!list_empty(&nfsi->layout.lo_layouts));
>> - WARN_ON(nfsi->layout.ld_data);
>> + WARN_ON(!list_empty(&nfsi->layout->segs));
>> + WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
>>
>> - if (nfsi->layout.refcount != 1)
>> + if (nfsi->layout->refcount != 1)
>> printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
>> - __func__, nfsi->layout.refcount);
>> - WARN_ON(nfsi->layout.refcount != 1);
>> + __func__, nfsi->layout->refcount);
>> + WARN_ON(nfsi->layout->refcount != 1);
>>
>> put_layout(lo);
>> }
>> @@ -698,7 +697,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
>> bool ret = false;
>>
>> spin_lock(&nfsi->vfs_inode.i_lock);
>> - list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
>> + list_for_each_entry(lseg, &nfsi->layout->segs, fi_list) {
>> if (!should_free_lseg(lseg, range))
>> continue;
>> lseg->valid = false;
>> @@ -904,29 +903,33 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
>> dprintk("%s:Return\n", __func__);
>> }
>>
>> +/*
>> + * Each layoutdriver embeds pnfs_layout_type as the first field in it's
>> + * per-layout type layout cache structure and returns it ZEROed
>> + * from layoutdriver_io_ops->alloc_layout
>> + */
>> static struct pnfs_layout_type *
>> alloc_init_layout(struct inode *ino)
>> {
>> - void *ld_data;
>> struct pnfs_layout_type *lo;
>> struct layoutdriver_io_operations *io_ops;
>>
>> + BUG_ON(NFS_I(ino)->layout != NULL);
>> io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
>> - lo = &NFS_I(ino)->layout;
>> - ld_data = io_ops->alloc_layout(ino);
>> - if (!ld_data) {
>> + lo = io_ops->alloc_layout(ino);
>> + if (!lo) {
>> printk(KERN_ERR
>> "%s: out of memory: io_ops->alloc_layout failed\n",
>> __func__);
>> return NULL;
>> }
>> -
>> spin_lock(&ino->i_lock);
>> - BUG_ON(lo->ld_data != NULL);
>> - lo->ld_data = ld_data;
>> - memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
>> lo->refcount = 1;
>> - lo->roc_iomode = 0;
>> + INIT_LIST_HEAD(&lo->lo_layouts);
>> + INIT_LIST_HEAD(&lo->segs);
>> + seqlock_init(&lo->seqlock);
>> + lo->lo_inode = ino;
>
>
> The spin_lock() could be moved to here, it's the single place
> that actually touches the nfsi.


True, but it gets moved in 'pnfs-submti rewrite layout allocation"

>
>> + NFS_I(ino)->layout = lo;
>> spin_unlock(&ino->i_lock);
>> return lo;
>> }
>> @@ -961,7 +964,7 @@ get_lock_alloc_layout(struct inode *ino)
>> * wait until bit is cleared if we lost this race.
>> */
>> res = wait_on_bit_lock(
>> - &nfsi->layout.pnfs_layout_state,
>> + &nfsi->layout->pnfs_layout_state,
>> NFS_INO_LAYOUT_ALLOC,
>> pnfs_wait_schedule, TASK_KILLABLE);
>> if (res) {
>> @@ -972,7 +975,7 @@ get_lock_alloc_layout(struct inode *ino)
>> /* Was current_layout already allocated while we slept?
>> * If so, retry get_lock'ing it. Otherwise, allocate it.
>> */
>> - if (nfsi->layout.ld_data) {
>> + if (nfsi->layout) {
>> spin_lock(&ino->i_lock);
>> continue;
>> }
>> @@ -985,8 +988,8 @@ get_lock_alloc_layout(struct inode *ino)
>>
>> /* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
>> clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
>> - &nfsi->layout.pnfs_layout_state);
>> - wake_up_bit(&nfsi->layout.pnfs_layout_state,
>> + &nfsi->layout->pnfs_layout_state);
>> + wake_up_bit(&nfsi->layout->pnfs_layout_state,
>> NFS_INO_LAYOUT_ALLOC);
>> break;
>> }
>> @@ -1113,12 +1116,12 @@ _pnfs_update_layout(struct inode *ino,
>> }
>>
>> /* if get layout already failed once goto out */
>> - if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
>> + if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
>> if (unlikely(nfsi->pnfs_layout_suspend &&
>> get_seconds() >= nfsi->pnfs_layout_suspend)) {
>> dprintk("%s: layout_get resumed\n", __func__);
>> clear_bit(lo_fail_bit(iomode),
>> - &nfsi->layout.pnfs_layout_state);
>> + &nfsi->layout->pnfs_layout_state);
>> nfsi->pnfs_layout_suspend = 0;
>> } else
>> goto out_put;
>> @@ -1130,7 +1133,7 @@ _pnfs_update_layout(struct inode *ino,
>> send_layoutget(ino, ctx, &arg, lsegpp, lo);
>> out:
>> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
>> - nfsi->layout.pnfs_layout_state, lseg);
>> + nfsi->layout->pnfs_layout_state, lseg);
>> return;
>> out_put:
>> if (lsegpp)
>> @@ -1235,12 +1238,12 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
>> /* remember that get layout failed and suspend trying */
>> nfsi->pnfs_layout_suspend = suspend;
>> set_bit(lo_fail_bit(lgp->args.lseg.iomode),
>> - &nfsi->layout.pnfs_layout_state);
>> + &nfsi->layout->pnfs_layout_state);
>> dprintk("%s: layout_get suspended until %ld\n",
>> __func__, suspend);
>> out:
>> dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
>> - __func__, lgp->status, nfsi->layout.pnfs_layout_state, lseg);
>> + __func__, lgp->status, nfsi->layout->pnfs_layout_state, lseg);
>> return;
>> }
>>
>> @@ -1321,7 +1324,7 @@ pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
>>
>> pgio->pg_test = NULL;
>>
>> - laytype = &NFS_I(inode)->layout;
>> + laytype = NFS_I(inode)->layout;
>> ld = NFS_SERVER(inode)->pnfs_curr_ld;
>> if (!pnfs_enabled_sb(NFS_SERVER(inode)) || !laytype)
>> return;
>> @@ -1454,7 +1457,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
>> numpages);
>> wdata->pdata.lseg = lseg;
>> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->write_pagelist(
>> - &nfsi->layout,
>> + nfsi->layout,
>> args->pages,
>> args->pgbase,
>> numpages,
>> @@ -1507,7 +1510,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>> __func__, numpages);
>> rdata->pdata.lseg = lseg;
>> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->read_pagelist(
>> - &nfsi->layout,
>> + nfsi->layout,
>> args->pages,
>> args->pgbase,
>> numpages,
>> @@ -1568,7 +1571,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
>> * This will be done by passing the buck to the layout driver.
>> */
>> data->pdata.lseg = NULL;
>> - trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
>> + trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(nfsi->layout,
>> sync, data);
>> dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>> return trypnfs;
>> @@ -1639,14 +1642,14 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>> /* Clear layoutcommit properties in the inode so
>> * new lc info can be generated
>> */
>> - write_begin_pos = nfsi->layout.pnfs_write_begin_pos;
>> - write_end_pos = nfsi->layout.pnfs_write_end_pos;
>> - data->cred = nfsi->layout.lo_cred;
>> - nfsi->layout.pnfs_write_begin_pos = 0;
>> - nfsi->layout.pnfs_write_end_pos = 0;
>> - nfsi->layout.lo_cred = NULL;
>> - __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>> - pnfs_get_layout_stateid(&data->args.stateid, &nfsi->layout);
>> + write_begin_pos = nfsi->layout->pnfs_write_begin_pos;
>> + write_end_pos = nfsi->layout->pnfs_write_end_pos;
>> + data->cred = nfsi->layout->lo_cred;
>> + nfsi->layout->pnfs_write_begin_pos = 0;
>> + nfsi->layout->pnfs_write_end_pos = 0;
>> + nfsi->layout->lo_cred = NULL;
>> + __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>> + pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>
>> spin_unlock(&inode->i_lock);
>>
>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>> index c5af82f..2c6ce4c 100644
>> --- a/include/linux/nfs4_pnfs.h
>> +++ b/include/linux/nfs4_pnfs.h
>> @@ -35,13 +35,13 @@ struct pnfs_layoutdriver_type {
>> static inline struct nfs_inode *
>> PNFS_NFS_INODE(struct pnfs_layout_type *lo)
>> {
>> - return container_of(lo, struct nfs_inode, layout);
>> + return NFS_I(lo->lo_inode);
>> }
>>
>> static inline struct inode *
>> PNFS_INODE(struct pnfs_layout_type *lo)
>> {
>> - return &PNFS_NFS_INODE(lo)->vfs_inode;
>> + return lo->lo_inode;
>> }
>>
>> static inline struct nfs_server *
>> @@ -50,12 +50,6 @@ PNFS_NFS_SERVER(struct pnfs_layout_type *lo)
>> return NFS_SERVER(PNFS_INODE(lo));
>> }
>>
>> -static inline void *
>> -PNFS_LD_DATA(struct pnfs_layout_type *lo)
>> -{
>> - return lo->ld_data;
>> -}
>> -
>> static inline struct pnfs_layoutdriver_type *
>> PNFS_LD(struct pnfs_layout_type *lo)
>> {
>> @@ -77,13 +71,14 @@ PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
>> static inline bool
>> has_layout(struct nfs_inode *nfsi)
>> {
>> - return nfsi->layout.ld_data != NULL;
>> + return nfsi->layout != NULL;
>
> does this mean the same thing now?

Yes, the pnfs_layout_type and the ld_data are allocated together.

>
> You could perhaps optimize with a bit flag that is set on
> first segment insert, and reset on last segment removal.
> Or some other/better scheme.

Are you suggesting adding a bit flag to nfs_inode, or perhaps moving
the pnfs_layout_state back to the nfs_inode?



>
>> }
>>
>> static inline bool
>> layoutcommit_needed(struct nfs_inode *nfsi)
>> {
>> - return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>> + return has_layout(nfsi) &&
>> + test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>
> What is the purpose of patch[1/1] ?

To not segfault when the nfs_inode->layout field is NULL.

>
> you are unlocked and asking about a layout pointer. Which according to you might
> go away mid-flight?

No, it simply has not been allocated. The pnfs_layout_type reference
counting will be done under the inode->i_lock and all in-flight RPC's
will hold a reference, so there is no reason to hold a lock to call
has_layout().

-->Andy

>
>> }
>>
>> #else /* CONFIG_NFS_V4_1 */
>> @@ -149,8 +144,8 @@ struct layoutdriver_io_operations {
>> /* Layout information. For each inode, alloc_layout is executed once to retrieve an
>> * inode specific layout structure. Each subsequent layoutget operation results in
>> * a set_layout call to set the opaque layout in the layout driver.*/
>> - void * (*alloc_layout) (struct inode *inode);
>> - void (*free_layout) (void *layoutid);
>> + struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
>> + void (*free_layout) (struct pnfs_layout_type *);
>> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_type *layoutid, struct nfs4_pnfs_layoutget_res *lgr);
>> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>>
>> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
>> index b612746..7b999a4 100644
>> --- a/include/linux/nfs_fs.h
>> +++ b/include/linux/nfs_fs.h
>> @@ -104,7 +104,6 @@ struct pnfs_layout_type {
>> int roc_iomode; /* iomode to return on close, 0=none */
>> seqlock_t seqlock; /* Protects the stateid */
>> nfs4_stateid stateid;
>> - void *ld_data; /* layout driver private data */
>> unsigned long pnfs_layout_state;
>> #define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
>> #define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
>> @@ -116,6 +115,7 @@ struct pnfs_layout_type {
>> */
>> loff_t pnfs_write_begin_pos;
>> loff_t pnfs_write_end_pos;
>> + struct inode *lo_inode;
>> };
>>
>> /*
>> @@ -206,7 +206,7 @@ struct nfs_inode {
>> /* pNFS layout information */
>> #if defined(CONFIG_NFS_V4_1)
>> wait_queue_head_t lo_waitq;
>> - struct pnfs_layout_type layout;
>> + struct pnfs_layout_type *layout;
>> time_t pnfs_layout_suspend;
>> #endif /* CONFIG_NFS_V4_1 */
>> #endif /* CONFIG_NFS_V4*/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-07-12 17:38:13

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

On Mon, Jul 12, 2010 at 1:05 PM, Benny Halevy <[email protected]> wrote:
> On Jul. 12, 2010, 19:29 +0300, Boaz Harrosh <[email protected]> wrote:
>> On 07/08/2010 01:34 AM, [email protected] wrote:
>>> From: Andy Adamson <[email protected]>
>>>
>>> Change nfs_inode layout field to a pointer
>>>
>>> Each layoutdriver embeds the pnfs_layout_type in it's private layout cache
>>> head structure, and allocates them both with the alloc_layout
>>> layoutdriver_io_operation.
>>>
>>> NOTE API change: the layoutdriver_io_operation alloc_layout and free_layout now
>>> use pnfs_layout_type pointers instead of void*.
>>>
>>> Signed-off-by: Andy Adamson <[email protected]>
>>> ---
>>> fs/nfs/callback_proc.c | 2 +-
>>> fs/nfs/inode.c | 12 +-----
>>> fs/nfs/nfs4proc.c | 2 +-
>>> fs/nfs/nfs4state.c | 4 +-
>>> fs/nfs/pnfs.c | 111 +++++++++++++++++++++++----------------------
>>> include/linux/nfs4_pnfs.h | 19 +++-----
>>> include/linux/nfs_fs.h | 4 +-
>>> 7 files changed, 71 insertions(+), 83 deletions(-)
>>>
>>> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
>>> index 4766695..d999ea8 100644
>>> --- a/fs/nfs/callback_proc.c
>>> +++ b/fs/nfs/callback_proc.c
>>> @@ -233,7 +233,7 @@ static int pnfs_recall_layout(void *data)
>>> rl.cbl_seg.length = NFS4_MAX_UINT64;
>>>
>>> if (rl.cbl_recall_type == RETURN_FILE) {
>>> - if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout,
>>> + if (pnfs_is_next_layout_stateid(NFS_I(inode)->layout,
>>> rl.cbl_stateid))
>>> status = pnfs_return_layout(inode, &rl.cbl_seg,
>>> &rl.cbl_stateid, RETURN_FILE,
>>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>>> index 3d51602..30252f4 100644
>>> --- a/fs/nfs/inode.c
>>> +++ b/fs/nfs/inode.c
>>> @@ -1397,17 +1397,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
>>> #ifdef CONFIG_NFS_V4_1
>>> init_waitqueue_head(&nfsi->lo_waitq);
>>> nfsi->pnfs_layout_suspend = 0;
>>> - seqlock_init(&nfsi->layout.seqlock);
>>> - INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
>>> - INIT_LIST_HEAD(&nfsi->layout.segs);
>>> - nfsi->layout.refcount = 0;
>>> - nfsi->layout.ld_data = NULL;
>>> - nfsi->layout.pnfs_layout_state = 0;
>>> - memset(&nfsi->layout.stateid, 0, NFS4_STATEID_SIZE);
>>> - nfsi->layout.roc_iomode = 0;
>>> - nfsi->layout.lo_cred = NULL;
>>> - nfsi->layout.pnfs_write_begin_pos = 0;
>>> - nfsi->layout.pnfs_write_end_pos = 0;
>>> + nfsi->layout = NULL;
>>> #endif /* CONFIG_NFS_V4_1 */
>>> #endif
>>> }
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index c5b8a2f..6acebc3 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -1086,7 +1086,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
>>> * flag during grace period
>>> */
>>> pnfs_destroy_layout(NFS_I(state->inode));
>>> - pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
>>> + pnfs_set_layout_stateid(NFS_I(state->inode)->layout, &zero_stateid);
>>> #endif /* CONFIG_NFS_V4_1 */
>>> }
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index bfe679b..6f44cb0 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -597,10 +597,10 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state,
>>> #ifdef CONFIG_NFS_V4_1
>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>
>>> - if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>> + if (has_layout(nfsi) && nfsi->layout->roc_iomode) {
>>> struct nfs4_pnfs_layout_segment range;
>>>
>>> - range.iomode = nfsi->layout.roc_iomode;
>>> + range.iomode = nfsi->layout->roc_iomode;
>>> range.offset = 0;
>>> range.length = NFS4_MAX_UINT64;
>>> pnfs_return_layout(state->inode, &range, NULL,
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 53c15b1..6d435ed 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -152,10 +152,11 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
>>> {
>>> dprintk("%s: has_layout=%d ctx=%p\n", __func__, has_layout(nfsi), ctx);
>>> spin_lock(&nfsi->vfs_inode.i_lock);
>>> - if (has_layout(nfsi) && !layoutcommit_needed(nfsi)) {
>>> - nfsi->layout.lo_cred = get_rpccred(ctx->state->owner->so_cred);
>>> + if (has_layout(nfsi) && !
>>> + !test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state)) {
>>
>> Is the "!!" intended
>>
>> Why did you open code layoutcommit_needed() ?
>>
>>> + nfsi->layout->lo_cred = get_rpccred(ctx->state->owner->so_cred);
>>> __set_bit(NFS_INO_LAYOUTCOMMIT,
>>> - &nfsi->layout.pnfs_layout_state);
>>> + &nfsi->layout->pnfs_layout_state);
>>> nfsi->change_attr++;
>>> spin_unlock(&nfsi->vfs_inode.i_lock);
>>> dprintk("%s: Set layoutcommit\n", __func__);
>>> @@ -175,17 +176,17 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
>>> loff_t end_pos;
>>>
>>> spin_lock(&nfsi->vfs_inode.i_lock);
>>> - if (offset < nfsi->layout.pnfs_write_begin_pos)
>>> - nfsi->layout.pnfs_write_begin_pos = offset;
>>> + if (offset < nfsi->layout->pnfs_write_begin_pos)
>>> + nfsi->layout->pnfs_write_begin_pos = offset;
>>> end_pos = offset + extent - 1; /* I'm being inclusive */
>>> - if (end_pos > nfsi->layout.pnfs_write_end_pos)
>>> - nfsi->layout.pnfs_write_end_pos = end_pos;
>>> + if (end_pos > nfsi->layout->pnfs_write_end_pos)
>>> + nfsi->layout->pnfs_write_end_pos = end_pos;
>>> dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
>>> __func__,
>>> (unsigned long) extent,
>>> (unsigned long) offset ,
>>> - (unsigned long) nfsi->layout.pnfs_write_begin_pos,
>>> - (unsigned long) nfsi->layout.pnfs_write_end_pos);
>>> + (unsigned long) nfsi->layout->pnfs_write_begin_pos,
>>> + (unsigned long) nfsi->layout->pnfs_write_end_pos);
>>> spin_unlock(&nfsi->vfs_inode.i_lock);
>>> }
>>>
>>> @@ -326,10 +327,9 @@ get_layout(struct pnfs_layout_type *lo)
>>> static inline struct pnfs_layout_type *
>>> grab_current_layout(struct nfs_inode *nfsi)
>>> {
>>> - struct pnfs_layout_type *lo = &nfsi->layout;
>>> + struct pnfs_layout_type *lo = nfsi->layout;
>>>
>>> - BUG_ON_UNLOCKED_LO(lo);
>>> - if (!lo->ld_data)
>>> + if (!lo)
>>> return NULL;
>>> get_layout(lo);
>>> return lo;
>>> @@ -343,13 +343,13 @@ put_layout(struct pnfs_layout_type *lo)
>>>
>>> lo->refcount--;
>>> if (!lo->refcount) {
>>> - struct layoutdriver_io_operations *io_ops =
>>> - PNFS_LD_IO_OPS(lo);
>>> + struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
>>> + struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
>>>
>>> - dprintk("%s: freeing layout %p\n", __func__, lo->ld_data);
>>> + dprintk("%s: freeing layout cache %p\n", __func__, lo);
>>> WARN_ON(!list_empty(&lo->lo_layouts));
>>> - io_ops->free_layout(lo->ld_data);
>>> - lo->ld_data = NULL;
>>> + io_ops->free_layout(lo);
>>> + nfsi->layout = NULL;
>>> }
>>> }
>>>
>>> @@ -381,14 +381,13 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
>>> lo = grab_current_layout(nfsi);
>>> if (lo) {
>>> pnfs_free_layout(lo, &range);
>>> - WARN_ON(!list_empty(&nfsi->layout.segs));
>>> - WARN_ON(!list_empty(&nfsi->layout.lo_layouts));
>>> - WARN_ON(nfsi->layout.ld_data);
>>> + WARN_ON(!list_empty(&nfsi->layout->segs));
>>> + WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
>>>
>>> - if (nfsi->layout.refcount != 1)
>>> + if (nfsi->layout->refcount != 1)
>>> printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
>>> - __func__, nfsi->layout.refcount);
>>> - WARN_ON(nfsi->layout.refcount != 1);
>>> + __func__, nfsi->layout->refcount);
>>> + WARN_ON(nfsi->layout->refcount != 1);
>>>
>>> put_layout(lo);
>>> }
>>> @@ -698,7 +697,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
>>> bool ret = false;
>>>
>>> spin_lock(&nfsi->vfs_inode.i_lock);
>>> - list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
>>> + list_for_each_entry(lseg, &nfsi->layout->segs, fi_list) {
>>> if (!should_free_lseg(lseg, range))
>>> continue;
>>> lseg->valid = false;
>>> @@ -904,29 +903,33 @@ pnfs_insert_layout(struct pnfs_layout_type *lo,
>>> dprintk("%s:Return\n", __func__);
>>> }
>>>
>>> +/*
>>> + * Each layoutdriver embeds pnfs_layout_type as the first field in it's
>>> + * per-layout type layout cache structure and returns it ZEROed
>>> + * from layoutdriver_io_ops->alloc_layout
>>> + */
>>> static struct pnfs_layout_type *
>>> alloc_init_layout(struct inode *ino)
>>> {
>>> - void *ld_data;
>>> struct pnfs_layout_type *lo;
>>> struct layoutdriver_io_operations *io_ops;
>>>
>>> + BUG_ON(NFS_I(ino)->layout != NULL);
>>> io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
>>> - lo = &NFS_I(ino)->layout;
>>> - ld_data = io_ops->alloc_layout(ino);
>>> - if (!ld_data) {
>>> + lo = io_ops->alloc_layout(ino);
>>> + if (!lo) {
>>> printk(KERN_ERR
>>> "%s: out of memory: io_ops->alloc_layout failed\n",
>>> __func__);
>>> return NULL;
>>> }
>>> -
>>> spin_lock(&ino->i_lock);
>>> - BUG_ON(lo->ld_data != NULL);
>>> - lo->ld_data = ld_data;
>>> - memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
>>> lo->refcount = 1;
>>> - lo->roc_iomode = 0;
>>> + INIT_LIST_HEAD(&lo->lo_layouts);
>>> + INIT_LIST_HEAD(&lo->segs);
>>> + seqlock_init(&lo->seqlock);
>>> + lo->lo_inode = ino;
>>
>>
>> The spin_lock() could be moved to here, it's the single place
>> that actually touches the nfsi.
>>
>>> + NFS_I(ino)->layout = lo;
>>> spin_unlock(&ino->i_lock);
>>> return lo;
>>> }
>>> @@ -961,7 +964,7 @@ get_lock_alloc_layout(struct inode *ino)
>>> * wait until bit is cleared if we lost this race.
>>> */
>>> res = wait_on_bit_lock(
>>> - &nfsi->layout.pnfs_layout_state,
>>> + &nfsi->layout->pnfs_layout_state,
>>> NFS_INO_LAYOUT_ALLOC,
>>> pnfs_wait_schedule, TASK_KILLABLE);
>>> if (res) {
>>> @@ -972,7 +975,7 @@ get_lock_alloc_layout(struct inode *ino)
>>> /* Was current_layout already allocated while we slept?
>>> * If so, retry get_lock'ing it. Otherwise, allocate it.
>>> */
>>> - if (nfsi->layout.ld_data) {
>>> + if (nfsi->layout) {
>>> spin_lock(&ino->i_lock);
>>> continue;
>>> }
>>> @@ -985,8 +988,8 @@ get_lock_alloc_layout(struct inode *ino)
>>>
>>> /* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
>>> clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
>>> - &nfsi->layout.pnfs_layout_state);
>>> - wake_up_bit(&nfsi->layout.pnfs_layout_state,
>>> + &nfsi->layout->pnfs_layout_state);
>>> + wake_up_bit(&nfsi->layout->pnfs_layout_state,
>>> NFS_INO_LAYOUT_ALLOC);
>>> break;
>>> }
>>> @@ -1113,12 +1116,12 @@ _pnfs_update_layout(struct inode *ino,
>>> }
>>>
>>> /* if get layout already failed once goto out */
>>> - if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
>>> + if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
>>> if (unlikely(nfsi->pnfs_layout_suspend &&
>>> get_seconds() >= nfsi->pnfs_layout_suspend)) {
>>> dprintk("%s: layout_get resumed\n", __func__);
>>> clear_bit(lo_fail_bit(iomode),
>>> - &nfsi->layout.pnfs_layout_state);
>>> + &nfsi->layout->pnfs_layout_state);
>>> nfsi->pnfs_layout_suspend = 0;
>>> } else
>>> goto out_put;
>>> @@ -1130,7 +1133,7 @@ _pnfs_update_layout(struct inode *ino,
>>> send_layoutget(ino, ctx, &arg, lsegpp, lo);
>>> out:
>>> dprintk("%s end, state 0x%lx lseg %p\n", __func__,
>>> - nfsi->layout.pnfs_layout_state, lseg);
>>> + nfsi->layout->pnfs_layout_state, lseg);
>>> return;
>>> out_put:
>>> if (lsegpp)
>>> @@ -1235,12 +1238,12 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
>>> /* remember that get layout failed and suspend trying */
>>> nfsi->pnfs_layout_suspend = suspend;
>>> set_bit(lo_fail_bit(lgp->args.lseg.iomode),
>>> - &nfsi->layout.pnfs_layout_state);
>>> + &nfsi->layout->pnfs_layout_state);
>>> dprintk("%s: layout_get suspended until %ld\n",
>>> __func__, suspend);
>>> out:
>>> dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
>>> - __func__, lgp->status, nfsi->layout.pnfs_layout_state, lseg);
>>> + __func__, lgp->status, nfsi->layout->pnfs_layout_state, lseg);
>>> return;
>>> }
>>>
>>> @@ -1321,7 +1324,7 @@ pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
>>>
>>> pgio->pg_test = NULL;
>>>
>>> - laytype = &NFS_I(inode)->layout;
>>> + laytype = NFS_I(inode)->layout;
>>> ld = NFS_SERVER(inode)->pnfs_curr_ld;
>>> if (!pnfs_enabled_sb(NFS_SERVER(inode)) || !laytype)
>>> return;
>>> @@ -1454,7 +1457,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
>>> numpages);
>>> wdata->pdata.lseg = lseg;
>>> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->write_pagelist(
>>> - &nfsi->layout,
>>> + nfsi->layout,
>>> args->pages,
>>> args->pgbase,
>>> numpages,
>>> @@ -1507,7 +1510,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>> __func__, numpages);
>>> rdata->pdata.lseg = lseg;
>>> trypnfs = nfss->pnfs_curr_ld->ld_io_ops->read_pagelist(
>>> - &nfsi->layout,
>>> + nfsi->layout,
>>> args->pages,
>>> args->pgbase,
>>> numpages,
>>> @@ -1568,7 +1571,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
>>> * This will be done by passing the buck to the layout driver.
>>> */
>>> data->pdata.lseg = NULL;
>>> - trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
>>> + trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(nfsi->layout,
>>> sync, data);
>>> dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>>> return trypnfs;
>>> @@ -1639,14 +1642,14 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>> /* Clear layoutcommit properties in the inode so
>>> * new lc info can be generated
>>> */
>>> - write_begin_pos = nfsi->layout.pnfs_write_begin_pos;
>>> - write_end_pos = nfsi->layout.pnfs_write_end_pos;
>>> - data->cred = nfsi->layout.lo_cred;
>>> - nfsi->layout.pnfs_write_begin_pos = 0;
>>> - nfsi->layout.pnfs_write_end_pos = 0;
>>> - nfsi->layout.lo_cred = NULL;
>>> - __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>>> - pnfs_get_layout_stateid(&data->args.stateid, &nfsi->layout);
>>> + write_begin_pos = nfsi->layout->pnfs_write_begin_pos;
>>> + write_end_pos = nfsi->layout->pnfs_write_end_pos;
>>> + data->cred = nfsi->layout->lo_cred;
>>> + nfsi->layout->pnfs_write_begin_pos = 0;
>>> + nfsi->layout->pnfs_write_end_pos = 0;
>>> + nfsi->layout->lo_cred = NULL;
>>> + __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>> + pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>>
>>> spin_unlock(&inode->i_lock);
>>>
>>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>>> index c5af82f..2c6ce4c 100644
>>> --- a/include/linux/nfs4_pnfs.h
>>> +++ b/include/linux/nfs4_pnfs.h
>>> @@ -35,13 +35,13 @@ struct pnfs_layoutdriver_type {
>>> static inline struct nfs_inode *
>>> PNFS_NFS_INODE(struct pnfs_layout_type *lo)
>>> {
>>> - return container_of(lo, struct nfs_inode, layout);
>>> + return NFS_I(lo->lo_inode);
>>> }
>>>
>>> static inline struct inode *
>>> PNFS_INODE(struct pnfs_layout_type *lo)
>>> {
>>> - return &PNFS_NFS_INODE(lo)->vfs_inode;
>>> + return lo->lo_inode;
>>> }
>>>
>>> static inline struct nfs_server *
>>> @@ -50,12 +50,6 @@ PNFS_NFS_SERVER(struct pnfs_layout_type *lo)
>>> return NFS_SERVER(PNFS_INODE(lo));
>>> }
>>>
>>> -static inline void *
>>> -PNFS_LD_DATA(struct pnfs_layout_type *lo)
>>> -{
>>> - return lo->ld_data;
>>> -}
>>> -
>>> static inline struct pnfs_layoutdriver_type *
>>> PNFS_LD(struct pnfs_layout_type *lo)
>>> {
>>> @@ -77,13 +71,14 @@ PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
>>> static inline bool
>>> has_layout(struct nfs_inode *nfsi)
>>> {
>>> - return nfsi->layout.ld_data != NULL;
>>> + return nfsi->layout != NULL;
>>
>> does this mean the same thing now?
>>
>> You could perhaps optimize with a bit flag that is set on
>> first segment insert, and reset on last segment removal.
>> Or some other/better scheme.
>
> We may want to have has_layout() as well as has_layout_segs()
> which will check that the lsegs list is not empty.

Won't be necessary when we initialize the layout when the segs list is empty.

-->Andy

>
> Benny
>
>>
>>> }
>>>
>>> static inline bool
>>> layoutcommit_needed(struct nfs_inode *nfsi)
>>> {
>>> - return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>>> + return has_layout(nfsi) &&
>>> + test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>
>> What is the purpose of patch[1/1] ?
>>
>> you are unlocked and asking about a layout pointer. Which according to you might
>> go away mid-flight?
>>
>>> }
>>>
>>> #else /* CONFIG_NFS_V4_1 */
>>> @@ -149,8 +144,8 @@ struct layoutdriver_io_operations {
>>> /* Layout information. For each inode, alloc_layout is executed once to retrieve an
>>> * inode specific layout structure. Each subsequent layoutget operation results in
>>> * a set_layout call to set the opaque layout in the layout driver.*/
>>> - void * (*alloc_layout) (struct inode *inode);
>>> - void (*free_layout) (void *layoutid);
>>> + struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
>>> + void (*free_layout) (struct pnfs_layout_type *);
>>> struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_type *layoutid, struct nfs4_pnfs_layoutget_res *lgr);
>>> void (*free_lseg) (struct pnfs_layout_segment *lseg);
>>>
>>> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
>>> index b612746..7b999a4 100644
>>> --- a/include/linux/nfs_fs.h
>>> +++ b/include/linux/nfs_fs.h
>>> @@ -104,7 +104,6 @@ struct pnfs_layout_type {
>>> int roc_iomode; /* iomode to return on close, 0=none */
>>> seqlock_t seqlock; /* Protects the stateid */
>>> nfs4_stateid stateid;
>>> - void *ld_data; /* layout driver private data */
>>> unsigned long pnfs_layout_state;
>>> #define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
>>> #define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
>>> @@ -116,6 +115,7 @@ struct pnfs_layout_type {
>>> */
>>> loff_t pnfs_write_begin_pos;
>>> loff_t pnfs_write_end_pos;
>>> + struct inode *lo_inode;
>>> };
>>>
>>> /*
>>> @@ -206,7 +206,7 @@ struct nfs_inode {
>>> /* pNFS layout information */
>>> #if defined(CONFIG_NFS_V4_1)
>>> wait_queue_head_t lo_waitq;
>>> - struct pnfs_layout_type layout;
>>> + struct pnfs_layout_type *layout;
>>> time_t pnfs_layout_suspend;
>>> #endif /* CONFIG_NFS_V4_1 */
>>> #endif /* CONFIG_NFS_V4*/
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-07-12 17:43:21

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type

On 07/12/2010 08:34 PM, William A. (Andy) Adamson wrote:
> On Mon, Jul 12, 2010 at 12:29 PM, Boaz Harrosh <[email protected]> wrote:
>>> static inline bool
>>> layoutcommit_needed(struct nfs_inode *nfsi)
>>> {
>>> - return test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout.pnfs_layout_state);
>>> + return has_layout(nfsi) &&
>>> + test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>
>> What is the purpose of patch[1/1] ?
>
> To not segfault when the nfs_inode->layout field is NULL.
>

Yes that is this patch [3/16].

I was asking about [PATCH 01/16] that changed it to a state_bit from lo_cred
why do we need that one. Now that you proved that has_layout(nfsi) cannot become
negative on us while asking?

>>
>> you are unlocked and asking about a layout pointer. Which according to you might
>> go away mid-flight?
>
> No, it simply has not been allocated. The pnfs_layout_type reference
> counting will be done under the inode->i_lock and all in-flight RPC's
> will hold a reference, so there is no reason to hold a lock to call
> has_layout().
>

I agree it is safe.

Boaz

> -->Andy
>

2010-07-12 18:09:06

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Mon, Jul 12, 2010 at 1:25 PM, Boaz Harrosh <[email protected]> wrote:
> On 07/08/2010 01:34 AM, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 2 ++
>> fs/nfs/pnfs.c | 13 +++++++++++++
>> fs/nfs/pnfs.h | 1 +
>> 3 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 6acebc3..f763746 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
>> struct pnfs_layoutcommit_data *data =
>> (struct pnfs_layoutcommit_data *)lcdata;
>>
>> + /* Matched by get_layout in pnfs_layoutcommit_inode */
>> + put_layout(data->args.inode);
>> put_rpccred(data->cred);
>> pnfs_layoutcommit_free(lcdata);
>> }
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index aa16e5d..d42c5da 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
>> }
>>
>> void
>> +put_layout(struct inode *inode)
>> +{
>> + spin_lock(&inode->i_lock);
>> + put_layout_locked(NFS_I(inode)->layout);
>> + spin_unlock(&inode->i_lock);
>> +
>> +}
>> +
>> +void
>> pnfs_layout_release(struct pnfs_layout_type *lo,
>> struct nfs4_pnfs_layout_segment *range)
>> {
>> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>
>> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
>> + get_layout(NFS_I(inode)->layout);
>> +
>
> Has of your rules this should be now called get_layout_locked

OK


>
>> spin_unlock(&inode->i_lock);
>>
>> /* Set up layout commit args */
>> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>> if (status) {
>> /* The layout driver failed to setup the layoutcommit */
>> put_rpccred(data->cred);
>> + put_layout(inode);
>
> And it is really nice that put_layout takes an inode and get_layout takes an
> struct layout_type.
>
>> goto out_free;
>> }
>> status = pnfs4_proc_layoutcommit(data, sync);
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 9b0fed4..e04b9d4 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
>> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
>> const nfs4_stateid *stateid);
>> void pnfs_destroy_layout(struct nfs_inode *);
>> +void put_layout(struct inode *inode);
>>
>> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
>> (srv)->pnfs_curr_ld->ld_io_ops && \
>
> I still think they can all go away.
> What is the point of a layout without it's nfsi+inode. And who cares if we free the layout_type structure
> 2 milliseconds before, and in the error case?
>
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-07-12 18:27:22

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Jul. 12, 2010, 21:09 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
> On Mon, Jul 12, 2010 at 1:25 PM, Boaz Harrosh <[email protected]> wrote:
>> On 07/08/2010 01:34 AM, [email protected] wrote:
>>> From: Andy Adamson <[email protected]>
>>>
>>> Signed-off-by: Andy Adamson <[email protected]>
>>> ---
>>> fs/nfs/nfs4proc.c | 2 ++
>>> fs/nfs/pnfs.c | 13 +++++++++++++
>>> fs/nfs/pnfs.h | 1 +
>>> 3 files changed, 16 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 6acebc3..f763746 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
>>> struct pnfs_layoutcommit_data *data =
>>> (struct pnfs_layoutcommit_data *)lcdata;
>>>
>>> + /* Matched by get_layout in pnfs_layoutcommit_inode */
>>> + put_layout(data->args.inode);
>>> put_rpccred(data->cred);
>>> pnfs_layoutcommit_free(lcdata);
>>> }
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index aa16e5d..d42c5da 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
>>> }
>>>
>>> void
>>> +put_layout(struct inode *inode)
>>> +{
>>> + spin_lock(&inode->i_lock);
>>> + put_layout_locked(NFS_I(inode)->layout);
>>> + spin_unlock(&inode->i_lock);
>>> +
>>> +}
>>> +
>>> +void
>>> pnfs_layout_release(struct pnfs_layout_type *lo,
>>> struct nfs4_pnfs_layout_segment *range)
>>> {
>>> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>>
>>> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
>>> + get_layout(NFS_I(inode)->layout);
>>> +
>>
>> Has of your rules this should be now called get_layout_locked
>
> OK

Andy, before you crank another version, I've already merged this patchset
with obvious fixes into my tree so it might be better to fix that up.

I also fixed a couple other less trivial bugs for which I'm going
to send patches soon, and I fixed up the obj and block layout alloc/free
methods to agree with the new API.

Benny

>
>
>>
>>> spin_unlock(&inode->i_lock);
>>>
>>> /* Set up layout commit args */
>>> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>> if (status) {
>>> /* The layout driver failed to setup the layoutcommit */
>>> put_rpccred(data->cred);
>>> + put_layout(inode);
>>
>> And it is really nice that put_layout takes an inode and get_layout takes an
>> struct layout_type.
>>
>>> goto out_free;
>>> }
>>> status = pnfs4_proc_layoutcommit(data, sync);
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index 9b0fed4..e04b9d4 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
>>> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
>>> const nfs4_stateid *stateid);
>>> void pnfs_destroy_layout(struct nfs_inode *);
>>> +void put_layout(struct inode *inode);
>>>
>>> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
>>> (srv)->pnfs_curr_ld->ld_io_ops && \
>>
>> I still think they can all go away.
>> What is the point of a layout without it's nfsi+inode. And who cares if we free the layout_type structure
>> 2 milliseconds before, and in the error case?
>>
>> Boaz
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

2010-07-12 18:27:43

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Mon, Jul 12, 2010 at 1:19 PM, Benny Halevy <[email protected]> wrote:
> On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 2 ++
>> fs/nfs/pnfs.c | 13 +++++++++++++
>> fs/nfs/pnfs.h | 1 +
>> 3 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 6acebc3..f763746 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
>> struct pnfs_layoutcommit_data *data =
>> (struct pnfs_layoutcommit_data *)lcdata;
>>
>> + /* Matched by get_layout in pnfs_layoutcommit_inode */
>> + put_layout(data->args.inode);
>> put_rpccred(data->cred);
>> pnfs_layoutcommit_free(lcdata);
>> }
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index aa16e5d..d42c5da 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
>> }
>>
>> void
>> +put_layout(struct inode *inode)
>> +{
>> + spin_lock(&inode->i_lock);
>> + put_layout_locked(NFS_I(inode)->layout);
>> + spin_unlock(&inode->i_lock);
>> +
>> +}
>> +
>> +void
>> pnfs_layout_release(struct pnfs_layout_type *lo,
>> struct nfs4_pnfs_layout_segment *range)
>> {
>> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>
>> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
>> + get_layout(NFS_I(inode)->layout);
>> +
>
> So from the 30000 foot level, I need to remind myself what do
> we need the refcount on the layout hdr (pnfs_layout_type) for?

> Can we really use it detached from the inode? NO
> Is it only for debugging to make catch the case that the inode
> is released while there are references to the layout?

When we migrate the filesystem, we need to reap the nfs_inode->layout
while keeping the nfs_inode.

Same with server reboot, network partition, and use of a different replica.

-->Andy

>
> Benny
>
>> spin_unlock(&inode->i_lock);
>>
>> /* Set up layout commit args */
>> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>> if (status) {
>> /* The layout driver failed to setup the layoutcommit */
>> put_rpccred(data->cred);
>> + put_layout(inode);
>> goto out_free;
>> }
>> status = pnfs4_proc_layoutcommit(data, sync);
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 9b0fed4..e04b9d4 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
>> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
>> const nfs4_stateid *stateid);
>> void pnfs_destroy_layout(struct nfs_inode *);
>> +void put_layout(struct inode *inode);
>>
>> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
>> (srv)->pnfs_curr_ld->ld_io_ops && \
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-07-12 18:28:57

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Mon, Jul 12, 2010 at 2:27 PM, Benny Halevy <[email protected]> wrote:
> On Jul. 12, 2010, 21:09 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>> On Mon, Jul 12, 2010 at 1:25 PM, Boaz Harrosh <[email protected]> wrote:
>>> On 07/08/2010 01:34 AM, [email protected] wrote:
>>>> From: Andy Adamson <[email protected]>
>>>>
>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>> ---
>>>> fs/nfs/nfs4proc.c | 2 ++
>>>> fs/nfs/pnfs.c | 13 +++++++++++++
>>>> fs/nfs/pnfs.h | 1 +
>>>> 3 files changed, 16 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index 6acebc3..f763746 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
>>>> struct pnfs_layoutcommit_data *data =
>>>> (struct pnfs_layoutcommit_data *)lcdata;
>>>>
>>>> + /* Matched by get_layout in pnfs_layoutcommit_inode */
>>>> + put_layout(data->args.inode);
>>>> put_rpccred(data->cred);
>>>> pnfs_layoutcommit_free(lcdata);
>>>> }
>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>> index aa16e5d..d42c5da 100644
>>>> --- a/fs/nfs/pnfs.c
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
>>>> }
>>>>
>>>> void
>>>> +put_layout(struct inode *inode)
>>>> +{
>>>> + spin_lock(&inode->i_lock);
>>>> + put_layout_locked(NFS_I(inode)->layout);
>>>> + spin_unlock(&inode->i_lock);
>>>> +
>>>> +}
>>>> +
>>>> +void
>>>> pnfs_layout_release(struct pnfs_layout_type *lo,
>>>> struct nfs4_pnfs_layout_segment *range)
>>>> {
>>>> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>>> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>>> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>>>
>>>> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
>>>> + get_layout(NFS_I(inode)->layout);
>>>> +
>>>
>>> Has of your rules this should be now called get_layout_locked
>>
>> OK
>
> Andy, before you crank another version, I've already merged this patchset
> with obvious fixes into my tree so it might be better to fix that up.
>
> I also fixed a couple other less trivial bugs for which I'm going
> to send patches soon, and I fixed up the obj and block layout alloc/free
> methods to agree with the new API.

OK, Thanks. :)

I'll grab it and we can move forward together from there.

-->Andy
>
> Benny
>
>>
>>
>>>
>>>> spin_unlock(&inode->i_lock);
>>>>
>>>> /* Set up layout commit args */
>>>> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>>> if (status) {
>>>> /* The layout driver failed to setup the layoutcommit */
>>>> put_rpccred(data->cred);
>>>> + put_layout(inode);
>>>
>>> And it is really nice that put_layout takes an inode and get_layout takes an
>>> struct layout_type.
>>>
>>>> goto out_free;
>>>> }
>>>> status = pnfs4_proc_layoutcommit(data, sync);
>>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>>> index 9b0fed4..e04b9d4 100644
>>>> --- a/fs/nfs/pnfs.h
>>>> +++ b/fs/nfs/pnfs.h
>>>> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
>>>> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
>>>> const nfs4_stateid *stateid);
>>>> void pnfs_destroy_layout(struct nfs_inode *);
>>>> +void put_layout(struct inode *inode);
>>>>
>>>> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
>>>> (srv)->pnfs_curr_ld->ld_io_ops && \
>>>
>>> I still think they can all go away.
>>> What is the point of a layout without it's nfsi+inode. And who cares if we free the layout_type structure
>>> 2 milliseconds before, and in the error case?
>>>
>>> Boaz
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>

2010-07-12 18:30:06

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit

On Jul. 12, 2010, 21:27 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
> On Mon, Jul 12, 2010 at 1:19 PM, Benny Halevy <[email protected]> wrote:
>> On Jul. 08, 2010, 1:34 +0300, [email protected] wrote:
>>> From: Andy Adamson <[email protected]>
>>>
>>> Signed-off-by: Andy Adamson <[email protected]>
>>> ---
>>> fs/nfs/nfs4proc.c | 2 ++
>>> fs/nfs/pnfs.c | 13 +++++++++++++
>>> fs/nfs/pnfs.h | 1 +
>>> 3 files changed, 16 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 6acebc3..f763746 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -5565,6 +5565,8 @@ static void pnfs_layoutcommit_release(void *lcdata)
>>> struct pnfs_layoutcommit_data *data =
>>> (struct pnfs_layoutcommit_data *)lcdata;
>>>
>>> + /* Matched by get_layout in pnfs_layoutcommit_inode */
>>> + put_layout(data->args.inode);
>>> put_rpccred(data->cred);
>>> pnfs_layoutcommit_free(lcdata);
>>> }
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index aa16e5d..d42c5da 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -354,6 +354,15 @@ put_layout_locked(struct pnfs_layout_type *lo)
>>> }
>>>
>>> void
>>> +put_layout(struct inode *inode)
>>> +{
>>> + spin_lock(&inode->i_lock);
>>> + put_layout_locked(NFS_I(inode)->layout);
>>> + spin_unlock(&inode->i_lock);
>>> +
>>> +}
>>> +
>>> +void
>>> pnfs_layout_release(struct pnfs_layout_type *lo,
>>> struct nfs4_pnfs_layout_segment *range)
>>> {
>>> @@ -1598,6 +1607,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>> __clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
>>> pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
>>>
>>> + /* Reference for layoutcommit matched in pnfs_layoutcommit_release */
>>> + get_layout(NFS_I(inode)->layout);
>>> +
>>
>> So from the 30000 foot level, I need to remind myself what do
>> we need the refcount on the layout hdr (pnfs_layout_type) for?
>
>> Can we really use it detached from the inode? NO
>> Is it only for debugging to make catch the case that the inode
>> is released while there are references to the layout?
>
> When we migrate the filesystem, we need to reap the nfs_inode->layout
> while keeping the nfs_inode.
>

But how does the refcount help us with that?
Don't we have to do this synchronously before reusing the nfs_inode?

Benny

> Same with server reboot, network partition, and use of a different replica.
>
> -->Andy
>
>>
>> Benny
>>
>>> spin_unlock(&inode->i_lock);
>>>
>>> /* Set up layout commit args */
>>> @@ -1606,6 +1618,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>> if (status) {
>>> /* The layout driver failed to setup the layoutcommit */
>>> put_rpccred(data->cred);
>>> + put_layout(inode);
>>> goto out_free;
>>> }
>>> status = pnfs4_proc_layoutcommit(data, sync);
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index 9b0fed4..e04b9d4 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -64,6 +64,7 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
>>> void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
>>> const nfs4_stateid *stateid);
>>> void pnfs_destroy_layout(struct nfs_inode *);
>>> +void put_layout(struct inode *inode);
>>>
>>> #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
>>> (srv)->pnfs_curr_ld->ld_io_ops && \
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>