2010-05-13 08:40:14

by Boaz Harrosh

[permalink] [raw]
Subject: SQUASHME: missing from FIXME: async layout return

I've tested the patch:
FIXME: async layout return

And there is a missing small hunk

I have tested with this patch and it is a very good patch
that should also go into 2.6.33. It is necessary in the rare
case when one inode have more then one open_context.

(For some reason I see that happening much more in 2.6.34
I don't understand why)

Boaz
---
git diff --stat -p -M
fs/nfs/nfs4state.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 15c8bc8..6dbe893 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
struct nfs_inode *nfsi = NFS_I(state->inode);

if (nfsi->layoutcommit_ctx)
- pnfs_layoutcommit_inode(state->inode, 0);
+ pnfs_layoutcommit_inode(state->inode, wait);
if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
struct nfs4_pnfs_layout_segment range;



2010-05-13 16:23:31

by Boaz Harrosh

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On 05/13/2010 07:04 PM, William A. (Andy) Adamson wrote:
> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <[email protected]> wrote:
>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <[email protected]> wrote:
>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wrote:
>>>>>> I've tested the patch:
>>>>>> FIXME: async layout return
>>>>>>
>>>>>> And there is a missing small hunk
>>>>>>
>>>>>> I have tested with this patch and it is a very good patch
>>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>>> case when one inode have more then one open_context.
>>>>>
>>>>> Do you mean more than one open context per open owner?
>>>>
>>>> What we see is one "regular" open context and one which is the layout_commit_ctx
>>>
>>> Isn't that a BUG?
>>>
>>> Here is what we're seeing:
>>>
>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context->
>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode,
>>>
>>> This is the same code that Boaz's 'missing small hunk' adds the wait
>>> to the pnfs_layout_commit_inode to. This is good, because when
>>> __nfs4_close is called with sync, every thing must be sent/returned
>>> prior to the nfs_do_close call.
>>>
>>> But there is still a problem. Here is the __nfs4_close call (with out
>>> the wait that Boaz added)
>>>
>>> if (nfsi->layoutcommit_ctx)
>>> pnfs_layoutcommit_inode(state->inode, 0);
>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>> struct nfs4_pnfs_layout_segment range;
>>>
>>> range.iomode = nfsi->layout.roc_iomode;
>>> range.offset = 0;
>>> range.length = NFS4_MAX_UINT64;
>>> pnfs_return_layout(state->inode, &range, NULL,
>>> RETURN_FILE);
>>>
>>> Note that a pnfs_return_layout which is always 'sync' (async with wait
>>> for completion). So the (currently) async LAYOUTCOMMIT call returns,
>>> and a LAYOUTRETURN is put on the wire
>>>
>>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open
>>> context is different from the open context that was put by
>>> nfs_file_release, then
>>>
>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close
>>> and the return on close LAYOUTRETURN is sent again!
>>>
>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses
>>> the same stateid as the first LAYOUTRETURN, and the reply to the
>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>>
>>> This will occur whether the LAYOUTCOMMIT is async or sync as they both
>>> call pnfs_layoutcommit_done.
>>>
>>> I need to understand why there are two open contexts. On the face of
>>> it, it seems wrong.
>>>
>>> We also add a pointer to the open context in the nfs_write_data, and
>>> in the pnfs_layoutcommit_data. Do we take a reference on the open data
>>> in theses cases? .
>>>
>>> -->Andy
>>>
>>>
>>
>> Yes on all acounts. This is what Benny's patch is fixing please try it out
>> it is most important (Add my small fix)
>
> Is this what you mean?
>
> With Benny's patch, with return-on-close set and the layoutcommit_ctx
> different from the open context that caused the __nfs4_close:
>
> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
> __nfs4_close() will complete prior to calling the return-on-close
> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
> put_nfs_open_context will be call on the layoutconmit_ctx (saved in
> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.
>
> In _this_ instance of calling __nfs4_close (from
> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so
> the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout
> is called. Since it is also a sync call, pnfs_layoutcommit_done does
> not return until it is complete. So the layout is returned, the layout
> is freed (all layout segments 'cause the range covers the whole file)
> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
> nfs_file_release)
>
> Now since return-on-close is set, pnfs_layout_return is called (in the
> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
> on the wire because the first pnfs_layout_return (from the
> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
> and all is well.
>
> Sheese.
>
> --->Andy
>

Yes I have seen this as well. I went even farther then Benny's patch
and added the following hunk which works very well BTW, but I now understand
the it is not necessary, exactly as you described above.

After all this will settle, I have a pending patch that cleans this code up
by moving the hunk from __nfs4_close() into a pnfs_close_context() function in pnfs.c
(And empty if not 4.1) this way lots of duplicate code gets thrown away
for example the double lo_commit codes

---
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 06f20b9..1740648 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -739,6 +754,11 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
}

lo = get_lock_current_layout(nfsi);
+ if (wait && atomic_read(&nfsi->layout.lretcount)) {
+ printk(KERN_ERR "%s(%lx): wait lretcount=%d\n", __func__, ino->i_ino,
+ atomic_read(&nfsi->layout.lretcount));
+ drain_layoutreturns(&nfsi->layout);
+ }
if (lo && !has_layout_to_return(lo, &arg)) {
put_unlock_current_layout(lo);
lo = NULL;

2010-05-13 16:20:16

by Benny Halevy

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <[email protected]> wrote:
>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <[email protected]> wrote:
>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wrote:
>>>>>> I've tested the patch:
>>>>>> FIXME: async layout return
>>>>>>
>>>>>> And there is a missing small hunk
>>>>>>
>>>>>> I have tested with this patch and it is a very good patch
>>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>>> case when one inode have more then one open_context.
>>>>>
>>>>> Do you mean more than one open context per open owner?
>>>>
>>>> What we see is one "regular" open context and one which is the layout_commit_ctx
>>>
>>> Isn't that a BUG?
>>>
>>> Here is what we're seeing:
>>>
>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context->
>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode,
>>>
>>> This is the same code that Boaz's 'missing small hunk' adds the wait
>>> to the pnfs_layout_commit_inode to. This is good, because when
>>> __nfs4_close is called with sync, every thing must be sent/returned
>>> prior to the nfs_do_close call.
>>>
>>> But there is still a problem. Here is the __nfs4_close call (with out
>>> the wait that Boaz added)
>>>
>>> if (nfsi->layoutcommit_ctx)
>>> pnfs_layoutcommit_inode(state->inode, 0);
>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>> struct nfs4_pnfs_layout_segment range;
>>>
>>> range.iomode = nfsi->layout.roc_iomode;
>>> range.offset = 0;
>>> range.length = NFS4_MAX_UINT64;
>>> pnfs_return_layout(state->inode, &range, NULL,
>>> RETURN_FILE);
>>>
>>> Note that a pnfs_return_layout which is always 'sync' (async with wait
>>> for completion). So the (currently) async LAYOUTCOMMIT call returns,
>>> and a LAYOUTRETURN is put on the wire
>>>
>>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open
>>> context is different from the open context that was put by
>>> nfs_file_release, then
>>>
>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close
>>> and the return on close LAYOUTRETURN is sent again!
>>>
>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses
>>> the same stateid as the first LAYOUTRETURN, and the reply to the
>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>>
>>> This will occur whether the LAYOUTCOMMIT is async or sync as they both
>>> call pnfs_layoutcommit_done.
>>>
>>> I need to understand why there are two open contexts. On the face of
>>> it, it seems wrong.
>>>
>>> We also add a pointer to the open context in the nfs_write_data, and
>>> in the pnfs_layoutcommit_data. Do we take a reference on the open data
>>> in theses cases? .
>>>
>>> -->Andy
>>>
>>>
>>
>> Yes on all acounts. This is what Benny's patch is fixing please try it out
>> it is most important (Add my small fix)
>
> Is this what you mean?
>
> With Benny's patch, with return-on-close set and the layoutcommit_ctx
> different from the open context that caused the __nfs4_close:
>
> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
> __nfs4_close() will complete prior to calling the return-on-close
> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
> put_nfs_open_context will be call on the layoutconmit_ctx (saved in
> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.
>
> In _this_ instance of calling __nfs4_close (from
> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so
> the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout
> is called. Since it is also a sync call, pnfs_layoutcommit_done does
> not return until it is complete. So the layout is returned, the layout
> is freed (all layout segments 'cause the range covers the whole file)
> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
> nfs_file_release)
>
> Now since return-on-close is set, pnfs_layout_return is called (in the
> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
> on the wire because the first pnfs_layout_return (from the
> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
> and all is well.
>
> Sheese.

Yeah :-/
I'm feeling increasingly uncomfortable with the current implementation
and we're planning to come up with the state-machine based model
for the Bakeathon and test it there. Ideally, this will be stable enough
that we can get the gist of it into pnfs-submit.

Benny

>
> --->Andy
>
>
>>
>> And about the multiple contexts I don't understand as well but the fact of it
>> is that an nfs_inode has an list_head of open_contexts and we must deal properly
>> with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in
>> prev kernels it almost never trigger.)
>>
>> Benny's patch takes care of that and I've done heavy testing of it I can see cases
>> of more then one contexts and it works correctly.
>>
>> But I too would like to understand when is it that an inode can have more then
>> one open_context
>>
>> Boaz
>>
>>>
>>>>
>>>> Benny
>>>>
>>>>>
>>>>> -->Andy
>>>>>
>>>>>>
>>>>>> (For some reason I see that happening much more in 2.6.34
>>>>>> I don't understand why)
>>>>>>
>>>>>> Boaz
>>>>>> ---
>>>>>> git diff --stat -p -M
>>>>>> fs/nfs/nfs4state.c | 2 +-
>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>> index 15c8bc8..6dbe893 100644
>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>
>>>>>> if (nfsi->layoutcommit_ctx)
>>>>>> - pnfs_layoutcommit_inode(state->inode, 0);
>>>>>> + pnfs_layoutcommit_inode(state->inode, wait);
>>>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>>>>> struct nfs4_pnfs_layout_segment range;
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to [email protected]
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>>

2010-05-17 06:03:03

by Benny Halevy

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On May. 14, 2010, 6:28 +0300, Zhang Jingwang <[email protected]> wrote:
> 2010/5/14 Benny Halevy <[email protected]>:
>> On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>>> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <[email protected]> wrote:
>>>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>>>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <[email protected]> wrote:
>>>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>>>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wrote:
>>>>>>>> I've tested the patch:
>>>>>>>> FIXME: async layout return
>>>>>>>>
>>>>>>>> And there is a missing small hunk
>>>>>>>>
>>>>>>>> I have tested with this patch and it is a very good patch
>>>>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>>>>> case when one inode have more then one open_context.
>>>>>>>
>>>>>>> Do you mean more than one open context per open owner?
>>>>>>
>>>>>> What we see is one "regular" open context and one which is the layout_commit_ctx
>>>>>
>>>>> Isn't that a BUG?
>>>>>
>>>>> Here is what we're seeing:
>>>>>
>>>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context->
>>>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode,
>>>>>
>>>>> This is the same code that Boaz's 'missing small hunk' adds the wait
>>>>> to the pnfs_layout_commit_inode to. This is good, because when
>>>>> __nfs4_close is called with sync, every thing must be sent/returned
>>>>> prior to the nfs_do_close call.
>>>>>
>>>>> But there is still a problem. Here is the __nfs4_close call (with out
>>>>> the wait that Boaz added)
>>>>>
>>>>> if (nfsi->layoutcommit_ctx)
>>>>> pnfs_layoutcommit_inode(state->inode, 0);
>>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>>>> struct nfs4_pnfs_layout_segment range;
>>>>>
>>>>> range.iomode = nfsi->layout.roc_iomode;
>>>>> range.offset = 0;
>>>>> range.length = NFS4_MAX_UINT64;
>>>>> pnfs_return_layout(state->inode, &range, NULL,
>>>>> RETURN_FILE);
>>>>>
>>>>> Note that a pnfs_return_layout which is always 'sync' (async with wait
>>>>> for completion). So the (currently) async LAYOUTCOMMIT call returns,
>>>>> and a LAYOUTRETURN is put on the wire
>>>>>
>>>>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>>>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open
>>>>> context is different from the open context that was put by
>>>>> nfs_file_release, then
>>>>>
>>>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close
>>>>> and the return on close LAYOUTRETURN is sent again!
>>>>>
>>>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses
>>>>> the same stateid as the first LAYOUTRETURN, and the reply to the
>>>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>>>>
>>>>> This will occur whether the LAYOUTCOMMIT is async or sync as they both
>>>>> call pnfs_layoutcommit_done.
>>>>>
>>>>> I need to understand why there are two open contexts. On the face of
>>>>> it, it seems wrong.
>>>>>
>>>>> We also add a pointer to the open context in the nfs_write_data, and
>>>>> in the pnfs_layoutcommit_data. Do we take a reference on the open data
>>>>> in theses cases? .
>>>>>
>>>>> -->Andy
>>>>>
>>>>>
>>>>
>>>> Yes on all acounts. This is what Benny's patch is fixing please try it out
>>>> it is most important (Add my small fix)
>>>
>>> Is this what you mean?
>>>
>>> With Benny's patch, with return-on-close set and the layoutcommit_ctx
>>> different from the open context that caused the __nfs4_close:
>>>
>>> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
>>> __nfs4_close() will complete prior to calling the return-on-close
>>> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
>>> put_nfs_open_context will be call on the layoutconmit_ctx (saved in
>>> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.
>>>
>>> In _this_ instance of calling __nfs4_close (from
>>> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so
>>> the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout
>>> is called. Since it is also a sync call, pnfs_layoutcommit_done does
>>> not return until it is complete. So the layout is returned, the layout
>>> is freed (all layout segments 'cause the range covers the whole file)
>>> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
>>> nfs_file_release)
>>>
>>> Now since return-on-close is set, pnfs_layout_return is called (in the
>>> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
>>> on the wire because the first pnfs_layout_return (from the
>>> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
>>> and all is well.
>>>
>>> Sheese.
>>
>> Yeah :-/
>> I'm feeling increasingly uncomfortable with the current implementation
>> and we're planning to come up with the state-machine based model
>> for the Bakeathon and test it there. Ideally, this will be stable enough
>> that we can get the gist of it into pnfs-submit.
>>
>> Benny
>>
>
> Is there any document about the state-machine based model?

Here: http://linux-nfs.org/pipermail/pnfs/attachments/20100304/f90bede4/attachment.txt

>>>
>>> --->Andy
>>>
>>>
>>>>
>>>> And about the multiple contexts I don't understand as well but the fact of it
>>>> is that an nfs_inode has an list_head of open_contexts and we must deal properly
>>>> with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in
>>>> prev kernels it almost never trigger.)
>>>>
>>>> Benny's patch takes care of that and I've done heavy testing of it I can see cases
>>>> of more then one contexts and it works correctly.
>>>>
>>>> But I too would like to understand when is it that an inode can have more then
>>>> one open_context
>>>>
>>>> Boaz
>>>>
>>>>>
>>>>>>
>>>>>> Benny
>>>>>>
>>>>>>>
>>>>>>> -->Andy
>>>>>>>
>>>>>>>>
>>>>>>>> (For some reason I see that happening much more in 2.6.34
>>>>>>>> I don't understand why)
>>>>>>>>
>>>>>>>> Boaz
>>>>>>>> ---
>>>>>>>> git diff --stat -p -M
>>>>>>>> fs/nfs/nfs4state.c | 2 +-
>>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>>>> index 15c8bc8..6dbe893 100644
>>>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
>>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>>>
>>>>>>>> if (nfsi->layoutcommit_ctx)
>>>>>>>> - pnfs_layoutcommit_inode(state->inode, 0);
>>>>>>>> + pnfs_layoutcommit_inode(state->inode, wait);
>>>>>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>>>>>>> struct nfs4_pnfs_layout_segment range;
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>>>> the body of a message to [email protected]
>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>

2010-05-13 16:04:49

by Andy Adamson

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <[email protected]> wrote:
> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <[email protected]> wrote:
>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wrote:
>>>>> I've tested the patch:
>>>>> ? ? ? ?FIXME: async layout return
>>>>>
>>>>> And there is a missing small hunk
>>>>>
>>>>> I have tested with this patch and it is a very good patch
>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>> case when one inode have more then one open_context.
>>>>
>>>> Do you mean more than one open context per open owner?
>>>
>>> What we see is one "regular" open context and one which is the layout_commit_ctx
>>
>> Isn't that a BUG?
>>
>> Here is what we're seeing:
>>
>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context->
>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode,
>>
>> This is the same code that Boaz's 'missing small hunk' adds the wait
>> to the pnfs_layout_commit_inode to. This is good, because when
>> __nfs4_close is called with sync, every thing must be sent/returned
>> prior to the nfs_do_close call.
>>
>> But there is still a problem. Here is the __nfs4_close call (with out
>> the wait that Boaz added)
>>
>> ? ? ? ? ? ? ? ? if (nfsi->layoutcommit_ctx)
>> ? ? ? ? ? ? ? ? ? ? ? ? pnfs_layoutcommit_inode(state->inode, 0);
>> ? ? ? ? ? ? ? ? if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>> ? ? ? ? ? ? ? ? ? ? ? ? struct nfs4_pnfs_layout_segment range;
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? range.iomode = nfsi->layout.roc_iomode;
>> ? ? ? ? ? ? ? ? ? ? ? ? range.offset = 0;
>> ? ? ? ? ? ? ? ? ? ? ? ? range.length = NFS4_MAX_UINT64;
>> ? ? ? ? ? ? ? ? ? ? ? ? pnfs_return_layout(state->inode, &range, NULL,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?RETURN_FILE);
>>
>> Note that a pnfs_return_layout which is always 'sync' (async with wait
>> for completion). So the (currently) async LAYOUTCOMMIT call returns,
>> and a LAYOUTRETURN is put on the wire
>>
>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open
>> context is different from the open context that was put by
>> nfs_file_release, then
>>
>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close
>> and the return on close LAYOUTRETURN is sent again!
>>
>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses
>> the same stateid as the first LAYOUTRETURN, and the reply to the
>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>
>> This will occur whether the LAYOUTCOMMIT is async or sync as they both
>> call pnfs_layoutcommit_done.
>>
>> I need to understand why there are two open contexts. On the face of
>> it, it seems wrong.
>>
>> We also add a pointer to the open context in the nfs_write_data, and
>> in the pnfs_layoutcommit_data. Do we take a reference on the open data
>> in theses cases? .
>>
>> -->Andy
>>
>>
>
> Yes on all acounts. This is what Benny's patch is fixing please try it out
> it is most important (Add my small fix)

Is this what you mean?

With Benny's patch, with return-on-close set and the layoutcommit_ctx
different from the open context that caused the __nfs4_close:

The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
__nfs4_close() will complete prior to calling the return-on-close
LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
put_nfs_open_context will be call on the layoutconmit_ctx (saved in
pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.

In _this_ instance of calling __nfs4_close (from
pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so
the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout
is called. Since it is also a sync call, pnfs_layoutcommit_done does
not return until it is complete. So the layout is returned, the layout
is freed (all layout segments 'cause the range covers the whole file)
and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
nfs_file_release)

Now since return-on-close is set, pnfs_layout_return is called (in the
nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
on the wire because the first pnfs_layout_return (from the
pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
and all is well.

Sheese.

--->Andy


>
> And about the multiple contexts I don't understand as well but the fact of it
> is that an nfs_inode has an list_head of open_contexts and we must deal properly
> with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in
> prev kernels it almost never trigger.)
>
> Benny's patch takes care of that and I've done heavy testing of it I can see cases
> of more then one contexts and it works correctly.
>
> But I too would like to understand when is it that an inode can have more then
> one open_context
>
> Boaz
>
>>
>>>
>>> Benny
>>>
>>>>
>>>> -->Andy
>>>>
>>>>>
>>>>> (For some reason I see that happening much more in 2.6.34
>>>>> ?I don't understand why)
>>>>>
>>>>> Boaz
>>>>> ---
>>>>> git diff --stat -p -M
>>>>> ?fs/nfs/nfs4state.c | ? ?2 +-
>>>>> ?1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>> index 15c8bc8..6dbe893 100644
>>>>> --- a/fs/nfs/nfs4state.c
>>>>> +++ b/fs/nfs/nfs4state.c
>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
>>>>> ? ? ? ? ? ? ? ?struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>
>>>>> ? ? ? ? ? ? ? ?if (nfsi->layoutcommit_ctx)
>>>>> - ? ? ? ? ? ? ? ? ? ? ? pnfs_layoutcommit_inode(state->inode, 0);
>>>>> + ? ? ? ? ? ? ? ? ? ? ? pnfs_layoutcommit_inode(state->inode, wait);
>>>>> ? ? ? ? ? ? ? ?if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?struct nfs4_pnfs_layout_segment range;
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
>

2010-05-13 14:07:24

by Andy Adamson

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

Hi Boaz

I've been chasing this bug. Where is the rest of the FIXME: async
layout return patch?
I can't find it.

Thanks

-->Andy

On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wr=
ote:
> I've tested the patch:
> =A0 =A0 =A0 =A0FIXME: async layout return
>
> And there is a missing small hunk
>
> I have tested with this patch and it is a very good patch
> that should also go into 2.6.33. It is necessary in the rare
> case when one inode have more then one open_context.
>
> (For some reason I see that happening much more in 2.6.34
> =A0I don't understand why)
>
> Boaz
> ---
> git diff --stat -p -M
> =A0fs/nfs/nfs4state.c | =A0 =A02 +-
> =A01 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 15c8bc8..6dbe893 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struc=
t nfs4_state *state, fmode_t fm
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I(state=
->inode);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode=
(state->inode, 0);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode=
(state->inode, wait);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->layout.r=
oc_iomode) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs_layou=
t_segment range;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2010-05-13 14:19:43

by Andy Adamson

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wr=
ote:
> I've tested the patch:
> =A0 =A0 =A0 =A0FIXME: async layout return
>
> And there is a missing small hunk
>
> I have tested with this patch and it is a very good patch
> that should also go into 2.6.33. It is necessary in the rare
> case when one inode have more then one open_context.

Do you mean more than one open context per open owner?

-->Andy

>
> (For some reason I see that happening much more in 2.6.34
> =A0I don't understand why)
>
> Boaz
> ---
> git diff --stat -p -M
> =A0fs/nfs/nfs4state.c | =A0 =A02 +-
> =A01 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 15c8bc8..6dbe893 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struc=
t nfs4_state *state, fmode_t fm
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I(state=
->inode);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode=
(state->inode, 0);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode=
(state->inode, wait);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->layout.r=
oc_iomode) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs_layou=
t_segment range;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2010-05-13 14:39:12

by Benny Halevy

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On May. 13, 2010, 17:07 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
> Hi Boaz
>
> I've been chasing this bug. Where is the rest of the FIXME: async
> layout return patch?
> I can't find it.

Sorry, it's not released yet.
I wanted to do more testing with it to make sure
it does the right thing.

Here it is:

>From 8052205dab7c8493bbd5837997ba79d95c8151cc Mon Sep 17 00:00:00 2001
From: Benny Halevy <[email protected]>
Date: Wed, 14 Apr 2010 19:05:13 +0300
Subject: [PATCH] FIXME: async layout return

FIXME: currently there's only one path that requires async layout return
to avoid blocking of the rpciod like the stack below.

We really neee the pnfs state machine instead.

rpciod/0 D 0000003d8c2d83b0 0 1029 2 0x00000000
778ccdd0 60264f00 77a6a000 778ccb20 77a6ba20 6001375b 77a6ba20 77a6a000
77a6a000 778cc8a0 77a6ba70 601b192f 77a6ba50 601b35a6 6ed537f0 77a6a000
77a6bb00 63506fd8 7b1870c8 77a6bb10 77a6ba90 7b187100 63506fd8 77a6a000
Call Trace:
77a6b9f8: [<6001375b>] _switch_to+0x5e/0xae
77a6ba28: [<601b192f>] schedule+0x1dd/0x21b
77a6ba38: [<601b35a6>] _raw_spin_unlock_irqrestore+0x18/0x1c
77a6ba60: [<7b1870c8>] rpc_wait_bit_killable+0x0/0x3c [sunrpc]
77a6ba78: [<7b187100>] rpc_wait_bit_killable+0x38/0x3c [sunrpc]
77a6ba98: [<601b1d9f>] __wait_on_bit+0x43/0x76
77a6bae8: [<601b1e43>] out_of_line_wait_on_bit+0x71/0x7c
77a6baf8: [<7b1870c8>] rpc_wait_bit_killable+0x0/0x3c [sunrpc]
77a6bb20: [<600430d6>] wake_bit_function+0x0/0x2e
77a6bb68: [<7b18709d>] __rpc_wait_for_completion_task+0x34/0x36 [sunrpc]
77a6bb78: [<7bf36048>] nfs4_wait_for_completion_rpc_task+0xb/0xd [nfs]
77a6bb88: [<7bf369ac>] pnfs4_proc_layoutreturn+0xae/0xe6 [nfs]
77a6bbc8: [<6007c8b1>] kmem_cache_alloc+0xaf/0xbe
77a6bc08: [<7bf4dc39>] _pnfs_return_layout+0x450/0x4d4 [nfs]
77a6bc38: [<7bf3cb79>] decode_attr_time+0x17/0x4d [nfs]
77a6bc58: [<7bf429d7>] decode_getfattr+0xaae/0xc02 [nfs]
77a6bcb8: [<7bf4601b>] __nfs4_close+0x15d/0x17a [nfs]
77a6bd28: [<7bf46053>] nfs4_close_state+0xb/0xd [nfs]
77a6bd38: [<7bf34183>] nfs4_close_context+0x26/0x28 [nfs]
77a6bd48: [<7bf2315c>] __put_nfs_open_context+0x79/0xa1 [nfs]
77a6bd78: [<7bf23227>] put_nfs_open_context+0xb/0xd [nfs]
77a6bd88: [<7bf4b763>] pnfs_layoutcommit_done+0xa5/0xad [nfs]
77a6bdb8: [<7bf3b12e>] pnfs_layoutcommit_rpc_done+0x39/0x56 [nfs]
77a6bde8: [<7b18712b>] rpc_exit_task+0x27/0x52 [sunrpc]
77a6be08: [<7b187817>] __rpc_execute+0x88/0x23a [sunrpc]
77a6be30: [<7b1879ee>] rpc_async_schedule+0x0/0x12 [sunrpc]
77a6be48: [<7b1879fe>] rpc_async_schedule+0x10/0x12 [sunrpc]
77a6be58: [<6003fc45>] worker_thread+0x114/0x1a5
77a6be80: [<600430a2>] autoremove_wake_function+0x0/0x34
77a6beb8: [<6003fb31>] worker_thread+0x0/0x1a5
77a6bed8: [<60042cf7>] kthread+0x8e/0x96
77a6bf48: [<60021429>] run_kernel_thread+0x41/0x4a
77a6bf58: [<60042c69>] kthread+0x0/0x96
77a6bf98: [<60021410>] run_kernel_thread+0x28/0x4a
77a6bfc8: [<600136d3>] new_thread_handler+0x71/0x9b

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfs/callback_proc.c | 7 ++++---
fs/nfs/inode.c | 2 +-
fs/nfs/nfs4proc.c | 17 ++++++++++-------
fs/nfs/nfs4state.c | 2 +-
fs/nfs/pnfs.c | 21 +++++++++++++--------
fs/nfs/pnfs.h | 9 +++++----
6 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index ebf86df..33ef5c0 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -214,7 +214,7 @@ static int pnfs_recall_layout(void *data)

if (rl.cbl_recall_type == RETURN_FILE) {
status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
- RETURN_FILE);
+ RETURN_FILE, true);
if (status)
dprintk("%s RETURN_FILE error: %d\n", __func__, status);
goto out;
@@ -226,12 +226,13 @@ static int pnfs_recall_layout(void *data)
/* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */
while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
/* XXX need to check status on pnfs_return_layout */
- pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE);
+ pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, true);
iput(ino);
}

/* send final layoutreturn */
- status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, rl.cbl_recall_type);
+ status = pnfs_return_layout(inode, &rl.cbl_seg, NULL,
+ rl.cbl_recall_type, true);
if (status)
printk(KERN_INFO "%s: ignoring pnfs_return_layout status=%d\n",
__func__, status);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f0b8676..42fac75 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1320,7 +1320,7 @@ void nfs4_clear_inode(struct inode *inode)
/* First call standard NFS clear_inode() code */
nfs_clear_inode(inode);
#ifdef CONFIG_NFS_V4_1
- pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
+ pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, true);
#endif /* CONFIG_NFS_V4_1 */
}
#endif
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 82cd2ea..c01ecd7 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1081,7 +1081,8 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
/* FIXME: send gratuitous layout commits and return with the reclaim
* flag during grace period
*/
- pnfs_return_layout(state->inode, NULL, &state->open_stateid, RETURN_FILE);
+ pnfs_return_layout(state->inode, NULL, &state->open_stateid,
+ RETURN_FILE, true);
pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
#endif /* CONFIG_NFS_V4_1 */
}
@@ -2375,7 +2376,7 @@ pnfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,

if (pnfs_enabled_sb(server) && has_layout(nfsi) &&
pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld))
- pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
+ pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, true);
return nfs4_proc_setattr(dentry, fattr, sattr);
}
#endif /* CONFIG_NFS_V4_1 */
@@ -5736,7 +5737,7 @@ static const struct rpc_call_ops nfs4_pnfs_layoutreturn_call_ops = {
.rpc_release = nfs4_pnfs_layoutreturn_release,
};

-int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp)
+int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait)
{
struct inode *ino = lrp->args.inode;
struct nfs_server *server = NFS_SERVER(ino);
@@ -5753,7 +5754,7 @@ int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp)
.callback_data = lrp,
.flags = RPC_TASK_ASYNC,
};
- int status;
+ int status = 0;

dprintk("--> %s\n", __func__);
lrp->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
@@ -5762,9 +5763,11 @@ int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp)
status = PTR_ERR(task);
goto out;
}
- status = nfs4_wait_for_completion_rpc_task(task);
- if (status == 0)
- status = task->tk_status;
+ if (wait) {
+ status = nfs4_wait_for_completion_rpc_task(task);
+ if (status == 0)
+ status = task->tk_status;
+ }
rpc_put_task(task);
out:
dprintk("<-- %s\n", __func__);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 15c8bc8..cfaa1be 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
range.offset = 0;
range.length = NFS4_MAX_UINT64;
pnfs_return_layout(state->inode, &range, NULL,
- RETURN_FILE);
+ RETURN_FILE, wait);
}
#endif /* CONFIG_NFS_V4_1 */
nfs4_do_close(path, state, wait);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3739c38..06f20b9 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -675,7 +675,8 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
static int
return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type type, struct pnfs_layout_type *lo)
+ enum pnfs_layoutreturn_type type, struct pnfs_layout_type *lo,
+ bool wait)
{
struct nfs4_pnfs_layoutreturn *lrp;
struct nfs_server *server = NFS_SERVER(ino);
@@ -700,7 +701,7 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
else if (lo)
pnfs_get_layout_stateid(&lrp->args.stateid, lo);

- status = pnfs4_proc_layoutreturn(lrp);
+ status = pnfs4_proc_layoutreturn(lrp, wait);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
@@ -709,7 +710,8 @@ out:
int
_pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type type)
+ enum pnfs_layoutreturn_type type,
+ bool wait)
{
struct pnfs_layout_type *lo = NULL;
struct nfs_inode *nfsi = NFS_I(ino);
@@ -727,7 +729,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
}
if (type == RETURN_FILE) {
if (nfsi->layoutcommit_ctx) {
- status = pnfs_layoutcommit_inode(ino, 1);
+ status = pnfs_layoutcommit_inode(ino, wait);
if (status) {
dprintk("%s: layoutcommit failed, status=%d. "
"Returning layout anyway\n",
@@ -765,7 +767,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
}
}
send_return:
- status = return_layout(ino, &arg, stateid, type, lo);
+ status = return_layout(ino, &arg, stateid, type, lo, wait);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
@@ -1680,7 +1682,8 @@ pnfs_writeback_done(struct nfs_write_data *data)
.length = data->args.count,
};
dprintk("%s: retrying\n", __func__);
- _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
+ _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
+ true);
pnfs_initiate_write(data, NFS_CLIENT(data->inode),
pdata->call_ops, pdata->how);
}
@@ -1811,7 +1814,8 @@ pnfs_read_done(struct nfs_read_data *data)
.length = data->args.count,
};
dprintk("%s: retrying\n", __func__);
- _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
+ _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
+ true);
pnfs_initiate_read(data, NFS_CLIENT(data->inode),
pdata->call_ops);
}
@@ -2037,7 +2041,8 @@ pnfs_commit_done(struct nfs_write_data *data)
.length = data->args.count,
};
dprintk("%s: retrying\n", __func__);
- _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
+ _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
+ true);
pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
pdata->call_ops, pdata->how);
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 47160c5..524a7cd 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -29,7 +29,7 @@ extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
struct pnfs_device *dev);
extern int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp);
extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data);
-extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp);
+extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait);

/* pnfs.c */
extern const nfs4_stateid zero_stateid;
@@ -40,7 +40,7 @@ int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,

int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type);
+ enum pnfs_layoutreturn_type, bool wait);
void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *mntfh, u32 id);
void unmount_pnfs_layoutdriver(struct nfs_server *);
int pnfs_use_read(struct inode *inode, ssize_t count);
@@ -247,14 +247,15 @@ static inline void pnfs_modify_new_request(struct nfs_page *req,
static inline int pnfs_return_layout(struct inode *ino,
struct nfs4_pnfs_layout_segment *lseg,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type type)
+ enum pnfs_layoutreturn_type type,
+ bool wait)
{
struct nfs_inode *nfsi = NFS_I(ino);
struct nfs_server *nfss = NFS_SERVER(ino);

if (pnfs_enabled_sb(nfss) &&
(type != RETURN_FILE || has_layout(nfsi)))
- return _pnfs_return_layout(ino, lseg, stateid, type);
+ return _pnfs_return_layout(ino, lseg, stateid, type, wait);

return 0;
}
--
1.6.6.1


2010-05-13 14:40:59

by Benny Halevy

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wrote:
>> I've tested the patch:
>> FIXME: async layout return
>>
>> And there is a missing small hunk
>>
>> I have tested with this patch and it is a very good patch
>> that should also go into 2.6.33. It is necessary in the rare
>> case when one inode have more then one open_context.
>
> Do you mean more than one open context per open owner?

What we see is one "regular" open context and one which is the layout_commit_ctx

Benny

>
> -->Andy
>
>>
>> (For some reason I see that happening much more in 2.6.34
>> I don't understand why)
>>
>> Boaz
>> ---
>> git diff --stat -p -M
>> fs/nfs/nfs4state.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 15c8bc8..6dbe893 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>
>> if (nfsi->layoutcommit_ctx)
>> - pnfs_layoutcommit_inode(state->inode, 0);
>> + pnfs_layoutcommit_inode(state->inode, wait);
>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>> struct nfs4_pnfs_layout_segment range;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

2010-05-13 15:17:59

by Andy Adamson

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <[email protected]> wr=
ote:
> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsada=
[email protected]> wrote:
>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]>=
wrote:
>>> I've tested the patch:
>>> =A0 =A0 =A0 =A0FIXME: async layout return
>>>
>>> And there is a missing small hunk
>>>
>>> I have tested with this patch and it is a very good patch
>>> that should also go into 2.6.33. It is necessary in the rare
>>> case when one inode have more then one open_context.
>>
>> Do you mean more than one open context per open owner?
>
> What we see is one "regular" open context and one which is the layout=
_commit_ctx

Isn't that a BUG?

Here is what we're seeing:

nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_o=
pen_context->
NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_la=
youtcommit_inode,

This is the same code that Boaz's 'missing small hunk' adds the wait
to the pnfs_layout_commit_inode to. This is good, because when
__nfs4_close is called with sync, every thing must be sent/returned
prior to the nfs_do_close call.

But there is still a problem. Here is the __nfs4_close call (with out
the wait that Boaz added)

if (nfsi->layoutcommit_ctx)
pnfs_layoutcommit_inode(state->inode, 0);
if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
struct nfs4_pnfs_layout_segment range;

range.iomode =3D nfsi->layout.roc_iomode;
range.offset =3D 0;
range.length =3D NFS4_MAX_UINT64;
pnfs_return_layout(state->inode, &range, NULL,
RETURN_FILE);

Note that a pnfs_return_layout which is always 'sync' (async with wait
for completion). So the (currently) async LAYOUTCOMMIT call returns,
and a LAYOUTRETURN is put on the wire

Then the LAYOUTCOMMIT rpc_call_done routine calls
pnfs_layoutcommit_done which calls put_nfs_open_context. If the open
context is different from the open context that was put by
nfs_file_release, then

pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close
and the return on close LAYOUTRETURN is sent again!

Of couse this second LAYOUTRETURN either gets a zero stateid, or uses
the same stateid as the first LAYOUTRETURN, and the reply to the
second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....

This will occur whether the LAYOUTCOMMIT is async or sync as they both
call pnfs_layoutcommit_done.

I need to understand why there are two open contexts. On the face of
it, it seems wrong.

We also add a pointer to the open context in the nfs_write_data, and
in the pnfs_layoutcommit_data. Do we take a reference on the open data
in theses cases? .

-->Andy



>
> Benny
>
>>
>> -->Andy
>>
>>>
>>> (For some reason I see that happening much more in 2.6.34
>>> =A0I don't understand why)
>>>
>>> Boaz
>>> ---
>>> git diff --stat -p -M
>>> =A0fs/nfs/nfs4state.c | =A0 =A02 +-
>>> =A01 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index 15c8bc8..6dbe893 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, str=
uct nfs4_state *state, fmode_t fm
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I(sta=
te->inode);
>>>
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx)
>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_ino=
de(state->inode, 0);
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_ino=
de(state->inode, wait);
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->layout=
=2Eroc_iomode) {
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs_lay=
out_segment range;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs=
" in
>>> the body of a message to [email protected]
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>

2010-05-13 15:31:23

by Boaz Harrosh

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <[email protected]> wrote:
>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <[email protected]> wrote:
>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <[email protected]> wrote:
>>>> I've tested the patch:
>>>> FIXME: async layout return
>>>>
>>>> And there is a missing small hunk
>>>>
>>>> I have tested with this patch and it is a very good patch
>>>> that should also go into 2.6.33. It is necessary in the rare
>>>> case when one inode have more then one open_context.
>>>
>>> Do you mean more than one open context per open owner?
>>
>> What we see is one "regular" open context and one which is the layout_commit_ctx
>
> Isn't that a BUG?
>
> Here is what we're seeing:
>
> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context->
> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode,
>
> This is the same code that Boaz's 'missing small hunk' adds the wait
> to the pnfs_layout_commit_inode to. This is good, because when
> __nfs4_close is called with sync, every thing must be sent/returned
> prior to the nfs_do_close call.
>
> But there is still a problem. Here is the __nfs4_close call (with out
> the wait that Boaz added)
>
> if (nfsi->layoutcommit_ctx)
> pnfs_layoutcommit_inode(state->inode, 0);
> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
> struct nfs4_pnfs_layout_segment range;
>
> range.iomode = nfsi->layout.roc_iomode;
> range.offset = 0;
> range.length = NFS4_MAX_UINT64;
> pnfs_return_layout(state->inode, &range, NULL,
> RETURN_FILE);
>
> Note that a pnfs_return_layout which is always 'sync' (async with wait
> for completion). So the (currently) async LAYOUTCOMMIT call returns,
> and a LAYOUTRETURN is put on the wire
>
> Then the LAYOUTCOMMIT rpc_call_done routine calls
> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open
> context is different from the open context that was put by
> nfs_file_release, then
>
> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close
> and the return on close LAYOUTRETURN is sent again!
>
> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses
> the same stateid as the first LAYOUTRETURN, and the reply to the
> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>
> This will occur whether the LAYOUTCOMMIT is async or sync as they both
> call pnfs_layoutcommit_done.
>
> I need to understand why there are two open contexts. On the face of
> it, it seems wrong.
>
> We also add a pointer to the open context in the nfs_write_data, and
> in the pnfs_layoutcommit_data. Do we take a reference on the open data
> in theses cases? .
>
> -->Andy
>
>

Yes on all acounts. This is what Benny's patch is fixing please try it out
it is most important (Add my small fix)

And about the multiple contexts I don't understand as well but the fact of it
is that an nfs_inode has an list_head of open_contexts and we must deal properly
with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in
prev kernels it almost never trigger.)

Benny's patch takes care of that and I've done heavy testing of it I can see cases
of more then one contexts and it works correctly.

But I too would like to understand when is it that an inode can have more then
one open_context

Boaz

>
>>
>> Benny
>>
>>>
>>> -->Andy
>>>
>>>>
>>>> (For some reason I see that happening much more in 2.6.34
>>>> I don't understand why)
>>>>
>>>> Boaz
>>>> ---
>>>> git diff --stat -p -M
>>>> fs/nfs/nfs4state.c | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index 15c8bc8..6dbe893 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>
>>>> if (nfsi->layoutcommit_ctx)
>>>> - pnfs_layoutcommit_inode(state->inode, 0);
>>>> + pnfs_layoutcommit_inode(state->inode, wait);
>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
>>>> struct nfs4_pnfs_layout_segment range;
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2010-05-14 03:28:47

by Zhang Jingwang

[permalink] [raw]
Subject: Re: SQUASHME: missing from FIXME: async layout return

2010/5/14 Benny Halevy <[email protected]>:
> On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <androsada=
[email protected]> wrote:
>> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <[email protected]=
> wrote:
>>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy-C4P08NqkoRmIwRZHo2/[email protected]=
m> wrote:
>>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <andro=
[email protected]> wrote:
>>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.=
com> wrote:
>>>>>>> I've tested the patch:
>>>>>>> =A0 =A0 =A0 =A0FIXME: async layout return
>>>>>>>
>>>>>>> And there is a missing small hunk
>>>>>>>
>>>>>>> I have tested with this patch and it is a very good patch
>>>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>>>> case when one inode have more then one open_context.
>>>>>>
>>>>>> Do you mean more than one open context per open owner?
>>>>>
>>>>> What we see is one "regular" open context and one which is the la=
yout_commit_ctx
>>>>
>>>> Isn't that a BUG?
>>>>
>>>> Here is what we're seeing:
>>>>
>>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_=
nfs_open_context->
>>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pn=
fs_layoutcommit_inode,
>>>>
>>>> This is the same code that Boaz's 'missing small hunk' adds the wa=
it
>>>> to the pnfs_layout_commit_inode to. This is good, because when
>>>> __nfs4_close is called with sync, every thing must be sent/returne=
d
>>>> prior to the nfs_do_close call.
>>>>
>>>> But there is still a problem. Here is the __nfs4_close call (with =
out
>>>> the wait that Boaz added)
>>>>
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nfsi->layoutcommit_ctx)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_=
inode(state->inode, 0);
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (has_layout(nfsi) && nfsi->layo=
ut.roc_iomode) {
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_pnfs_l=
ayout_segment range;
>>>>
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.iomode =3D n=
fsi->layout.roc_iomode;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.offset =3D 0=
;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.length =3D N=
=46S4_MAX_UINT64;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_return_layout=
(state->inode, &range, NULL,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0RETURN_FILE);
>>>>
>>>> Note that a pnfs_return_layout which is always 'sync' (async with =
wait
>>>> for completion). So the (currently) async LAYOUTCOMMIT call return=
s,
>>>> and a LAYOUTRETURN is put on the wire
>>>>
>>>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the op=
en
>>>> context is different from the open context that was put by
>>>> nfs_file_release, then
>>>>
>>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_clos=
e
>>>> and the return on close LAYOUTRETURN is sent again!
>>>>
>>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or u=
ses
>>>> the same stateid as the first LAYOUTRETURN, and the reply to the
>>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>>>
>>>> This will occur whether the LAYOUTCOMMIT is async or sync as they =
both
>>>> call pnfs_layoutcommit_done.
>>>>
>>>> I need to understand why there are two open contexts. On the face =
of
>>>> it, it seems wrong.
>>>>
>>>> We also add a pointer to the open context in the nfs_write_data, a=
nd
>>>> in the pnfs_layoutcommit_data. Do we take a reference on the open =
data
>>>> in theses cases? .
>>>>
>>>> -->Andy
>>>>
>>>>
>>>
>>> Yes on all acounts. This is what Benny's patch is fixing please try=
it out
>>> it is most important (Add my small fix)
>>
>> Is this what you mean?
>>
>> With Benny's patch, with return-on-close set and the layoutcommit_ct=
x
>> different from the open context that caused the __nfs4_close:
>>
>> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
>> __nfs4_close() will complete prior to calling the return-on-close
>> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
>> put_nfs_open_context will be call on the layoutconmit_ctx (saved in
>> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.
>>
>> In _this_ instance of calling __nfs4_close (from
>> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null =
so
>> the pnfs_layoutcommit_inode call is skipped, and =A0pnfs_return_layo=
ut
>> is called. Since it is also a sync call, pnfs_layoutcommit_done does
>> not return until it is complete. So the layout is returned, the layo=
ut
>> is freed (all layout segments 'cause the range covers the whole file=
)
>> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
>> nfs_file_release)
>>
>> Now since return-on-close is set, pnfs_layout_return is called (in t=
he
>> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
>> on the wire because the first pnfs_layout_return (from the
>> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
>> and all is well.
>>
>> Sheese.
>
> Yeah :-/
> I'm feeling increasingly uncomfortable with the current implementatio=
n
> and we're planning to come up with the state-machine based model
> for the Bakeathon and test it there. =A0Ideally, this will be stable =
enough
> that we can get the gist of it into pnfs-submit.
>
> Benny
>

Is there any document about the state-machine based model?
>>
>> --->Andy
>>
>>
>>>
>>> And about the multiple contexts I don't understand as well but the =
fact of it
>>> is that an nfs_inode has an list_head of open_contexts and we must =
deal properly
>>> with when that happens. (BTW with 2.6.34 Kernel it happens regularl=
y as in
>>> prev kernels it almost never trigger.)
>>>
>>> Benny's patch takes care of that and I've done heavy testing of it =
I can see cases
>>> of more then one contexts and it works correctly.
>>>
>>> But I too would like to understand when is it that an inode can hav=
e more then
>>> one open_context
>>>
>>> Boaz
>>>
>>>>
>>>>>
>>>>> Benny
>>>>>
>>>>>>
>>>>>> -->Andy
>>>>>>
>>>>>>>
>>>>>>> (For some reason I see that happening much more in 2.6.34
>>>>>>> =A0I don't understand why)
>>>>>>>
>>>>>>> Boaz
>>>>>>> ---
>>>>>>> git diff --stat -p -M
>>>>>>> =A0fs/nfs/nfs4state.c | =A0 =A02 +-
>>>>>>> =A01 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>>> index 15c8bc8..6dbe893 100644
>>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path,=
struct nfs4_state *state, fmode_t fm
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I=
(state->inode);
>>>>>>>
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx)
>>>>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit=
_inode(state->inode, 0);
>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit=
_inode(state->inode, wait);
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->la=
yout.roc_iomode) {
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs=
_layout_segment range;
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux=
-nfs" in
>>>>>>> the body of a message to [email protected]
>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info=
=2Ehtml
>>>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nf=
s" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
>>>>
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Zhang Jingwang
National Research Centre for High Performance Computers
Institute of Computing Technology, Chinese Academy of Sciences
No. 6, South Kexueyuan Road, Haidian District
Beijing, China