Return-Path: Received: from mail-gw0-f46.google.com ([74.125.83.46]:57119 "EHLO mail-gw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932281Ab0EMQEt convert rfc822-to-8bit (ORCPT ); Thu, 13 May 2010 12:04:49 -0400 Received: by gwj19 with SMTP id 19so704026gwj.19 for ; Thu, 13 May 2010 09:04:47 -0700 (PDT) In-Reply-To: <4BEC1B43.2040901@panasas.com> References: <4BEBBAE9.8090605@panasas.com> <4BEC0F78.5060203@panasas.com> <4BEC1B43.2040901@panasas.com> Date: Thu, 13 May 2010 12:04:46 -0400 Message-ID: Subject: Re: SQUASHME: missing from FIXME: async layout return From: "William A. (Andy) Adamson" To: Boaz Harrosh Cc: Benny Halevy , NFS list Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh wrote: > On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: >> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy wrote: >>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" wrote: >>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh wrote: >>>>> I've tested the patch: >>>>> ? ? ? ?FIXME: async layout return >>>>> >>>>> And there is a missing small hunk >>>>> >>>>> I have tested with this patch and it is a very good patch >>>>> that should also go into 2.6.33. It is necessary in the rare >>>>> case when one inode have more then one open_context. >>>> >>>> Do you mean more than one open context per open owner? >>> >>> What we see is one "regular" open context and one which is the layout_commit_ctx >> >> Isn't that a BUG? >> >> Here is what we're seeing: >> >> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context-> >> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode, >> >> This is the same code that Boaz's 'missing small hunk' adds the wait >> to the pnfs_layout_commit_inode to. This is good, because when >> __nfs4_close is called with sync, every thing must be sent/returned >> prior to the nfs_do_close call. >> >> But there is still a problem. Here is the __nfs4_close call (with out >> the wait that Boaz added) >> >> ? ? ? ? ? ? ? ? if (nfsi->layoutcommit_ctx) >> ? ? ? ? ? ? ? ? ? ? ? ? pnfs_layoutcommit_inode(state->inode, 0); >> ? ? ? ? ? ? ? ? if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >> ? ? ? ? ? ? ? ? ? ? ? ? struct nfs4_pnfs_layout_segment range; >> >> ? ? ? ? ? ? ? ? ? ? ? ? range.iomode = nfsi->layout.roc_iomode; >> ? ? ? ? ? ? ? ? ? ? ? ? range.offset = 0; >> ? ? ? ? ? ? ? ? ? ? ? ? range.length = NFS4_MAX_UINT64; >> ? ? ? ? ? ? ? ? ? ? ? ? pnfs_return_layout(state->inode, &range, NULL, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?RETURN_FILE); >> >> Note that a pnfs_return_layout which is always 'sync' (async with wait >> for completion). So the (currently) async LAYOUTCOMMIT call returns, >> and a LAYOUTRETURN is put on the wire >> >> Then the LAYOUTCOMMIT rpc_call_done routine calls >> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open >> context is different from the open context that was put by >> nfs_file_release, then >> >> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close >> and the return on close LAYOUTRETURN is sent again! >> >> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses >> the same stateid as the first LAYOUTRETURN, and the reply to the >> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... >> >> This will occur whether the LAYOUTCOMMIT is async or sync as they both >> call pnfs_layoutcommit_done. >> >> I need to understand why there are two open contexts. On the face of >> it, it seems wrong. >> >> We also add a pointer to the open context in the nfs_write_data, and >> in the pnfs_layoutcommit_data. Do we take a reference on the open data >> in theses cases? . >> >> -->Andy >> >> > > Yes on all acounts. This is what Benny's patch is fixing please try it out > it is most important (Add my small fix) Is this what you mean? With Benny's patch, with return-on-close set and the layoutcommit_ctx different from the open context that caused the __nfs4_close: The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release __nfs4_close() will complete prior to calling the return-on-close LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called, put_nfs_open_context will be call on the layoutconmit_ctx (saved in pnfs_layoutcommit_data->ctx) and __nfs4_close will be called. In _this_ instance of calling __nfs4_close (from pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout is called. Since it is also a sync call, pnfs_layoutcommit_done does not return until it is complete. So the layout is returned, the layout is freed (all layout segments 'cause the range covers the whole file) and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from nfs_file_release) Now since return-on-close is set, pnfs_layout_return is called (in the nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go on the wire because the first pnfs_layout_return (from the pnfs_layoutcommit_done __nfs4_close) has removed the layout segment, and all is well. Sheese. --->Andy > > And about the multiple contexts I don't understand as well but the fact of it > is that an nfs_inode has an list_head of open_contexts and we must deal properly > with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in > prev kernels it almost never trigger.) > > Benny's patch takes care of that and I've done heavy testing of it I can see cases > of more then one contexts and it works correctly. > > But I too would like to understand when is it that an inode can have more then > one open_context > > Boaz > >> >>> >>> Benny >>> >>>> >>>> -->Andy >>>> >>>>> >>>>> (For some reason I see that happening much more in 2.6.34 >>>>> ?I don't understand why) >>>>> >>>>> Boaz >>>>> --- >>>>> git diff --stat -p -M >>>>> ?fs/nfs/nfs4state.c | ? ?2 +- >>>>> ?1 files changed, 1 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>>>> index 15c8bc8..6dbe893 100644 >>>>> --- a/fs/nfs/nfs4state.c >>>>> +++ b/fs/nfs/nfs4state.c >>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm >>>>> ? ? ? ? ? ? ? ?struct nfs_inode *nfsi = NFS_I(state->inode); >>>>> >>>>> ? ? ? ? ? ? ? ?if (nfsi->layoutcommit_ctx) >>>>> - ? ? ? ? ? ? ? ? ? ? ? pnfs_layoutcommit_inode(state->inode, 0); >>>>> + ? ? ? ? ? ? ? ? ? ? ? pnfs_layoutcommit_inode(state->inode, wait); >>>>> ? ? ? ? ? ? ? ?if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>>>> ? ? ? ? ? ? ? ? ? ? ? ?struct nfs4_pnfs_layout_segment range; >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html >>>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html >> > >