Return-Path: Received: from mail-vx0-f174.google.com ([209.85.220.174]:62993 "EHLO mail-vx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751925Ab1GZRc6 convert rfc822-to-8bit (ORCPT ); Tue, 26 Jul 2011 13:32:58 -0400 Received: by vxh35 with SMTP id 35so485491vxh.19 for ; Tue, 26 Jul 2011 10:32:58 -0700 (PDT) In-Reply-To: <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A51825D@SACMVEXC2-PRD.hq.netapp.com> References: <1309743002-1658-1-git-send-email-bergwolf@gmail.com> <4E18614C.4010002@tonian.com> <1311621204.28209.14.camel@lade.trondhjem.org> <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A51825D@SACMVEXC2-PRD.hq.netapp.com> From: Peng Tao Date: Wed, 27 Jul 2011 01:32:38 +0800 Message-ID: Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS To: "Myklebust, Trond" Cc: tao.peng@emc.com, linux-nfs@vger.kernel.org, bhalevy@tonian.com Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Tue, Jul 26, 2011 at 11:50 PM, Myklebust, Trond wrote: >> -----Original Message----- >> From: Peng Tao [mailto:bergwolf@gmail.com] >> Sent: Tuesday, July 26, 2011 11:37 AM >> To: Myklebust, Trond >> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >> >> Hi, Trond, >> >> On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust >> wrote: >> > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: >> >> Hi, Trond, >> >> >> >> Any comments on this patch? I still get kernel crash when pnfs write >> is attempted but fails and calls pnfs_ld_write_done(). It seems object >> layout uses the same code path as well. But I don't find the patch in >> either your tree or Benny's tree. Are there any concerns? >> >> >> >> Thanks, >> >> Tao >> > >> > The whole pnfs_ld_write_done thing is bogus and needs to be replaced >> > with something sane. It is trying to initiate a WRITE RPC call with >> the >> > wrong block size, and is calling the MDS rpc_call_done() and >> > rpc_release() with an uninitialised rpc task pointer. >> > >> > Ditto for pnfs_ld_read_done. >> Thanks for your explanation. Is there any plan on how to fix >> pnfs_ld_read/write_done? Basically, we would need an interface that >> can redirect the IO to MDS if pnfs_error is set or do all necessary >> cleanup work to end read/write if pnfs_error is 0. IMHO, the >> recoalesce logic need to access nfs_pageio_descriptor but we do not >> have that information at pnfs_ld_read/write_done. > > As far as I can see, the right thing to do is to mark the layout as invalid and then redirty the page. It should be easy to have fsync() re-send the pages in this case. These should be extremely rare events, since we expect to catch most of the pNFS failures when we do the actual LAYOUTGET in the ->pg_init(). Agreed. This should be easier than re-coalescing and sending to MDS at read/write_done. > > My main worry is for aio/dio where there is no good mechanism for retrying. I'm still working on that... For dio, we may have to send the failed pages to MDS instead of relying on next fsync() to retry. Thanks, Tao > > Cheers >  Trond >