Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
MIME-Version: 1.0
References: <1624901943-20027-1-git-send-email-dwysocha@redhat.com>
 <1624901943-20027-5-git-send-email-dwysocha@redhat.com> <efc373dd3f321f2f45e749a5edb383f2a11a7b78.camel@hammerspace.com>
 <CALF+zO=CFMYUGRG2m77XQy=LVVd-Zf_a+oQrJATymu-iqHRNtA@mail.gmail.com>
 <e9f38da48bfaf1b43546e273493afc907c52def5.camel@hammerspace.com>
 <CALF+zOma0-M7Nhsz1=XZR0ihFGe4d4v7ofr4Emjg2VJWeUj9uw@mail.gmail.com>
 <ac3deecf386901f688b0c682237326f468a625ef.camel@hammerspace.com>
 <CALF+zOn=p6wuZ_pdifwWtLOUsiArkxBHHVDEnYcxsBdQy1LtVw@mail.gmail.com>
 <321b6e11718979668b5ab129a7048a511a9886a9.camel@hammerspace.com>
 <CALF+zOn5QiKTpngHB0Lv-pBNPa4R8t5+snbo49hKDAP3gcOx0w@mail.gmail.com> <267a770477273ba7400973ed162f040eec763e74.camel@hammerspace.com>
In-Reply-To: <267a770477273ba7400973ed162f040eec763e74.camel@hammerspace.com>
From:   David Wysochanski <dwysocha@redhat.com>
Date:   Tue, 29 Jun 2021 11:29:08 -0400
Message-ID: <CALF+zOkEQSKbUrmUMn8Bna3WGw1Qm3a0aoz+4XG7=TYsjbfsgg@mail.gmail.com>
Subject: Re: [PATCH 4/4] NFS: Fix fscache read from NFS after cache error
To:     Trond Myklebust <trondmy@hammerspace.com>
Cc:     "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
        "anna.schumaker@netapp.com" <anna.schumaker@netapp.com>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Tue, Jun 29, 2021 at 10:54 AM Trond Myklebust
<trondmy@hammerspace.com> wrote:
>
> On Tue, 2021-06-29 at 09:20 -0400, David Wysochanski wrote:
> > On Tue, Jun 29, 2021 at 8:46 AM Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > >
> > > On Tue, 2021-06-29 at 05:17 -0400, David Wysochanski wrote:
> > > > On Mon, Jun 28, 2021 at 8:39 PM Trond Myklebust
> > > > <trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Mon, 2021-06-28 at 19:46 -0400, David Wysochanski wrote:
> > > > > > On Mon, Jun 28, 2021 at 5:59 PM Trond Myklebust
> > > > > > <trondmy@hammerspace.com> wrote:
> > > > > > >
> > > > > > > On Mon, 2021-06-28 at 17:12 -0400, David Wysochanski wrote:
> > > > > > > > On Mon, Jun 28, 2021 at 3:09 PM Trond Myklebust
> > > > > > > > <trondmy@hammerspace.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 2021-06-28 at 13:39 -0400, Dave Wysochanski
> > > > > > > > > wrote:
> > > > > > > > > > Earlier commits refactored some NFS read code and
> > > > > > > > > > removed
> > > > > > > > > > nfs_readpage_async(), but neglected to properly fixup
> > > > > > > > > > nfs_readpage_from_fscache_complete().  The code path
> > > > > > > > > > is
> > > > > > > > > > only hit when something unusual occurs with the
> > > > > > > > > > cachefiles
> > > > > > > > > > backing filesystem, such as an IO error or while a
> > > > > > > > > > cookie
> > > > > > > > > > is being invalidated.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > > > > > > ---
> > > > > > > > > >  fs/nfs/fscache.c | 14 ++++++++++++--
> > > > > > > > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> > > > > > > > > > index c4c021c6ebbd..d308cb7e1dd4 100644
> > > > > > > > > > --- a/fs/nfs/fscache.c
> > > > > > > > > > +++ b/fs/nfs/fscache.c
> > > > > > > > > > @@ -381,15 +381,25 @@ static void
> > > > > > > > > > nfs_readpage_from_fscache_complete(struct page *page,
> > > > > > > > > >                                                void
> > > > > > > > > > *context,
> > > > > > > > > >                                                int
> > > > > > > > > > error)
> > > > > > > > > >  {
> > > > > > > > > > +       struct nfs_readdesc desc;
> > > > > > > > > > +       struct inode *inode = page->mapping->host;
> > > > > > > > > > +
> > > > > > > > > >         dfprintk(FSCACHE,
> > > > > > > > > >                  "NFS: readpage_from_fscache_complete
> > > > > > > > > > (0x%p/0x%p/%d)\n",
> > > > > > > > > >                  page, context, error);
> > > > > > > > > >
> > > > > > > > > > -       /* if the read completes with an error, we
> > > > > > > > > > just
> > > > > > > > > > unlock
> > > > > > > > > > the
> > > > > > > > > > page and let
> > > > > > > > > > -        * the VM reissue the readpage */
> > > > > > > > > >         if (!error) {
> > > > > > > > > >                 SetPageUptodate(page);
> > > > > > > > > >                 unlock_page(page);
> > > > > > > > > > +       } else {
> > > > > > > > > > +               desc.ctx = context;
> > > > > > > > > > +               nfs_pageio_init_read(&desc.pgio,
> > > > > > > > > > inode,
> > > > > > > > > > false,
> > > > > > > > > > +
> > > > > > > > > > &nfs_async_read_completion_ops);
> > > > > > > > > > +               error = readpage_async_filler(&desc,
> > > > > > > > > > page);
> > > > > > > > > > +               if (error)
> > > > > > > > > > +                       return;
> > > > > > > > >
> > > > > > > > > This code path can clearly fail too. Why can we not fix
> > > > > > > > > this
> > > > > > > > > code
> > > > > > > > > to
> > > > > > > > > allow it to return that reported error so that we can
> > > > > > > > > handle
> > > > > > > > > the
> > > > > > > > > failure case in nfs_readpage() instead of dead-ending
> > > > > > > > > here?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Maybe the below patch is what you had in mind?  That way
> > > > > > > > if
> > > > > > > > fscache
> > > > > > > > is enabled, nfs_readpage() should behave the same way as
> > > > > > > > if
> > > > > > > > it's
> > > > > > > > not,
> > > > > > > > for the case where an IO error occurs in the NFS read
> > > > > > > > completion
> > > > > > > > path.
> > > > > > > >
> > > > > > > > If we call into fscache and we get back that the IO has
> > > > > > > > been
> > > > > > > > submitted,
> > > > > > > > wait until it is completed, so we'll catch any IO errors
> > > > > > > > in
> > > > > > > > the
> > > > > > > > read
> > > > > > > > completion
> > > > > > > > path.  This does not solve the "catch the internal
> > > > > > > > errors",
> > > > > > > > IOW,
> > > > > > > > the
> > > > > > > > ones that show up as pg_error, that will probably require
> > > > > > > > copying
> > > > > > > > pg_error into nfs_open_context.error field.
> > > > > > > >
> > > > > > > > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> > > > > > > > index 78b9181e94ba..28e3318080e0 100644
> > > > > > > > --- a/fs/nfs/read.c
> > > > > > > > +++ b/fs/nfs/read.c
> > > > > > > > @@ -357,13 +357,13 @@ int nfs_readpage(struct file *file,
> > > > > > > > struct
> > > > > > > > page
> > > > > > > > *page)
> > > > > > > >         } else
> > > > > > > >                 desc.ctx =
> > > > > > > > get_nfs_open_context(nfs_file_open_context(file));
> > > > > > > >
> > > > > > > > +       xchg(&desc.ctx->error, 0);
> > > > > > > >         if (!IS_SYNC(inode)) {
> > > > > > > >                 ret = nfs_readpage_from_fscache(desc.ctx,
> > > > > > > > inode,
> > > > > > > > page);
> > > > > > > >                 if (ret == 0)
> > > > > > > > -                       goto out;
> > > > > > > > +                       goto out_wait;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > -       xchg(&desc.ctx->error, 0);
> > > > > > > >         nfs_pageio_init_read(&desc.pgio, inode, false,
> > > > > > > >
> > > > > > > > &nfs_async_read_completion_ops);
> > > > > > > >
> > > > > > > > @@ -373,6 +373,7 @@ int nfs_readpage(struct file *file,
> > > > > > > > struct
> > > > > > > > page
> > > > > > > > *page)
> > > > > > > >
> > > > > > > >         nfs_pageio_complete_read(&desc.pgio);
> > > > > > > >         ret = desc.pgio.pg_error < 0 ? desc.pgio.pg_error
> > > > > > > > :
> > > > > > > > 0;
> > > > > > > > +out_wait:
> > > > > > > >         if (!ret) {
> > > > > > > >                 ret = wait_on_page_locked_killable(page);
> > > > > > > >                 if (!PageUptodate(page) && !ret)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +               nfs_pageio_complete_read(&desc.pgio);
> > > > > > > > > >         }
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Trond Myklebust
> > > > > > > > > Linux NFS client maintainer, Hammerspace
> > > > > > > > > trond.myklebust@hammerspace.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Yes, please. This avoids that duplication of NFS read code
> > > > > > > in
> > > > > > > the
> > > > > > > fscache layer.
> > > > > > >
> > > > > >
> > > > > > If you mean patch 4 we still need that - I don't see anyway
> > > > > > to
> > > > > > avoid it.  The above just will make the fscache enabled
> > > > > > path waits for the IO to complete, same as the non-fscache
> > > > > > case.
> > > > > >
> > > > >
> > > > > With the above, you can simplify patch 4/4 to just make the
> > > > > page
> > > > > unlock
> > > > > unconditional on the error, no?
> > > > >
> > > > > i.e.
> > > > >         if (!error)
> > > > >                 SetPageUptodate(page);
> > > > >         unlock_page(page);
> > > > >
> > > > > End result: the client just does the same check as before and
> > > > > let's
> > > > > the
> > > > > vfs/mm decide based on the status of the PG_uptodate flag what
> > > > > to
> > > > > do
> > > > > next. I'm assuming that a retry won't cause fscache to do
> > > > > another
> > > > > bio
> > > > > attempt?
> > > > >
> > > >
> > > > Yes I think you're right and I'm following - let me test it and
> > > > I'll
> > > > send a v2.
> > > > Then we can drop patch #3 right?
> > > >
> > > Sounds good. Thanks Dave!
> > >
> >
> > This approach works but it differs from the original when an fscache
> > error occurs.
> > The original (see below) would call back into NFS to read from the
> > server, but
> > now we just let the VM handle it.  The VM will re-issue the read, but
> > will go back into
> > fscache again (because it's enabled), which may fail again.
>
> How about marking the page on failure, then? I don't believe we
> currently use PG_owner_priv_1 (a.k.a. PageOwnerPriv1, PageChecked,
> PagePinned, PageForeign, PageSwapCache, PageXenRemapped) for anything
> and according to legend it is supposed to be usable by the fs for page
> cache pages.
>
> So what say we use SetPageChecked() to mark the page as having failed
> retrieval from fscache?
>

So this?  I confirm this patch on top of the one I just sent works.
Want me to merge them together and send a v3?

Author: Dave Wysochanski <dwysocha@redhat.com>
Date:   Tue Jun 29 11:10:15 2021 -0400

    NFS: Mark page with PG_checked if fscache IO completes in error

    If fscache is enabled and we try to read from fscache, but the
    IO fails, mark the page with PG_checked.  Then when the VM
    re-issues the IO, skip over fscache and just read from the server.

    Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 0966e147e973..687e98b08994 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -404,10 +404,12 @@ static void
nfs_readpage_from_fscache_complete(struct page *page,
                 "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
                 page, context, error);

-       /* if the read completes with an error, unlock the page and let
-        * the VM reissue the readpage */
+       /* if the read completes with an error, mark the page with PG_checked,
+        * unlock the page, and let the VM reissue the readpage */
        if (!error)
                SetPageUptodate(page);
+       else
+               SetPageChecked(page);
        unlock_page(page);
 }

@@ -423,6 +425,11 @@ int __nfs_readpage_from_fscache(struct
nfs_open_context *ctx,
                 "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
                 nfs_i_fscache(inode), page, page->index, page->flags, inode);

+       if (PageChecked(page)) {
+               ClearPageChecked(page);
+               return 1;
+       }
+

        ret = fscache_read_or_alloc_page(nfs_i_fscache(inode),
                                         page,
                                         nfs_readpage_from_fscache_complete,