Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley
        Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United
        Kingdom.
        Registered in England and Wales under Company Registration No. 3798903
From:   David Howells <dhowells@redhat.com>
In-Reply-To: <CAHk-=whAAOVBrzwb2uMjCmdRrtudGesYj0tuqdUgi8X_gbw1jw@mail.gmail.com>
References: <CAHk-=whAAOVBrzwb2uMjCmdRrtudGesYj0tuqdUgi8X_gbw1jw@mail.gmail.com> <20230220135225.91b0f28344c01d5306c31230@linux-foundation.org>
To:     Linus Torvalds <torvalds@linux-foundation.org>,
        Vishal Moola <vishal.moola@gmail.com>,
        Steve French <stfrench@microsoft.com>
Cc:     dhowells@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
        Jan Kara <jack@suse.cz>, Paulo Alcantara <pc@cjr.nz>,
        Matthew Wilcox <willy@infradead.org>,
        Huang Ying <ying.huang@intel.com>,
        Baolin Wang <baolin.wang@linux.alibaba.com>,
        Xin Hao <xhao@linux.alibaba.com>, linux-mm@kvack.org,
        mm-commits@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] MM updates for 6.3-rc1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <2009824.1677229488.1@warthog.procyon.org.uk>
Date:   Fri, 24 Feb 2023 09:04:48 +0000
Message-ID: <2009825.1677229488@warthog.procyon.org.uk>
Precedence: bulk

Linus Torvalds <torvalds@linux-foundation.org> wrote:

>    Yes, I saw David Howells resolution suggestion. I think that one
> was buggy. It would wait for a page under writeback, and then go on to
> the *next* one without writing it back. I don't thin kthat was right.

You're right.  Vishal's patch introduced it into afs and I copied it across
and didn't notice it either then or on review of Vishal's patch.  He inserted
the extra for-loop as he's now extracting a batch, but kept the continue that
used to repeat the extraction - except now it continues the wrong loop.

So afs will need fixing too.  The simplest ways I think are to just decrement
the loop counter before continuing or to stick a goto in back to the beginning
of the loop (which is what you did in cifs).  But I'm not sure that's the
correct thing to do.  The previous code dropped the found folio and then
repeated the search in case the folio got truncated, migrated or punched.  I
suspect that's probably what we should do.


Also, thinking about it again, I'm not sure whether fetching a batch with
filemap_get_folios_tag() like this in {afs,cifs}_writepages_region() is
necessarily the right thing to do.  There are three cases I'm thinking of:

 (1) A single folio is returned.  This is trivial.

 (2) A run of contiguous folios are returned - {afs,cifs}_extend_writeback()
     is likely to write them back, in which case the batch is probably not
     useful.  Note that *_extend_writeback() walks the xarray directly itself
     as it wants contiguous folios and doesn't want to extract any folio it's
     not going to use.

 (3) A list of scattered folios is returned.  Granted this is more efficient
     if nothing else interferes - but there could be other writes in the gaps
     that we then skip over, other flushes that render some of our list clean
     or page invalidations.  This is a change in behaviour, but I'm not sure
     that matters too much since a flush/sync can only be expected to write
     back what's modified at the time it is initiated.

     Further, processing each entry in the list is potentially very slow
     because we're doing a write across the network for each one (cifs might
     bump this into the background, but it might also have to (re)open a file
     handle on the server and wait for credits first to even begin the
     transaction).

     Which means all of the folios in the batch may then get pinned for a long
     period of time - up to 14x for the last folio in the batch - which could
     prevent things like page migration.

Further, we might not get to write out all the folios in the batch as
*_extend_writeback() might hit the wbc limit first.

>    That said, I'm not at all convinced my version is right either. I
> can't test it, and that means I probably messed up. It looked sane to
> me when I did it, and it builds cleanly, but I honestly doubt myself.

It doesn't seem to work.  A write seems to end in lots of:

	CIFS: VFS: No writable handle in writepages rc=-9

being emitted.  I'll poke further into it - there's always the possibility
that some other patch is interfering.

David