Date: Mon, 9 Sep 2013 12:32:09 -0500
From: Quentin Barnes <qbarnes@gmail.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
        "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: nfs-backed mmap file results in 1000s of WRITEs per second
Message-ID: <20130909173209.GA28353@gmail.com>
References: <20130905162110.GA17920@gmail.com>
 <20130905170303.GB17330@us.ibm.com>
 <20130905191139.GA20830@gmail.com>
 <1378411320.5450.27.camel@leira.trondhjem.org>
 <20130905213649.GA21944@gmail.com>
 <1378418243.5450.29.camel@leira.trondhjem.org>
 <20130905223420.GA23192@gmail.com>
 <20130906093636.6818e7b2@corrin.poochiereds.net>
 <CAKjHkpBW+LWKuKuHnUfKNxwDZeX3SOFKv_jYeNGF8ezdAnKnvg@mail.gmail.com>
 <20130909090424.1a780b49@tlielax.poochiereds.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <20130909090424.1a780b49@tlielax.poochiereds.net>
Sender: linux-nfs-owner@vger.kernel.org

On Mon, Sep 09, 2013 at 09:04:24AM -0400, Jeff Layton wrote:
> On Fri, 6 Sep 2013 11:48:45 -0500
> Quentin Barnes <qbarnes@gmail.com> wrote:
> 
> > Jeff, can your try out my test program in the base note on your
> > RHEL5.9 or later RHEL5.x kernels?
> > 
> > I reverified that running the test on a 2.6.18-348.16.1.el5 x86_64
> > kernel (latest released RHEL5.9) does not show the problem for me.
> > Based on what you and Trond have said in this thread though, I'm
> > really curious why it doesn't have the problem.
> 
> I can confirm what you see on RHEL5. One difference is that RHEL5's
> page_mkwrite handler does not do wait_on_page_writeback. That was added
> as part of the stable pages work that went in a while back, so that may 
> be the main difference. Adding that in doesn't seem to materially
> change things though.

Good to know you confirmed the behavior I saw on RHEL5 (just so that
I know it's not some random variable in play I had overlooked).

> In any case, what I see is that the initial program just ends up with a
> two calls to nfs_vm_page_mkwrite(). They both push out a WRITE and then
> things settle down (likely because the page is still marked dirty).
> 
> Eventually, another write occurs and the dirty page gets pushed out to
> the server in a small flurry of WRITEs to the same range.Then, things
> settle down again until there's another small flurry of activity.
> 
> My suspicion is that there is a race condition involved here, but I'm
> unclear on where it is. I'm not 100% convinced this is a bug, but page
> fault semantics aren't my strong suit.

As a test on RHEL6, I made a trivial systemtap script for kprobing
nfs_vm_page_mkwrite() and nfs_flush_incompatible().  I wanted to
make sure this bug was limited to just the nfs module and was not a
result of some mm behavior change.

With the bug unfixed running the test program, nfs_vm_page_mkwrite()
and nfs_flush_incompatible() are called repeatedly at a very high rate
(hence all the WRITEs).

After Trond's patch, the two functions are called just at the
program's initialization and then called only every 30 seconds or
so.

It looks like to me from the code flow that there must be something
nfs_wb_page() does that resets the need for mm to keeping reinvoking
nfs_vm_page_mkwrite().  I didn't look any deeper than that though
for now.  Maybe a race in how nfs_wb_page() updates status you're
thinking of?

> You may want to consider opening a "formal" RH support case if you have
> interest in getting Trond's patch backported, and/or following up on
> why RHEL5 behaves the way it does.

Yes, I'll be doing that.  When I do, I'll send you an email with the
case ticket.  Before filing it though, I want to hear back from the
group that had the original problem to make sure Trond's patch fully
addresses their problem (besides just the trivial test program).

> -- 
> Jeff Layton <jlayton@redhat.com>

Quentin