Message-ID: <1424046503.5538.23.camel@intel.com>
Subject: Re: [LKP] [SUNRPC] c4a7ca77494: +6.0%
 fsmark.time.involuntary_context_switches, no primary result change
From: Huang Ying <ying.huang@intel.com>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: LKML <linux-kernel@vger.kernel.org>, LKP ML <lkp@01.org>
Date: Mon, 16 Feb 2015 08:28:23 +0800
In-Reply-To: <CAHQdGtSgAC5+TJPpQ-BiKa8ZTpfw-TjZcjn1CQhCu-xBxfx2PA@mail.gmail.com>
References: <1423987029.5538.16.camel@intel.com>
	 <CAHQdGtSgAC5+TJPpQ-BiKa8ZTpfw-TjZcjn1CQhCu-xBxfx2PA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7335
Lines: 131

On Sun, 2015-02-15 at 13:02 -0500, Trond Myklebust wrote:
> Hi guys,
> 
> On Sun, Feb 15, 2015 at 2:57 AM, Huang Ying <ying.huang@intel.com> wrote:
> > FYI, we noticed the below changes on
> >
> > commit c4a7ca774949960064dac11b326908f28407e8c3 ("SUNRPC: Allow waiting on memory allocation")
> >
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-f2fs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0  c4a7ca774949960064dac11b32
> > ----------------  --------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >      52524 ±  0%      +6.0%      55672 ±  0%  fsmark.time.involuntary_context_switches
> >        436 ± 14%     +54.9%        676 ± 20%  sched_debug.cfs_rq[0]:/.tg_load_contrib
> >        433 ± 15%     +54.7%        670 ± 21%  sched_debug.cfs_rq[0]:/.blocked_load_avg
> >       8348 ±  7%     +27.0%      10602 ±  9%  sched_debug.cfs_rq[0]:/.min_vruntime
> >     190081 ± 13%     +32.7%     252269 ± 13%  sched_debug.cpu#0.sched_goidle
> >     205783 ± 12%     +30.2%     267903 ± 13%  sched_debug.cpu#0.ttwu_local
> >     464065 ± 11%     +26.6%     587524 ± 12%  sched_debug.cpu#0.nr_switches
> >     464278 ± 11%     +26.6%     587734 ± 12%  sched_debug.cpu#0.sched_count
> >      15807 ± 11%     +19.6%      18910 ± 12%  sched_debug.cpu#4.nr_load_updates
> >     300041 ±  8%     +20.3%     360969 ± 10%  sched_debug.cpu#0.ttwu_count
> >       1863 ±  9%     +18.1%       2201 ± 10%  sched_debug.cfs_rq[4]:/.exec_clock
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-btrfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0  c4a7ca774949960064dac11b32
> > ----------------  --------------------------
> >        fail:runs  %reproduction    fail:runs
> >            |             |             |
> >      52184 ±  0%      +5.6%      55122 ±  0%  fsmark.time.involuntary_context_switches
> >        557 ± 19%     +21.5%        677 ±  9%  sched_debug.cfs_rq[5]:/.blocked_load_avg
> >        217 ± 19%     -42.9%        124 ± 21%  sched_debug.cfs_rq[2]:/.load
> >      45852 ± 14%     -39.4%      27773 ± 24%  sched_debug.cpu#7.ttwu_local
> >        457 ± 18%     +50.1%        686 ± 20%  sched_debug.cfs_rq[0]:/.tg_load_contrib
> >        455 ± 18%     +46.7%        668 ± 19%  sched_debug.cfs_rq[0]:/.blocked_load_avg
> >      66605 ± 10%     -26.7%      48826 ± 14%  sched_debug.cpu#7.sched_goidle
> >      78249 ±  9%     -22.5%      60678 ± 11%  sched_debug.cpu#7.ttwu_count
> >     153506 ±  9%     -22.7%     118649 ± 12%  sched_debug.cpu#7.nr_switches
> >     153613 ±  9%     -22.7%     118755 ± 12%  sched_debug.cpu#7.sched_count
> >      15806 ±  6%     +19.2%      18833 ± 18%  sched_debug.cpu#4.nr_load_updates
> >       2171 ±  5%     +15.6%       2510 ± 13%  sched_debug.cfs_rq[4]:/.exec_clock
> >       9924 ± 11%     -27.0%       7244 ± 25%  sched_debug.cfs_rq[3]:/.min_vruntime
> >       3156 ±  4%     -13.4%       2734 ±  8%  sched_debug.cfs_rq[7]:/.min_vruntime
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-ext4-nfsv4-9B-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0  c4a7ca774949960064dac11b32
> > ----------------  --------------------------
> >     104802 ±  0%      +7.7%     112883 ±  0%  fsmark.time.involuntary_context_switches
> >     471755 ±  0%      -1.3%     465592 ±  0%  fsmark.time.voluntary_context_switches
> >       1977 ± 36%     +90.8%       3771 ±  8%  sched_debug.cpu#4.curr->pid
> >          2 ± 34%     +80.0%          4 ± 24%  sched_debug.cpu#6.cpu_load[1]
> >          4 ± 33%     +83.3%          8 ± 31%  sched_debug.cpu#6.cpu_load[0]
> >        193 ± 17%     +48.0%        286 ± 19%  sched_debug.cfs_rq[2]:/.blocked_load_avg
> >        196 ± 17%     +47.5%        290 ± 19%  sched_debug.cfs_rq[2]:/.tg_load_contrib
> >         96 ± 18%     +40.6%        135 ± 11%  sched_debug.cfs_rq[7]:/.load
> >         97 ± 18%     +38.5%        135 ± 11%  sched_debug.cpu#7.load
> >       2274 ±  7%     -16.5%       1898 ±  3%  proc-vmstat.pgalloc_dma
> >        319 ±  6%     -29.7%        224 ± 24%  sched_debug.cfs_rq[1]:/.tg_load_contrib
> >        314 ±  5%     -29.4%        222 ± 25%  sched_debug.cfs_rq[1]:/.blocked_load_avg
> >        621 ± 10%     +41.9%        881 ± 37%  sched_debug.cfs_rq[4]:/.avg->runnable_avg_sum
> >
> > nhm4: Nehalem
> > Memory: 4G
> >
> >
> >
> >
> >                       fsmark.time.involuntary_context_switches
> >
> >   114000 ++-----------------------------------------------------------------+
> >   113000 O+    O     O  O  O  O  O  O  O   O  O  O  O  O  O  O     O     O  |
> >          |  O     O                                             O     O     O
> >   112000 ++                                                                 |
> >   111000 ++                                                                 |
> >          |                                                                  |
> >   110000 ++                                                                 |
> >   109000 ++                                                                 |
> >   108000 ++                                                                 |
> >          |                                                                  |
> >   107000 ++                                                                 |
> >   106000 ++                                                                 |
> >          |                                                                  |
> >   105000 *+.*..*..*..*..*..*..*..*..*..*...*..*..*..*..*..*..*..*           |
> >   104000 ++-----------------------------------------------------------------+
> >
> >
> >         [*] bisect-good sample
> >         [O] bisect-bad  sample
> >
> > To reproduce:
> >
> >         apt-get install ruby
> >         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> >         cd lkp-tests
> >         bin/setup-local job.yaml # the job file attached in this email
> >         bin/run-local   job.yaml
> 
> So this is on a loopback NFS setup (i.e. the server resides on the
> same node as the client, which just mounts from the loopback IP
> address 127.0.0.1)?

Yes.  This is on a loopback NFS setup.

> That's a fairly quirky setup as far as memory management goes. In low
> memory situations, you have very a nasty feedback mechanism whereby
> the NFS server ends up pushing the client to write back more data,
> increasing the memory pressure on the NFS server, etc.
> It is quite possible that allowing the NFS client to block more
> aggressively in low memory situations could worsen that feedback
> mechanism, however that's not our main target platform; we actively
> discourage people from using loopback NFS in production systems.
> 
> Is there any way you could confirm this performance change using a
> remote NFS server instead of the loopback NFS?

We are working on a remote NFS setup now :)

Best Regards,
Huang, Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/