_______________________________________________
LKP mailing list
[email protected]
Hi guys,
On Sun, Feb 15, 2015 at 2:57 AM, Huang Ying <[email protected]> wrote:
> FYI, we noticed the below changes on
>
> commit c4a7ca774949960064dac11b326908f28407e8c3 ("SUNRPC: Allow waiting on memory allocation")
>
>
> testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-f2fs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
>
> 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 52524 ± 0% +6.0% 55672 ± 0% fsmark.time.involuntary_context_switches
> 436 ± 14% +54.9% 676 ± 20% sched_debug.cfs_rq[0]:/.tg_load_contrib
> 433 ± 15% +54.7% 670 ± 21% sched_debug.cfs_rq[0]:/.blocked_load_avg
> 8348 ± 7% +27.0% 10602 ± 9% sched_debug.cfs_rq[0]:/.min_vruntime
> 190081 ± 13% +32.7% 252269 ± 13% sched_debug.cpu#0.sched_goidle
> 205783 ± 12% +30.2% 267903 ± 13% sched_debug.cpu#0.ttwu_local
> 464065 ± 11% +26.6% 587524 ± 12% sched_debug.cpu#0.nr_switches
> 464278 ± 11% +26.6% 587734 ± 12% sched_debug.cpu#0.sched_count
> 15807 ± 11% +19.6% 18910 ± 12% sched_debug.cpu#4.nr_load_updates
> 300041 ± 8% +20.3% 360969 ± 10% sched_debug.cpu#0.ttwu_count
> 1863 ± 9% +18.1% 2201 ± 10% sched_debug.cfs_rq[4]:/.exec_clock
>
> testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-btrfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
>
> 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> ---------------- --------------------------
> fail:runs %reproduction fail:runs
> | | |
> 52184 ± 0% +5.6% 55122 ± 0% fsmark.time.involuntary_context_switches
> 557 ± 19% +21.5% 677 ± 9% sched_debug.cfs_rq[5]:/.blocked_load_avg
> 217 ± 19% -42.9% 124 ± 21% sched_debug.cfs_rq[2]:/.load
> 45852 ± 14% -39.4% 27773 ± 24% sched_debug.cpu#7.ttwu_local
> 457 ± 18% +50.1% 686 ± 20% sched_debug.cfs_rq[0]:/.tg_load_contrib
> 455 ± 18% +46.7% 668 ± 19% sched_debug.cfs_rq[0]:/.blocked_load_avg
> 66605 ± 10% -26.7% 48826 ± 14% sched_debug.cpu#7.sched_goidle
> 78249 ± 9% -22.5% 60678 ± 11% sched_debug.cpu#7.ttwu_count
> 153506 ± 9% -22.7% 118649 ± 12% sched_debug.cpu#7.nr_switches
> 153613 ± 9% -22.7% 118755 ± 12% sched_debug.cpu#7.sched_count
> 15806 ± 6% +19.2% 18833 ± 18% sched_debug.cpu#4.nr_load_updates
> 2171 ± 5% +15.6% 2510 ± 13% sched_debug.cfs_rq[4]:/.exec_clock
> 9924 ± 11% -27.0% 7244 ± 25% sched_debug.cfs_rq[3]:/.min_vruntime
> 3156 ± 4% -13.4% 2734 ± 8% sched_debug.cfs_rq[7]:/.min_vruntime
>
> testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-ext4-nfsv4-9B-400M-fsyncBeforeClose-16d-256fpd
>
> 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> ---------------- --------------------------
> 104802 ± 0% +7.7% 112883 ± 0% fsmark.time.involuntary_context_switches
> 471755 ± 0% -1.3% 465592 ± 0% fsmark.time.voluntary_context_switches
> 1977 ± 36% +90.8% 3771 ± 8% sched_debug.cpu#4.curr->pid
> 2 ± 34% +80.0% 4 ± 24% sched_debug.cpu#6.cpu_load[1]
> 4 ± 33% +83.3% 8 ± 31% sched_debug.cpu#6.cpu_load[0]
> 193 ± 17% +48.0% 286 ± 19% sched_debug.cfs_rq[2]:/.blocked_load_avg
> 196 ± 17% +47.5% 290 ± 19% sched_debug.cfs_rq[2]:/.tg_load_contrib
> 96 ± 18% +40.6% 135 ± 11% sched_debug.cfs_rq[7]:/.load
> 97 ± 18% +38.5% 135 ± 11% sched_debug.cpu#7.load
> 2274 ± 7% -16.5% 1898 ± 3% proc-vmstat.pgalloc_dma
> 319 ± 6% -29.7% 224 ± 24% sched_debug.cfs_rq[1]:/.tg_load_contrib
> 314 ± 5% -29.4% 222 ± 25% sched_debug.cfs_rq[1]:/.blocked_load_avg
> 621 ± 10% +41.9% 881 ± 37% sched_debug.cfs_rq[4]:/.avg->runnable_avg_sum
>
> nhm4: Nehalem
> Memory: 4G
>
>
>
>
> fsmark.time.involuntary_context_switches
>
> 114000 ++-----------------------------------------------------------------+
> 113000 O+ O O O O O O O O O O O O O O O O O |
> | O O O O O
> 112000 ++ |
> 111000 ++ |
> | |
> 110000 ++ |
> 109000 ++ |
> 108000 ++ |
> | |
> 107000 ++ |
> 106000 ++ |
> | |
> 105000 *+.*..*..*..*..*..*..*..*..*..*...*..*..*..*..*..*..*..* |
> 104000 ++-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> apt-get install ruby
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/setup-local job.yaml # the job file attached in this email
> bin/run-local job.yaml
So this is on a loopback NFS setup (i.e. the server resides on the
same node as the client, which just mounts from the loopback IP
address 127.0.0.1)?
That's a fairly quirky setup as far as memory management goes. In low
memory situations, you have very a nasty feedback mechanism whereby
the NFS server ends up pushing the client to write back more data,
increasing the memory pressure on the NFS server, etc.
It is quite possible that allowing the NFS client to block more
aggressively in low memory situations could worsen that feedback
mechanism, however that's not our main target platform; we actively
discourage people from using loopback NFS in production systems.
Is there any way you could confirm this performance change using a
remote NFS server instead of the loopback NFS?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Sun, 2015-02-15 at 13:02 -0500, Trond Myklebust wrote:
> Hi guys,
>
> On Sun, Feb 15, 2015 at 2:57 AM, Huang Ying <[email protected]> wrote:
> > FYI, we noticed the below changes on
> >
> > commit c4a7ca774949960064dac11b326908f28407e8c3 ("SUNRPC: Allow waiting on memory allocation")
> >
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-f2fs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 52524 ± 0% +6.0% 55672 ± 0% fsmark.time.involuntary_context_switches
> > 436 ± 14% +54.9% 676 ± 20% sched_debug.cfs_rq[0]:/.tg_load_contrib
> > 433 ± 15% +54.7% 670 ± 21% sched_debug.cfs_rq[0]:/.blocked_load_avg
> > 8348 ± 7% +27.0% 10602 ± 9% sched_debug.cfs_rq[0]:/.min_vruntime
> > 190081 ± 13% +32.7% 252269 ± 13% sched_debug.cpu#0.sched_goidle
> > 205783 ± 12% +30.2% 267903 ± 13% sched_debug.cpu#0.ttwu_local
> > 464065 ± 11% +26.6% 587524 ± 12% sched_debug.cpu#0.nr_switches
> > 464278 ± 11% +26.6% 587734 ± 12% sched_debug.cpu#0.sched_count
> > 15807 ± 11% +19.6% 18910 ± 12% sched_debug.cpu#4.nr_load_updates
> > 300041 ± 8% +20.3% 360969 ± 10% sched_debug.cpu#0.ttwu_count
> > 1863 ± 9% +18.1% 2201 ± 10% sched_debug.cfs_rq[4]:/.exec_clock
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-btrfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> > ---------------- --------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > 52184 ± 0% +5.6% 55122 ± 0% fsmark.time.involuntary_context_switches
> > 557 ± 19% +21.5% 677 ± 9% sched_debug.cfs_rq[5]:/.blocked_load_avg
> > 217 ± 19% -42.9% 124 ± 21% sched_debug.cfs_rq[2]:/.load
> > 45852 ± 14% -39.4% 27773 ± 24% sched_debug.cpu#7.ttwu_local
> > 457 ± 18% +50.1% 686 ± 20% sched_debug.cfs_rq[0]:/.tg_load_contrib
> > 455 ± 18% +46.7% 668 ± 19% sched_debug.cfs_rq[0]:/.blocked_load_avg
> > 66605 ± 10% -26.7% 48826 ± 14% sched_debug.cpu#7.sched_goidle
> > 78249 ± 9% -22.5% 60678 ± 11% sched_debug.cpu#7.ttwu_count
> > 153506 ± 9% -22.7% 118649 ± 12% sched_debug.cpu#7.nr_switches
> > 153613 ± 9% -22.7% 118755 ± 12% sched_debug.cpu#7.sched_count
> > 15806 ± 6% +19.2% 18833 ± 18% sched_debug.cpu#4.nr_load_updates
> > 2171 ± 5% +15.6% 2510 ± 13% sched_debug.cfs_rq[4]:/.exec_clock
> > 9924 ± 11% -27.0% 7244 ± 25% sched_debug.cfs_rq[3]:/.min_vruntime
> > 3156 ± 4% -13.4% 2734 ± 8% sched_debug.cfs_rq[7]:/.min_vruntime
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-ext4-nfsv4-9B-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> > ---------------- --------------------------
> > 104802 ± 0% +7.7% 112883 ± 0% fsmark.time.involuntary_context_switches
> > 471755 ± 0% -1.3% 465592 ± 0% fsmark.time.voluntary_context_switches
> > 1977 ± 36% +90.8% 3771 ± 8% sched_debug.cpu#4.curr->pid
> > 2 ± 34% +80.0% 4 ± 24% sched_debug.cpu#6.cpu_load[1]
> > 4 ± 33% +83.3% 8 ± 31% sched_debug.cpu#6.cpu_load[0]
> > 193 ± 17% +48.0% 286 ± 19% sched_debug.cfs_rq[2]:/.blocked_load_avg
> > 196 ± 17% +47.5% 290 ± 19% sched_debug.cfs_rq[2]:/.tg_load_contrib
> > 96 ± 18% +40.6% 135 ± 11% sched_debug.cfs_rq[7]:/.load
> > 97 ± 18% +38.5% 135 ± 11% sched_debug.cpu#7.load
> > 2274 ± 7% -16.5% 1898 ± 3% proc-vmstat.pgalloc_dma
> > 319 ± 6% -29.7% 224 ± 24% sched_debug.cfs_rq[1]:/.tg_load_contrib
> > 314 ± 5% -29.4% 222 ± 25% sched_debug.cfs_rq[1]:/.blocked_load_avg
> > 621 ± 10% +41.9% 881 ± 37% sched_debug.cfs_rq[4]:/.avg->runnable_avg_sum
> >
> > nhm4: Nehalem
> > Memory: 4G
> >
> >
> >
> >
> > fsmark.time.involuntary_context_switches
> >
> > 114000 ++-----------------------------------------------------------------+
> > 113000 O+ O O O O O O O O O O O O O O O O O |
> > | O O O O O
> > 112000 ++ |
> > 111000 ++ |
> > | |
> > 110000 ++ |
> > 109000 ++ |
> > 108000 ++ |
> > | |
> > 107000 ++ |
> > 106000 ++ |
> > | |
> > 105000 *+.*..*..*..*..*..*..*..*..*..*...*..*..*..*..*..*..*..* |
> > 104000 ++-----------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> > To reproduce:
> >
> > apt-get install ruby
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > cd lkp-tests
> > bin/setup-local job.yaml # the job file attached in this email
> > bin/run-local job.yaml
>
> So this is on a loopback NFS setup (i.e. the server resides on the
> same node as the client, which just mounts from the loopback IP
> address 127.0.0.1)?
Yes. This is on a loopback NFS setup.
> That's a fairly quirky setup as far as memory management goes. In low
> memory situations, you have very a nasty feedback mechanism whereby
> the NFS server ends up pushing the client to write back more data,
> increasing the memory pressure on the NFS server, etc.
> It is quite possible that allowing the NFS client to block more
> aggressively in low memory situations could worsen that feedback
> mechanism, however that's not our main target platform; we actively
> discourage people from using loopback NFS in production systems.
>
> Is there any way you could confirm this performance change using a
> remote NFS server instead of the loopback NFS?
We are working on a remote NFS setup now :)
Best Regards,
Huang, Ying