Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933115AbcLGVUI (ORCPT ); Wed, 7 Dec 2016 16:20:08 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:49812 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932209AbcLGVUH (ORCPT ); Wed, 7 Dec 2016 16:20:07 -0500 Date: Wed, 7 Dec 2016 21:19:58 +0000 From: Mel Gorman To: Eric Dumazet Cc: Andrew Morton , Christoph Lameter , Michal Hocko , Vlastimil Babka , Johannes Weiner , Jesper Dangaard Brouer , Joonsoo Kim , Linux-MM , Linux-Kernel Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v7 Message-ID: <20161207211958.s3ymjva54wgakpkm@techsingularity.net> References: <20161207101228.8128-1-mgorman@techsingularity.net> <1481137249.4930.59.camel@edumazet-glaptop3.roam.corp.google.com> <20161207194801.krhonj7yggbedpba@techsingularity.net> <1481141424.4930.71.camel@edumazet-glaptop3.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1481141424.4930.71.camel@edumazet-glaptop3.roam.corp.google.com> User-Agent: Mutt/1.6.2 (2016-07-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5999 Lines: 127 On Wed, Dec 07, 2016 at 12:10:24PM -0800, Eric Dumazet wrote: > On Wed, 2016-12-07 at 19:48 +0000, Mel Gorman wrote: > > > > > > Interesting because it didn't match what I previous measured but then > > again, when I established that netperf on localhost was slab intensive, > > it was also an older kernel. Can you tell me if SLAB or SLUB was enabled > > in your test kernel? > > > > Either that or the baseline I used has since been changed from what you > > are testing and we're not hitting the same paths. > > > lpaa6:~# uname -a > Linux lpaa6 4.9.0-smp-DEV #429 SMP @1481125332 x86_64 GNU/Linux > > lpaa6:~# perf record -g ./netperf -t UDP_STREAM -l 3 -- -m 16384 > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > localhost () port 0 AF_INET > Socket Message Elapsed Messages > Size Size Time Okay Errors Throughput > bytes bytes secs # # 10^6bits/sec > > 212992 16384 3.00 654644 0 28601.04 > 212992 3.00 654592 28598.77 > I'm seeing parts of the disconnect. The load is slab intensive but not necessarily page allocator intensive depending on a variety of factors. While the motivation of the patch was initially SLUB, any path that is high-order page allocator intensive benefits so; 1. If the workload is slab intensive and SLUB is used then it may benefit if SLUB happens to frequently require new pages, particularly if there is a pattern of growing/shrinking slabs frequently. 2. If the workload is high-order page allocator intensive but bypassing SLUB and SLAB, then it'll benefit anyway So you say you don't see much slab activity for some configuration and it's hitting the page allocator. For the purposes of this patch, that's fine albeit useless for a SLAB vs SLUB comparison. Anything else I saw for the moment is probably not surprising; At small packet sizes on localhost, I see relatively low page allocator activity except during the socket setup and other unrelated activity (khugepaged, irqbalance, some btrfs stuff) which is curious as it's less clear why the performance was improved in that case. I considered the possibility that it was cache hotness of pages but that's not a good fit. If it was true then the first test would be slow and the rest relatively fast and I'm not seeing that. The other side-effect is that all the high-order pages that are allocated at the start are physically close together but that shouldn't have that big an impact. So for now, the gain is unexplained even though it happens consistently. At larger message sizes to localhost, it's page allocator intensive through paths like this netperf-3887 [032] .... 393.246420: mm_page_alloc: page=ffffea0021272200 pfn=8690824 order=3 migratetype=0 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_REPEAT|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_NOTRACK netperf-3887 [032] .... 393.246421: => kmalloc_large_node+0x60/0x8d => __kmalloc_node_track_caller+0x245/0x280 => __kmalloc_reserve.isra.35+0x31/0x90 => __alloc_skb+0x7e/0x280 => alloc_skb_with_frags+0x5a/0x1c0 => sock_alloc_send_pskb+0x19e/0x200 => sock_alloc_send_skb+0x18/0x20 => __ip_append_data.isra.46+0x61d/0xa00 => ip_make_skb+0xc2/0x110 => udp_sendmsg+0x2c0/0xa40 => inet_sendmsg+0x7f/0xb0 => sock_sendmsg+0x38/0x50 => SYSC_sendto+0x102/0x190 => SyS_sendto+0xe/0x10 => do_syscall_64+0x5b/0xd0 => return_from_SYSCALL_64+0x0/0x6a It's going through the SLUB paths but finding the allocation is too large and hitting the page allocator instead. This is using 4.9-rc5 as a baseline so fixes might be missing. If using small messages to a remote host, I again see intense page allocator activity via netperf-4326 [047] .... 994.978387: mm_page_alloc: page=ffffea0041413400 pfn=17106128 order=2 migratetype=0 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_REPEAT|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_NOTRACK netperf-4326 [047] .... 994.978387: => alloc_pages_current+0x88/0x120 => new_slab+0x33f/0x580 => ___slab_alloc+0x352/0x4d0 => __slab_alloc.isra.73+0x43/0x5e => __kmalloc_node_track_caller+0xba/0x280 => __kmalloc_reserve.isra.35+0x31/0x90 => __alloc_skb+0x7e/0x280 => alloc_skb_with_frags+0x5a/0x1c0 => sock_alloc_send_pskb+0x19e/0x200 => sock_alloc_send_skb+0x18/0x20 => __ip_append_data.isra.46+0x61d/0xa00 => ip_make_skb+0xc2/0x110 => udp_sendmsg+0x2c0/0xa40 => inet_sendmsg+0x7f/0xb0 => sock_sendmsg+0x38/0x50 => SYSC_sendto+0x102/0x190 => SyS_sendto+0xe/0x10 => do_syscall_64+0x5b/0xd0 => return_from_SYSCALL_64+0x0/0x6a This is a slab path, but at different orders. So while the patch was motivated by SLUB, the fact I'm getting intense page allocator activity still benefits. > Maybe one day we will avoid doing order-4 (or even order-5 in extreme > cases !) allocations for loopback as we did for af_unix :P > > I mean, maybe some applications are sending 64KB UDP messages over > loopback right now... > Maybe but it's clear that even running "networking" workloads does not necessarily mean that paths interesting to this patch are hit. Not necessarily bad but it was always expected that the benefit of the patch would be workload and configuration dependant. -- Mel Gorman SUSE Labs