Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934169AbcLAReS (ORCPT ); Thu, 1 Dec 2016 12:34:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59112 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751301AbcLAReQ (ORCPT ); Thu, 1 Dec 2016 12:34:16 -0500 Date: Thu, 1 Dec 2016 18:34:02 +0100 From: Jesper Dangaard Brouer To: Mel Gorman Cc: Andrew Morton , Christoph Lameter , Michal Hocko , Vlastimil Babka , Johannes Weiner , Linux-MM , Linux-Kernel , Rick Jones , Paolo Abeni , brouer@redhat.com, "netdev@vger.kernel.org" , Hannes Frederic Sowa Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3 Message-ID: <20161201183402.2fbb8c5b@redhat.com> In-Reply-To: <20161130163520.hg7icdflagmvarbr@techsingularity.net> References: <20161127131954.10026-1-mgorman@techsingularity.net> <20161130134034.3b60c7f0@redhat.com> <20161130140615.3bbn7576iwbyc3op@techsingularity.net> <20161130160612.474ca93c@redhat.com> <20161130163520.hg7icdflagmvarbr@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 01 Dec 2016 17:34:12 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5758 Lines: 144 (Cc. netdev, we might have an issue with Paolo's UDP accounting and small socket queues) On Wed, 30 Nov 2016 16:35:20 +0000 Mel Gorman wrote: > > I don't quite get why you are setting the socket recv size > > (with -- -s and -S) to such a small number, size + 256. > > > > Maybe I missed something at the time I wrote that but why would it > need to be larger? Well, to me it is quite obvious that we need some queue to avoid packet drops. We have two processes netperf and netserver, that are sending packets between each-other (UDP_STREAM mostly netperf -> netserver). These PIDs are getting scheduled and migrated between CPUs, and thus does not get executed equally fast, thus a queue is need absorb the fluctuations. The network stack is even partly catching your config "mistake" and increase the socket queue size, so we minimum can handle one max frame (due skb "truesize" concept approx PAGE_SIZE + overhead). Hopefully for localhost testing a small queue should hopefully not result in packet drops. Testing... ups, this does result in packet drops. Test command extracted from mmtests, UDP_STREAM size 1024: netperf-2.4.5-installed/bin/netperf -t UDP_STREAM -l 60 -H 127.0.0.1 \ -- -s 1280 -S 1280 -m 1024 -M 1024 -P 15895 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 15895 AF_INET to 127.0.0.1 (127.0.0.1) port 15895 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 4608 1024 60.00 50024301 0 6829.98 2560 60.00 46133211 6298.72 Dropped packets: 50024301-46133211=3891090 To get a better drop indication, during this I run a command, to get system-wide network counters from the last second, so below numbers are per second. $ nstat > /dev/null && sleep 1 && nstat #kernel IpInReceives 885162 0.0 IpInDelivers 885161 0.0 IpOutRequests 885162 0.0 UdpInDatagrams 776105 0.0 UdpInErrors 109056 0.0 UdpOutDatagrams 885160 0.0 UdpRcvbufErrors 109056 0.0 IpExtInOctets 931190476 0.0 IpExtOutOctets 931189564 0.0 IpExtInNoECTPkts 885162 0.0 So, 885Kpps but only 776Kpps delivered and 109Kpps drops. See UdpInErrors and UdpRcvbufErrors is equal (109056/sec). This drop happens kernel side in __udp_queue_rcv_skb[1], because receiving process didn't empty it's queue fast enough see [2]. Although upstream changes are coming in this area, [2] is replaced with __udp_enqueue_schedule_skb, which I actually tested with... hmm Retesting with kernel 4.7.0-baseline+ ... show something else. To Paolo, you might want to look into this. And it could also explain why I've not see the mentioned speedup by mm-change, as I've been testing this patch on top of net-next (at 93ba2222550) with Paolo's UDP changes. netperf-2.4.5-installed/bin/netperf -t UDP_STREAM -l 60 -H 127.0.0.1 \ -- -s 1280 -S 1280 -m 1024 -M 1024 -P 15895 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 15895 AF_INET to 127.0.0.1 (127.0.0.1) port 15895 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 4608 1024 60.00 47248301 0 6450.97 2560 60.00 47245030 6450.52 Only dropped 47248301-47245030=3271 $ nstat > /dev/null && sleep 1 && nstat #kernel IpInReceives 810566 0.0 IpInDelivers 810566 0.0 IpOutRequests 810566 0.0 UdpInDatagrams 810468 0.0 UdpInErrors 99 0.0 UdpOutDatagrams 810566 0.0 UdpRcvbufErrors 99 0.0 IpExtInOctets 852713328 0.0 IpExtOutOctets 852713328 0.0 IpExtInNoECTPkts 810563 0.0 And nstat is also much better with only 99 drop/sec. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer [1] http://lxr.free-electrons.com/source/net/ipv4/udp.c?v=4.8#L1454 [2] http://lxr.free-electrons.com/source/net/core/sock.c?v=4.8#L413 Extra: with net-next at 93ba2222550 If I use netperf default socket queue, then there is not a single packet drop: netperf-2.4.5-installed/bin/netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -- -m 1024 -M 1024 -P 15895 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 15895 AF_INET to 127.0.0.1 (127.0.0.1) port 15895 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1024 60.00 48485642 0 6619.91 212992 60.00 48485642 6619.91 $ nstat > /dev/null && sleep 1 && nstat #kernel IpInReceives 821723 0.0 IpInDelivers 821722 0.0 IpOutRequests 821723 0.0 UdpInDatagrams 821722 0.0 UdpOutDatagrams 821722 0.0 IpExtInOctets 864457856 0.0 IpExtOutOctets 864458908 0.0 IpExtInNoECTPkts 821729 0.0