Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759007AbcLAWRz (ORCPT ); Thu, 1 Dec 2016 17:17:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38292 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758533AbcLAWRx (ORCPT ); Thu, 1 Dec 2016 17:17:53 -0500 Message-ID: <1480630668.5345.7.camel@redhat.com> Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3 From: Paolo Abeni To: Jesper Dangaard Brouer Cc: Mel Gorman , Andrew Morton , Christoph Lameter , Michal Hocko , Vlastimil Babka , Johannes Weiner , Linux-MM , Linux-Kernel , Rick Jones , "netdev@vger.kernel.org" , Hannes Frederic Sowa Date: Thu, 01 Dec 2016 23:17:48 +0100 In-Reply-To: <20161201183402.2fbb8c5b@redhat.com> References: <20161127131954.10026-1-mgorman@techsingularity.net> <20161130134034.3b60c7f0@redhat.com> <20161130140615.3bbn7576iwbyc3op@techsingularity.net> <20161130160612.474ca93c@redhat.com> <20161130163520.hg7icdflagmvarbr@techsingularity.net> <20161201183402.2fbb8c5b@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Thu, 01 Dec 2016 22:17:52 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4773 Lines: 125 On Thu, 2016-12-01 at 18:34 +0100, Jesper Dangaard Brouer wrote: > (Cc. netdev, we might have an issue with Paolo's UDP accounting and > small socket queues) > > On Wed, 30 Nov 2016 16:35:20 +0000 > Mel Gorman wrote: > > > > I don't quite get why you are setting the socket recv size > > > (with -- -s and -S) to such a small number, size + 256. > > > > > > > Maybe I missed something at the time I wrote that but why would it > > need to be larger? > > Well, to me it is quite obvious that we need some queue to avoid packet > drops. We have two processes netperf and netserver, that are sending > packets between each-other (UDP_STREAM mostly netperf -> netserver). > These PIDs are getting scheduled and migrated between CPUs, and thus > does not get executed equally fast, thus a queue is need absorb the > fluctuations. > > The network stack is even partly catching your config "mistake" and > increase the socket queue size, so we minimum can handle one max frame > (due skb "truesize" concept approx PAGE_SIZE + overhead). > > Hopefully for localhost testing a small queue should hopefully not > result in packet drops. Testing... ups, this does result in packet > drops. > > Test command extracted from mmtests, UDP_STREAM size 1024: > > netperf-2.4.5-installed/bin/netperf -t UDP_STREAM -l 60 -H 127.0.0.1 \ > -- -s 1280 -S 1280 -m 1024 -M 1024 -P 15895 > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) > port 15895 AF_INET to 127.0.0.1 (127.0.0.1) port 15895 AF_INET > Socket Message Elapsed Messages > Size Size Time Okay Errors Throughput > bytes bytes secs # # 10^6bits/sec > > 4608 1024 60.00 50024301 0 6829.98 > 2560 60.00 46133211 6298.72 > > Dropped packets: 50024301-46133211=3891090 > > To get a better drop indication, during this I run a command, to get > system-wide network counters from the last second, so below numbers are > per second. > > $ nstat > /dev/null && sleep 1 && nstat > #kernel > IpInReceives 885162 0.0 > IpInDelivers 885161 0.0 > IpOutRequests 885162 0.0 > UdpInDatagrams 776105 0.0 > UdpInErrors 109056 0.0 > UdpOutDatagrams 885160 0.0 > UdpRcvbufErrors 109056 0.0 > IpExtInOctets 931190476 0.0 > IpExtOutOctets 931189564 0.0 > IpExtInNoECTPkts 885162 0.0 > > So, 885Kpps but only 776Kpps delivered and 109Kpps drops. See > UdpInErrors and UdpRcvbufErrors is equal (109056/sec). This drop > happens kernel side in __udp_queue_rcv_skb[1], because receiving > process didn't empty it's queue fast enough see [2]. > > Although upstream changes are coming in this area, [2] is replaced with > __udp_enqueue_schedule_skb, which I actually tested with... hmm > > Retesting with kernel 4.7.0-baseline+ ... show something else. > To Paolo, you might want to look into this. And it could also explain why > I've not see the mentioned speedup by mm-change, as I've been testing > this patch on top of net-next (at 93ba2222550) with Paolo's UDP changes. Thank you for reporting this. It seems that the commit 123b4a633580 ("udp: use it's own memory accounting schema") is too strict while checking the rcvbuf. For very small value of rcvbuf, it allows a single skb to be enqueued, while previously we allowed 2 of them to enter the queue, even if the first one truesize exceeded rcvbuf, as in your test-case. Can you please try the following patch ? Thank you, Paolo --- net/ipv4/udp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e1d0bf8..2f5dc92 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1200,19 +1200,21 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) struct sk_buff_head *list = &sk->sk_receive_queue; int rmem, delta, amt, err = -ENOMEM; int size = skb->truesize; + int limit; /* try to avoid the costly atomic add/sub pair when the receive * queue is full; always allow at least a packet */ rmem = atomic_read(&sk->sk_rmem_alloc); - if (rmem && (rmem + size > sk->sk_rcvbuf)) + limit = size + sk->sk_rcvbuf; + if (rmem > limit) goto drop; /* we drop only if the receive buf is full and the receive * queue contains some other skb */ rmem = atomic_add_return(size, &sk->sk_rmem_alloc); - if ((rmem > sk->sk_rcvbuf) && (rmem > size)) + if (rmem > limit) goto uncharge_drop; spin_lock(&list->lock);