Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp5304020ybc; Wed, 27 Nov 2019 01:52:26 -0800 (PST) X-Google-Smtp-Source: APXvYqw81u9gy0WjIoJ3gXgugidPh81t7XtndDE+QWI6MQsM2OjgC/3H4HU1u0FpTZRfprfBhffm X-Received: by 2002:aa7:c0c8:: with SMTP id j8mr30493327edp.235.1574848346634; Wed, 27 Nov 2019 01:52:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574848346; cv=none; d=google.com; s=arc-20160816; b=CytQqhNXVIq8QbR6IZ6hNKI9JfAsQiUSncgH+tsBeFq4fd/p1AE9wpmN4sGHiFprSm 3zlJbWENVAY1QmwiX+UYob4niG5XozklVnSTg/Rawst3021zNVi71/tSdKNfEWpGfu6S EmrFsIrkLwgTlKUAj1gSYZTAHFA95hQt8UNzTRdGYbJSh4IN3VXEQ8fFo7212wbaUEUv iDyf02qWQ5axvgi32/qGa6Ia8GAN+1/nosQGOZWswh+PLsDv3wFIOM8GB7bRPwM+QDKJ z/XcBGNGg6GgtqoA9RK5tHdIiRuybV4ixN3TlpJgbV0RNeE4avHQJxYjSH0DTGOsp4Cz j4aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=H07T8BHh5BcmV80j3lg7IXdBX7ZDvpzBwkD+L/6zGBM=; b=U4MXLzhr5c+UgxJfvOguEQBOhS9gh71Kg78L73pwkHgEV67Y9AhMCcKldy6AXOInjh Mp0E4+no78RqazUNgLXxLkJNqxSnBRX7LrWGCb1tYJSnL0lNIabLIk+y14IoDbG29tvA Hc05bxuPkuu6ub0LO9o1xO7BJsODGgkOqPtyZnShv3VDFUbk0k11cAfpFVDVsV5J0ZwL Nyo4st0K9QkDljMJ5CHiLofAdZS/HEik0izkk9Xu+zD1st/AiL52N/4nrPKIPv7+22I7 poL5vGW7DIH/Sb7uOm/RVDDEiNX2/Wv2+n7CMmxEwQZgE+S00X2QLu12Hkmq8xPVECQc 1Wlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=pBBLnQ1N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id rs25si9396222ejb.85.2019.11.27.01.52.03; Wed, 27 Nov 2019 01:52:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=pBBLnQ1N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726373AbfK0JvJ (ORCPT + 99 others); Wed, 27 Nov 2019 04:51:09 -0500 Received: from mail-qv1-f51.google.com ([209.85.219.51]:43245 "EHLO mail-qv1-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726145AbfK0JvI (ORCPT ); Wed, 27 Nov 2019 04:51:08 -0500 Received: by mail-qv1-f51.google.com with SMTP id cg2so8591259qvb.10 for ; Wed, 27 Nov 2019 01:51:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H07T8BHh5BcmV80j3lg7IXdBX7ZDvpzBwkD+L/6zGBM=; b=pBBLnQ1NyzQyxoM6zl5akl2JKAydiUwno7tA2mwZK83clwOH4PJruUqiaKydGo/wWG /C43rPvDuaU9hTqVAPyEsrb+5KQ9cZDEvfHJ364w6GgXKA7KIY7tn6n5LCjd0ZhdTX2H pJ1agAMuLszk50LMYYrlc4R61sRY3FnqgMs5M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H07T8BHh5BcmV80j3lg7IXdBX7ZDvpzBwkD+L/6zGBM=; b=MMUCC5G8+3gsF6Vn/6QccLwRcHCgJe8msaRiFVhdNweG13rYkiygxJJq5HN/7v8Hjh 90LrUchyAtccRXNDE7SogHTrQcf17GGuUmnmHfSYcOYWJCIDs2B4uzPcw08neiZtKb4n wss/oivoruaAnCk2Ib3CUJbxxBHvXM+L32VWXkwbBJWlmRzW1Weenu53tDIg4Qa58FYK 5Y4oh4vt1zLGbCTQjJlh3CHUrLTkFTQaxg9BFe3HGAeNFLXF74wutyHERIKFb/NYVZJx vCKRFp7gB6BDpImJUV0Lu2NTeqMWQ6Oy6McMlg0fS91zDryJIWLPsS6XXvCAn52lDYcA pBWg== X-Gm-Message-State: APjAAAUXdsacPA6xRmoB8v8MtnLSp09F/nJ8pbR2tqbkrW3r+jmv/e5Q Lcrka+ecpVPoeYL2no6m6PzXmDb69YvO0HjkaJuHnQ== X-Received: by 2002:a0c:8e87:: with SMTP id x7mr3797948qvb.112.1574848267041; Wed, 27 Nov 2019 01:51:07 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Marek Majkowski Date: Wed, 27 Nov 2019 10:50:55 +0100 Message-ID: Subject: Re: epoll_wait() performance To: David Laight Cc: linux-kernel , network dev , kernel-team , Jesper Dangaard Brouer Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 22, 2019 at 12:18 PM David Laight wrote: > I'm trying to optimise some code that reads UDP messages (RTP and RTCP) from a lot of sockets. > The 'normal' data pattern is that there is no data on half the sockets (RTCP) and > one message every 20ms on the others (RTP). > However there can be more than one message on each socket, and they all need to be read. > Since the code processing the data runs every 10ms, the message receiving code > also runs every 10ms (a massive gain when using poll()). How many sockets we are talking about? More like 500 or 500k? We had very bad experience with UDP connected sockets, so if you are using UDP connected sockets, the RX path is super slow, mostly consumed by udp_lib_lookup() https://elixir.bootlin.com/linux/v5.4/source/net/ipv4/udp.c#L445 Then we might argue that doing thousands of udp unconnected sockets - like 192.0.2.1:1234, 192.0.2.2:1234, etc - creates little value. I guess the only reasonable case for large number of UDP sockets is when you need large number of source ports. In such case we experimented with abusing TPROXY: https://web.archive.org/web/20191115081000/https://blog.cloudflare.com/how-we-built-spectrum/ > While using recvmmsg() to read multiple messages might seem a good idea, it is much > slower than recv() when there is only one message (even recvmsg() is a lot slower). > (I'm not sure why the code paths are so slow, I suspect it is all the copy_from_user() > and faffing with the user iov[].) > > So using poll() we repoll the fd after calling recv() to find is there is a second message. > However the second poll has a significant performance cost (but less than using recvmmsg()). That sounds wrong. Single recvmmsg(), even when receiving only a single message, should be faster than two syscalls - recv() and poll(). > If we use epoll() in level triggered mode a second epoll_wait() call (after the recv()) will > indicate that there is more data. > > For poll() it doesn't make much difference how many fd are supplied to each system call. > The overall performance is much the same for 32, 64 or 500 (all the sockets). > > For epoll_wait() that isn't true. > Supplying a buffer that is shorter than the list of 'ready' fds gives a massive penalty. > With a buffer long enough for all the events epoll() is somewhat faster than poll(). > But with a 64 entry buffer it is much slower. > I've looked at the code and can't see why splicing the unread events back is expensive. Again, this is surprising. > I'd like to be able to change the code so that multiple threads are reading from the epoll fd. > This would mean I'd have to run it in edge mode and each thread reading a smallish > block of events. > Any suggestions on how to efficiently read the 'unusual' additional messages from > the sockets? Random ideas: 1. Perhaps reducing the number of sockets could help - with iptables or TPROXY. TPROXY has some performance impact though, so be careful. 2. I played with io_submit for syscall batching, but in my experiments I wasn't able to show performance boost: https://blog.cloudflare.com/io_submit-the-epoll-alternative-youve-never-heard-about/ Perhaps the newer io_uring with networking support could help: https://twitter.com/axboe/status/1195047335182524416 3. SO_BUSYPOLL drastically reduces latency, but I've only used it with a single socket.. 4. If you want to get number of outstanding packets, there is SIOCINQ and SO_MEMINFO. My older writeups: https://blog.cloudflare.com/how-to-receive-a-million-packets/ https://blog.cloudflare.com/how-to-achieve-low-latency/ Cheers, Marek > FWIW the fastest way to read 1 RTP message every 20ms is to do non-blocking recv() every 10ms. > The failing recv() is actually faster than either epoll() or two poll() actions. > (Although something is needed to pick up the occasional second message.) > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >