Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp9337745ybc; Sat, 30 Nov 2019 05:31:44 -0800 (PST) X-Google-Smtp-Source: APXvYqwEyuCTsV3FK9e6aRHW2muoQ2yWNrPG/AsfA1M7VaBznwe2uAVDfwTNcWIZ+noAZt2SIpxX X-Received: by 2002:a50:fc18:: with SMTP id i24mr51628102edr.41.1575120704237; Sat, 30 Nov 2019 05:31:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575120704; cv=none; d=google.com; s=arc-20160816; b=y0Y8El0fhPp56neY/Zyi0J00eeAKF1Rd4TlMI9VjtXJhYe7Z2z4g7vfaX+ls67NkiP 8MylZ8unB33m3isZDS477kPD2oLF52Zpb7k/Vj57FILDjGDA43y8IhF31MqccfBBUmy0 dWuAvUhuIM5+pgbHbrjVHORAjVzh4xWxr5Q7gfxnbCN0dgKGxvMyfB0J776JcDSKQsaA lEX60sOLePEmMCVtiWxAaJYsP6OwNS9df3dgPPlGP2vcNBsarHzPrmbaoFWfPT66+/X4 8HGHEGi0Nz9cjV1JU0qquvWrsMBp1zRZvpzR28RN8MiCm2u0SArru9JzpDfb7eQW2HHS GB3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:in-reply-to :subject:cc:to:from:user-agent:references:dkim-signature; bh=Mdb4T961s/1Pcn7Snum1QFXu2ONNJ1wf0K7LmowejOA=; b=ojVc+GnpjIApdtdbenUrZ1aTlCsEb+Ad0YSugSCEYPcaVkRnfDAm/uXhKxfXMnKvAZ 5f/ekOFaiMSUrHqZr8Xlw+VqktFSL5kK+HRw/adTZ+Z4Q3XumQAhRvfpYOUb2pe1EZ2U +1WbJaj9m9jNtsXQyu7OsWYdeeTKkTP41+a6T+63xivxYDDL2tr6FH2iOEOyew0ZKPOJ d5u9cYmFCYJhQERCFdI6BRw4Wb5wn9YCowmNB7BdScyQRqCJifxvX7F/gPIyQDlXE+9Y yIHIW7R1GO7Ju9G8/GpWJ8u49lUgC/Fa9Q2GjWUh+XBcFnv+psj2xVrovMuUJSgbwYBx l1rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=rt7I1jNZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ck13si16666089ejb.62.2019.11.30.05.31.07; Sat, 30 Nov 2019 05:31:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=rt7I1jNZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726924AbfK3N3s (ORCPT + 99 others); Sat, 30 Nov 2019 08:29:48 -0500 Received: from mail-lj1-f195.google.com ([209.85.208.195]:33804 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725811AbfK3N3r (ORCPT ); Sat, 30 Nov 2019 08:29:47 -0500 Received: by mail-lj1-f195.google.com with SMTP id m6so27418468ljc.1 for ; Sat, 30 Nov 2019 05:29:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version; bh=Mdb4T961s/1Pcn7Snum1QFXu2ONNJ1wf0K7LmowejOA=; b=rt7I1jNZ+ZXMKPBdgk96daMbufwwaKUPrQaESEumVv8rQbRp30PqGaZnYoAKntNCR7 WrXLq/uHKDBbHZaScQlym69MOhFeSqOH/RPd7SsTiiH2Hr1HLHdFZey6LkPwbNUTLZRb gHbXb9B5A3EPoG+ipS14E+Uxo3UidVTAWcsY0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version; bh=Mdb4T961s/1Pcn7Snum1QFXu2ONNJ1wf0K7LmowejOA=; b=OAYUiTEcj63CreiExllK2ELTi3wR8vC1AAtd6RspL1Nn+/dyeQKGyVLNEWWxdhAtUr PeTdmHllZZd7JVD5oCDEvliA77KOkWnACNKUndaSR87JGIOZAB/0CULRrGEn6egg4BHN /j6bSwMCFDZaoy3wC1imnRrqGiaCWFIAzYFDSyZn+BnHuyht4SSoTTFFf6B+SrrOtXXN Y0a7TA2kvI5zSTLhg+YOHHkmcHFami38Xd4NwZM+138O5DrVzjaKGedIW+ysvDCdzfHS +mMYHPQ7xp0GzMUG4E0RGUBBEfxTvx5hg3UGJeUSsXkkyGPtS2ftEKtXkotxxDIA8FdH SV/g== X-Gm-Message-State: APjAAAXdyQ6FiDfbEWPQCs0E7WoLw8YaZ0WfhPMn3pIG+xg3oEWKCork 9F6qLMAg2p4xdz8VGUXgh1Semw== X-Received: by 2002:a2e:7318:: with SMTP id o24mr6000039ljc.185.1575120584323; Sat, 30 Nov 2019 05:29:44 -0800 (PST) Received: from cloudflare.com ([2a02:a310:c262:aa00:b35e:8938:2c2a:ba8b]) by smtp.gmail.com with ESMTPSA id a12sm6967959ljk.48.2019.11.30.05.29.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 30 Nov 2019 05:29:43 -0800 (PST) References: <5f4028c48a1a4673bd3b38728e8ade07@AcuMS.aculab.com> <20191127164821.1c41deff@carbon> <0b8d7447e129539aec559fa797c07047f5a6a1b2.camel@redhat.com> <2f1635d9300a4bec8a0422e9e9518751@AcuMS.aculab.com> <313204cf-69fd-ec28-a22c-61526f1dea8b@gmail.com> <1265e30d04484d08b86ba2abef5f5822@AcuMS.aculab.com> User-agent: mu4e 1.1.0; emacs 26.1 From: Jakub Sitnicki To: David Laight Cc: Eric Dumazet , 'Paolo Abeni' , Jesper Dangaard Brouer , 'Marek Majkowski' , linux-kernel , network dev , kernel-team Subject: Re: epoll_wait() performance In-reply-to: Date: Sat, 30 Nov 2019 14:29:41 +0100 Message-ID: <878snxo5kq.fsf@cloudflare.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Nov 30, 2019 at 02:07 AM CET, Eric Dumazet wrote: > On 11/28/19 2:17 AM, David Laight wrote: >> From: Eric Dumazet >>> Sent: 27 November 2019 17:47 >> ... >>> A QUIC server handles hundred of thousands of ' UDP flows' all using only one UDP socket >>> per cpu. >>> >>> This is really the only way to scale, and does not need kernel changes to efficiently >>> organize millions of UDP sockets (huge memory footprint even if we get right how >>> we manage them) >>> >>> Given that UDP has no state, there is really no point trying to have one UDP >>> socket per flow, and having to deal with epoll()/poll() overhead. >> >> How can you do that when all the UDP flows have different destination port numbers? >> These are message flows not idempotent requests. >> I don't really want to collect the packets before they've been processed by IP. >> >> I could write a driver that uses kernel udp sockets to generate a single message queue >> than can be efficiently processed from userspace - but it is a faff compiling it for >> the systems kernel version. > > Well if destinations ports are not under your control, > you also could use AF_PACKET sockets, no need for 'UDP sockets' to receive UDP traffic, > especially it the rate is small. Alternatively, you could steer UDP flows coming to a certain port range to one UDP socket using TPROXY [0, 1]. TPROXY has the same downside as AF_PACKET, meaning that it requires at least CAP_NET_RAW to create/set up the socket. OTOH, with TPROXY you can gracefully co-reside with other services, filering on just the destination addresses you want in iptables/nftables. Fan-out / load-balancing with reuseport to have one socket per CPU is not possible, though. You would need to do that with Netfilter. -Jakub [0] https://www.kernel.org/doc/Documentation/networking/tproxy.txt [1] https://blog.cloudflare.com/how-we-built-spectrum/