Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2590110rdb; Fri, 8 Dec 2023 12:36:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IEtXcGhOENuFvLID/wot+95xt+cHo6OCS6bF3CyR8fEFNZUOFB1wHlrYG5zKxarCK45ueJw X-Received: by 2002:a17:902:830b:b0:1d0:8352:b71c with SMTP id bd11-20020a170902830b00b001d08352b71cmr571122plb.5.1702067768554; Fri, 08 Dec 2023 12:36:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702067768; cv=none; d=google.com; s=arc-20160816; b=Q7KxMB0p/gR8wR4YCHdRCnAmy5kblVIzfQ8qQzfNwSLS4TApzyn9KDIflLCQBOOha8 iAqDv8DjKhwb/x1uJhKjoa5r0CEE3NJKgbuxg72JKHFpJRYdu+tIy98EinVpbeZIpdKR ZXZWRz1z6XlV3EGPKAkuoTfAdLcZ1suhoCl52UC4awKs7n0O+UpSg+RQM1p7NtNWJTQm cjvyncUilidrvAUGMJ9IF+WIDURtwXjEu+DboD0SKMGasaqgxuddTlivT/Ec2pUZYViS oijy3uSw9tAD+v8YtPjOsBJUjyvv9Py8uekKEJ0NYL6+kchShONntlOx/X5EPlWUwRDL Yd1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=5WPzggr+JIIcZt9j2+N/vNpUQEAOa3KMecMg/7uW9IE=; fh=UAfXC1tJtAzklkM/d95ZmTfoQTwpR5NSjLi+BoBXnqE=; b=Zbu73eSlG1Y8UvgDMWfTKlnO2iWHJetQ4sOuyhvyiCxdfp5llRh2+gRlEdDpwgw898 RMnranttGa/kAHcPBNTX5rR9SrqXNpF/eYO2g7ZsdQHx1DnjANJu+eIXPBfesNkEt/Os zSW1VMdCgfK86rm3l+2RRba04ZHhrWdoJtLdSAFXAkWHmqptGSrUFf0YunSeKy1BcLRF ENFbVYdkGzFtmD8ekECvv/kWelI5sRM/S59Yw2NMq/0H4doUYBHHsZylDL7X+hpd9LCU n9AcMavL1O81VTlZwr1Bn/wgdKsd4dq8vpDsGf2O1oUIdSVWN8m4/6Oj2dvrov6aOf/a SvGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GNt935a5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id t7-20020a170902bc4700b001d0748f1dffsi2004295plz.162.2023.12.08.12.36.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 12:36:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GNt935a5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 74D90807E442; Fri, 8 Dec 2023 12:35:45 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233862AbjLHUfX (ORCPT + 99 others); Fri, 8 Dec 2023 15:35:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbjLHUfW (ORCPT ); Fri, 8 Dec 2023 15:35:22 -0500 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3574198C; Fri, 8 Dec 2023 12:35:27 -0800 (PST) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-40a4848c6e1so29638245e9.1; Fri, 08 Dec 2023 12:35:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702067726; x=1702672526; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=5WPzggr+JIIcZt9j2+N/vNpUQEAOa3KMecMg/7uW9IE=; b=GNt935a57YU1ODyelRed/q3gVEF0A3p77smAwUvWNAj18cPFGb1tWns72zj504cOIc oX92zHrDHlePliG50XROTC1yOiWJfA6DgG+4Bvagz7gQfmoZeurTO80KLlZ7h6+loaHy jAjHsG7gPRu/NUyN+p8ai7234qwQJMmOel0xVtmnYe4xUMTofYgBmE8q0rfp+OuP7x9+ 0tt1zS/i+hxNzgPrgmlQOxLv/XkDgjq9QFUgxEWlaFuaa+0msMU1CMnIrj8VPXiRqfa6 Pc3W5K0qRMlDLnBFBk0d2NtNiWwZCumLYhb4qmIk5zpIfWstHuX5qacPPjKnWVzrEF6W VX0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702067726; x=1702672526; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5WPzggr+JIIcZt9j2+N/vNpUQEAOa3KMecMg/7uW9IE=; b=cB3M36yLCEC3nCzSkPpFq/+WzxwSnuV8xIOANYKgFOnP34f9L5RRuYlXBNMDqQWlQ2 V5sP5Pv7EmF0p8kglPykNxLjrrcP/OcZpzyA33k1U3sehBdWFxm59pwSKmcF8C9hiRWV +tFBjkBBs3whlB8pLzyB1UgbzaiG6aREF6FynO8HjXUZwBdblWXzh199oTUDBxynBpE9 f4nbhZQ3y/3/lULSdhkTS1cJMj27guNEAQNAsU5T+lLaZWJ5/7L8sY9B0WKcgbDuj+4B wCYpZSzsofKD6Z4MuFK9WSIbd0f73Y8MeVnNPYWkB+hnj+WrB6SmOasTAoyW6Toi+250 xgPQ== X-Gm-Message-State: AOJu0Yy67IezYE7L8UMhQvoxxNj8j7fAhPGIehTD2FeOs58h7ea31bBu w4+IWPX28CIkJFXkuwgBZMY= X-Received: by 2002:a05:600c:3093:b0:40b:5e21:bdbe with SMTP id g19-20020a05600c309300b0040b5e21bdbemr287462wmn.77.1702067725916; Fri, 08 Dec 2023 12:35:25 -0800 (PST) Received: from [192.168.8.100] ([85.255.232.89]) by smtp.gmail.com with ESMTPSA id u10-20020a5d434a000000b0033342338a24sm2778357wrr.6.2023.12.08.12.35.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Dec 2023 12:35:25 -0800 (PST) Message-ID: <7e7c2c21-12ba-41c1-92c4-f32a3906f3ee@gmail.com> Date: Fri, 8 Dec 2023 20:28:15 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP Content-Language: en-US To: Willem de Bruijn , Stanislav Fomichev Cc: Mina Almasry , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-11-almasrymina@google.com> From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 08 Dec 2023 12:35:45 -0800 (PST) On 11/6/23 22:55, Willem de Bruijn wrote: > On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev wrote: >> >> On 11/06, Willem de Bruijn wrote: >>>>> IMHO, we need a better UAPI to receive the tokens and give them back to >>>>> the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done, >>>>> but look dated and hacky :-( >>>>> >>>>> We should either do some kind of user/kernel shared memory queue to >>>>> receive/return the tokens (similar to what Jonathan was doing in his >>>>> proposal?) >>>> >>>> I'll take a look at Jonathan's proposal, sorry, I'm not immediately >>>> familiar but I wanted to respond :-) But is the suggestion here to >>>> build a new kernel-user communication channel primitive for the >>>> purpose of passing the information in the devmem cmsg? IMHO that seems >>>> like an overkill. Why add 100-200 lines of code to the kernel to add >>>> something that can already be done with existing primitives? I don't >>>> see anything concretely wrong with cmsg & setsockopt approach, and if >>>> we switch to something I'd prefer to switch to an existing primitive >>>> for simplicity? >>>> >>>> The only other existing primitive to pass data outside of the linear >>>> buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that >>>> preferred? Any other suggestions or existing primitives I'm not aware >>>> of? >>>> >>>>> or bite the bullet and switch to io_uring. >>>>> >>>> >>>> IMO io_uring & socket support are orthogonal, and one doesn't preclude >>>> the other. As you know we like to use sockets and I believe there are >>>> issues with io_uring adoption at Google that I'm not familiar with >>>> (and could be wrong). I'm interested in exploring io_uring support as >>>> a follow up but I think David Wei will be interested in io_uring >>>> support as well anyway. >>> >>> I also disagree that we need to replace a standard socket interface >>> with something "faster", in quotes. >>> >>> This interface is not the bottleneck to the target workload. >>> >>> Replacing the synchronous sockets interface with something more >>> performant for workloads where it is, is an orthogonal challenge. >>> However we do that, I think that traditional sockets should continue >>> to be supported. >>> >>> The feature may already even work with io_uring, as both recvmsg with >>> cmsg and setsockopt have io_uring support now. >> >> I'm not really concerned with faster. I would prefer something cleaner :-) >> >> Or maybe we should just have it documented. With some kind of path >> towards beautiful world where we can create dynamic queues.. > > I suppose we just disagree on the elegance of the API. > > The concise notification API returns tokens as a range for > compression, encoding as two 32-bit unsigned integers start + length. > It allows for even further batching by returning multiple such ranges > in a single call. FWIW, nothing prevents io_uring from compressing ranges. The io_uring zc RFC returns {offset, size} as well, though at the moment the would lie in the same page. > This is analogous to the MSG_ZEROCOPY notification mechanism from > kernel to user. > > The synchronous socket syscall interface can be replaced by something > asynchronous like io_uring. This already works today? Whatever If you mean async io_uring recv, it does work. In short, internally it polls the socket and then calls sock_recvmsg(). There is also a feature that would make it return back to polling after sock_recvmsg() and loop like this. > asynchronous ring-based API would be selected, io_uring or otherwise, > I think the concise notification encoding would remain as is. > > Since this is an operation on a socket, I find a setsockopt the > fitting interface. -- Pavel Begunkov