Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2581497rdb; Fri, 8 Dec 2023 12:17:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IECXwrV+SCS82K7l5TLjM7v5X0nEJuTC/yK3AHc0a0lP+gqgZf+ZQP+N26OEj+v5ixlbmg6 X-Received: by 2002:a05:6a20:1445:b0:18f:97c:4f33 with SMTP id a5-20020a056a20144500b0018f097c4f33mr576803pzi.63.1702066626277; Fri, 08 Dec 2023 12:17:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702066626; cv=none; d=google.com; s=arc-20160816; b=Rl5GBjkmdlPXKEo9nQA0kaQWY8ppvyL0oXDx2LHtjcbRzu8RoEzecptS1e+JHZJxX4 LGG4/RONzS4o4ce6cKOR76mdtdemja7F2UgDoz7iorPfMNWZlxTR4mWt9H5mFhxxgE4X /yy/M+yHiG1J3K9G2X2fc9lt89sIuSBPvRhuvBZJuWqzdvfVQ0j7ZFbAt8ZVMh0pP3qD 1z8QbgVD2xjwdm8Sa9Gjp6JbFuz3nXfUgY68dC/bQCyof1JZIuPWzM9haPu3dPp0Qv3T Q1c/4eNanhXRA7/7POUPeMT6QwPxNKpZRQjxeO3iklTltkeXK0TdDjK+wfzX62A14IPp tvBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=OIAzMCiwJSZ7GUp121efXJqWFAFOgKJGEvOKpU6gcvs=; fh=VStg6QIiXkC9l2KF/funjFqMv4uhWWxLfpKUKpbI5DQ=; b=dzQGqtpZf+1ktLHJmEXXKKdPinleGHUGHMTadqFs3trVJrWHSgLaMDStRPrJfDOkNi k8WyoNBphjCt67rhqIrcdq6EJMPeMyyBsICurEU7tRs20ywoZMe8j/V6TbDH7geJvY7H jxqD8HIlWiRX6+xJIEiRkBJWvis/J20u9pHDEB9LtjvG8cPV41W0jNTfJE3ZCaNqQWzx B8UW/vYfJXcbcK75xKEP94X0y8xR7d1h1rwqaB2h8L7FfrN3QDMM4VwWsU/Ey2ySY5uh I4pl3r28YM8WGC5qqe6emxoZ11G+LP4C7CKXb6pQWmTJJEwMFRg94e0MrKAuF2n5T9Y/ J/mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ciaEEPpz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id x65-20020a636344000000b005b928e39429si2039560pgb.259.2023.12.08.12.17.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 12:17:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ciaEEPpz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id B54B680B31D1; Fri, 8 Dec 2023 12:17:03 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233919AbjLHUQu (ORCPT + 99 others); Fri, 8 Dec 2023 15:16:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbjLHUQt (ORCPT ); Fri, 8 Dec 2023 15:16:49 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8FE8123; Fri, 8 Dec 2023 12:16:54 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-40c3b3caa55so431815e9.1; Fri, 08 Dec 2023 12:16:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702066613; x=1702671413; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=OIAzMCiwJSZ7GUp121efXJqWFAFOgKJGEvOKpU6gcvs=; b=ciaEEPpz5lAJoGJzD2cvMqHq1rfkSp8yisi+8QJ0AoflLBBIQvgXpbq30467OoeEVM c6xvnsBpikH2QsN3pZft/szdcx+t2nbpqRklkTFmUV5o6u1ZJXozNiR+KME1/gVcouE/ 0AuxXUww17UUhn4zt0fn1whDfIvnUNQBRbfLvvMJ13DqFIwJ4EkTOKdLjSDPegb5UGLr ko1tq45cIIecTWe8yiqxyMGxxNd4lW7ykYmUB2/0CBbLSupAuif7Ef0UmlvGEjvMzBxF XzYNSi/O7U7xrvV9x+MzsFP3wdwlAWxVHrPIrI3TTqGTdkH6agwc3zC4VSHLs/DX6rWN tR4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702066613; x=1702671413; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OIAzMCiwJSZ7GUp121efXJqWFAFOgKJGEvOKpU6gcvs=; b=wMPTyeheRHqQxjb5Q0ObGdSQlJCxzh6AExB0KHPUnxuASRefRQPHZAbRQl/EOIShhC ZdULtH8x0a3xU0GlMU6dT6czjKHLvzDKpCdavuahhMyrQJ/sQL5nB9BsJWdG1bCcX8OS p9sA/SmDHLVB+t0I+SELT9yKq8FY3y7ymHkKxS1aJKmaWpgdRQMF9tUI8ZUgQSBYJEiL wEzeq5HV6sEGYSBSRohjgYpmEPcE7Cruu4nmbuCJwgbpqaVFdhtZSMt0TPlzamY8oiHp 60LYaKhQKUcwfFcyLbRRzoZ9OM0jvrKfImmhqECnRM74kCQE58+utjObjNayMW5vcgcA r3/Q== X-Gm-Message-State: AOJu0YygfZ9/ig5yV/nLffHfYGS9htCuCmi4uCDuDC+d995gl9BToAKa gxsLQbwydaAMvR+JUvcCVSo= X-Received: by 2002:a05:600c:54c2:b0:40b:5e1c:5c25 with SMTP id iw2-20020a05600c54c200b0040b5e1c5c25mr272170wmb.58.1702066612869; Fri, 08 Dec 2023 12:16:52 -0800 (PST) Received: from [192.168.8.100] ([85.255.232.89]) by smtp.gmail.com with ESMTPSA id v12-20020a05600c470c00b0040b37f107c4sm2713374wmo.16.2023.12.08.12.16.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Dec 2023 12:16:52 -0800 (PST) Message-ID: <48bcbb79-6464-4a46-8070-b59a64018b91@gmail.com> Date: Fri, 8 Dec 2023 20:09:44 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP Content-Language: en-US To: Stanislav Fomichev , Willem de Bruijn Cc: Mina Almasry , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-11-almasrymina@google.com> From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 08 Dec 2023 12:17:03 -0800 (PST) On 11/6/23 22:34, Stanislav Fomichev wrote: > On 11/06, Willem de Bruijn wrote: >>>> IMHO, we need a better UAPI to receive the tokens and give them back to >>>> the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done, >>>> but look dated and hacky :-( >>>> >>>> We should either do some kind of user/kernel shared memory queue to >>>> receive/return the tokens (similar to what Jonathan was doing in his >>>> proposal?) Oops, missed the discussion. IMHO shared rings are more elegant here. With that the app -> kernel buffer return path doesn't need to setsockopt(), which will have to figure out how to return buffers to pp efficiently, and then potentially some sync on the pp allocation side. It just grabs entries from the ring in the napi context on allocation when necessary. But then you basically get the io_uring zc rx... just saying >>> I'll take a look at Jonathan's proposal, sorry, I'm not immediately >>> familiar but I wanted to respond :-) But is the suggestion here to >>> build a new kernel-user communication channel primitive for the >>> purpose of passing the information in the devmem cmsg? IMHO that seems >>> like an overkill. Why add 100-200 lines of code to the kernel to add >>> something that can already be done with existing primitives? I don't >>> see anything concretely wrong with cmsg & setsockopt approach, and if >>> we switch to something I'd prefer to switch to an existing primitive >>> for simplicity? >>> >>> The only other existing primitive to pass data outside of the linear >>> buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that >>> preferred? Any other suggestions or existing primitives I'm not aware >>> of? >>> >>>> or bite the bullet and switch to io_uring. >>>> >>> >>> IMO io_uring & socket support are orthogonal, and one doesn't preclude >>> the other. They don't preclude each other, but I wouldn't say they're orthogonal. Similar approaches, some different details. FWIW, we'll be posting a next iteration on top of the pp providers patches soon. >>> As you know we like to use sockets and I believe there are >>> issues with io_uring adoption at Google that I'm not familiar with >>> (and could be wrong). I'm interested in exploring io_uring support as >>> a follow up but I think David Wei will be interested in io_uring >>> support as well anyway. Well, not exactly support of devmem, but true, we definitely want to have io_uring zerocopy, considering all the api differences. (at the same time not duplicating net bits). >> I also disagree that we need to replace a standard socket interface >> with something "faster", in quotes. >> >> This interface is not the bottleneck to the target workload. >> >> Replacing the synchronous sockets interface with something more >> performant for workloads where it is, is an orthogonal challenge. >> However we do that, I think that traditional sockets should continue >> to be supported. >> >> The feature may already even work with io_uring, as both recvmsg with >> cmsg and setsockopt have io_uring support now. It should, in theory, but the api wouldn't suit io_uring, internals wouldn't be properly optimised, and we can't use it with some important features like multishot recv because of cmsg. > I'm not really concerned with faster. I would prefer something cleaner :-) > > Or maybe we should just have it documented. With some kind of path > towards beautiful world where we can create dynamic queues.. -- Pavel Begunkov