Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2467661rdf; Mon, 6 Nov 2023 15:32:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IH7m+2JmAZUcWqxEGATvoJex6rDzB0llUTfxxrNX3KHMeImuDUT6n3XAWHuzLuMTlJyZ4Gb X-Received: by 2002:a17:903:84b:b0:1c1:dbd6:9bf6 with SMTP id ks11-20020a170903084b00b001c1dbd69bf6mr14815824plb.41.1699313579167; Mon, 06 Nov 2023 15:32:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699313579; cv=none; d=google.com; s=arc-20160816; b=xNfDziqiubny7pIvhVqcPZCA4V4PuS0tz9g8dfraUlX9o6vpEC7hgfiRt6vjGrsrdi fo4brOWebfXmSXiqTFQIWL3LsH8IiRTdjZAliunHOOyTebXcqQMObQ7JMJZT6AEF48Pb koEg3KEDgKeDrCJl08S4TE79iVsAr+w3sRCaykB+PUldbf05DjKGiAGk8X+yhMnCZUG0 Y43f29LiHjo0kkaMKcS+q3FSNtIdmVG3+rgVBQWo0nZmPc4i0ImONz0TjUD6FVSWaEmu NYB+u/eTf8QPicgvyd7TyKPHXC8RtbCtbGzjE5PDeg8Ia/EFbhwZ6qNUBUQFgPq/mjD5 sukQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=whIvC0sNba4bFzFOhrMovdRgssWrwgttcPpuIWiql/s=; fh=CZeRwgdnUxSjphLI3DXsQJfBi7a+kXNPYTGkDlvd/6M=; b=pEPQ0xuksv7fh/bIGVcslXoWBG43FzgHXwVrkq8zocClEnfv65FUxGruW83p/EAb5V 82tzuG1y+qtuR43qMKwef7iOdNdu8od7m0Dh1jpwRn8R2i+q974rdLzMrTSbKE/ISXEo b7E/HgjiUm/x6iV4+fhcDBMOqMy0iYcTI1Yf6vmnqvuFZ7f6s7z+/FNgtpPR0oALMk/C C1tfKGLh+ZCEujrJhiYqfuNELIUK6u5jwTGZoGlA465K//4ZioBbbJp9PnXp31jW09tG HzgweMR1P2ge8KG7p+g5r4UG72qEo97DsgApcPtlJhWS8ZT8rjhKgzov8zdpQbcmWlBy /RdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=0fFLegmb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id d9-20020a170903230900b001c62b51ac0fsi9726472plh.306.2023.11.06.15.32.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 15:32:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=0fFLegmb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id CEF91802F722; Mon, 6 Nov 2023 15:32:56 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233093AbjKFXcm (ORCPT + 99 others); Mon, 6 Nov 2023 18:32:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233080AbjKFXcl (ORCPT ); Mon, 6 Nov 2023 18:32:41 -0500 Received: from mail-vk1-xa30.google.com (mail-vk1-xa30.google.com [IPv6:2607:f8b0:4864:20::a30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E69089E for ; Mon, 6 Nov 2023 15:32:37 -0800 (PST) Received: by mail-vk1-xa30.google.com with SMTP id 71dfb90a1353d-4ac71c558baso361208e0c.2 for ; Mon, 06 Nov 2023 15:32:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699313557; x=1699918357; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=whIvC0sNba4bFzFOhrMovdRgssWrwgttcPpuIWiql/s=; b=0fFLegmbL+Iaw2NnAVRviDlP/OLm9i/CKzG4iU/X2gMdRfz6aMmu5/Lmv0lhbcz6a9 0cOVzosG3ZULPed5qqIEAS8I0H9UjqEkxU7AS8S6swCnYUA50PMNgBoGjZUVmzJFiVmS 2LEQAWHf7LI9fv90CLTZVsLzuw4xc84SYwlH3J7mDfWyvLLFVDRB4wG3cTq0PnEt1p3X 4FoN/GRHBTqkmResJghw2NgA4vvC8Px3oxjqNlNODNHrbB51SdSp/svRZFM4VC7vjTL0 YLH+BLj6Jh7pGQH83RnksObHNPNrOKBBdERwO3OVaNEZ4Faxb4/lMJo4akEP34zP5SNf 4x2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699313557; x=1699918357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=whIvC0sNba4bFzFOhrMovdRgssWrwgttcPpuIWiql/s=; b=h9HXQrmfzRrDKSOB12HBFnJXgvM7Rr6tul7o2PlcjzlmsCccixDRgAtfFaN1Gk6acV Jdu/tyvs3lHcl2liNTgK+yMH2VxZg2vLgnm9oz2ohldstJsP7usbSr5snS6HuF2aT/Z9 spc9Gz5fQGgjEP/8Ve2da7UwHVuAWgr7cSFqZneRtJtR01GqM/uc0JzLyhm8ubF0orv/ S1qzUj80Ve0cdbFUZyJrM/ZV/kx92Vkc7wz+DybiLNwu2tX1Nj2rPQDI96qJYAzH+Ioj 9ld+/LmRGVFQP4yE5WSEihFwWzU2pnMxr/6xStl99AdseAkUg/DvMwIhhaSRiytnB+Rt +r0w== X-Gm-Message-State: AOJu0YyrkfwtigrRmV1u4gmiDmB/VpvaDnZqmTU3Md+cnUizWh+nXJPY O0doMLU/RxivSHlCXDe8FCXw9giqtCN7kXxB6Uceaw== X-Received: by 2002:a1f:9d04:0:b0:4ac:6a9d:c49b with SMTP id g4-20020a1f9d04000000b004ac6a9dc49bmr2214430vke.14.1699313556771; Mon, 06 Nov 2023 15:32:36 -0800 (PST) MIME-Version: 1.0 References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-11-almasrymina@google.com> In-Reply-To: From: Stanislav Fomichev Date: Mon, 6 Nov 2023 15:32:22 -0800 Message-ID: Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP To: Willem de Bruijn Cc: Mina Almasry , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 06 Nov 2023 15:32:56 -0800 (PST) On Mon, Nov 6, 2023 at 2:56=E2=80=AFPM Willem de Bruijn wrote: > > On Mon, Nov 6, 2023 at 2:34=E2=80=AFPM Stanislav Fomichev wrote: > > > > On 11/06, Willem de Bruijn wrote: > > > > > IMHO, we need a better UAPI to receive the tokens and give them b= ack to > > > > > the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job don= e, > > > > > but look dated and hacky :-( > > > > > > > > > > We should either do some kind of user/kernel shared memory queue = to > > > > > receive/return the tokens (similar to what Jonathan was doing in = his > > > > > proposal?) > > > > > > > > I'll take a look at Jonathan's proposal, sorry, I'm not immediately > > > > familiar but I wanted to respond :-) But is the suggestion here to > > > > build a new kernel-user communication channel primitive for the > > > > purpose of passing the information in the devmem cmsg? IMHO that se= ems > > > > like an overkill. Why add 100-200 lines of code to the kernel to ad= d > > > > something that can already be done with existing primitives? I don'= t > > > > see anything concretely wrong with cmsg & setsockopt approach, and = if > > > > we switch to something I'd prefer to switch to an existing primitiv= e > > > > for simplicity? > > > > > > > > The only other existing primitive to pass data outside of the linea= r > > > > buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that > > > > preferred? Any other suggestions or existing primitives I'm not awa= re > > > > of? > > > > > > > > > or bite the bullet and switch to io_uring. > > > > > > > > > > > > > IMO io_uring & socket support are orthogonal, and one doesn't precl= ude > > > > the other. As you know we like to use sockets and I believe there a= re > > > > issues with io_uring adoption at Google that I'm not familiar with > > > > (and could be wrong). I'm interested in exploring io_uring support = as > > > > a follow up but I think David Wei will be interested in io_uring > > > > support as well anyway. > > > > > > I also disagree that we need to replace a standard socket interface > > > with something "faster", in quotes. > > > > > > This interface is not the bottleneck to the target workload. > > > > > > Replacing the synchronous sockets interface with something more > > > performant for workloads where it is, is an orthogonal challenge. > > > However we do that, I think that traditional sockets should continue > > > to be supported. > > > > > > The feature may already even work with io_uring, as both recvmsg with > > > cmsg and setsockopt have io_uring support now. > > > > I'm not really concerned with faster. I would prefer something cleaner = :-) > > > > Or maybe we should just have it documented. With some kind of path > > towards beautiful world where we can create dynamic queues.. > > I suppose we just disagree on the elegance of the API. Yeah, I might be overly sensitive to the apis that use get/setsockopt for something more involved than setting a flag. Probably because I know that bpf will (unnecessarily) trigger on these :-D I had to implement that bpf "bypass" (or fastpath) for TCP_ZEROCOPY_RECEIVE and it looks like this token recycle might also benefit from something similar. > The concise notification API returns tokens as a range for > compression, encoding as two 32-bit unsigned integers start + length. > It allows for even further batching by returning multiple such ranges > in a single call. Tangential: should tokens be u64? Otherwise we can't have more than 4gb unacknowledged. Or that's a reasonable constraint? > This is analogous to the MSG_ZEROCOPY notification mechanism from > kernel to user. > > The synchronous socket syscall interface can be replaced by something > asynchronous like io_uring. This already works today? Whatever > asynchronous ring-based API would be selected, io_uring or otherwise, > I think the concise notification encoding would remain as is. > > Since this is an operation on a socket, I find a setsockopt the > fitting interface.