Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp2875812imw; Sun, 17 Jul 2022 19:36:47 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t1Hmk8i0DKQbJaucR0z7VdaVw2D7X7nkglCZ54D2a4uB3nn9nqRv2rNUw1MQk6zl0ieEWJ X-Received: by 2002:a63:9547:0:b0:408:be53:b599 with SMTP id t7-20020a639547000000b00408be53b599mr22760810pgn.463.1658111807626; Sun, 17 Jul 2022 19:36:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658111807; cv=none; d=google.com; s=arc-20160816; b=o6WBegj0dCbvmUOYNn/TU+h4nvrgwx1OWyia82Tp4CTZ+fUncuFpOAFpnVOW8wiAkx ss0bBKyHEk0aKUTN8zFEede/C68v1blCW0nAZMhgiwN8i47DDtAqpt6Paf/5+o3WOYIV 4t7Of1KLPldrTUDSVaBWNX0qXJonJB3BDLSMAf45PVOL3xUK0pu7BkHmvH1ITrDy9Zqr ljR5TD0La0s2yit38VN6S251RJ1hZt2X6VgyFW8nQG9IWb+MoMRL5c8sPSQ2myuGl+EG 4foMj8y8Zelff9rX8y9OxabUjL2VGv93lp3hh7w2z5pEgRfo0tjm/mbY7R/uM62C39I2 6c6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=cZnNxD+vn/hs2yg9QqrRnT8iyZzjUocV93Xi0XeMpQs=; b=m5yE0H5amx0oG1Ijs8EvUNUcSaGJ36lgHlERd52WRskVZrQMr8oCTlr9CIvThZTRxi p+HKFYsJydQE7XFIbbTsLVmKghvHD6daMruvtoAqmPWaUsBXX3AgcJ6FTvQ1zeDyN9vp 7hkEDPQGw3EPBFVVet5sMLfLXpf7fr56JIAcbdGdlihfoXjuL91NHpeFhohtNQOevxYc 9d0uVOBqJ6MwQxpnsXt8wQZMI7msd5gZ3qibJHzvqq7HRRmV0pjK3425JPIReBW6nR8M VqKExeUtGa5FPMlaCkjU0Q0YYtrOOvg3DUyY/+VBi/zH6VEgtjREu2u99PcscEWSFZ1n FDXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=JxyKOmhu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e9-20020a170903240900b001678ce90824si10460871plo.542.2022.07.17.19.36.33; Sun, 17 Jul 2022 19:36:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=JxyKOmhu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233088AbiGRCTZ (ORCPT + 99 others); Sun, 17 Jul 2022 22:19:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229886AbiGRCTY (ORCPT ); Sun, 17 Jul 2022 22:19:24 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F072DEA8; Sun, 17 Jul 2022 19:19:23 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CA78A61117; Mon, 18 Jul 2022 02:19:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C2160C3411E; Mon, 18 Jul 2022 02:19:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1658110762; bh=1Xu5kv9Vaj48OhBfO2C5bpnImS1ZSkJcwzDbJu5BTDQ=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=JxyKOmhuOVtfQWNQZYXQxBpgYR72OVijtbLe9Zuna3bNU3m/WjG21N7mPrNcUPZfQ WLfdEoxH7Z86Uv76fnH8lv0l4LJ9KlQNyXORyq+rLRgb3Y8zKHGjvU+beoxE22oUXe +HC57lY9CC9iHo/p2TLZKkYP9cjx0RrKPlasmctPfVHzg2gaNGu9fvAbyLj59piNoY SyEo1ODpgk8adT0DcXOvKgCKe4H5iBTjoYq4bJzcg6jdjcT0P2y0s/sTiChO7lxKRF U17n89cyO8GdBZDuZS3uchBopAhgR32rC4TfgcMV/+5TfN5hJzp2xn4MWDkSfGCHYB JJfifVQwP5r0A== Message-ID: <812c3233-1b64-8a0d-f820-26b98ff6642d@kernel.org> Date: Sun, 17 Jul 2022 20:19:20 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH net-next v4 00/27] io_uring zerocopy send Content-Language: en-US To: Pavel Begunkov , io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com References: <2c49d634-bd8a-5a7f-0f66-65dba22bae0d@kernel.org> <0f54508f-e819-e367-84c2-7aa0d7767097@gmail.com> From: David Ahern In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/14/22 12:55 PM, Pavel Begunkov wrote: >>>>> You dropped comments about TCP testing; any progress there? If not, >>>>> can >>>>> you relay any issues you are hitting? >>>> >>>> Not really a problem, but for me it's bottle necked at NIC bandwidth >>>> (~3GB/s) for both zc and non-zc and doesn't even nearly saturate a CPU. >>>> Was actually benchmarked by my colleague quite a while ago, but can't >>>> find numbers. Probably need to at least add localhost numbers or grab >>>> a better server. >>> >>> Testing localhost TCP with a hack (see below), it doesn't include >>> refcounting optimisations I was testing UDP with and that will be >>> sent afterwards. Numbers are in MB/s >>> >>> IO size | non-zc    | zc >>> 1200    | 4174      | 4148 >>> 4096    | 7597      | 11228 >> >> I am surprised by the low numbers; you should be able to saturate a 100G >> link with TCP and ZC TX API. > > It was a quick test with my laptop, not a super fast CPU, preemptible > kernel, etc., and considering that the fact that it processes receives > from in the same send syscall roughly doubles the overhead, 87Gb/s > looks ok. It's not like MSG_ZEROCOPY would look much different, even > more to that all sends here will be executed sequentially in io_uring, > so no extra parallelism or so. As for 1200, I think 4GB/s is reasonable, > it's just the kernel overhead per byte is too high, should be same with > just send(2). ? It's a stream socket so those sends are coalesced into MTU sized packets. > >>> Because it's localhost, we also spend cycles here for the recv side. >>> Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the >>> omitted optimisations will somewhat help. I don't consider it to be a >>> blocker. but would be interesting to poke into later. One thing helping >>> non-zc is that it squeezes a number of requests into a single page >>> whenever zerocopy adds a new frag for every request. >>> >>> Can't say anything new for larger payloads, I'm still NIC-bound but >>> looking at CPU utilisation zc doesn't drain as much cycles as non-zc. >>> Also, I don't remember if mentioned before, but another catch is that >>> with TCP it expects users to not be flushing notifications too much, >>> because it forces it to allocate a new skb and lose a good chunk of >>> benefits from using TCP. >> >> I had issues with TCP sockets and io_uring at the end of 2020: >> https://www.spinics.net/lists/io-uring/msg05125.html >> >> have not tried anything recent (from 2022). > > Haven't seen it back then. In general io_uring doesn't stop submitting > requests if one request fails, at least because we're trying to execute > requests asynchronously. And in general, requests can get executed > out of order, so most probably submitting a bunch of requests to a single > TCP sock without any ordering on io_uring side is likely a bug. TCP socket buffer fills resulting in a partial send (i.e, for a given sqe submission only part of the write/send succeeded). io_uring was not handling that case. I'll try to find some time to resurrect the iperf3 patch and try top of tree kernel. > > You can link io_uring requests, i.e. IOSQE_IO_LINK, guaranteeing > execution ordering. And if you meant links in the message, I agree > that it was not the best decision to consider len < sqe->len not > an error and not breaking links, but it was later added that > MSG_WAITALL would also change the success condition to > len==sqe->len. But all that is relevant if you was using linking. >