Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AA58C54EED for ; Mon, 30 Jan 2023 14:30:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237809AbjA3OaD (ORCPT ); Mon, 30 Jan 2023 09:30:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237808AbjA3O3T (ORCPT ); Mon, 30 Jan 2023 09:29:19 -0500 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A0711715B for ; Mon, 30 Jan 2023 06:28:14 -0800 (PST) Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 1AE963F301 for ; Mon, 30 Jan 2023 14:28:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1675088892; bh=8tldsSMHjz/KGcubN74VP3m6mSViWeoZ+GXrdbkATG4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=Xtva5QHr4fCV9JrnEwUcsfVjaMoeQT12m8Ss1f3ZXSxONSGjEFZ+j0MwwQ6LfWAGb XNxqqWKYm9T0FwHWA4ndVDIDq1WffTAOv30eeRTDpjXdGLSpyk6pbHJyMQEItrsxCj srQKk80DpJ1C69kImw+CDuNtZ9Ydm/gx04nqO8oM7mgQ9gyBt+XxSUOjoh/pPOKF1k zLxKcHbO9E9eugEAePx+ElmBSUbnn1aEUJ8RR2sUXsk0MK0gvjiHcugLyX6iODKRJJ g4zB2Bixs4azFO8clMfzTpXtOb/wuiYlLgfFHLnBGNWcnaGrIFXP1R0sEi0UMf7tRN frGlmjcPDjaXg== Received: by mail-wm1-f69.google.com with SMTP id fl5-20020a05600c0b8500b003db12112fdeso7249963wmb.5 for ; Mon, 30 Jan 2023 06:28:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8tldsSMHjz/KGcubN74VP3m6mSViWeoZ+GXrdbkATG4=; b=IARFCQQi8sXIPxi2cbqFG7EKpAPEgBseCBLvda4MmaTpBNUyCgxx9NdEEG+nXMUo8s SwNl+tC85A5pnjie3T6QnENXcRWSQ3m7edyqQVqHbKt/qTFv9yvjd937zO0drSr9WcTN F5oiV86KxM+1kqm3TCswnMvxD7cXXji8CrxCWYkCtLHI64oAMgx5m5Qo41zq196f7I2k h9vM8IHmKdt8x7rayf/c7Iar+vUC5E2NdsrIz7YCB+kwsaVlbt0wo0tcxvXewndZ2igh 425t0w6f49rp7MZxKueIJ7tHABY59S3Bv7t83vLn0Jv5IYOW+5fW7uIXoi5UzdSFMxAc +klQ== X-Gm-Message-State: AO0yUKVMqrm4yFyH0HV/LkWOUuEe02USC3b0fe2Ryi0e0DyBbvxfFDQR EWbV1DIRxiB75FuE5vuKliHZfn5fKckLVWX+mo7Fne8+Ar0Yg2PtNO6/ghjWlGsaE2/GLghmLN/ R5IOkoQ6zeOIBwiqEAc6PhlQq1NqrP472a2EUq3mSpA== X-Received: by 2002:a05:600c:3ba6:b0:3dc:58d5:3a80 with SMTP id n38-20020a05600c3ba600b003dc58d53a80mr4007479wms.24.1675088891325; Mon, 30 Jan 2023 06:28:11 -0800 (PST) X-Google-Smtp-Source: AK7set9RXHwyV38sXXqZOIzwMAAznKaTCBHLsXPHnS8yXwbmHLuyZd3o4xjfz5sCXZ+8TzS3OT06TA== X-Received: by 2002:a05:600c:3ba6:b0:3dc:58d5:3a80 with SMTP id n38-20020a05600c3ba600b003dc58d53a80mr4007457wms.24.1675088891083; Mon, 30 Jan 2023 06:28:11 -0800 (PST) Received: from qwirkle ([81.2.157.149]) by smtp.gmail.com with ESMTPSA id r7-20020a05600c434700b003dc3f195abesm9540902wme.39.2023.01.30.06.28.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Jan 2023 06:28:10 -0800 (PST) Date: Mon, 30 Jan 2023 14:28:08 +0000 From: Andrei Gherzan To: Willem de Bruijn Cc: Paolo Abeni , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Shuah Khan , netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] selftests: net: udpgso_bench_tx: Introduce exponential back-off retries Message-ID: References: <20230127181625.286546-1-andrei.gherzan@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23/01/30 08:35AM, Willem de Bruijn wrote: > On Mon, Jan 30, 2023 at 7:51 AM Andrei Gherzan > wrote: > > > > On 23/01/30 09:26AM, Paolo Abeni wrote: > > > On Fri, 2023-01-27 at 17:03 -0500, Willem de Bruijn wrote: > > > > On Fri, Jan 27, 2023 at 1:16 PM Andrei Gherzan > > > > wrote: > > > > > > > > > > The tx and rx test programs are used in a couple of test scripts including > > > > > "udpgro_bench.sh". Taking this as an example, when the rx/tx programs > > > > > are invoked subsequently, there is a chance that the rx one is not ready to > > > > > accept socket connections. This racing bug could fail the test with at > > > > > least one of the following: > > > > > > > > > > ./udpgso_bench_tx: connect: Connection refused > > > > > ./udpgso_bench_tx: sendmsg: Connection refused > > > > > ./udpgso_bench_tx: write: Connection refused > > > > > > > > > > This change addresses this by adding routines that retry the socket > > > > > operations with an exponential back off algorithm from 100ms to 2s. > > > > > > > > > > Fixes: 3a687bef148d ("selftests: udp gso benchmark") > > > > > Signed-off-by: Andrei Gherzan > > > > > > > > Synchronizing the two processes is indeed tricky. > > > > > > > > Perhaps more robust is opening an initial TCP connection, with > > > > SO_RCVTIMEO to bound the waiting time. That covers all tests in one > > > > go. > > > > > > Another option would be waiting for the listener(tcp)/receiver(udp) > > > socket to show up in 'ss' output before firing-up the client - quite > > > alike what mptcp self-tests are doing. > > > > I like this idea. I have tested it and it works as expected with the > > exeception of: > > > > ./udpgso_bench_tx: sendmsg: No buffer space available > > > > Any ideas on how to handle this? I could retry and that works. > > This happens (also) without the zerocopy flag, right? That > > It might mean reaching the sndbuf limit, which can be adjusted with > SO_SNDBUF (or SO_SNDBUFFORCE if CAP_NET_ADMIN). Though I would not > expect this test to bump up against that limit. > > A few zerocopy specific reasons are captured in > https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html#transmission. I have dug a bit more into this, and it does look like your hint was in the right direction. The fails I'm seeing are only with the zerocopy flag. From the reasons (doc) above I can only assume optmem limit as I've reproduced it with unlimited locked pages and the fails are transient. That leaves optmem limit. Bumping the value I have by default (20480) to (2048000) made the sendmsg succeed as expected. On the other hand, the tests started to fail with something like: ./udpgso_bench_tx: Unexpected number of Zerocopy completions: 774783 expected 773707 received Also, this audit fail is transient as with the buffer limit one. -- Andrei Gherzan