Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp634040ybl; Thu, 22 Aug 2019 02:27:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqy/DHQVEmwVUOtfztTl9cQBsWyi9+YdwNZuELpieJtzWIMou5pJpqKefXoUeVC0e03+YG9M X-Received: by 2002:a65:64cf:: with SMTP id t15mr32267057pgv.88.1566466054639; Thu, 22 Aug 2019 02:27:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566466054; cv=none; d=google.com; s=arc-20160816; b=o2PzIeaaKAa8RH1TZrsbJ/EgYQOwV7knZc1TtKZ2svWC8VhCTCz63RR4OJ8Q2lj8GJ p3H8QWwKWNTKU9ccfepSKDNhHpXtcVBFI+4vUgLC3OqC8tt6lqKqoYp4FONrdeYlwwhU aSxh+fDEeYlXPwKlqFayhA767/FwRVGSN5OVCuvI4pb/sg/G3DXGSMFehok0E5J9/q6i 5rKRRwI0IurU9oiSViY1JLSfGQH0ce1I5+t18ErNqvjOBXOp11zXnFQ9cUgflqX+kZat s1XNXn8PiQg2DO6m3gI4tMTh4FChmXorFo0RR/cCo0Zeyr/xsOmRhstj4Qo7JGhFSuiX XOaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=V+r5KhRPxzjM3uj7uTYNut2acVQUlKxmnm9YS5qeaAw=; b=RokKj0rzZd3Glu6awV21I/VXv6MSq7ZHwciARiY7zzidYMyQgQSrSYVTFGbLXTpgia abDJjuJLmPk6djOOMMh6vfio2RIfOx2NCUAov7VEJjj96luTBwAdf+aNaDYAJQG6lmPc iJ2DWHL0vAdJuICA2mlsgAaen0Ppw8Be2dPHw22E4UFyozABusnKS6x5enRrv6bjADO4 Cyvk+Yz0awz34TUgvMiJZvR39akRP51aViKNwzrv/zE3/guSl5yncM084c4e1YLxsJCf CpPZcZFtn4Z2WHgEnlSeejDsrwV4RAjnAYBA9S45O7asgPAdW17D1SpN2DNsxf99UKek 8KxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=R5htcEzJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 143si16209710pgc.479.2019.08.22.02.27.20; Thu, 22 Aug 2019 02:27:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=R5htcEzJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731209AbfHVHM1 (ORCPT + 99 others); Thu, 22 Aug 2019 03:12:27 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:43795 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726332AbfHVHM1 (ORCPT ); Thu, 22 Aug 2019 03:12:27 -0400 Received: by mail-qt1-f196.google.com with SMTP id b11so6398378qtp.10; Thu, 22 Aug 2019 00:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=V+r5KhRPxzjM3uj7uTYNut2acVQUlKxmnm9YS5qeaAw=; b=R5htcEzJbGByo6YozjOZVsEHFQQHPkAKF4QVZ1jwveMN9DlnypgM/EMU/L9hVOP4Uw bgPtFKJ3ocs08DDBvJbwC4I0wpQpTHL7lYNJl5a5KosU1fuNUGu8bq+3s98FMdBge0Pa T0XpPAFBtVRImzPzSudaIdyqvUXi4uIq4gJ9dleTwhAh2rFy5XI21HU8X4U1KXmBghdq 5lwY+9My0aeQ1iY9M2/pMfDUCeBeO16FGuWH+jT0rMPZpYjTTkDyKFviq5K0uPnKKPUI vUgHUxfnVLlFgiUr6EHgRA4YtYtWzVNj6RMuwtWzFVoTS1Pn/3JmJtb4HTOn/TOuMkX0 FEcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=V+r5KhRPxzjM3uj7uTYNut2acVQUlKxmnm9YS5qeaAw=; b=h8tyMdBPCOGMOGFkwcTeUUAvMBdpkawIZT7VOzGcSwU4WuWtrb/6HbD6lZmDodjr+9 dGiOIyjM6JllQvQtBJsXYM6s3p8EAb/AXrONJjBBB3hkBz4e0TUKR1lAJM+Q4Anbey3e HTQ59CpBIUjIjszK9Be+7xSb+WiPdanUCHVzq9Ar9Ja2TmAuLNL7VYLAYWC1ziYYbGe0 CV9c4EW9UqgqF6DCi6o/Q7RF7yt1uMoKTJW8tfnB3J+krVa44kXGvmMLtwAcAnIGR8JR T1B5fs801ENrwrblHutVlEW0ctres9h7GZn322+fWWCAKSoyfLXsqo+7cCV0sqUon9+z UqOQ== X-Gm-Message-State: APjAAAUnhmq80+Mh5538nCdAnCsfCUZXy0QIzAwnMTDMeZRBTYm4k74Y L8+7bshj9TqWb6ER7UM+FAlgaAy3qDC+GQ9jpD4= X-Received: by 2002:ac8:4a83:: with SMTP id l3mr34467940qtq.46.1566457945741; Thu, 22 Aug 2019 00:12:25 -0700 (PDT) MIME-Version: 1.0 References: <20190820151611.10727-1-i.maximets@samsung.com> <625791af-c656-1e42-b60e-b3a5cedcb4c4@samsung.com> In-Reply-To: From: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Date: Thu, 22 Aug 2019 09:12:14 +0200 Message-ID: Subject: Re: [Intel-wired-lan] [PATCH net] ixgbe: fix double clean of tx descriptors with xdp To: Alexander Duyck Cc: Ilya Maximets , Jakub Kicinski , Daniel Borkmann , Netdev , William Tu , LKML , Alexei Starovoitov , intel-wired-lan , bpf , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , "David S. Miller" , Magnus Karlsson , Eelco Chaudron Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 Aug 2019 at 18:57, Alexander Duyck w= rote: > > On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets wr= ote: > > > > On 21.08.2019 4:17, Alexander Duyck wrote: > > > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets wrote: > > >> > > >> On 20.08.2019 18:35, Alexander Duyck wrote: > > >>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets wrote: > > >>>> > > >>>> Tx code doesn't clear the descriptor status after cleaning. > > >>>> So, if the budget is larger than number of used elems in a ring, s= ome > > >>>> descriptors will be accounted twice and xsk_umem_complete_tx will = move > > >>>> prod_tail far beyond the prod_head breaking the comletion queue ri= ng. > > >>>> > > >>>> Fix that by limiting the number of descriptors to clean by the num= ber > > >>>> of used descriptors in the tx ring. > > >>>> > > >>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > > >>>> Signed-off-by: Ilya Maximets > > >>> > > >>> I'm not sure this is the best way to go. My preference would be to > > >>> have something in the ring that would prevent us from racing which = I > > >>> don't think this really addresses. I am pretty sure this code is sa= fe > > >>> on x86 but I would be worried about weak ordered systems such as > > >>> PowerPC. > > >>> > > >>> It might make sense to look at adding the eop_desc logic like we ha= ve > > >>> in the regular path with a proper barrier before we write it and af= ter > > >>> we read it. So for example we could hold of on writing the bytecoun= t > > >>> value until the end of an iteration and call smp_wmb before we writ= e > > >>> it. Then on the cleanup we could read it and if it is non-zero we t= ake > > >>> an smp_rmb before proceeding further to process the Tx descriptor a= nd > > >>> clearing the value. Otherwise this code is going to just keep poppi= ng > > >>> up with issues. > > >> > > >> But, unlike regular case, xdp zero-copy xmit and clean for particula= r > > >> tx ring always happens in the same NAPI context and even on the same > > >> CPU core. > > >> > > >> I saw the 'eop_desc' manipulations in regular case and yes, we could > > >> use 'next_to_watch' field just as a flag of descriptor existence, > > >> but it seems unnecessarily complicated. Am I missing something? > > >> > > > > > > So is it always in the same NAPI context?. I forgot, I was thinking > > > that somehow the socket could possibly make use of XDP for transmit. > > > > AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which > > is used in zero-copy mode. Real xmit happens inside > > ixgbe_poll() > > -> ixgbe_clean_xdp_tx_irq() > > -> ixgbe_xmit_zc() > > > > This should be not possible to bound another XDP socket to the same net= dev > > queue. > > > > It also possible to xmit frames in xdp_ring while performing XDP_TX/RED= IRECT > > actions. REDIRECT could happen from different netdev with different NAP= I > > context, but this operation is bound to specific CPU core and each core= has > > its own xdp_ring. > > > > However, I'm not an expert here. > > Bj=C3=B6rn, maybe you could comment on this? > > > > > > > > As far as the logic to use I would be good with just using a value yo= u > > > are already setting such as the bytecount value. All that would need > > > to happen is to guarantee that the value is cleared in the Tx path. S= o > > > if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could > > > theoretically just use that as well to flag that a descriptor has bee= n > > > populated and is ready to be cleaned. Assuming the logic about this > > > all being in the same NAPI context anyway you wouldn't need to mess > > > with the barrier stuff I mentioned before. > > > > Checking the number of used descs, i.e. next_to_use - next_to_clean, > > makes iteration in this function logically equal to the iteration insid= e > > 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later > > function too to follow same 'bytecount' approach? I don't like having > > two different ways to determine number of used descriptors in the same = file. > > > > Best regards, Ilya Maximets. > > As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I > would say that if you got rid of budget and framed things more like > how ixgbe_xsk_clean_tx_ring was framed with the ntc !=3D ntu being > obvious I would prefer to see us go that route. > > Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you > are going to be working with a static ntu value since you will only > ever process one iteration through the ring anyway. It might make more > sense if you just went through and got rid of budget and i, and > instead used ntc and ntu like what was done in > ixgbe_xsk_clean_tx_ring(). > +1. I'd prefer this as well! Cheers, Bj=C3=B6rn > Thanks. > > - Alex > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan