Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2727880pxu; Mon, 14 Dec 2020 09:28:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJzFWXvIwKbuS6NBST9AkaEhuKBSsE0bAA9SrojwwXLCyzilCxxvVmFribcpl4soEZNdtdAO X-Received: by 2002:aa7:ce94:: with SMTP id y20mr25368392edv.361.1607966939605; Mon, 14 Dec 2020 09:28:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607966939; cv=none; d=google.com; s=arc-20160816; b=BeGPDq1csf8kBS+0QXrYx9vfjWaK7qZbxEP8sslEMX8xzganMhJLLWt2YXnnzBZzB7 vYnjQN7Gvs3Hfs2BXwH4LHceHpcATrzUfuMd9rZA0S5RZIdcwaNByKpWD2NOMd36YYSB qll27Pah3kQtDkB3pxGpzIa+nYUdyxmHOjaBnXTpXHIZEfbQYM3/5crfg28oekeb0XY3 cCri8f84OiEeL9Kxlb1/8VcFOa6KrVS0qNV9YDxm8ICq23RSPuzgYAdG/zEAtz128N1N WTlh2jCy5D6gCAI8d2SAtcHyZXz6/HpUhtB0+j8QcEzmjTZ5Rqff4crTyLhGehnD8c8l GcLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GFrxUJ2+3DxqruW/vfemtxiKuJAN5bbNqUiktWTQp2Y=; b=cuDs+W8Kq+F2GKnLq3lpLcnGvX1UUeaxE/MK+oZknKjy59Vu1W2sBWAp6v9wFVNKfX 1Bz8vrXSyjGI4Jt9MnxEGrfIfkFkIloiZjDm7U/Ha5VE/rfb8BHzbXimM3Ovo+aZTGN3 6krTvla3oAF7tReqttxUIPK0AozrA6WU5ZuaM+IFeZq/Q1AG+XutE8CjUR2SkjQz3MoJ Ni6H/jvFQYvxW8+fkNToW+ntPnHpjc9ABSj7/KjZ5QafB6QUkCG8Bv1zSJ37HojG61Yh q+fLG0k7DghmfMuO5Ln0OqLINeFrfIDBygABLinRTjoDxXQwEcbxMr66mH1uNSmb5/5k OnzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tMTk1D6i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k20si11509937edv.254.2020.12.14.09.28.34; Mon, 14 Dec 2020 09:28:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tMTk1D6i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407226AbgLNI7P (ORCPT + 99 others); Mon, 14 Dec 2020 03:59:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728481AbgLNI7P (ORCPT ); Mon, 14 Dec 2020 03:59:15 -0500 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD4CDC0613CF for ; Mon, 14 Dec 2020 00:58:34 -0800 (PST) Received: by mail-pg1-x544.google.com with SMTP id w16so12023069pga.9 for ; Mon, 14 Dec 2020 00:58:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GFrxUJ2+3DxqruW/vfemtxiKuJAN5bbNqUiktWTQp2Y=; b=tMTk1D6iJ+/oRV9cjcLelYnSoO+4js04/uVFG7Su1vTfxYye+vAK91XNJwWyEKM0sr CCHxTwi/z9g1FN2s/DMxRmyHgdqqFAYDRVatsQfu0ua7bOpUJo02nE/z5fchZjmGs9sd ZX5t7fwy5QLMWiCa+FCUDyRV/Rs6UA7HgDbsxsIAjTioKesxWhRfKK56F8LUz2WJtx0z as50v8c/k6BGXqylOc4jUis0YgSPxKKCs3R5c38X1JQtxlK/K5CYeIwntJjDYvjuIrey yoZ8dU0oZdUn+P1CQJ3zfWNQa6vAT4nGifwDtH5i0vXX6FDVcgDZ50Kq2Xo3qdpoNBQQ T8dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GFrxUJ2+3DxqruW/vfemtxiKuJAN5bbNqUiktWTQp2Y=; b=Wy2XUsJi31XUPiFYJ9CnSxoh6xCGdQH0p9mMiE0V5k/32Q710UEEPBwxC5lQBS8Vp/ hyHS4SPI1ktxqHtWrgoz1G/5JBoQChjXaZU8qLb64k5j4XWhmD1RaBEvuwp7TMRrIuTp g7tWkGR0ZIUwZD8EGAi+VCcDxDKX5UvuAWLuewSipgOsOzAD2KtMtUM26fE6YHUMwhE0 Xf+U1j9Szsoi0pk0116cIptR0jHLnBqbQ50I3/BhyyciFJB447Crz/3+LUfMnDCdfivL MZkef0Au+kXH/XmpDKMYbAEQ4I0faa8HYLIHm/JaMtUz5mWOY4sU5X3vTi6ORWS10HDu UlPg== X-Gm-Message-State: AOAM531gpV7RA1gpnXmZO7BEioKGUC/d/EUHuPbp9moJc2LW4lafuml9 inl/0w75vqb6xqEIKRNAtgLUN6Ks/D6bhbC2+bM= X-Received: by 2002:a63:802:: with SMTP id 2mr23627036pgi.292.1607936314469; Mon, 14 Dec 2020 00:58:34 -0800 (PST) MIME-Version: 1.0 References: <1607762670.6417274-1-xuanzhuo@linux.alibaba.com> In-Reply-To: <1607762670.6417274-1-xuanzhuo@linux.alibaba.com> From: Magnus Karlsson Date: Mon, 14 Dec 2020 09:58:23 +0100 Message-ID: Subject: Re: [PATCH bpf-next] xsk: save the undone skb To: Xuan Zhuo Cc: "Karlsson, Magnus" , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Jonathan Lemon , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , "open list:XDP SOCKETS , open list:XDP SOCKETS , open list" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 12, 2020 at 9:47 AM Xuan Zhuo wrote: > > On Fri, 11 Dec 2020 16:32:06 +0100, Magnus Karlsson wrote: > > On Fri, Dec 11, 2020 at 2:12 PM Xuan Zhuo wrote: > > > > > > We can reserve the skb. When sending fails, NETDEV_TX_BUSY or > > > xskq_prod_reserve fails. As long as skb is successfully generated and > > > successfully configured, we can reserve skb if we encounter exceptions > > > later. > > > > > > Especially when NETDEV_TX_BUSY fails, there is no need to deal with > > > the problem that xskq_prod_reserve has been updated. > > > > > > Signed-off-by: Xuan Zhuo > > > --- > > > include/net/xdp_sock.h | 3 +++ > > > net/xdp/xsk.c | 36 +++++++++++++++++++++++++++--------- > > > 2 files changed, 30 insertions(+), 9 deletions(-) > > > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > > index 4f4e93b..fead0c9 100644 > > > --- a/include/net/xdp_sock.h > > > +++ b/include/net/xdp_sock.h > > > @@ -76,6 +76,9 @@ struct xdp_sock { > > > struct mutex mutex; > > > struct xsk_queue *fq_tmp; /* Only as tmp storage before bind */ > > > struct xsk_queue *cq_tmp; /* Only as tmp storage before bind */ > > > + > > > + struct sk_buff *skb_undone; > > > + bool skb_undone_reserve; > > > }; > > > > > > #ifdef CONFIG_XDP_SOCKETS > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > index e28c682..1051024 100644 > > > --- a/net/xdp/xsk.c > > > +++ b/net/xdp/xsk.c > > > @@ -435,6 +435,19 @@ static int xsk_generic_xmit(struct sock *sk) > > > if (xs->queue_id >= xs->dev->real_num_tx_queues) > > > goto out; > > > > > > + if (xs->skb_undone) { > > > + if (xs->skb_undone_reserve) { > > > + if (xskq_prod_reserve(xs->pool->cq)) > > > + goto out; > > > + > > > + xs->skb_undone_reserve = false; > > > + } > > > + > > > + skb = xs->skb_undone; > > > + xs->skb_undone = NULL; > > > + goto xmit; > > > + } > > > + > > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > > > char *buffer; > > > u64 addr; > > > @@ -454,12 +467,7 @@ static int xsk_generic_xmit(struct sock *sk) > > > addr = desc.addr; > > > buffer = xsk_buff_raw_get_data(xs->pool, addr); > > > err = skb_store_bits(skb, 0, buffer, len); > > > - /* This is the backpressure mechanism for the Tx path. > > > - * Reserve space in the completion queue and only proceed > > > - * if there is space in it. This avoids having to implement > > > - * any buffering in the Tx path. > > > - */ > > > - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > > > + if (unlikely(err)) { > > > kfree_skb(skb); > > > goto out; > > > } > > > @@ -470,12 +478,22 @@ static int xsk_generic_xmit(struct sock *sk) > > > skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > > > skb->destructor = xsk_destruct_skb; > > > > > > + /* This is the backpressure mechanism for the Tx path. > > > + * Reserve space in the completion queue and only proceed > > > + * if there is space in it. This avoids having to implement > > > + * any buffering in the Tx path. > > > + */ > > > + if (xskq_prod_reserve(xs->pool->cq)) { > > > + xs->skb_undone_reserve = true; > > > + xs->skb_undone = skb; > > > + goto out; > > > + } > > > + > > > +xmit: > > > > This will not work in the general case since we cannot guarantee that > > the application does not replace the packet in the Tx ring before it > > calls send() again. This is fully legal. I also do not like to > > introduce state between calls. Much simpler to have it stateless which > > means less error prone. > > > > On the positive side, I will submit a patch that improves performance > > of this transmit function by using the new batch interfaces I > > introduced a month ago. With this patch I get a throughput improvement > > of between 15 and 25% for the txpush benchmark in xdpsock. This is > > much more than you will get from this patch. It also avoids the > > problem you are addressing here completely. I will submit the patch > > next week after the bug fix in this code has trickled down to > > bpf-next. Hope you will like the throughput improvement that it > > provides. > > In fact, we can also call xskq_cons_release before save the undone skb and > exiting this function, so do not worry about the users modifying the data > in tx. Of course, I understand that you want to have it stateless. > I agree with this. I will give up this idea temporarily. > > But here in the case of NETDEV_TX_BUSY, xskq_prod_reserve has been called, > but skb is released directly without xsk_destruct_skb, this should be a bug. Yes, you are correct. In this case the reservation in the cq is not rolled back and we really make the ring one entry smaller. After enough of these errors, the ring will be of size zero and the socket will stop working for transmit. Thank you so much for spotting this. I believe this was introduced when I tried to make NETDEV_TX_BUSY not drop packets and instead give the user a chance to just resend them. This part seems to be in dire need of some solid tests contributed to the new xsk selftests. I will bundle this fix as patch 2 in a patch set with the other race that you found. I need the locking of that one to be able to safely back out the reservation. > > > > > err = __dev_direct_xmit(skb, xs->queue_id); > > > if (err == NETDEV_TX_BUSY) { > > > /* Tell user-space to retry the send */ > > > - skb->destructor = sock_wfree; > > > - /* Free skb without triggering the perf drop trace */ > > > - consume_skb(skb); > > > + xs->skb_undone = skb; > > > err = -EAGAIN; > > > goto out; > > > } > > > -- > > > 1.8.3.1 > > >