Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3976723pxf; Tue, 16 Mar 2021 02:34:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+PnVHMNV2eljqbZNByORg7IYrM+fnqH6bARAVgpfa44E3qXpEKRnWMoXM5rPvx+DKqrg8 X-Received: by 2002:a05:6402:68e:: with SMTP id f14mr34782354edy.169.1615887280875; Tue, 16 Mar 2021 02:34:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615887280; cv=none; d=google.com; s=arc-20160816; b=cx0iQLWKbhB0n5jXwTXQeXNkeTjiTZVLWuXQV4zy8amDh2w2GBXWmTaB1HcX1WTWqt 7++Yy0Em3b4RYZUdmIbVffJUQzZ/7nOv9CEFGX6+PVux/bLuzfcCDguQgSZOnhfLpg3u m4ux4QWKexnnRFNUur+BZd+jnsk1/yzg5vkhvAb6/Uw66V/Rp7qS7kVgRrqovXDBVQWn 6mLWCY9XYbP0GExt5I8WJUfgXSZYF06SJeYcLI2UwIQyytopos7YcHag45NIPGp0CIpA XXBkyPY5PTlk6SJ1Nt7r6DroKcjYbk6eQC6IaTMsCtpZde3CWiTVpsetGGO+ydVqKclK 3iFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+D2WnhFAGVgH3O4MSj/CMoZhIgKvhDRo7ugLLfyM64I=; b=CBJgcI367cSrs+7Vjgl2exj/tHYQQMFYRA4/dqTC1C8obuaxys9QGkqJbDSY7dgOWu W7NZbI/r6HYVEyCzOKXu/ZIdEipRjsR16qttAGR7/hpDFWQP172XauYp7VntguLBkkck yPguYoBE5pwnWbpJ/YB9SW9l0jeYqMEvd3GSNH0vr3SwbLlbfTaNJV+SzphIoym0E8ho yrisW+8tb3oNGZDXfCplkUOZY+1V/Nqk54tDNdhXQDJh7H3EPg+8dmnJd/dWgr15cFmr 3VZIapQ1x9I/4cqanbS71gvHUS9qeAsUvqGnrRIGnqFbRM+qw03cIs44DufOhY2tUlCZ w93w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lDHPMwGR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h26si12948947edw.139.2021.03.16.02.34.18; Tue, 16 Mar 2021 02:34:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lDHPMwGR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234252AbhCPIPp (ORCPT + 99 others); Tue, 16 Mar 2021 04:15:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234278AbhCPIPY (ORCPT ); Tue, 16 Mar 2021 04:15:24 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADEA4C061756 for ; Tue, 16 Mar 2021 01:15:24 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id f145so19518458ybg.11 for ; Tue, 16 Mar 2021 01:15:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+D2WnhFAGVgH3O4MSj/CMoZhIgKvhDRo7ugLLfyM64I=; b=lDHPMwGR/57tFSrpx+Pq8Kbzuq2Dr7Hy66EiVBF6asU3s8d+kzPpaC82hpMZLwDp1w Cxnp6S287cUBoN+Gt6UPUMPLeaErcspN2hHYwS+RrHEAGSvLKBqRsUBCv4VeSMOGDaP2 0ejgvF2XsFdNx3Mn3RALCNBL1fmCMlwagvBer108ZzKBrAhuoWS4TqKPcc6lZ9XuO+pj KULcfuSizrtEQhNiwPLfbHw5I3knLrbd438Sdk+iJWCxXpCEmgpKazXV7IsgLGUmWc5S Mk9cnbliXEuTD0uOj6v+enYbUZG8GHBLX4j7Hlwr/6lGwhr6Z+hooZ+EKtbsfW+/niQS /wtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+D2WnhFAGVgH3O4MSj/CMoZhIgKvhDRo7ugLLfyM64I=; b=cQzBEpBl3v0xxnKzDzsTPH1b3Qw2WUzDohSLk4ozAQqfE2xAXxNbG4Ruy5vBcXa/GA sL3VdtpA3/Yn2jlf5jRj6UU7RomXr6cBcAU6nMIvfHhkcLNR7XFiRRRS8dTF4qb3ujXl 0s7wSOS57qRpaMZcNX7ORSQWXPAhQlkwCeGbLivxEfC9FUrPKYfxeOUOEQgZBI7ddZ2q z66RsZS9sxOOCsmyiJMxCjBERVuzHNOfVtVrApCjbuu69hrTXYiaD6Uj00RI0+J7JYFc ocp8ZQ/eBPaO5YMJOyEvMciDghBZ7b3FCIh71rTGGIKtA+bQMb6QrqYXwAMduQtW3q0b w7ug== X-Gm-Message-State: AOAM531xE19/sOJhkGz5ZOWs69vX3ygMzda98o2iv6AD1jX2n57Mw/DM CDQlnv+6feikW0O4lNGTCQNGbxk0DtQPRM0JiDZimw== X-Received: by 2002:a25:2307:: with SMTP id j7mr5403907ybj.518.1615882523533; Tue, 16 Mar 2021 01:15:23 -0700 (PDT) MIME-Version: 1.0 References: <1615603667-22568-1-git-send-email-linyunsheng@huawei.com> <1615777818-13969-1-git-send-email-linyunsheng@huawei.com> <20210315115332.1647e92b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> <3838b7c2-c32f-aeda-702a-5cb8f712ec0c@huawei.com> In-Reply-To: <3838b7c2-c32f-aeda-702a-5cb8f712ec0c@huawei.com> From: Eric Dumazet Date: Tue, 16 Mar 2021 09:15:11 +0100 Message-ID: Subject: Re: [RFC v2] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc To: Yunsheng Lin Cc: Jakub Kicinski , David Miller , Vladimir Oltean , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Wei Wang , Cong Wang , Taehee Yoo , netdev , LKML , linuxarm@openeuler.org, Marc Kleine-Budde , linux-can@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 16, 2021 at 1:35 AM Yunsheng Lin wrote: > > On 2021/3/16 2:53, Jakub Kicinski wrote: > > On Mon, 15 Mar 2021 11:10:18 +0800 Yunsheng Lin wrote: > >> @@ -606,6 +623,11 @@ static const u8 prio2band[TC_PRIO_MAX + 1] = { > >> */ > >> struct pfifo_fast_priv { > >> struct skb_array q[PFIFO_FAST_BANDS]; > >> + > >> + /* protect against data race between enqueue/dequeue and > >> + * qdisc->empty setting > >> + */ > >> + spinlock_t lock; > >> }; > >> > >> static inline struct skb_array *band2list(struct pfifo_fast_priv *priv, > >> @@ -623,7 +645,10 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc, > >> unsigned int pkt_len = qdisc_pkt_len(skb); > >> int err; > >> > >> - err = skb_array_produce(q, skb); > >> + spin_lock(&priv->lock); > >> + err = __ptr_ring_produce(&q->ring, skb); > >> + WRITE_ONCE(qdisc->empty, false); > >> + spin_unlock(&priv->lock); > >> > >> if (unlikely(err)) { > >> if (qdisc_is_percpu_stats(qdisc)) > >> @@ -642,6 +667,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) > >> struct sk_buff *skb = NULL; > >> int band; > >> > >> + spin_lock(&priv->lock); > >> for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) { > >> struct skb_array *q = band2list(priv, band); > >> > >> @@ -655,6 +681,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) > >> } else { > >> WRITE_ONCE(qdisc->empty, true); > >> } > >> + spin_unlock(&priv->lock); > >> > >> return skb; > >> } > > > > I thought pfifo was supposed to be "lockless" and this change > > re-introduces a lock between producer and consumer, no? > > Yes, the lock breaks the "lockless" of the lockless qdisc for now > I do not how to solve the below data race locklessly: > > CPU1: CPU2: > dequeue skb . > . . > . enqueue skb > . . > . WRITE_ONCE(qdisc->empty, false); > . . > . . > WRITE_ONCE(qdisc->empty, true); Maybe it is time to fully document/explain how this can possibly work. lockless qdisc used concurrently by multiple cpus, using WRITE_ONCE() and READ_ONCE() ? Just say no to this. > > If the above happens, the qdisc->empty is true even if the qdisc has some > skb, which may cuase out of order or packet stuck problem. > > It seems we may need to update ptr_ring' status(empty or not) while > enqueuing/dequeuing atomically in the ptr_ring implementation. > > Any better idea?