Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp1325088ybp; Wed, 9 Oct 2019 12:15:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqxk2BzUa5IGRk/Raq55V4q4lT7+aox4A6lj6H4uVhsIA8rGScVD3OreaqvFswlLWhkhyDmt X-Received: by 2002:a05:6402:713:: with SMTP id w19mr4472060edx.113.1570648551582; Wed, 09 Oct 2019 12:15:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570648551; cv=none; d=google.com; s=arc-20160816; b=rhjk9IVa7xPy/0QU/LGZ/aNT3mlaxVB/MOMdNkDu8YUI10Blrj5JNalfaHku8Cqe8w LVgWxfjuoQPJ1nZaaHQ3tnZSFEUoomlDZw8jrwhp8fBeTJxHRwYorwB5ZIdJvEEWP3Hp XH/qvhC6ERtoHS1sBHghfBFh4h1Yrsc85GXPFh3v91qfMKEuLxYW/4yItZiMtgrmDKI2 d64OsmL9Hk+TIOr29jGHKTuD/3zkFjVKEgkLmeBct8iM4OtKWKgJyMw3pOSyWxoELGym AJxRPS60KtLRfb6N2jE+hgce4NhpR3KCqNmI6Lg0NPbSrSmFlIGUdBCY495dxf6WAKjg ZGag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:to:from:subject:message-id; bh=87mR1QxCrcSsWvnzUbcOACof1Ft0fwx2+Giqb8pgTjU=; b=RMNODeSbDiJGnQ4sXLtK3mFB0xCjKe1JOPT/2K8NA6Sq4gQQ7lt6rVZE+TkXuzsuZj aUhr1IESPd317RF4ZcCWd2bDrBYggUaR5S8eYGp7Ykqp3G9i6LNkOxc8gw+XIezdnl3U SDFHyxaX+8p/Ycn2wdpc6+kTDxz83XfvXeOJma2e1T1lMXz8unwk1yvqpfAYmi3Kulqa 1TSVR8zLiJpMZ25dPnH1wIaDNZJS9G6KiFSmKKbs8651sYTOz+qpqqXyYXIeNa5Sw35s sZKAuB2LlPlXFOQn8WDiSV5J+Hk3aD5lQi/uZI2nOUjvMB6/fFwtHo0nYB9C4gexAM9N 9wEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cw11si1731852ejb.245.2019.10.09.12.15.27; Wed, 09 Oct 2019 12:15:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732509AbfJITOQ (ORCPT + 99 others); Wed, 9 Oct 2019 15:14:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33998 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732492AbfJITOL (ORCPT ); Wed, 9 Oct 2019 15:14:11 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 15EDE10DCC95; Wed, 9 Oct 2019 19:14:11 +0000 (UTC) Received: from ovpn-116-36.ams2.redhat.com (ovpn-116-36.ams2.redhat.com [10.36.116.36]) by smtp.corp.redhat.com (Postfix) with ESMTP id BA4AE5EE1D; Wed, 9 Oct 2019 19:14:08 +0000 (UTC) Message-ID: <95c5a697932e19ebd6577b5dac4d7052fe8c4255.camel@redhat.com> Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc From: Paolo Abeni To: Jonas Bonn , "netdev@vger.kernel.org" , LKML , "David S . Miller" , John Fastabend Date: Wed, 09 Oct 2019 21:14:07 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.32.4 (3.32.4-1.fc30) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.64]); Wed, 09 Oct 2019 19:14:11 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2019-10-09 at 08:46 +0200, Jonas Bonn wrote: > Hi, > > The lockless pfifo_fast qdisc has an issue with packets getting stuck in > the queue. What appears to happen is: > > i) Thread 1 holds the 'seqlock' on the qdisc and dequeues packets. > ii) Thread 1 dequeues the last packet in the queue. > iii) Thread 1 iterates through the qdisc->dequeue function again and > determines that the queue is empty. > > iv) Thread 2 queues up a packet. Since 'seqlock' is busy, it just > assumes the packet will be dequeued by whoever is holding the lock. > > v) Thread 1 releases 'seqlock'. > > After v), nobody will check if there are packets in the queue until a > new packet is enqueued. Thereby, the packet enqueued by Thread 2 may be > delayed indefinitely. I think you are right. It looks like this possible race is present since the initial lockless implementation - commit 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Anyhow the racing windows looks quite tiny - I never observed that issue in my tests. Do you have a working reproducer? Something alike the following code - completely untested - can possibly address the issue, but it's a bit rough and I would prefer not adding additonal complexity to the lockless qdiscs, can you please have a spin a it? Thanks, Paolo --- diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 6a70845bd9ab..65a1c03330d6 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -113,18 +113,23 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, struct net_device *dev, struct netdev_queue *txq, spinlock_t *root_lock, bool validate); -void __qdisc_run(struct Qdisc *q); +int __qdisc_run(struct Qdisc *q); static inline void qdisc_run(struct Qdisc *q) { + int quota = 0; + if (qdisc_run_begin(q)) { /* NOLOCK qdisc must check 'state' under the qdisc seqlock * to avoid racing with dev_qdisc_reset() */ if (!(q->flags & TCQ_F_NOLOCK) || likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) - __qdisc_run(q); + quota = __qdisc_run(q); qdisc_run_end(q); + + if (quota > 0 && q->flags & TCQ_F_NOLOCK && q->ops->peek(q)) + __netif_schedule(q); } } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 17bd8f539bc7..013480f6a794 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -376,7 +376,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets) return sch_direct_xmit(skb, q, dev, txq, root_lock, validate); } -void __qdisc_run(struct Qdisc *q) +int __qdisc_run(struct Qdisc *q) { int quota = dev_tx_weight; int packets; @@ -390,9 +390,10 @@ void __qdisc_run(struct Qdisc *q) quota -= packets; if (quota <= 0 || need_resched()) { __netif_schedule(q); - break; + return 0; } } + return quota; } unsigned long dev_trans_start(struct net_device *dev)