Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755377AbbKDEqZ (ORCPT ); Tue, 3 Nov 2015 23:46:25 -0500 Received: from mail-pa0-f42.google.com ([209.85.220.42]:35549 "EHLO mail-pa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755057AbbKDEqX (ORCPT ); Tue, 3 Nov 2015 23:46:23 -0500 Message-ID: <1446612381.4184.7.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: kernel panic in 4.2.3, rb_erase in sch_fq From: Eric Dumazet To: Denys Fedoryshchenko Cc: Cong Wang , Jamal Hadi Salim , "David S. Miller" , netdev , linux-kernel@vger.kernel.org Date: Tue, 03 Nov 2015 20:46:21 -0800 In-Reply-To: <0705e2b76150c28341d7e1915433450d@visp.net.lb> References: <0705e2b76150c28341d7e1915433450d@visp.net.lb> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2294 Lines: 68 On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote: > On 2015-11-04 00:06, Cong Wang wrote: > > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko > > wrote: > >> Hi! > >> > >> Actually seems i was getting this panic for a while (once per week) on > >> loaded pppoe server, but just now was able to get full panic message. > >> After checking commit logs on sch_fq.c i didnt seen any fixes, so > >> probably > >> upgrading to newer kernel wont help? > > > > > > Can you share your `tc qdisc show dev xxxx` with us? And how to > > reproduce > > it? I tried to setup htb+fq and then flip the interface back and forth > > but I don't > > see any crash. > My guess it wont be easy to reproduce, it is happening on box with 4.5k > interfaces, that constantly create/delete interfaces, > and even with that this problem may happen once per day, or may not > happen for 1 week. > > Here is script that is being fired after new ppp interface detected. But > pppoe process are independent from > process that are "establishing" shapers. It is probably a generic bug. sch_fq seems OK to me. Somehow nobody tries to change qdisc hundred times per second ;) Could you try following patch ? It seems to 'fix' the issue for me. diff --git a/net/core/dev.c b/net/core/dev.c index 8ce3f74cd6b9..bf136103bc7b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2880,6 +2880,12 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, spin_lock(&q->busylock); spin_lock(root_lock); + if (unlikely(q != rcu_dereference_bh(txq->qdisc))) { + pr_err_ratelimited("Arg, qdisc changed ! state %lx\n", q->state); + kfree_skb(skb); + rc = NET_XMIT_DROP; + goto end; + } if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) { kfree_skb(skb); rc = NET_XMIT_DROP; @@ -2913,6 +2919,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, __qdisc_run(q); } } +end: spin_unlock(root_lock); if (unlikely(contended)) spin_unlock(&q->busylock); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/