Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754808AbdCTMnZ (ORCPT ); Mon, 20 Mar 2017 08:43:25 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:35672 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753477AbdCTMnW (ORCPT ); Mon, 20 Mar 2017 08:43:22 -0400 Message-ID: <1490013799.16816.22.camel@edumazet-glaptop3.roam.corp.google.com> Subject: Re: net: deadlock between ip_expire/sch_direct_xmit From: Eric Dumazet To: Dmitry Vyukov Cc: Cong Wang , Eric Dumazet , David Miller , Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , netdev , LKML , Jamal Hadi Salim , syzkaller Date: Mon, 20 Mar 2017 05:43:19 -0700 In-Reply-To: References: <1489502504.28631.115.camel@edumazet-glaptop3.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1798 Lines: 52 On Mon, 2017-03-20 at 10:59 +0100, Dmitry Vyukov wrote: > On Tue, Mar 14, 2017 at 5:41 PM, Cong Wang wrote: > > On Tue, Mar 14, 2017 at 7:56 AM, Eric Dumazet wrote: > >> On Tue, Mar 14, 2017 at 7:46 AM, Dmitry Vyukov wrote: > >> > >>> I am confused. Lockdep has observed both of these stacks: > >>> > >>> CPU0 CPU1 > >>> ---- ---- > >>> lock(&(&q->lock)->rlock); > >>> lock(_xmit_ETHER#2); > >>> lock(&(&q->lock)->rlock); > >>> lock(_xmit_ETHER#2); > >>> > >>> > >>> So it somehow happened. Or what do you mean? > >>> > >> > >> Lockdep said " possible circular locking dependency detected " . > >> It is not an actual deadlock, but lockdep machinery firing. > >> > >> For a dead lock to happen, this would require that he ICMP message > >> sent by ip_expire() is itself fragmented and reassembled. > >> This cannot be, because ICMP messages are not candidates for > >> fragmentation, but lockdep can not know that of course... > > > > It doesn't have to be ICMP, as long as get the same hash for > > the inet_frag_queue, we will need to take the same lock and > > deadlock will happen. > > > > hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol); > > > > So it is really up to this hash function. > > > > Is the following the same issue? > It mentions dev->qdisc_tx_busylock, but I am not sure if it's relevant > if there already a cycle between _xmit_ETHER#2 --> > &(&q->lock)->rlock#2. False positive again. veth needs to use netdev_lockdep_set_classes(), assuming you use veth ? I will provide a patch, thanks. cf515802043cccecfe9ab75065f8fc71e6ec9bab missed a few drivers.