Received: by 10.192.165.156 with SMTP id m28csp658493imm; Fri, 13 Apr 2018 05:49:34 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/JcimVTWCUrAoQiymbgN6oMFClz6e8v2x1QKuryKPWPrZI0GLsovgJtvC8VUeT1lVc68qA X-Received: by 10.98.76.196 with SMTP id e65mr11506495pfj.35.1523623774245; Fri, 13 Apr 2018 05:49:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523623774; cv=none; d=google.com; s=arc-20160816; b=r2jINsFCIsASkseu4oh7r/itGmKMqmYwrQD1TN1yfK17KL3raVzWv+Si2DhBBALIlC 3f46xmPDefDRAAQWG6dCWa1zMhpE7CbtOIRg+Sy7ZvaSiSiNCh+FP61yNb07xlCbHF4X C7QEgXvyJmRuMBRlP98j0XiplHpeb6xli1zEakmDi7/kXvvW9XvYy8xfZvUDg5bDNhrm WVI5IMr464WS3MY7iK679Zl2GXhxaI6UYnHrhBVztc3Gs0GnsoYx0mI9VoQAAogKrzoT wKCkAy4kVG8TZaNe5aDu/oElViUk+M+qYtDFsl8eAbUqR2Fi2gwhMsWiX5UXRGC4Jk6Z G4xA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=D6D6OahPv1sV3i3SpTUJsmCImuhNFQ4Amzx5kUioibE=; b=ScGchEa84TZIu+AcviZ1tQM4feLem3Atj9c8nRoSO8HZcuWuQuajDumqV9FdQo6cCQ cY9rPeZuxZFBHd/W3quaj2J/Xnh3u5rukEtkRUoXOgLlIRb930xtaPqgfzPvI5V2N+A0 dtyI9yQZM2kfONQRdfP0kj6tPaw5jheBPczIMgHmxlj1rlXRW94Tyx4E45UKQIyyNyNn Qun+Nj8s6g/9YIPGu8zb2TGQ8iOc7yRNHc+3rGi8SUHzf6I1MPWlJ0yQq1ijbHhXgPGv h6ZeAyhZdF0khgGo5eiFlElGkFnLpGn3gMga336CUCyjZY5fRCg+RNvByzOr5+80E4Ml 3kUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=MmNtYF5U; dkim=fail header.i=@ieee-org.20150623.gappssmtp.com header.s=20150623 header.b=FV8VdIR1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y101-v6si5460240plh.188.2018.04.13.05.49.20; Fri, 13 Apr 2018 05:49:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=MmNtYF5U; dkim=fail header.i=@ieee-org.20150623.gappssmtp.com header.s=20150623 header.b=FV8VdIR1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754867AbeDMMoZ (ORCPT + 99 others); Fri, 13 Apr 2018 08:44:25 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:37862 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754589AbeDMMoV (ORCPT ); Fri, 13 Apr 2018 08:44:21 -0400 Received: by mail-it0-f66.google.com with SMTP id 71-v6so3027008ith.2; Fri, 13 Apr 2018 05:44:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=D6D6OahPv1sV3i3SpTUJsmCImuhNFQ4Amzx5kUioibE=; b=MmNtYF5UuvcyYmU3KX3doiXkt9x7e+6wJ++h09dOlQ1wH3XkI0a/KPKxSeGu6eAPcz P526F5K5n8H40fN4PTOG+aMusPwK46yS4CZp2s/m23oBqzK4qrMPww/y9ud+ka2qCcQO eqLJDHeUYnF7uJ9p1VldbB9ASY3bjWe9sOzkrNJgD3gDkYpUVlJdc3BqnA2VW9zuWthP 3/2KKW51sKp+lm6c8MOqU7uxFoSlPfMVbNdF53Y5IvuaQzXQrEN3O9CELCNLBP9Ix7Tx 8Ul0aymu60zXy6hKuVVh3vivCWZy6EvT+a9jZypwd/L1Okm7VNl/mlzhepnYELL8+20z k4Fg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ieee-org.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=D6D6OahPv1sV3i3SpTUJsmCImuhNFQ4Amzx5kUioibE=; b=FV8VdIR1a0/UVap9+ZxQDGOTklomhhrdqvX+OORNJfa7I2ZjAkAvPgjp4uTzUsVS9M 6K8VpRWX+6ovw72S1rYdzNR07ldCYpuxVQpjeKtLzmjKXS9dtffHeXTNmhvvpFOTZlTv aPL17Rf5+OJlxF8lZf89uYQZ98QWimi4nZhcWAe+lRl9Y5NQzrWMXUg56Ds9F/o4kV9l VFnUbpxcA7VOHCs7PhGDmmjaLDpKQ7z/2ZLhohtpzDn9VEFr90PMMDzwtBjoeE317+cK Z4Mm8XrReFLXHrKEII/Cv/5J7DqxxcglF81QspPE64KrcHAXbmWG7dTYGKesrlHuGhSF kcrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=D6D6OahPv1sV3i3SpTUJsmCImuhNFQ4Amzx5kUioibE=; b=D2gZof+T0TUZF+hgkF2O3ZFyiI4gnmSnDgJG7mi/opelATz2WZVcxGei/Nhhnf1ZJl 3POPOKQaydJ1ZSO2SAhF2l+we5jUBBBxmz2SWnz6Bgo/y2g8gELJteVlNWbSs70KMDvc h3Rb1j4sH+RS+8BOlAEMBFyvZV+ZUpcJIG+L3d4PhfgXoAMQfOAmEuLXfMRBr4pJ7+0b 85IuSZMWLCittegUrg2HFhPOC+6/1XDeHuingo90NjRjqNNGxqNLIhWQH+O6O8qeuwtY tWuErhvLzqwPnfzWqqEq/Woa8nnIpjpyFnTPtdAnMrebL/2IO4q5R4o+w9EDAoBTImkV YIcQ== X-Gm-Message-State: ALQs6tD/J2Pz2lej1+CmaaZRMoFXtRaFYjEpNXh9hoZET3358fBGgQZP u9p1Qz/J+yOOXM1yCm9k9DrYx1pb0YSggPpllqM= X-Received: by 2002:a24:29c1:: with SMTP id p184-v6mr4950778itp.122.1523623460705; Fri, 13 Apr 2018 05:44:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.2.181.70 with HTTP; Fri, 13 Apr 2018 05:43:40 -0700 (PDT) In-Reply-To: References: <7fd7e3b3-77b1-0936-b169-d08b946bedc7@iogearbox.net> <991243e2-e7c2-f2b2-72b9-d37b0d569b3b@gmail.com> <5973966e-fcd9-7ee5-a9c4-b79d22c1b9dd@nokia.com> <20180220162622.GA32068@hmswarspite.think-freely.org> <7d98027d-e810-a079-49c5-0bf8beef390e@nokia.com> From: Dan Streetman Date: Fri, 13 Apr 2018 08:43:40 -0400 X-Google-Sender-Auth: UPKYIMxGJkuR6sSGfPxysmECnLo Message-ID: Subject: Re: net: hang in unregister_netdevice: waiting for lo to become free To: Dmitry Vyukov Cc: Tommi Rantala , Neil Horman , Xin Long , David Ahern , Daniel Borkmann , Cong Wang , David Miller , Eric Dumazet , Willem de Bruijn , Jakub Kicinski , Rasmus Villemoes , netdev , LKML , Alexey Kuznetsov , Hideaki YOSHIFUJI , syzkaller , Dan Streetman , "Eric W. Biederman" , Alexey Kodanev , Marcelo Ricardo Leitner , linux-sctp@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 12, 2018 at 8:15 AM, Dmitry Vyukov wrote: > On Wed, Feb 21, 2018 at 3:53 PM, Tommi Rantala > wrote: >> On 20.02.2018 18:26, Neil Horman wrote: >>> >>> On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote: >>>> >>>> On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala >>>> wrote: >>>>> >>>>> On 19.02.2018 20:59, Dmitry Vyukov wrote: >>>>>> >>>>>> Is this meant to be fixed already? I am still seeing this on the >>>>>> latest upstream tree. >>>>>> >>>>> >>>>> These two commits are in v4.16-rc1: >>>>> >>>>> commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8 >>>>> Author: Tommi Rantala >>>>> Date: Mon Feb 5 21:48:14 2018 +0200 >>>>> >>>>> sctp: fix dst refcnt leak in sctp_v4_get_dst >>>>> ... >>>>> Fixes: 410f03831 ("sctp: add routing output fallback") >>>>> Fixes: 0ca50d12f ("sctp: fix src address selection if using >>>>> secondary >>>>> addresses") >>>>> >>>>> >>>>> commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 >>>>> Author: Alexey Kodanev >>>>> Date: Mon Feb 5 15:10:35 2018 +0300 >>>>> >>>>> sctp: fix dst refcnt leak in sctp_v6_get_dst() >>>>> ... >>>>> Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using >>>>> secondary >>>>> addresses for ipv6") >>>>> >>>>> >>>>> I guess we missed something if it's still reproducible. >>>>> >>>>> I can check it later this week, unless someone else beat me to it. >>>> >>>> >>>> Hi Tommi, >>>> >>>> Hmmm, I can't claim that it's exactly the same bug. Perhaps it's >>>> another one then. But I am still seeing these: >>>> >>>> [ 58.799130] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> [ 60.847138] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> [ 62.895093] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> [ 64.943103] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> >>>> on upstream tree pulled ~12 hours ago. >>>> >>> Can you write a systemtap script to probe dev_hold, and dev_put, printing >>> out a >>> backtrace if the device name matches "lo". That should tell us >>> definitively if >>> the problem is in the same location or not >> >> >> Hi Dmitry, I tested with the reproducer and the kernel .config file that you >> sent in the first email in this thread: >> >> With 4.16-rc2 unable to reproduce. >> >> With 4.15-rc9 bug reproducible, and I get "unregister_netdevice: waiting for >> lo to become free. Usage count = 3" >> >> With 4.15-rc9 and Alexey's "sctp: fix dst refcnt leak in sctp_v6_get_dst()" >> cherry-picked on top, unable to reproduce. >> >> >> Is syzkaller doing something else now to trigger the bug...? >> Can you still trigger the bug with the same reproducer? > > Hi Neil, Tommi, > > Reviving this old thread about "unregister_netdevice: waiting for lo > to become free. Usage count = 3" hangs. > I still did not have time to deep dive into what happens there (too > many bugs coming from syzbot). But this still actively happens and I > suspect accounts to a significant portion of various hang reports, > which are quite unpleasant. > > One idea that could make it all simpler: > > Is this wait loop in netdev_wait_allrefs() supposed to wait for any > prolonged periods of time under any non-buggy conditions? E.g. more > than 1-2 minutes? > If it only supposed to wait briefly for things that already supposed > to be shutting down, and we add a WARNING there after some timeout, > then syzbot will report all info how/when it happens, hopefully > extracting reproducers, and all the nice things. > But this WARNING should not have any false positives under any > realistic conditions (e.g. waiting for arrival of remote packets with > large timeouts). > > Looking at some task hung reports, it seems that this code holds some > mutexes, takes workqueue thread and prevents any progress with > destruction of other devices (and net namespace creation/destruction), > so I guess it should not wait for any indefinite periods of time? I'm working on this currently: https://bugs.launchpad.net/ubuntu/zesty/+source/linux/+bug/1711407 I added a summary of what I've found to be the cause (or at least, one possible cause) of this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/comments/72 I'm working on a patch to work around the main side-effect of this, which is hanging while holding the global net mutex. Hangs will still happen (e.g. if a dst leaks) but should not affect anything else, other than a leak of the dst and its net namespace. Fixing the dst leaks is important too, of course, but a dst leak (or other cause) shouldn't break the entire system.