Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3935065ybl; Mon, 27 Jan 2020 13:17:42 -0800 (PST) X-Google-Smtp-Source: APXvYqyiihvDgXS6xQt6HWUaGXdQofyPAhCJn4r7YJBBsDEFWn3BI6GT74An1j3XOcJ5wE9ebpv9 X-Received: by 2002:a9d:798e:: with SMTP id h14mr13444973otm.257.1580159862579; Mon, 27 Jan 2020 13:17:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580159862; cv=none; d=google.com; s=arc-20160816; b=K/sVqigYBPFd4MzyWSB/yzSXtEAFx0Cn2bhkop2AWG3+8sbFaIZ/lPNJAEXnS0prJ1 rV+KjW7vg6ka1nWVO8CmDO8AILTkPPe4QvH8Dxu+gbS7wdi1Ziqmw1fNoEtOBZlxD8J8 bHeKMMhpdOWespaHGgqCnr52Mhx35yBuIrZeaXzny5oJkIsxrQdcB1VPqiaHXVik09+3 ZKrMDD40tBj6VOwv7Sayf3RphHJpgjGTYOHiu9rlmCgNZ/vilw7Hrqzda0DKBx6gJN78 yHArFwr4TQ77xhgeuUW3ieQYzOvdNt3+oHTitjQhZtm0MBUYEDl2MUwivvM4DisHbi2n KvTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:date:from:dkim-signature; bh=81DUY2c9ZW9f7zPo1osSxiKRvvdo9rLFQnLTxxEr3HA=; b=d1fe2zaNd03rIa8Z5RHpUmOM+ia8pi09udIXI/Eemb1Wm/Nqm7r7nYrQ4DuPIU9hag pvZk5P5ER40KchqPxW5I20xhD93G8FxVDrpvInwe/E1ohctQ6uiZtrceO5AGdgbYbL7P tvQQvKI5ozLVe3HvUWIVHmTFoyaIV9n9GF+0XtuNNz1Dr4C1Ixb1sCryCU66yog8cq3y nqn7Hr5UUAjl++Z4JLCsaHYH37agBWSISXeeqB373soh+XT/5rLK9xU71cuDin6Oq9Tl rQmq4fV5IE3K7VBE43PKGuC7bLJHqI0JRu4PFSg3ae2VDm1Mi6eUzbfjNs9zN3G6gJo0 dXMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PHgQraG7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i3si7316543otc.272.2020.01.27.13.17.26; Mon, 27 Jan 2020 13:17:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PHgQraG7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726293AbgA0VQV (ORCPT + 99 others); Mon, 27 Jan 2020 16:16:21 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:43175 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725955AbgA0VQU (ORCPT ); Mon, 27 Jan 2020 16:16:20 -0500 Received: by mail-wr1-f68.google.com with SMTP id d16so13376491wre.10; Mon, 27 Jan 2020 13:16:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=81DUY2c9ZW9f7zPo1osSxiKRvvdo9rLFQnLTxxEr3HA=; b=PHgQraG7ew4Ok+osxRiL5gGVuArcTfpQmtcDcfLtwrxGuCQSmCl/ePQOiEfc/UB344 cJ55zwrIBvxU+s1jxtukUMYkJXJW5PeDWi+vAgRm37dF+PhmhNH7Io/eUX/VCoAbQhvJ DQxGLH850YxfALv077fYueO7atlFXfiKlU0PWuiNVTCBX4udfFkjB9Q62y8bWUphlCG2 mli4yv9yt5/nuczWjzvikK+X0dSJVjCsGaynMkCZUu44lHnPJD1IvOVFuW78VVSymi2D kb7mlENI9eJectKM9mQ4Igd7/T57dD8vNCXKMxUSOkYrAcIdvz5mcDuibBHHIaurvWR/ 4kKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=81DUY2c9ZW9f7zPo1osSxiKRvvdo9rLFQnLTxxEr3HA=; b=EOcxFw8H2anPZVOnqBm8/+mI0LUOOw0LosCftt3/nixT+75zPNf0MMfGMfv7TVnrw7 +6/7pHyKZzOfHaOINBxJrYM99KXTd93A0aVtPPe1dOpgYOEinMymf9k2rSoxQsY1lTSU 8gJMV2gC/j5xeuQb2hCVAmiPh8NxnMd/3O+fp2clvg0jQt29HcR4uQYQtj8DXvfemkfj bVsqngE32jAnfYM5CWU7Ey+R7aCzfDKD4gM67mBrU3Eu2N6MBDg/S1IzjRtx9yE9lerM /W60Vz3bOXtqx3bzK5nzBHwMlwEUlJIpq0xweFd03+P6GMhGtR1redes0nlentNT3err HiKA== X-Gm-Message-State: APjAAAUYf5wE0dIxv79pJVEdMSPWjajS6H5tDtsoph7C9TMAwM9SYjxo kuwXFVV8PfGbbmFvvYltxjotY5HazT0= X-Received: by 2002:adf:f80b:: with SMTP id s11mr25389289wrp.12.1580159777788; Mon, 27 Jan 2020 13:16:17 -0800 (PST) Received: from felia ([2001:16b8:2d9d:8c00:380d:4350:8c25:6615]) by smtp.gmail.com with ESMTPSA id u16sm50960wmj.41.2020.01.27.13.16.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jan 2020 13:16:17 -0800 (PST) From: Lukas Bulwahn X-Google-Original-From: Lukas Bulwahn Date: Mon, 27 Jan 2020 22:16:04 +0100 (CET) X-X-Sender: lukas@felia To: =?ISO-8859-15?Q?Jouni_H=F6gander?= cc: Lukas Bulwahn , Greg Kroah-Hartman , open list , Andrew Morton , Ben Hutchings , linux- stable , Netdev , Al Viro , linux-fsdevel@vger.kernel.org, Eric Dumazet , "David S. Miller" , syzkaller@googlegroups.com Subject: Re: [PATCH 4.19 000/306] 4.19.87-stable review In-Reply-To: <87h80h2suv.fsf@unikie.com> Message-ID: References: <20191127203114.766709977@linuxfoundation.org> <20191128073623.GE3317872@kroah.com> <20191129085800.GF3584430@kroah.com> <87sgk8szhc.fsf@unikie.com> <87h80h2suv.fsf@unikie.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323329-600720457-1580159776=:2951" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-600720457-1580159776=:2951 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT On Mon, 27 Jan 2020, Jouni Högander wrote: > Lukas Bulwahn writes: > > > On Wed, 22 Jan 2020, Jouni Högander wrote: > > > >> Greg Kroah-Hartman writes: > >> >> > Now queued up, I'll push out -rc2 versions with this fix. > >> >> > > >> >> > greg k-h > >> >> > >> >> We have also been informed about another regression these two commits > >> >> are causing: > >> >> > >> >> https://lore.kernel.org/lkml/ace19af4-7cae-babd-bac5-cd3505dcd874@I-love.SAKURA.ne.jp/ > >> >> > >> >> I suggest to drop these two patches from this queue, and give us a > >> >> week to shake out the regressions of the change, and once ready, we > >> >> can include the complete set of fixes to stable (probably in a week or > >> >> two). > >> > > >> > Ok, thanks for the information, I've now dropped them from all of the > >> > queues that had them in them. > >> > > >> > greg k-h > >> > >> I have now run more extensive Syzkaller testing on following patches: > >> > >> cb626bf566eb net-sysfs: Fix reference count leak > >> ddd9b5e3e765 net-sysfs: Call dev_hold always in rx_queue_add_kobject > >> e0b60903b434 net-sysfs: Call dev_hold always in netdev_queue_add_kobje > >> 48a322b6f996 net-sysfs: fix netdev_queue_add_kobject() breakage > >> b8eb718348b8 net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject > >> > >> These patches are fixing couple of memory leaks including this one found > >> by Syzbot: https://syzkaller.appspot.com/bug?extid=ad8ca40ecd77896d51e2 > >> > >> I can reproduce these memory leaks in following stable branches: 4.14, > >> 4.19, and 5.4. > >> > >> These are all now merged into net/master tree and based on my testing > >> they are ready to be taken into stable branches as well. > >> > > > > + syzkaller list > > Jouni et. al, please drop Linus in further responses; Linus, it was wrong > > to add you to this thread in the first place (reason is explained below) > > > > Jouni, thanks for investigating. > > > > It raises the following questions and comments: > > > > - Does the memory leak NOT appear on 4.9 and earlier LTS branches (or did > > you not check that)? If it does not appear, can you bisect it with the > > reproducer to the commit between 4.14 and 4.9? > > I tested and these memory leaks are not reproucible in 4.9 and earlier. > > > > > - Do the reproducers you found with your syzkaller testing show the same > > behaviour (same bisection) as the reproducers from syzbot? > > Yes, they are same. > > > > > - I fear syzbot's automatic bisection on is wrong, and Linus' commit > > 0e034f5c4bc4 ("iwlwifi: fix mis-merge that breaks the driver") is not to > > blame here; that commit did not cause the memory leak, but fixed some > > unrelated issue that simply confuses syzbot's automatic bisection. > > > > Just FYI: Dmitry Vyukov's evaluation of the syzbot bisection shows that > > about 50% are wrong, e.g., due to multiple bugs being triggered with one > > reproducer and the difficulty of automatically identifying them of being > > different due to different root causes (despite the smart heuristics of > > syzkaller & syzbot). So, to identify the actual commit on which the memory > > leak first appeared, you need to bisect manually with your own judgement > > if the reported bug stack trace fits to the issue you investigating. Or > > you use syzbot's automatic bisection but then with a reduced kernel config > > that cannot be confused by other issues. You might possibly also hit a > > "beginning of time" in your bisection, where KASAN was simply not > > supported, then the initially causing commit can simply not determined by > > bisection with the reproducer and needs some code inspection and > > archaeology with git. Can you go ahead try to identify the correct commit > > for this issue? > > These two commits (that are not in 4.9 and earlier) are intorducing these leaks: > > commit e331c9066901dfe40bea4647521b86e9fb9901bb > Author: YueHaibing > Date: Tue Mar 19 10:16:53 2019 +0800 > > net-sysfs: call dev_hold if kobject_init_and_add success > > [ Upstream commit a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e ] > > In netdev_queue_add_kobject and rx_queue_add_kobject, > if sysfs_create_group failed, kobject_put will call > netdev_queue_release to decrease dev refcont, however > dev_hold has not be called. So we will see this while > unregistering dev: > > unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1 > > Reported-by: Hulk Robot > Fixes: d0d668371679 ("net: don't decrement kobj reference count on init fail > ure") > Signed-off-by: YueHaibing > Signed-off-by: David S. Miller > Signed-off-by: Greg Kroah-Hartman > > commit d0d6683716791b2a2761a1bb025c613eb73da6c3 > Author: stephen hemminger > Date: Fri Aug 18 13:46:19 2017 -0700 > > net: don't decrement kobj reference count on init failure > > If kobject_init_and_add failed, then the failure path would > decrement the reference count of the queue kobject whose reference > count was already zero. > > Fixes: 114cf5802165 ("bql: Byte queue limits") > Signed-off-by: Stephen Hemminger > Signed-off-by: David S. Miller > But, it seems that we now have just a long sequences of fix patches. This commit from 2011 seems to be the initial buggy one: commit 114cf5802165ee93e3ab461c9c505cd94a08b800 Author: Tom Herbert Date: Mon Nov 28 16:33:09 2011 +0000 bql: Byte queue limits And then we just have fixes over fixes: 114cf5802165ee93e3ab461c9c505cd94a08b800 fixed by d0d6683716791b2a2761a1bb025c613eb73da6c3 fixed by a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e fixed by the sequence of your five patches, mentioned above If that is right, we should be able to find a reproducer with syzkaller on the versions before d0d668371679 ("net: don't decrement kobj reference count on init failure") with fault injection enabled or some manually injected fault by modifying the source code to always fail on init to really trigger the init failure, and see the reference count go below zero. All further issues should also have reproducers found with syzkaller. If we have a good feeling on the reproducers and this series of fixes really fixed the issue now here for all cases, we should suggest to backport all of the fixes to 4.4 and 4.9. We should NOT just have Greg pick up a subset of the patches and backport them to 4.4 and 4.9, that will likely break more than it fixes. Jouni, did you see Greg's bot inform you that he would pick up your latest patch for 4.4 and 4.9? Please respond to those emails to make sure a complete set of patches is picked up, which we tested with all those intermediate reproducers and an extensive syzkaller run hitting the net-sysfs interface (e.g., by configuring the corpus and check coverage). If you cannot do this testing for 4.4 and 4.9 now quickly (you potentially have less than 24 hours), we should hold those new patches back for 4.4 and 4.9, as none of the fixes seem to be applied at all right now and the users have not complained yet on 4.4 and 4.9. Once testing of the whole fix sequence is done, we request to backport all patches at once for 4.4 and 4.9. Lukas --8323329-600720457-1580159776=:2951--