Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp5380683pxu; Thu, 22 Oct 2020 00:09:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyheT52FKxIj6cMNxqfKyd4gY2f/3SJmTFJcEdo53TuEQIXdeTG4jJ1Xl9EB+XLX7usxW2C X-Received: by 2002:a17:906:e292:: with SMTP id gg18mr961967ejb.278.1603350543567; Thu, 22 Oct 2020 00:09:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603350543; cv=none; d=google.com; s=arc-20160816; b=SxCfEE7YX7tq6SemKthztQvoC/icpITUfXS8LBUNPjZgWjtzZaiMHV5+g195y7mYGU /Y0iXsUzwUH+8V9k1mrRWkjLX/HBhs2URTuNMOyHkwIMYSlpN9Gnv5+XKhpZbeoBSGpW yrp7wThIq6FHGCQBmsih84/344Ztm4H7NRpx/W6iCtUxCQVI02nE8TVQtpl0LizY434G oyrTtfCjBdjxaBfbyfIZ5AU7ll5NHzSSjEstpknNlVul33mKau8566qfYsl5zu+/wd0p lXG34Cm0f2+j44tAEUfvMP8ESFh6dag+kPdnk8J8trg9QKb8PbYD7WZKUHkRD0D/ZYgt P+6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=AieU9TT4ooG8IhHcQMgp++3eebf4OkcrIycb0MLJFJ8=; b=oXHy5x1H/RpbBl/dUknYnPcNMTy+Vd1laie1Ci+F3d2/uoKi5O+mU7Qw6BFtxy0zq2 exrfot4E7I3jDzSGMWc1HhTkJvZIb9y9Uc/4wCV1cgoHJRPUCawPs/Bybi/ZsMhAJ+pM JV4bt/GGusnHcNGMcUyoBPu2+2m4EYaY2Vv81ZtUKBZwqQXX5uNOI+uJfk3YXKcVnqtm ILtQm4rvuVdyCcsMipjoBYrZUWCu1sX607yHen4wi0XOZrgbPUVLzmVP0+fWJOtqhkeo 0xhiOUTzIzBrUbB+UUdwdRb3PXB8Q5UERziEgou1zpsUrcsChwuKXdPg4iy/GGk2FIh3 LiFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LWkQ3BmX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dg11si425501edb.246.2020.10.22.00.08.40; Thu, 22 Oct 2020 00:09:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LWkQ3BmX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505698AbgJUVCp (ORCPT + 99 others); Wed, 21 Oct 2020 17:02:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2439588AbgJUVCp (ORCPT ); Wed, 21 Oct 2020 17:02:45 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37349C0613CE; Wed, 21 Oct 2020 14:02:44 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id a23so3998881qkg.13; Wed, 21 Oct 2020 14:02:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=AieU9TT4ooG8IhHcQMgp++3eebf4OkcrIycb0MLJFJ8=; b=LWkQ3BmXCFBlgRMVEmpfuWCH2qPqVO3fohsqtbsJ5fcAymSsi7hVIVfbmgtSMyV7yj /TuKaAgB/5oBzqnroZSZNWXOb5w7nAvrU2c/LcqtwF/PaojuIjIeZfC2HMeEsVU/yNP8 +4GIXNjuPswdjGBj41/n03S1+nJGcWN61xuQHahtgBuAg02c1fFDXxTiM07tV0YdBjW2 ly300NjbORSBetK2Rwi9tMpt2iGSdzYO5yLj31Et+bIJhKeysuhqZyBc8lmBjh+lIvpB xugJ/zgBGeYbloJFr5iWvwRyzwYIgwsZLCUT2BMHex8wWYby3CzgGKUwofRjJq6zY+XW YnDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=AieU9TT4ooG8IhHcQMgp++3eebf4OkcrIycb0MLJFJ8=; b=gQsgx3eNlJz6KQ3nB7sB1HrgfIvhyfXKAI/E1U4URxKQkhNu5icii2x+EuSXvXVUFb yscIiACTNlilhV8agn3R1uMdMRlsP7tlzQaSjvCUbOck9h0RFEbDjiaUhUeerjciFpbh XqZCMEc2UhCoV+4M6/shP7Ptpblok0m4EZ0xiFpmIFn2nKg1TyP5zQ4yz11YbBhS4nQF hb3FnujrqEQNQvdBIKYh+gBQDSbBEqm4wn+YJcWFb/QqYtxOyTjpAJUm6UJ6fIbkxvu9 iFyU45SILx+xpsUdQqvJIf+myY86oI60MdHflC9mMikJklQZc6wizjRIdMu62lyoEUAY 0qew== X-Gm-Message-State: AOAM533cZ9/hHlN9mOnwshz3H1ZH7jk6U43ZwP1OgHRK4ARGIz8aYFjC u9fIqnf0hipOhI9ysm9Hqos= X-Received: by 2002:a05:620a:100e:: with SMTP id z14mr4879083qkj.278.1603314163369; Wed, 21 Oct 2020 14:02:43 -0700 (PDT) Received: from [192.168.1.49] (c-67-187-90-124.hsd1.tn.comcast.net. [67.187.90.124]) by smtp.gmail.com with ESMTPSA id g38sm2154683qtb.25.2020.10.21.14.02.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Oct 2020 14:02:42 -0700 (PDT) Subject: Re: [RFC PATCH 0/3] Fix errors on DT overlay removal with devlinks To: Michael Auchter , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, saravanak@google.com Cc: robh+dt@kernel.org, gregkh@linuxfoundation.org, rafael@kernel.org References: <20201014193615.1045792-1-michael.auchter@ni.com> From: Frank Rowand Message-ID: Date: Wed, 21 Oct 2020 16:02:41 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201014193615.1045792-1-michael.auchter@ni.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Saravana, Michael found an issue related to the removal of a devicetree node which involves devlinks: On 10/14/20 2:36 PM, Michael Auchter wrote: > After updating to v5.9, I've started seeing errors in the kernel log > when using device tree overlays. Specifically, the problem seems to > happen when removing a device tree overlay that contains two devices > with some dependency between them (e.g., a device that provides a clock > and a device that consumes that clock). Removing such an overlay results > in: > > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy > > followed by hitting some REFCOUNT_WARNs in refcount.c > > In the first patch, I've included a unittest that can be used to > reproduce this when built with CONFIG_OF_UNITTEST [1]. > > I believe the issue is caused by the cleanup performed when releasing > the devlink device that's created to represent the dependency between > devices. The devlink device has references to the consumer and supplier > devices, which it drops in device_link_free; the devlink device's > release callback calls device_link_free via call_srcu. > > When the overlay is being removed, all devices are removed, and > eventually the release callback for the devlink device run, and > schedules cleanup using call_srcu. Before device_link_free can and call > put_device on the consumer/supplier, the rest of the overlay removal > process runs, resulting in the error traces above. When a devicetree node in an overlay is removed, the remove code expects all previous users of the related device to have done the appropriate put of the device and to have no later references. As Michael described above, the devlink release callback defers the put_device(). The cleanup via srcu was implemented in commit 843e600b8a2b01463c4d873a90b2c2ea8033f1f6 "driver core: Fix sleeping in invalid context during device link deletion" to solve yet another issue. > > Patches 2 and 3 are an attempt at fixing this: call srcu_barrier to wait > for any pending device_link_free's to execute before continuing on with > the removal process. > > These patches resolve the issue, but probably not in the best way. In > particular, it seems strange to need to leak details of devlinks into > the device tree overlay code. So, I'd be curious to get some feedback or > hear any other ideas for how to resolve this issue. I agree with Michael that adding an indirect call of srcu_barrier(&device_links_srcu) into the devicetree overlay code is not an appropriate solution. Is there some other way to fix the problem that 843e600b8a2b solves without deferring the put_device() done by the devlink release callback? -Frank > > Thanks, > Michael > > 1. Note that this isn't a very good unit test: it will report a "pass" > even if it fails with the aforementioned errors, as these errors > aren't propogated. > > Michael Auchter (3): > of: unittest: add test of overlay with devlinks > driver core: add device_links_barrier > of: dynamic: add device links barrier before detach > > drivers/base/core.c | 10 ++++++++++ > drivers/of/dynamic.c | 3 +++ > drivers/of/unittest-data/Makefile | 1 + > drivers/of/unittest-data/overlay_16.dts | 26 +++++++++++++++++++++++++ > drivers/of/unittest.c | 16 +++++++++++++++ > include/linux/device.h | 1 + > 6 files changed, 57 insertions(+) > create mode 100644 drivers/of/unittest-data/overlay_16.dts >