Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp418153pxb; Mon, 16 Aug 2021 08:18:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyXzbout1trzA+lEoVcHLN/dlAqal/fViLg6vhJFD+MqViy50mLl8jrdjGipRDg4UbudGk6 X-Received: by 2002:a17:906:405:: with SMTP id d5mr16497797eja.189.1629127117726; Mon, 16 Aug 2021 08:18:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629127117; cv=none; d=google.com; s=arc-20160816; b=xgvwm/Xu/QYjiT2w6Uxh/+wtYm+F7hUVfyMumpFDrN9vJe6raT0VHVc1S0SRKGPYBY Mn2Ag0N4BiTDZxGXAZTfh3kIFpunbigxDUmphZPzrEwxp+eQP8UUkUcWPVoKMj92FGRI 0XXEIjX7D9od8OqKi8+o1V2Qh6p2ERl2c26bzqJQaBxBf86oINJ6MI/v7EM32dpsA1XA nuujJRK9NuNYVEVoFfLHVIyUaD6WoCIjF8S1DT10nuoXO/1dRHz754hf9Gqgoi92enqs G9W5AGDqoClKXu652vSopna/whc0IaGifDSvPpdutiefrzQlTErZJgEuE39sIdfaoK7y Lyow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=PCGu6LKv/RVKdtYv9khNj4fVhn3c3G222WeZkDQYghA=; b=XOMwqfRoQKo4AeFagM9ze3wQbcEurvEAd5qq0avNwa2NwRrBTIz3zNeoR/s3AA1pms DWojtBk2YalMY+wtH4LmoLFLexWEm0ebrZGTGxG42fmQIDwSwLdRr14CCT/IsRR3Qka0 8oh2MTgJ5AJY47w/PqWTdmmxSJsgzM93agz4ui4GruGtjaIiVuI80xgr1OufTrbsM7Xr 4LswqflXMWIPPsoRGS0aGvCrlpdn2tW7+uTzVH4o5TfbC8a1qiFzDWEUMKw5K6QpO1E9 Z8UvuoCbzM6TrGwTBXH3KfqYUkVsfErGQ4hTRzK2Xo6qcBs9WvPn5K1UvalnpYWtLt+3 4O6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CISces9T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m12si9908905ejg.274.2021.08.16.08.18.11; Mon, 16 Aug 2021 08:18:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CISces9T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232841AbhHPPO4 (ORCPT + 99 others); Mon, 16 Aug 2021 11:14:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232574AbhHPPOz (ORCPT ); Mon, 16 Aug 2021 11:14:55 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48F0AC0613C1; Mon, 16 Aug 2021 08:14:24 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id t190so6044899qke.7; Mon, 16 Aug 2021 08:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=PCGu6LKv/RVKdtYv9khNj4fVhn3c3G222WeZkDQYghA=; b=CISces9TDTtyNHUhb+Dzk9UPoZjkkFXWYex7qH35eoRxGkQVLgR/1DTqaRkfd78IuQ xpRbqNcaPxnjmxKfMPsl97t8z5ktu5ohAzxkVsEyke1kmjRaLwCC4bQxurQJX+scFQUa xMM6XOi+foF8nmOqGYxv8ZoWEyfaMniSFzs1s03DYzWXAE9vqHIG9g7YVvlYPGws3vpT 6yioiJMXf3ojQrLqCPUqabZWVIkq2LJxyYYkkWffzSbNWvvzzjkuw9rldmpJMEtICRWP rXdcOrs16jzuzo4rGDTodQBL7ocHJ9hFU14MZNNq4073wGUxuvxL9GtbvYpe9T6cwVEV Clgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=PCGu6LKv/RVKdtYv9khNj4fVhn3c3G222WeZkDQYghA=; b=Ihkm6RMAxIdD35JgUKbeJb+S5/XcAfVZN1uxcqRN94KNAyRKJl4EUGpHQNKfe8F4Mk D5iYzzE5qzryqDO288fVsCdb0lBBhpfzqGxusDupYBwKkQfXlC/a3s3gu2XndagpfyBR rbJSskndQ+c5Ym0JS3roe9jZn5GfbcqtZdnujxn/bWmL0k60uPQqhFL5tZ0nM/z49gim 7pCAJaC++HaZYvNe661FrbC9xGj7f2OB77xpkJuG+D/t1ZaHLhOwPamXGijdZi5hmQsD pGbFDmOsZuw0YTbEq4F/Tnh0ytAj7bThh33WmXEJ29dfnAebBrgktu6+cGQlRhOKQmSZ 5lCQ== X-Gm-Message-State: AOAM5301guf6ysAatA1XuP47AY3XJZogeQer+hDYTMXSHq0C4QZFVUuw 83P40FZFPFG70AWKfmrZ7pI= X-Received: by 2002:a37:d4c:: with SMTP id 73mr16227882qkn.188.1629126863427; Mon, 16 Aug 2021 08:14:23 -0700 (PDT) Received: from [192.168.1.49] (c-67-187-90-124.hsd1.ky.comcast.net. [67.187.90.124]) by smtp.gmail.com with ESMTPSA id y9sm4771547qtw.51.2021.08.16.08.14.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Aug 2021 08:14:23 -0700 (PDT) Subject: Re: of_node_put() usage is buggy all over drivers/of/base.c?! To: Vladimir Oltean Cc: Rob Herring , Sascha Hauer , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org References: <20210814010139.kzryimmp4rizlznt@skbuf> <9accd63a-961c-4dab-e167-9e2654917a94@gmail.com> <20210816144622.tgslast6sbblclda@skbuf> From: Frank Rowand Message-ID: <4cad28e0-d6b4-800d-787b-936ffaca7be3@gmail.com> Date: Mon, 16 Aug 2021 10:14:22 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210816144622.tgslast6sbblclda@skbuf> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/16/21 9:46 AM, Vladimir Oltean wrote: > Hi Frank, > > On Mon, Aug 16, 2021 at 09:33:03AM -0500, Frank Rowand wrote: >> Hi Vladimir, >> >> On 8/13/21 8:01 PM, Vladimir Oltean wrote: >>> Hi, >>> >>> I was debugging an RCU stall which happened during the probing of a >>> driver. Activating lock debugging, I see: >> >> I took a quick look at sja1105_mdiobus_register() in v5.14-rc1 and v5.14-rc6. >> >> Looking at the following stack trace, I did not see any calls to >> of_find_compatible_node() in sja1105_mdiobus_register(). I am >> guessing that maybe there is an inlined function that calls >> of_find_compatible_node(). This would likely be either >> sja1105_mdiobus_base_tx_register() or sja1105_mdioux_base_t1_register(). > > Yes, it is sja1105_mdiobus_base_t1_register which is inlined. > >>> >>> [ 101.710694] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938 >>> [ 101.719119] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1534, name: sh >>> [ 101.726763] INFO: lockdep is turned off. >>> [ 101.730674] irq event stamp: 0 >>> [ 101.733716] hardirqs last enabled at (0): [<0000000000000000>] 0x0 >>> [ 101.739973] hardirqs last disabled at (0): [] copy_process+0xa78/0x1a98 >>> [ 101.748146] softirqs last enabled at (0): [] copy_process+0xa78/0x1a98 >>> [ 101.756313] softirqs last disabled at (0): [<0000000000000000>] 0x0 >>> [ 101.762569] CPU: 4 PID: 1534 Comm: sh Not tainted 5.14.0-rc5+ #272 >>> [ 101.774558] Call trace: >>> [ 101.794734] __might_sleep+0x50/0x88 >>> [ 101.798297] __mutex_lock+0x60/0x938 >>> [ 101.801863] mutex_lock_nested+0x38/0x50 >>> [ 101.805775] kernfs_remove+0x2c/0x50 <---- this takes mutex_lock(&kernfs_mutex); >>> [ 101.809341] sysfs_remove_dir+0x54/0x70 >> >> The __kobject_del() occurs only if the refcount on the node >> becomes zero. This should never be true when of_find_compatible_node() >> calls of_node_put() unless a "from" node is passed to of_find_compatible_node(). > > I figured that was the assumption, that the of_node_put would never > trigger a sysfs file / kobject deletion from there. > >> In both sja1105_mdiobus_base_tx_register() and sja1105_mdioux_base_t1_register() >> a from node ("mdio") is passed to of_find_compatible_node() without first doing an >> of_node_get(mdio). If you add the of_node_get() calls the problem should be fixed. > > The answer seems simple enough, but stupid question, but why does > of_find_compatible_node call of_node_put on "from" in the first place? Actually a good question. I do not know why of_find_compatible_node() calls of_node_put() instead of making the caller of of_find_compatible_node() responsible. That pattern was created long before I was involved in devicetree and I have not gone back to read the review comments of when that code was created. -Frank