Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2414825imm; Thu, 14 Jun 2018 14:01:06 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLUxT5Kd0hQ172fTlwIen9y7s6jLoYE9grdlrGOHZA+FzIPOoti8ealJWzuiwO+m3G0C1RX X-Received: by 2002:a17:902:1004:: with SMTP id b4-v6mr4835763pla.82.1529010066332; Thu, 14 Jun 2018 14:01:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529010066; cv=none; d=google.com; s=arc-20160816; b=oKcCwjWLptWB3kEsrwrmrnsCVhPM57lOBEbo9lAzzclGrAMDei1OW7YZgcL7iLdp4G hSBmxwvN972JJ1c3G9tyYvucomosQP1ABRfOiTqjTwsr/M5sqnkMTudZCIdsR8nu2+53 lWiwxVJLSy8MfnFPY0s/9UWtBE+fK5DA+rgQktb3dA4ivZhMpBLgvSOrN3558cgdNPI8 pL3L9gdsgHTf3dTVhL0MX8I9/H9qab41Bp8Lneo0cMdHri0xMCZlgMpQeE1nhj3vcQHF KYImBPD4Agk1qkInqcCW5RLilfJsE2jzbTs5HVSSE6j2zI9RV1DsBh3L0QXTtPmyUNaN 2a7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=hK6vvQkaWax0e0E6NysRFPHb8uXPnkK7PG4SUcyGIHM=; b=Mr/C/XjAWbD1bmhtdTAAkNU2ZY9YVWU4yySgfSdUCY4QS96Cr2LPu6rgTiA+6Vmoev JiKqqvYhGx4S5XS9yhLrGkUM+bw1Km66Qvb1muoHT1JvxOoMBs2bIxIhaXH3x+gCrvkA bBstSaDashBqGLsOQPusrQ0ijjRL5Avtfd3lD+fPHX43txmabV5knyQ4cmB4bPAuyCth /pwihZlODppKYzEuvQ4Iqdhj5Odh7Netr+jeVkPYSzr0TkSLRVKrGbqDh4cxURhnw3BO DAbwNfQM5DzOLktEE4Mt2dDOstJw9ky8/PBAwKBiIyu2NTmRNGFiEwmS9dMeKqEGxt4B Bsfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="nW/vi0Sp"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 89-v6si6375169plb.154.2018.06.14.14.00.39; Thu, 14 Jun 2018 14:01:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="nW/vi0Sp"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755537AbeFNVAD (ORCPT + 99 others); Thu, 14 Jun 2018 17:00:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:34050 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755406AbeFNU76 (ORCPT ); Thu, 14 Jun 2018 16:59:58 -0400 Received: from mail-yb0-f175.google.com (mail-yb0-f175.google.com [209.85.213.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DDC90208D7; Thu, 14 Jun 2018 20:59:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1529009998; bh=cIFdctODhYXkgIdyNBMuOXwZLpKnia7PgsfKJp6GXW8=; h=In-Reply-To:References:From:Date:Subject:To:Cc:From; b=nW/vi0Spj/LuYxgsRtdsSaRqurqWaovc0MY7zdMfEX2HbAuFQaYidjkgzO3Jhztlu YSA4SCDkoUaMQNAkzJPSmcepnstpKrjkACXJogpg2jDeGRz63HiS+4yeQMcJdzsb9G csSGZGCz+BMt7hfN2ln0OZ1j3+L/f+DAS1TBynAM= Received: by mail-yb0-f175.google.com with SMTP id n23-v6so2760554ybg.1; Thu, 14 Jun 2018 13:59:57 -0700 (PDT) X-Gm-Message-State: APt69E29GPVHKIWKvzR2y11S2AuogD/WmsVT3PWNQM8+YyPWgiHVfVeH NeplxwhkKf3jfYiQAYf6AJx4nbNf8ZQ7Mtv7SV4= X-Received: by 2002:a25:40c2:: with SMTP id n185-v6mr2412882yba.122.1529009997060; Thu, 14 Jun 2018 13:59:57 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a5b:b87:0:0:0:0:0 with HTTP; Thu, 14 Jun 2018 13:59:16 -0700 (PDT) In-Reply-To: References: <1520208889-3908-1-git-send-email-frowand.list@gmail.com> <1520208889-3908-2-git-send-email-frowand.list@gmail.com> From: Alan Tull Date: Thu, 14 Jun 2018 15:59:16 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 1/3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle() To: Frank Rowand Cc: Rob Herring , cpandya@codeaurora.org, "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" , linux-kernel , linux-fpga@vger.kernel.org, Moritz Fischer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 13, 2018 at 4:47 PM, Frank Rowand wrot= e: > On 06/13/18 07:42, Alan Tull wrote: >> On Tue, Jun 12, 2018 at 1:16 PM, Alan Tull wrote: >>> On Sun, Mar 4, 2018 at 6:14 PM, wrote: >>> >>> Hi Frank, >>> >>> I'm investigating a refcount use-after-free warning that happens after >>> overlays are applied, removed, reapplied a few (typically three) times >>> (see below). This is new in v4.17, didn't happen in v4.16. As I was >>> investigating I found that rebuilding the phandle_cache after overlays >>> are applied or removed seems to help. >> >> I was probably wrong about this. The more I look at the phandle_cache c= ode, >> the more it looks looks good and straightforward. Probably disabling >> phandle_cache is 'fixing' things through some weird side effect. I'll >> keep investigating. Sorry for the noise. > > I suspect that you have found an issue, even if it is not the cause of > the refcount issue. I noted in a reply to v4 of the patch: > > >> +static void of_populate_phandle_cache(void) > >> +{ > >> +=C3=82 =C3=82 =C3=82 unsigned long flags; > >> +=C3=82 =C3=82 =C3=82 u32 cache_entries; > >> +=C3=82 =C3=82 =C3=82 struct device_node *np; > >> +=C3=82 =C3=82 =C3=82 u32 phandles =3D 0; > >> + > >> +=C3=82 =C3=82 =C3=82 raw_spin_lock_irqsave(&devtree_lock, flags); > >> + > >> +=C3=82 =C3=82 =C3=82 kfree(phandle_cache); > > > > I couldn't understood this. Everything else looks good to me. > > I will be adding a call to of_populate_phandle_cache() from the > devicetree overlay code. I put the kfree here so that the previous > cache memory is freed when a new cache is created. > > Adding the call from the overlay code is not done in this > series because I have a patch series modifying overlays and > I do not want to create a conflict or ordering between that > series and that patch. The lack of the call from overlay > code means that overlay code will gain some of the overhead > reduction from this patch, but possibly not the entire reduction. > > Sorry I'm not giving a link to the archive of this message - I have > a class I have to go to so I don't have enough time to find it. I understand, that's cool. I've found the email chain, thanks for pointing me to it! > The > email was > > Subject: Re: [PATCH v3] of: cache phandle nodes to reduce cost of > of_find_node_by_phandle() > Date: Fri, 16 Feb 2018 14:20:22 -0800 > Message-ID: <46d5fc76-33e3-d54a-26b8-e9bb8332924d@gmail.com> > > Quickly looking at the current code, I don't see the overlay patch > that I mentioned. I have to dig into what happened to that. > > Leaving a phandle from an overlay in the phandle cache after the > overlay is removed would clearly be a bug. Yes that's totally it, when I get a match of a phandle and the np address is stale, then comes the kernel warning. I added more debug code to print out the results of looking up dynamic nodes only (nodes > the highest node added when the cache was originally built). Also I added debug code to also force the lookup code to go back and look up dynamic nodes the slow, uncached way so I could see if it got the correct np address. I'm unloading and reloading an overlay, so sometimes the new np (after the overlay was removed and reapplied) is at the same address, but sometimes it isn't and then I get the kernel warning. Just for entertainment value, here is my prints of the np from the cache and the correct np that' at a different address. The '88' is the phandle, after that is the np address, then the np=3D. Note also the corruption in the cached node's full_name OF: of_find_node_by_phandle line 1108 found dynamic node in cache 88 ee518680 np=3D/soc/base_fpga_region/\300\252~\356ze_controller@0x100000450 OF: of_find_node_by_phandle line 1119 found the slow way: 88 eeaf2d00 np=3D/soc/base_fpga_region/freeze_controller@0x100000450 I'm just scratching my head and wondering: should we be zeroing out some things in memory after an overlay is removed? Then this bug would led to a pointer that would have thrown an oops. If it's useful for you, next week I can send a cleaned up version of my patch that rebuilds the cache after overlays are applied or removed. Alan > > -Frank