Received: by 10.223.185.116 with SMTP id b49csp6537627wrg; Wed, 28 Feb 2018 11:06:49 -0800 (PST) X-Google-Smtp-Source: AH8x226z7yssFuHB3+B5vFemCmgP3f1at3YhMY1tERunfL8WYc2pJf1aMGcdUCQkulL31W18iBYA X-Received: by 10.101.87.199 with SMTP id q7mr15230656pgr.215.1519844809734; Wed, 28 Feb 2018 11:06:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519844809; cv=none; d=google.com; s=arc-20160816; b=bSUEYm2mzlnsL+MKIvLvQLDx7ltBWXDme8ae6BWIqUB579uBQ2GDLMeg9Fxtf2Yn2P Zw14ai3AuBV1I2isy3NpQtn3cYLFQ3ENPlEsskW1GcQjqx2Is0CYPT4IwO83CBGwsbwo ncRz1Nfdx8RmK0Bs5ODQodi3KwSHzDC0ckEpD6Vt5igNPlN8mD5YH0VFVPoadfg1L6ze zUMf3Zt6HRGZshLN5/dgFpciRr5hvWBOHuzZQafAek/9WhI3sqfpTrGMRLyJ6rT+JnZz Mc6caZI1EM6xgp5ct7xTzvbMjDRE//94/pvdYdxA1P95s0IUgJP+LTJsGwTR//CqfXyR fMeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=OYCaXxKCi8AeqympEkqleJ96BpPvwDzsF8UmVi1E57A=; b=GfHE0Ng0RBa0xvFtmqIHq6qGVwNAYSz0dwc6LI9xzoysEsbYf8IerxPhXRROdKlPGp aWRxoyvdcRQizZ3PVou9DxcG+H5OJf6wW9/2r5pRcSaQX7w2z9F3W402mHyW1Ur+ia2X TkKIo0tE8I2bNFXqBwTGKAA65K+gJy98ZjwiMPorJP0S8rnhdxLRtW1AtKUiFI/AM1C6 H1KqLjbK0YFCAg+cmbpMwavSpuEyZ+MfAL7FgoHEEEa3OQirPn9dRA/0o2hFv28AnKHS WUN+3xGjMSTZxABf+4ov+iLnE+/S0zY3hHyIcCL6iQhisdhUcuAoeJA4T4u3Fx2B8PTR bdkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=S270zvit; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 69-v6si1669102plc.814.2018.02.28.11.06.29; Wed, 28 Feb 2018 11:06:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=S270zvit; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933185AbeB1TFb (ORCPT + 99 others); Wed, 28 Feb 2018 14:05:31 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:34781 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932709AbeB1TFF (ORCPT ); Wed, 28 Feb 2018 14:05:05 -0500 Received: by mail-pf0-f196.google.com with SMTP id j20so1387562pfi.1; Wed, 28 Feb 2018 11:05:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=OYCaXxKCi8AeqympEkqleJ96BpPvwDzsF8UmVi1E57A=; b=S270zvitWSXgJbZvpCSHQJpB3it8+g/xjR+7hK2d8weLuBxRF3UKsMXYl6lmQVBOmU pXfnDgGx+Gd06hjaSW4PKCiqtd34FfVz0b01TYsIhk8jJ9LQcC7aj8Q1je8sgmP5fk7K 5OOpKaEBkYNG9Cm3Ewyma1g69R4veInpZ8K8xIOpAo/Rm3hM4NcNoaRfuAiRc2SV2S1W yai2QR4hdruTbjOkPIJlGO0IC4tuFjsbiAVUoVBxEKsljllx+KuYQ13JFKfwxZPhjcwv s+myJnvdDAwX40HRenn42cX2kK6+bZDB8hPfDWtJQBywiDWqWHE2zUADL3la24a3SDhk m8uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=OYCaXxKCi8AeqympEkqleJ96BpPvwDzsF8UmVi1E57A=; b=jTHQU+w2ZsqFbWGdK47ntMH9DYTqlDX635OjOqGaa+iS4WCvru92W4go7LeqkqJZgw GBqFl9UJ+h+3mms67P4NuCDgHdDTIgKGfyjPmiCna2CvpoCgs4LomDKHCJpF3VqkCSRS /7VpdcfUpxf94X79GSVyLw5ESdQ6q7JFuHidIIRiwxNdLjQxVAw94BiPU2zN9MHpsoJ9 wHg8k5ck6cnRsK0HF2BLSQPAhwAjxY/0PTR3/dkgTNR/J21JLBLVUcrwPVr2NKG9BVmi Bo2DsSgPjFar6RQfXPeLTFXSpGIia398SDZXVbfYeY+lY5b6VXOL+0qyySR1iXhI8Nt0 I5kA== X-Gm-Message-State: APf1xPB/APY4lhlpptbzyWh1qHAe8si1t/rWyoPtE2O+YufZi8WUXHwp Qi3nVDNYPBsVvCoDeiXQCTU= X-Received: by 10.99.126.14 with SMTP id z14mr15404705pgc.429.1519844705051; Wed, 28 Feb 2018 11:05:05 -0800 (PST) Received: from localhost.localdomain (c-73-93-215-6.hsd1.ca.comcast.net. [73.93.215.6]) by smtp.gmail.com with ESMTPSA id m22sm5027462pfg.188.2018.02.28.11.05.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 28 Feb 2018 11:05:04 -0800 (PST) From: frowand.list@gmail.com To: Rob Herring , cpandya@codeaurora.org Cc: devicetree@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 1/2] of: cache phandle nodes to reduce cost of of_find_node_by_phandle() Date: Wed, 28 Feb 2018 11:04:15 -0800 Message-Id: <1519844656-16443-2-git-send-email-frowand.list@gmail.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1519844656-16443-1-git-send-email-frowand.list@gmail.com> References: <1519844656-16443-1-git-send-email-frowand.list@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Frank Rowand Create a cache of the nodes that contain a phandle property. Use this cache to find the node for a given phandle value instead of scanning the devicetree to find the node. If the phandle value is not found in the cache, of_find_node_by_phandle() will fall back to the tree scan algorithm. The cache is initialized in of_core_init(). The cache is freed via a late_initcall_sync() if modules are not enabled. If the devicetree is created by the dtc compiler, with all phandle property values auto generated, then the size required by the cache could be 4 * (1 + number of phandles) bytes. This results in an O(1) node lookup cost for a given phandle value. Due to a concern that the phandle property values might not be consistent with what is generated by the dtc compiler, a mask has been added to the cache lookup algorithm. To maintain the O(1) node lookup cost, the size of the cache has been increased by rounding the number of entries up to the next power of two. The overhead of finding the devicetree node containing a given phandle value has been noted by several people in the recent past, in some cases with a patch to add a hashed index of devicetree nodes, based on the phandle value of the node. One concern with this approach is the extra space added to each node. This patch takes advantage of the phandle property values auto generated by the dtc compiler, which begin with one and monotonically increase by one, resulting in a range of 1..n for n phandle values. This implementation should also provide a good reduction of overhead for any range of phandle values that are mostly in a monotonic range. Performance measurements by Chintan Pandya of several implementations of patches that are similar to this one suggest an expected reduction of boot time by ~400ms for his test system. If the cache size was decreased to 64 entries, the boot time was reduced by ~340 ms. The measurements were on a 4.9.73 kernel for arch/arm64/boot/dts/qcom/sda670-mtp.dts, contains 2371 nodes and 814 phandle values. Reported-by: Chintan Pandya Signed-off-by: Frank Rowand --- Changes since v3: - of_populate_phandle_cache(): add check for failed memory allocation Changes since v2: - add mask to calculation of phandle cache entry - which results in better overhead reduction for devicetrees with phandle properties not allocated in the monotonically increasing range of 1..n - due to mask, number of entries in cache potentially increased to next power of two - minor fixes as suggested by reviewers - no longer using live_tree_max_phandle() so do not move it from drivers/of/resolver.c to drivers/of/base.c Changes since v1: - change short description from of: cache phandle nodes to reduce cost of of_find_node_by_phandle() - rebase on v4.16-rc1 - reorder new functions in base.c to avoid forward declaration - add locking around kfree(phandle_cache) for memory ordering - add explicit check for non-null of phandle_cache in of_find_node_by_phandle(). There is already a check for !handle, which prevents accessing a null phandle_cache, but that dependency is not obvious, so this check makes it more apparent. - do not free phandle_cache if modules are enabled, so that cached phandles will be available when modules are loaded drivers/of/base.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++--- drivers/of/of_private.h | 3 ++ drivers/of/resolver.c | 3 -- 3 files changed, 85 insertions(+), 7 deletions(-) diff --git a/drivers/of/base.c b/drivers/of/base.c index ad28de96e13f..e71d157d7149 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -91,10 +91,72 @@ int __weak of_node_to_nid(struct device_node *np) } #endif +static struct device_node **phandle_cache; +static u32 phandle_cache_mask; + +/* + * Assumptions behind phandle_cache implementation: + * - phandle property values are in a contiguous range of 1..n + * + * If the assumptions do not hold, then + * - the phandle lookup overhead reduction provided by the cache + * will likely be less + */ +static void of_populate_phandle_cache(void) +{ + unsigned long flags; + u32 cache_entries; + struct device_node *np; + u32 phandles = 0; + + raw_spin_lock_irqsave(&devtree_lock, flags); + + kfree(phandle_cache); + phandle_cache = NULL; + + for_each_of_allnodes(np) + if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) + phandles++; + + cache_entries = roundup_pow_of_two(phandles); + phandle_cache_mask = cache_entries - 1; + + phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache), + GFP_ATOMIC); + if (!phandle_cache) + goto out; + + for_each_of_allnodes(np) + if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) + phandle_cache[np->phandle & phandle_cache_mask] = np; + +out: + raw_spin_unlock_irqrestore(&devtree_lock, flags); +} + +#ifndef CONFIG_MODULES +static int __init of_free_phandle_cache(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&devtree_lock, flags); + + kfree(phandle_cache); + phandle_cache = NULL; + + raw_spin_unlock_irqrestore(&devtree_lock, flags); + + return 0; +} +late_initcall_sync(of_free_phandle_cache); +#endif + void __init of_core_init(void) { struct device_node *np; + of_populate_phandle_cache(); + /* Create the kset, and register existing nodes */ mutex_lock(&of_mutex); of_kset = kset_create_and_add("devicetree", NULL, firmware_kobj); @@ -1021,16 +1083,32 @@ int of_modalias_node(struct device_node *node, char *modalias, int len) */ struct device_node *of_find_node_by_phandle(phandle handle) { - struct device_node *np; + struct device_node *np = NULL; unsigned long flags; + phandle masked_handle; if (!handle) return NULL; raw_spin_lock_irqsave(&devtree_lock, flags); - for_each_of_allnodes(np) - if (np->phandle == handle) - break; + + masked_handle = handle & phandle_cache_mask; + + if (phandle_cache) { + if (phandle_cache[masked_handle] && + handle == phandle_cache[masked_handle]->phandle) + np = phandle_cache[masked_handle]; + } + + if (!np) { + for_each_of_allnodes(np) + if (np->phandle == handle) { + if (phandle_cache) + phandle_cache[masked_handle] = np; + break; + } + } + of_node_get(np); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index 0c609e7d0334..fa70650136b4 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -131,6 +131,9 @@ extern void __of_update_property_sysfs(struct device_node *np, extern void __of_sysfs_remove_bin_file(struct device_node *np, struct property *prop); +/* illegal phandle value (set when unresolved) */ +#define OF_PHANDLE_ILLEGAL 0xdeadbeef + /* iterators for transactions, used for overlays */ /* forward iterator */ #define for_each_transaction_entry(_oft, _te) \ diff --git a/drivers/of/resolver.c b/drivers/of/resolver.c index 740d19bde601..b2ca8185c8c6 100644 --- a/drivers/of/resolver.c +++ b/drivers/of/resolver.c @@ -19,9 +19,6 @@ #include "of_private.h" -/* illegal phandle value (set when unresolved) */ -#define OF_PHANDLE_ILLEGAL 0xdeadbeef - static phandle live_tree_max_phandle(void) { struct device_node *node; -- Frank Rowand