of_find_node_by_phandle() takes a lot of time (1ms per
call) to find right node when your intended device is
too deeper in the fdt. Reason is, we search for each
device serially in the fdt. See this,
struct device_node *__of_find_all_nodes(struct device_node *prev)
{
struct device_node *np;
if (!prev) {
np = of_root;
} else if (prev->child) {
np = prev->child;
} else {
/* Walk back up looking for a sibling, or the end of the structure */
np = prev;
while (np->parent && !np->sibling)
np = np->parent;
np = np->sibling; /* Might be null at the end of the tree */
}
return np;
}
#define for_each_of_allnodes_from(from, dn) \
for (dn = __of_find_all_nodes(from); dn; dn = __of_find_all_nodes(dn))
#define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)
Implement, device-phandle relation in hash-table so
that look up can be faster, irrespective of where my
device is defined in the DT.
There are ~6.7k calls to of_find_node_by_phandle() and
total improvement observed during boot is 400ms.
Signed-off-by: Chintan Pandya <[email protected]>
---
drivers/of/base.c | 8 ++++++--
drivers/of/fdt.c | 18 ++++++++++++++++++
include/linux/of.h | 6 ++++++
3 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/of/base.c b/drivers/of/base.c
index 26618ba..bfbfa99 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1005,10 +1005,14 @@ struct device_node *of_find_node_by_phandle(phandle handle)
if (!handle)
return NULL;
- raw_spin_lock_irqsave(&devtree_lock, flags);
- for_each_of_allnodes(np)
+ spin_lock(&dt_hash_spinlock);
+ hash_for_each_possible(dt_hash_table, np, hash, handle)
if (np->phandle == handle)
break;
+
+ spin_unlock(&dt_hash_spinlock);
+
+ raw_spin_lock_irqsave(&devtree_lock, flags);
of_node_get(np);
raw_spin_unlock_irqrestore(&devtree_lock, flags);
return np;
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4675e5a..62a9a4c 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -33,6 +33,10 @@
#include "of_private.h"
+static bool dt_hash_needs_init = true;
+DECLARE_HASHTABLE(dt_hash_table, DT_HASH_BITS);
+DEFINE_SPINLOCK(dt_hash_spinlock);
+
/*
* of_fdt_limit_memory - limit the number of regions in the /memory node
* @limit: maximum entries
@@ -242,6 +246,20 @@ static void populate_properties(const void *blob,
pprev = &pp->next;
}
+ /*
+ * In 'dryrun = true' cases, np is some non-NULL junk. So, protect
+ * against those cases.
+ */
+ if (!dryrun && np->phandle) {
+ spin_lock(&dt_hash_spinlock);
+ if (dt_hash_needs_init) {
+ dt_hash_needs_init = false;
+ hash_init(dt_hash_table);
+ }
+ hash_add(dt_hash_table, &np->hash, np->phandle);
+ spin_unlock(&dt_hash_spinlock);
+ }
+
/* With version 0x10 we may not have the name property,
* recreate it here from the unit name if absent
*/
diff --git a/include/linux/of.h b/include/linux/of.h
index d3dea1d..2e3ba84 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -25,6 +25,7 @@
#include <linux/notifier.h>
#include <linux/property.h>
#include <linux/list.h>
+#include <linux/hashtable.h>
#include <asm/byteorder.h>
#include <asm/errno.h>
@@ -69,6 +70,7 @@ struct device_node {
#endif
unsigned long _flags;
void *data;
+ struct hlist_node hash;
#if defined(CONFIG_SPARC)
const char *path_component_name;
unsigned int unique_id;
@@ -76,6 +78,10 @@ struct device_node {
#endif
};
+#define DT_HASH_BITS 6
+extern DECLARE_HASHTABLE(dt_hash_table, DT_HASH_BITS);
+extern spinlock_t dt_hash_spinlock;
+
#define MAX_PHANDLE_ARGS 16
struct of_phandle_args {
struct device_node *np;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On 2018-01-26 09:31, Chintan Pandya wrote:
> Implement, device-phandle relation in hash-table so
> that look up can be faster, irrespective of where my
> device is defined in the DT.
>
> There are ~6.7k calls to of_find_node_by_phandle() and
> total improvement observed during boot is 400ms.
I'm probably missing something obvious, but: Aren't phandles in practice
small consecutive integers assigned by dtc? If so, why not just have a
smallish static array mapping the small phandle values directly to
device node, instead of adding a pointer to every struct device_node? Or
one could determine the size of the array dynamically (largest seen
phandle value, capping at something sensible, e.g. 1024).
In either case, one would still need to keep the code doing the
whole-tree traversal for handling large phandle values, but I think the
above should make lookup O(1) in most cases.
Alternatively, one could just count the number of nodes with a phandle,
allocate an array of that many pointers (so the memory use is certainly
no more than if adding a pointer to each device_node), and sort it by
phandle, so one can do lookup using a binary search.
Rasmus
> I'm probably missing something obvious, but: Aren't phandles in practice
> small consecutive integers assigned by dtc? If so, why not just have a
> smallish static array mapping the small phandle values directly to
> device node, instead of adding a pointer to every struct device_node? Or
> one could determine the size of the array dynamically (largest seen
> phandle value, capping at something sensible, e.g. 1024).
Haven't noticed this earlier !! If following is known or true, we can avoid
using hash-table and save per device_node hlish_node.
1. How to know max phandle value without traversing tree once? In
my case,
max is 1263.
2. Although, I haven't observed collision but is it like every
device_node
is associated with unique phandle value ?
> In either case, one would still need to keep the code doing the
> whole-tree traversal for handling large phandle values, but I think the
> above should make lookup O(1) in most cases.
I would refrain doing this because that will make this API inconsistent
in terms
of time taken by different nodes. I see that people do change their device
placing in DT and that changes time taken in of_* APIs for them but
affecting
others.
> Alternatively, one could just count the number of nodes with a phandle,
> allocate an array of that many pointers (so the memory use is certainly
> no more than if adding a pointer to each device_node), and sort it by
> phandle, so one can do lookup using a binary search.
>
> Rasmus
This is certainly doable if current approach is not welcomed due to
addition on hlish_node in device_node.
Chintan Pandya
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On Fri, Jan 26, 2018 at 9:14 AM, Chintan Pandya <[email protected]> wrote:
>
>> I'm probably missing something obvious, but: Aren't phandles in practice
>> small consecutive integers assigned by dtc? If so, why not just have a
>> smallish static array mapping the small phandle values directly to
>> device node, instead of adding a pointer to every struct device_node? Or
>> one could determine the size of the array dynamically (largest seen
>> phandle value, capping at something sensible, e.g. 1024).
I do have some concerns that is a bit fragile and dependent on dtc's
implementations. However, I guess we already kind of are with overlay
phandles.
> Haven't noticed this earlier !! If following is known or true, we can avoid
> using hash-table and save per device_node hlish_node.
>
> 1. How to know max phandle value without traversing tree once? In my
> case,
> max is 1263.
We already have to know it for overlays. Plus unflattening has to
handle phandles specially already, so it would be easy to do there if
we aren't already.
Then the question what to do with overlays. For now, we can probably
assume too few phandles to matter.
> 2. Although, I haven't observed collision but is it like every
> device_node
> is associated with unique phandle value ?
Yes, phandles must be unique.
>> In either case, one would still need to keep the code doing the
>> whole-tree traversal for handling large phandle values, but I think the
>> above should make lookup O(1) in most cases.
>
> I would refrain doing this because that will make this API inconsistent in
> terms
> of time taken by different nodes. I see that people do change their device
> placing in DT and that changes time taken in of_* APIs for them but
> affecting
> others.
Who cares. It's the total time that matters. It's obviously not a
problem until you have 1000s of lookups as no one cared until recently
(though you aren't the first with a hash table lookup).
>> Alternatively, one could just count the number of nodes with a phandle,
>> allocate an array of that many pointers (so the memory use is certainly
>> no more than if adding a pointer to each device_node), and sort it by
>> phandle, so one can do lookup using a binary search.
>>
>> Rasmus
>
> This is certainly doable if current approach is not welcomed due to
> addition on hlish_node in device_node.
I certainly prefer an out of band approach as that's easier to turn of
if we want to save memory.
Still, I'd like to see some data on a cache based approach and reasons
why that won't work.
Rob
On Fri, Jan 26, 2018 at 09:34:59AM -0600, Rob Herring wrote:
> On Fri, Jan 26, 2018 at 9:14 AM, Chintan Pandya <[email protected]> wrote:
> >
> >> I'm probably missing something obvious, but: Aren't phandles in practice
> >> small consecutive integers assigned by dtc? If so, why not just have a
> >> smallish static array mapping the small phandle values directly to
> >> device node, instead of adding a pointer to every struct device_node? Or
> >> one could determine the size of the array dynamically (largest seen
> >> phandle value, capping at something sensible, e.g. 1024).
>
> I do have some concerns that is a bit fragile and dependent on dtc's
> implementations. However, I guess we already kind of are with overlay
> phandles.
>
> > Haven't noticed this earlier !! If following is known or true, we can avoid
> > using hash-table and save per device_node hlish_node.
> >
> > 1. How to know max phandle value without traversing tree once? In my
> > case,
> > max is 1263.
>
> We already have to know it for overlays. Plus unflattening has to
> handle phandles specially already, so it would be easy to do there if
> we aren't already.
>
> Then the question what to do with overlays. For now, we can probably
> assume too few phandles to matter.
>
> > 2. Although, I haven't observed collision but is it like every
> > device_node
> > is associated with unique phandle value ?
>
> Yes, phandles must be unique.
>
> >> In either case, one would still need to keep the code doing the
> >> whole-tree traversal for handling large phandle values, but I think the
> >> above should make lookup O(1) in most cases.
> >
> > I would refrain doing this because that will make this API inconsistent in
> > terms
> > of time taken by different nodes. I see that people do change their device
> > placing in DT and that changes time taken in of_* APIs for them but
> > affecting
> > others.
>
> Who cares. It's the total time that matters. It's obviously not a
> problem until you have 1000s of lookups as no one cared until recently
> (though you aren't the first with a hash table lookup).
>
> >> Alternatively, one could just count the number of nodes with a phandle,
> >> allocate an array of that many pointers (so the memory use is certainly
> >> no more than if adding a pointer to each device_node), and sort it by
> >> phandle, so one can do lookup using a binary search.
> >>
> >> Rasmus
> >
> > This is certainly doable if current approach is not welcomed due to
> > addition on hlish_node in device_node.
>
> I certainly prefer an out of band approach as that's easier to turn of
> if we want to save memory.
>
> Still, I'd like to see some data on a cache based approach and reasons
> why that won't work.
I was curious, so I implemented it. It ends up being similar to Rasmus's
1st suggestion. The difference is we don't try to store all entries, but
rather implement a hash table that doesn't handle collisions. Relying on
the fact that phandles are just linearly allocated from 0, we just mask
the high bits of the phandle to get the index.
Using the unittest (modified to run twice), the time for 192 calls goes
from ~50us to ~30us. Can you try out on your setup and try different
array sizes.
8<----------------------------------------------------------------------
diff --git a/drivers/of/base.c b/drivers/of/base.c
index dd0b4201f1cc..cb88adf4f3db 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -986,6 +986,13 @@ int of_modalias_node(struct device_node *node, char *modalias, int len)
}
EXPORT_SYMBOL_GPL(of_modalias_node);
+static ktime_t elapsed;
+static int call_count;
+
+#define PHANDLE_CACHE_SZ 64
+#define PHANDLE_CACHE_HASH(x) ((x) & (PHANDLE_CACHE_SZ - 1))
+static struct device_node *phandle_cache[PHANDLE_CACHE_SZ];
+
/**
* of_find_node_by_phandle - Find a node given a phandle
* @handle: phandle of the node to find
@@ -997,16 +1004,30 @@ struct device_node *of_find_node_by_phandle(phandle handle)
{
struct device_node *np;
unsigned long flags;
+ ktime_t t1, t2;
if (!handle)
return NULL;
raw_spin_lock_irqsave(&devtree_lock, flags);
- for_each_of_allnodes(np)
- if (np->phandle == handle)
- break;
+ t1 = ktime_get();
+ np = phandle_cache[PHANDLE_CACHE_HASH(handle)];
+ if (!np || np->phandle != handle) {
+ for_each_of_allnodes(np)
+ if (np->phandle == handle) {
+ phandle_cache[PHANDLE_CACHE_HASH(handle)] = np;
+ break;
+ }
+
+ }
+
of_node_get(np);
+ t2 = ktime_get();
+ elapsed = ktime_add(elapsed, ktime_sub(t2, t1));
+ call_count++;
raw_spin_unlock_irqrestore(&devtree_lock, flags);
+ if (!(call_count % 32))
+ printk("of_find_node_by_phandle called %d times totalling %lld ns", call_count, elapsed);
return np;
}
EXPORT_SYMBOL(of_find_node_by_phandle);
Hi Chintan,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on robh/for-next]
[also build test ERROR on v4.15-rc9 next-20180126]
[cannot apply to glikely/devicetree/next]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Chintan-Pandya/of-use-hash-based-search-in-of_find_node_by_phandle/20180127-043814
base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
config: sparc64-defconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64
All errors (new ones prefixed by >>):
drivers/of/base.o: In function `of_find_node_by_phandle':
>> base.c:(.text+0x5ec): undefined reference to `dt_hash_spinlock'
base.c:(.text+0x5f4): undefined reference to `dt_hash_spinlock'
base.c:(.text+0x5fc): undefined reference to `dt_hash_table'
base.c:(.text+0x604): undefined reference to `dt_hash_table'
base.c:(.text+0x61c): undefined reference to `dt_hash_spinlock'
base.c:(.text+0x628): undefined reference to `dt_hash_spinlock'
base.c:(.text+0x64c): undefined reference to `dt_hash_spinlock'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Chintan,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on robh/for-next]
[also build test ERROR on v4.15-rc9 next-20180126]
[cannot apply to glikely/devicetree/next]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Chintan-Pandya/of-use-hash-based-search-in-of_find_node_by_phandle/20180127-043814
base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc
All errors (new ones prefixed by >>):
drivers/of/base.o: In function `of_find_node_by_phandle':
>> base.c:(.text+0x1ac): undefined reference to `dt_hash_table'
base.c:(.text+0x1b4): undefined reference to `dt_hash_table'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
> I was curious, so I implemented it. It ends up being similar to Rasmus's
> 1st suggestion. The difference is we don't try to store all entries, but
> rather implement a hash table that doesn't handle collisions. Relying on
> the fact that phandles are just linearly allocated from 0, we just mask
> the high bits of the phandle to get the index.
I think this is most resourceful way.
> Can you try out on your setup and try different
> array sizes.
Here are my test results. However, I simply considered overall boot time to
compare different scenarios because profiling of_find_node_by_phandle() in
early boot fails.
Scenarios:
[1] Cache size 1024 + early cache build up [Small change in your cache
patch,
see the patch below]
[2] Hash 64 approach[my original v2 patch]
[3] Cache size 64
[4] Cache size 128
[5] Cache size 256
[6] Base build
Result (boot to shell in sec):
[1] 14.292498 14.370994 14.313537 --> 850ms avg gain
[2] 14.340981 14.395900 14.398149 --> 800ms avg gain
[3] 14.546429 14.488783 14.468694 --> 680ms avg gain
[4] 14.506007 14.497487 14.523062 --> 670ms avg gain
[5] 14.671100 14.643344 14.731853 --> 500ms avg gain
[6] 15.263386 15.246155 15.041847 --> 0
Previously reported 400ms gain for [2] was from different set up. These
tests
and new data is from my own debug set up. When we take any of these patch to
production, result might deviate accordingly.
Chintan Pandya
Patch for the case [1]
<--------------------------------------------------------------------------->
diff --git a/drivers/of/base.c b/drivers/of/base.c
index a0bccb5..671e7cf 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1084,6 +1084,8 @@ int of_modalias_node(struct device_node *node,
char *modalias, int len)
}
EXPORT_SYMBOL_GPL(of_modalias_node);
+struct device_node *phandle_cache[PHANDLE_CACHE_SZ];
+
/**
* of_find_node_by_phandle - Find a node given a phandle
* @handle: phandle of the node to find
@@ -1100,9 +1102,15 @@ struct device_node
*of_find_node_by_phandle(phandle handle)
return NULL;
raw_spin_lock_irqsave(&devtree_lock, flags);
- for_each_of_allnodes(np)
- if (np->phandle == handle)
- break;
+ np = phandle_cache[PHANDLE_CACHE_HASH(handle)];
+ if (!np || np->phandle != handle) {
+ for_each_of_allnodes(np)
+ if (np->phandle == handle) {
+ phandle_cache[PHANDLE_CACHE_HASH(handle)] = np;
+ break;
+ }
+
+ }
of_node_get(np);
raw_spin_unlock_irqrestore(&devtree_lock, flags);
return np;
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c0914fb..e90a458 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -227,6 +227,9 @@ static void populate_properties(const void *blob,
pprev = &pp->next;
}
+ if (!dryrun && np->phandle)
+ phandle_cache[PHANDLE_CACHE_HASH(np->phandle)] = np;
+
/* With version 0x10 we may not have the name property,
* recreate it here from the unit name if absent
*/
diff --git a/include/linux/of.h b/include/linux/of.h
index 299aeb1..b1200be 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -92,6 +92,10 @@ struct of_phandle_iterator {
struct device_node *node;
};
+#define PHANDLE_CACHE_SZ 1024
+#define PHANDLE_CACHE_HASH(x) ((x) & (PHANDLE_CACHE_SZ - 1))
+extern struct device_node *phandle_cache[PHANDLE_CACHE_SZ];
+
struct of_reconfig_data {
struct device_node *dn;
struct property *prop;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On Mon, Jan 29, 2018 at 1:34 AM, Chintan Pandya <[email protected]> wrote:
>
>> I was curious, so I implemented it. It ends up being similar to Rasmus's
>> 1st suggestion. The difference is we don't try to store all entries, but
>> rather implement a hash table that doesn't handle collisions. Relying on
>> the fact that phandles are just linearly allocated from 0, we just mask
>> the high bits of the phandle to get the index.
>
> I think this is most resourceful way.
>>
>> Can you try out on your setup and try different
>> array sizes.
>
> Here are my test results. However, I simply considered overall boot time to
> compare different scenarios because profiling of_find_node_by_phandle() in
> early boot fails.
>
> Scenarios:
> [1] Cache size 1024 + early cache build up [Small change in your cache
> patch,
> see the patch below]
> [2] Hash 64 approach[my original v2 patch]
> [3] Cache size 64
> [4] Cache size 128
> [5] Cache size 256
> [6] Base build
>
> Result (boot to shell in sec):
> [1] 14.292498 14.370994 14.313537 --> 850ms avg gain
> [2] 14.340981 14.395900 14.398149 --> 800ms avg gain
> [3] 14.546429 14.488783 14.468694 --> 680ms avg gain
> [4] 14.506007 14.497487 14.523062 --> 670ms avg gain
> [5] 14.671100 14.643344 14.731853 --> 500ms avg gain
It's strange that bigger sizes are slower. Based on this data, I'd pick [3].
How many phandles do you have? I thought it was hundreds, so 1024
entries would be more than enough and you should see some curve to a
max gain as cache size approaches # of phandles.
Rob
>> Scenarios:
>> [1] Cache size 1024 + early cache build up [Small change in your cache
>> patch,
>> see the patch below]
>> [2] Hash 64 approach[my original v2 patch]
>> [3] Cache size 64
>> [4] Cache size 128
>> [5] Cache size 256
>> [6] Base build
>>
>> Result (boot to shell in sec):
>> [1] 14.292498 14.370994 14.313537 --> 850ms avg gain
>> [2] 14.340981 14.395900 14.398149 --> 800ms avg gain
>> [3] 14.546429 14.488783 14.468694 --> 680ms avg gain
>> [4] 14.506007 14.497487 14.523062 --> 670ms avg gain
>> [5] 14.671100 14.643344 14.731853 --> 500ms avg gain
> It's strange that bigger sizes are slower. Based on this data, I'd pick [3].
>
> How many phandles do you have? I thought it was hundreds, so 1024
> entries would be more than enough and you should see some curve to a
> max gain as cache size approaches # of phandles.
>
1063 phandles for my device. In one of the previous mails, I estimated it to
be few hundreds but I wastoo short of actual number. However, 1063 still
doesn't justify why [4] and [5] are notbetter than [3].
I would still be interested to find out a way to dynamically allocate array
with size near to total # of phandles with pre-stored mapping. And free this
array once done with it. But at present, no idea how will I achieve this. If
you can share any pointers around this, that would help !
Thanks,
Chintan Pandya
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Hi Chintan,
On 01/26/18 00:31, Chintan Pandya wrote:
> of_find_node_by_phandle() takes a lot of time (1ms per
> call) to find right node when your intended device is
> too deeper in the fdt. Reason is, we search for each
> device serially in the fdt. See this,
>
> struct device_node *__of_find_all_nodes(struct device_node *prev)
> {
> struct device_node *np;
> if (!prev) {
> np = of_root;
> } else if (prev->child) {
> np = prev->child;
> } else {
> /* Walk back up looking for a sibling, or the end of the structure */
> np = prev;
> while (np->parent && !np->sibling)
> np = np->parent;
> np = np->sibling; /* Might be null at the end of the tree */
> }
> return np;
> }
>
> #define for_each_of_allnodes_from(from, dn) \
> for (dn = __of_find_all_nodes(from); dn; dn = __of_find_all_nodes(dn))
> #define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)
>
> Implement, device-phandle relation in hash-table so
> that look up can be faster, irrespective of where my
> device is defined in the DT.
>
> There are ~6.7k calls to of_find_node_by_phandle() and
> total improvement observed during boot is 400ms.
>
> Signed-off-by: Chintan Pandya <[email protected]>
> ---
> drivers/of/base.c | 8 ++++++--
< snip >
I asked some questions in the version 1 thread and did not get
answers. I am copying the 3 questions here.
(1)
>>>
>> Please give me a pointer to the code that is doing
>> this search.
>>
>> -Frank
> You can refer include/linux/of.h
>
> #define for_each_of_allnodes_from(from, dn) \
> for (dn = __of_find_all_nodes(from); dn; dn = __of_find_all_nodes(dn))
> #define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)
>
> where __of_find_all_nodes() does
>
> struct device_node *__of_find_all_nodes(struct device_node *prev)
> {
> struct device_node *np;
> if (!prev) {
> np = of_root;
> } else if (prev->child) {
> np = prev->child;
> } else {
> /* Walk back up looking for a sibling, or the end of the structure */
> np = prev;
> while (np->parent && !np->sibling)
> np = np->parent;
> np = np->sibling; /* Might be null at the end of the tree */
> }
> return np;
> }
>
Let me restate my question.
Can you point me to the driver code that is invoking
the search?
(2)
And also the .dts devicetree source file that you are seeing
large overhead with.
(3) -- this one is less important, but if the info is easily
available to you
Sorry about dribbling out questions instead of all at once....
What is the hardware you are testing this on?
Processor?
Cache size?
Memory size?
Processor frequency?
Any other attribute of the system that will help me understand
the boot performance you are seeing?
Thanks,
Frank
> (1)
>
> Can you point me to the driver code that is invoking
> the search?
There are many locations. Few of them being,
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/of/irq.c?h=msm-4.9#n214
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/irqchip/irq-gic-v3.c?h=msm-4.9#n1107
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/clk/msm/msm-clock-controller.c?h=msm-4.9#n492
>
> (2)
>
> And also the .dts devicetree source file that you are seeing
> large overhead with.
SDM670 DTS tree starts here.
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/arch/arm64/boot/dts/qcom/sdm670.dtsi?h=msm-4.9
>
>
> (3) -- this one is less important, but if the info is easily
> available to you
>
> Sorry about dribbling out questions instead of all at once....
>
> What is the hardware you are testing this on?
SDM670
> Processor?
Kryo-300 Silver
> Cache size?
From DT,
L1 32KB (per CPU)
L2 128KB (per CPU)
L3 1MB (total)
> Memory size?
6GB
> Processor frequency?
Max 1.7GHz for core 0. Not sure about boot time frequency.
> Any other attribute of the system that will help me understand
> the boot performance you are seeing?
I'm not able to profile of_find_node_by_phandle specifically as timers are
not up by then. So, just observing overall boot time for comparison.
My recent results were taken on debug_defconfig which has many performance
slowing code. So, gap between base-build and w/ the test patches would be
more than the actual production build.
Thanks,
Chintan
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On 01/30/18 00:04, Chintan Pandya wrote:
>> (1)
>>
>> Can you point me to the driver code that is invoking
>> the search?
> There are many locations. Few of them being,
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/of/irq.c?h=msm-4.9#n214
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/irqchip/irq-gic-v3.c?h=msm-4.9#n1107
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/clk/msm/msm-clock-controller.c?h=msm-4.9#n492
>>
>> (2)
>>
>> And also the .dts devicetree source file that you are seeing
>> large overhead with.
> SDM670 DTS tree starts here.
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/arch/arm64/boot/dts/qcom/sdm670.dtsi?h=msm-4.9
Thanks, I'm starting to get a picture of what you are facing.
The file you point at is a .dtsi, not a .dts. There are quite a
few sdm670*.dts files in that tree. Which one corresponds to
the system you are working on?
I picked a random one (sdm670-cdp.dts), but I need to be looking
at the .dts that you are using to be able to ask reasonable
questions instead of poking around in the dark.
Also, when I clone that git tree there is not a valid HEAD. I
picked what looked like a reasonable commit to checkout, but I
would prefer to be looking at the same exact source that you
are working with. Can you give me the commit id of the branch
that you are working on?
-Frank
>>
>>
>> (3) -- this one is less important, but if the info is easily
>> available to you
>>
>> Sorry about dribbling out questions instead of all at once....
>>
>> What is the hardware you are testing this on?
> SDM670
>> Processor?
> Kryo-300 Silver
>> Cache size?
> From DT,
> L1 32KB (per CPU)
> L2 128KB (per CPU)
> L3 1MB (total)
>> Memory size?
> 6GB
>> Processor frequency?
> Max 1.7GHz for core 0. Not sure about boot time frequency.
>> Any other attribute of the system that will help me understand
>> the boot performance you are seeing?
> I'm not able to profile of_find_node_by_phandle specifically as timers are
> not up by then. So, just observing overall boot time for comparison.
>
> My recent results were taken on debug_defconfig which has many performance
> slowing code. So, gap between base-build and w/ the test patches would be
> more than the actual production build.
>
> Thanks,
> Chintan
>