2021-08-31 21:50:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 1/3] kernel/resource: clean up and optimize iomem_is_exclusive()

We end up traversing subtrees of ranges we are not interested in; let's
optimize this case, skipping such subtrees, cleaning up the function a bit.

Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index ca9f5198a01f..2999f57da38c 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -73,6 +73,18 @@ static struct resource *next_resource(struct resource *p)
return p->sibling;
}

+static struct resource *next_resource_skip_children(struct resource *p)
+{
+ while (!p->sibling && p->parent)
+ p = p->parent;
+ return p->sibling;
+}
+
+#define for_each_resource(_root, _p, _skip_children) \
+ for ((_p) = (_root)->child; (_p); \
+ (_p) = (_skip_children) ? next_resource_skip_children(_p) : \
+ next_resource(_p))
+
static void *r_next(struct seq_file *m, void *v, loff_t *pos)
{
struct resource *p = v;
@@ -1712,10 +1724,9 @@ static int strict_iomem_checks;
*/
bool iomem_is_exclusive(u64 addr)
{
- struct resource *p = &iomem_resource;
- bool err = false;
- loff_t l;
+ bool skip_children = false, err = false;
int size = PAGE_SIZE;
+ struct resource *p;

if (!strict_iomem_checks)
return false;
@@ -1723,15 +1734,19 @@ bool iomem_is_exclusive(u64 addr)
addr = addr & PAGE_MASK;

read_lock(&resource_lock);
- for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+ for_each_resource(&iomem_resource, p, skip_children) {
/*
* We can probably skip the resources without
* IORESOURCE_IO attribute?
*/
if (p->start >= addr + size)
break;
- if (p->end < addr)
+ if (p->end < addr) {
+ skip_children = true;
continue;
+ }
+ skip_children = false;
+
/*
* A resource is exclusive if IORESOURCE_EXCLUSIVE is set
* or CONFIG_IO_STRICT_DEVMEM is enabled and the
--
2.31.1


2021-09-01 22:37:44

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] kernel/resource: clean up and optimize iomem_is_exclusive()

On Tue, 2021-08-31 at 22:21 +0200, David Hildenbrand wrote:
> We end up traversing subtrees of ranges we are not interested in; let's
> optimize this case, skipping such subtrees, cleaning up the function a bit.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
>  kernel/resource.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)

That diffstat does not come across as "cleanup", and the skip_children
flag changing values mid-iteration feels tricky. Is there a win here,
the same number of entries still need to be accessed, right?

BTW, I had to pull this from lore to reply to it, looks like the
intended Cc's were missing?

2021-09-02 09:59:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] kernel/resource: clean up and optimize iomem_is_exclusive()

On 01.09.21 21:43, Williams, Dan J wrote:
> On Tue, 2021-08-31 at 22:21 +0200, David Hildenbrand wrote:
>> We end up traversing subtrees of ranges we are not interested in; let's
>> optimize this case, skipping such subtrees, cleaning up the function a bit.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>>  kernel/resource.c | 25 ++++++++++++++++++++-----
>>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> That diffstat does not come across as "cleanup", and the skip_children
> flag changing values mid-iteration feels tricky. Is there a win here,
> the same number of entries still need to be accessed, right?

Right, most of the patch changes falls under "optimize". The cleanup is
using for_each_resource() and not using r_next(NULL, p, &l). Sure, I
could have split this up but then I'd just introduce for_each_resource()
to modify it immediately again.


Let's take a look at /proc/iomem on my notebook:

00000000-00000fff : Reserved
00001000-00057fff : System RAM
00058000-00058fff : Reserved
00059000-0009cfff : System RAM
0009d000-000fffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c3fff : PCI Bus 0000:00
000c4000-000c7fff : PCI Bus 0000:00
000c8000-000cbfff : PCI Bus 0000:00
000cc000-000cffff : PCI Bus 0000:00
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000dffff : PCI Bus 0000:00
000e0000-000e3fff : PCI Bus 0000:00
000e4000-000e7fff : PCI Bus 0000:00
000e8000-000ebfff : PCI Bus 0000:00
000ec000-000effff : PCI Bus 0000:00
000f0000-000fffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-3fffffff : System RAM
40000000-403fffff : Reserved
40000000-403fffff : pnp 00:00
40400000-80a79fff : System RAM
...

Why should we take a look at any children of "0009d000-000fffff :
Reserved" if we can just skip these 15 items directly because the parent
range is not of interest?


It gets even worse for some PCI buses:

8f800000-f7ffffff : PCI Bus 0000:00
8f800000-8f9fffff : PCI Bus 0000:01
8fa00000-8fbfffff : PCI Bus 0000:01
90000000-b1ffffff : PCI Bus 0000:04
90000000-b1ffffff : PCI Bus 0000:05
90000000-b1ffffff : PCI Bus 0000:07
90000000-b1ffffff : PCI Bus 0000:08
90000000-b1ffffff : PCI Bus 0000:0d
c0000000-cfffffff : 0000:00:02.0
d0000000-e60fffff : PCI Bus 0000:04
d0000000-e60fffff : PCI Bus 0000:05
d0000000-e5efffff : PCI Bus 0000:07
d0000000-e5efffff : PCI Bus 0000:08
d0000000-d00fffff : PCI Bus 0000:09
d0000000-d000ffff : 0000:09:00.0
d0000000-d000ffff : xhci-hcd
d0010000-d0010fff : 0000:09:00.0
d0011000-d0011fff : 0000:09:00.0
d0100000-d01fffff : PCI Bus 0000:0b
d0100000-d010ffff : 0000:0b:00.0
d0100000-d010ffff : xhci-hcd
d0110000-d0110fff : 0000:0b:00.0
d0111000-d0111fff : 0000:0b:00.0
d0200000-e5efffff : PCI Bus 0000:0d
e5f00000-e5ffffff : PCI Bus 0000:3c
e5f00000-e5f0ffff : 0000:3c:00.0
e5f00000-e5f0ffff : xhci-hcd
e6000000-e60fffff : PCI Bus 0000:06
e6000000-e603ffff : 0000:06:00.0
e6000000-e603ffff : thunderbolt
e6040000-e6040fff : 0000:06:00.0
e7000000-e7ffffff : 0000:00:02.0
e8000000-e80fffff : PCI Bus 0000:3e
e8000000-e8003fff : 0000:3e:00.0
e8000000-e8003fff : nvme
e8100000-e81fffff : PCI Bus 0000:3d
e8100000-e8101fff : 0000:3d:00.0
e8100000-e8101fff : iwlwifi
e8200000-e821ffff : 0000:00:1f.6
e8200000-e821ffff : e1000e
e8220000-e822ffff : 0000:00:14.0
e8220000-e822ffff : xhci-hcd
e8228070-e822846f : intel_xhci_usb_sw
e8230000-e823ffff : 0000:00:1f.3
e8230000-e823ffff : ICH HD audio
e8240000-e8247fff : 0000:00:04.0
e8240000-e8247fff : proc_thermal
e8248000-e824bfff : 0000:00:1f.3
e8248000-e824bfff : ICH HD audio
e824c000-e824ffff : 0000:00:1f.2
e8250000-e8250fff : 0000:00:08.0
e8251000-e8251fff : 0000:00:14.2
e8251000-e8251fff : Intel PCH thermal driver
e8252000-e8252fff : 0000:00:15.0
e8252000-e82521ff : lpss_dev
e8252000-e82521ff : i2c_designware.0 lpss_dev
e8252200-e82522ff : lpss_priv
e8252800-e8252fff : idma64.0
e8252800-e8252fff : idma64.0 idma64.0
e8253000-e8253fff : 0000:00:16.0
e8253000-e8253fff : mei_me
e8254000-e82540ff : 0000:00:1f.4
f7fe0000-f7ffffff : pnp 00:08
f7fe0000-f7ffffff : pnp 00:0a

I didn't count how many entries these are, but it's certainly more
entries in that subtree than I have directly under the root, meaning in
my setup we end up looking at at least 50% less entries (actually, much
more).

>
> BTW, I had to pull this from lore to reply to it, looks like the
> intended Cc's were missing?

Yes, I messed up this time, sorry -- I forgot "--cover-cc" w ... I will
resend the patches so everybody has them without going trough extra trouble.


--
Thanks,

David / dhildenb

2021-09-03 00:33:09

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] kernel/resource: clean up and optimize iomem_is_exclusive()

On Thu, Sep 2, 2021 at 12:52 AM David Hildenbrand <[email protected]> wrote:
>
> On 01.09.21 21:43, Williams, Dan J wrote:
> > On Tue, 2021-08-31 at 22:21 +0200, David Hildenbrand wrote:
> >> We end up traversing subtrees of ranges we are not interested in; let's
> >> optimize this case, skipping such subtrees, cleaning up the function a bit.
> >>
> >> Signed-off-by: David Hildenbrand <[email protected]>
> >> ---
> >> kernel/resource.c | 25 ++++++++++++++++++++-----
> >> 1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > That diffstat does not come across as "cleanup", and the skip_children
> > flag changing values mid-iteration feels tricky. Is there a win here,
> > the same number of entries still need to be accessed, right?
>
> Right, most of the patch changes falls under "optimize". The cleanup is
> using for_each_resource() and not using r_next(NULL, p, &l). Sure, I
> could have split this up but then I'd just introduce for_each_resource()
> to modify it immediately again.
>
>
> Let's take a look at /proc/iomem on my notebook:
>
> 00000000-00000fff : Reserved
> 00001000-00057fff : System RAM
> 00058000-00058fff : Reserved
> 00059000-0009cfff : System RAM
> 0009d000-000fffff : Reserved
> 000a0000-000bffff : PCI Bus 0000:00
> 000c0000-000c3fff : PCI Bus 0000:00
> 000c4000-000c7fff : PCI Bus 0000:00
> 000c8000-000cbfff : PCI Bus 0000:00
> 000cc000-000cffff : PCI Bus 0000:00
> 000d0000-000d3fff : PCI Bus 0000:00
> 000d4000-000d7fff : PCI Bus 0000:00
> 000d8000-000dbfff : PCI Bus 0000:00
> 000dc000-000dffff : PCI Bus 0000:00
> 000e0000-000e3fff : PCI Bus 0000:00
> 000e4000-000e7fff : PCI Bus 0000:00
> 000e8000-000ebfff : PCI Bus 0000:00
> 000ec000-000effff : PCI Bus 0000:00
> 000f0000-000fffff : PCI Bus 0000:00
> 000f0000-000fffff : System ROM
> 00100000-3fffffff : System RAM
> 40000000-403fffff : Reserved
> 40000000-403fffff : pnp 00:00
> 40400000-80a79fff : System RAM
> ...
>
> Why should we take a look at any children of "0009d000-000fffff :
> Reserved" if we can just skip these 15 items directly because the parent
> range is not of interest?

Oh I misread, it never loads the child entries into cache, so it's a
true skip and not a continue.

You can add:

Reviewed-by: Dan Williams <[email protected]>

...I was going to say it should be named for_each_top_resource(), but
we can cross that bridge when / if something needs an iterator that
includes children.