2018-03-22 03:38:53

by Baoquan He

[permalink] [raw]
Subject: [PATCH 0/2] Kexec_file: Load kernel at top of system ram

The current kexec_file ignores kexec_buf.top_down value when call
arch_kexec_walk_mem() to allocate memory for loading kernel/initrd
stuffs. This is not supposed to be what kexec_buf.top_down is used
for.

In patch 0001, introduce a new function walk_system_ram_res_rev()
which is a variant of walk_system_ram_res(), walks through resources
of System RAM from top to down. And patch 0001 is picked from AKASHI's
patchset which adds arm64 kexec_file support. His next round of post
won't need walk_system_ram_res_rev any more, so I take it into this
patchset and use it in patch 0002.

In patch 0002, check kexec_buf.top_down in arch_kexec_walk_mem(),
if its value is 'true', call walk_system_ram_res_rev(). Otherwise
call walk_system_ram_res().

AKASHI Takahiro (1):
resource: add walk_system_ram_res_rev()

Baoquan He (1):
kexec_file: Load kernel at top of system RAM if required

include/linux/ioport.h | 3 +++
kernel/kexec_file.c | 2 ++
kernel/resource.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+)

--
2.13.6



2018-03-22 03:39:02

by Baoquan He

[permalink] [raw]
Subject: [PATCH 1/2] resource: add walk_system_ram_res_rev()

From: AKASHI Takahiro <[email protected]>

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file code.

Signed-off-by: AKASHI Takahiro <[email protected]>
Signed-off-by: Baoquan He <[email protected]>

---
include/linux/ioport.h | 3 +++
kernel/resource.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 66 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..f12d95fe038b 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -277,6 +277,9 @@ extern int
walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+ int (*func)(struct resource *, void *));
+extern int
walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
void *arg, int (*func)(struct resource *, void *));

diff --git a/kernel/resource.c b/kernel/resource.c
index e270b5048988..f456fc95f1b2 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,8 @@
#include <linux/pfn.h>
#include <linux/mm.h>
#include <linux/resource_ext.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
#include <asm/io.h>


@@ -470,6 +472,67 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
}

/*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+ int (*func)(struct resource *, void *))
+{
+ struct resource res, *rams;
+ int rams_size = 16, i;
+ int ret = -1;
+
+ /* create a list */
+ rams = vmalloc(sizeof(struct resource) * rams_size);
+ if (!rams)
+ return ret;
+
+ res.start = start;
+ res.end = end;
+ res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ i = 0;
+ while ((res.start < res.end) &&
+ (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
+ if (i >= rams_size) {
+ /* re-alloc */
+ struct resource *rams_new;
+ int rams_new_size;
+
+ rams_new_size = rams_size + 16;
+ rams_new = vmalloc(sizeof(struct resource)
+ * rams_new_size);
+ if (!rams_new)
+ goto out;
+
+ memcpy(rams_new, rams,
+ sizeof(struct resource) * rams_size);
+ vfree(rams);
+ rams = rams_new;
+ rams_size = rams_new_size;
+ }
+
+ rams[i].start = res.start;
+ rams[i++].end = res.end;
+
+ res.start = res.end + 1;
+ res.end = end;
+ }
+
+ /* go reverse */
+ for (i--; i >= 0; i--) {
+ ret = (*func)(&rams[i], arg);
+ if (ret)
+ break;
+ }
+
+out:
+ vfree(rams);
+ return ret;
+}
+
+/*
* This function calls the @func callback against all memory ranges, which
* are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
*/
--
2.13.6


2018-03-22 03:39:04

by Baoquan He

[permalink] [raw]
Subject: [PATCH 2/2] kexec_file: Load kernel at top of system RAM if required

For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
is used to load kernel/initrd/purgatory is supposed to be allocated from
top to down. This is also consistent with the old kexec loading interface.

However, the current arch_kexec_walk_mem() doesn't do like this. It ignores
checking kexec_buf.top_down, bug calls walk_system_ram_res() directly to go
through all resources of System RAM, to try to find memory region which can
contain the specific kexec buffer, then call locate_mem_hole_callback() to
allocate memory in that found memory region from top to down. This is not
right.

Here add checking if kexec_buf.top_down is 'true' in arch_kexec_walk_mem(),
if yes, call the newly added walk_system_ram_res_rev() to find memory region
from top to down to load kernel.

Signed-off-by: Baoquan He <[email protected]>
---
kernel/kexec_file.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 57ec39995b23..76e6307f8971 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -445,6 +445,8 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
crashk_res.start, crashk_res.end,
kbuf, func);
+ else if (kbuf->top_down)
+ return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
else
return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
}
--
2.13.6


2018-03-22 22:31:02

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On Thu, 22 Mar 2018 11:37:21 +0800 Baoquan He <[email protected]> wrote:

> From: AKASHI Takahiro <[email protected]>
>
> This function, being a variant of walk_system_ram_res() introduced in
> commit 8c86e70acead ("resource: provide new functions to walk through
> resources"), walks through a list of all the resources of System RAM
> in reversed order, i.e., from higher to lower.
>
> It will be used in kexec_file code.
>
> ...
>
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -23,6 +23,8 @@
> #include <linux/pfn.h>
> #include <linux/mm.h>
> #include <linux/resource_ext.h>
> +#include <linux/string.h>
> +#include <linux/vmalloc.h>
> #include <asm/io.h>
>
>
> @@ -470,6 +472,67 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> }
>
> /*
> + * This function, being a variant of walk_system_ram_res(), calls the @func
> + * callback against all memory ranges of type System RAM which are marked as
> + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
> + * higher to lower.
> + */

This should document the return value, as should walk_system_ram_res().
Why does it return -1 on error rather than an errno (ENOMEM)?

> +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(struct resource *, void *))
> +{
> + struct resource res, *rams;
> + int rams_size = 16, i;
> + int ret = -1;
> +
> + /* create a list */
> + rams = vmalloc(sizeof(struct resource) * rams_size);
> + if (!rams)
> + return ret;
> +
> + res.start = start;
> + res.end = end;
> + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> + i = 0;
> + while ((res.start < res.end) &&
> + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
> + if (i >= rams_size) {
> + /* re-alloc */
> + struct resource *rams_new;
> + int rams_new_size;
> +
> + rams_new_size = rams_size + 16;
> + rams_new = vmalloc(sizeof(struct resource)
> + * rams_new_size);
> + if (!rams_new)
> + goto out;
> +
> + memcpy(rams_new, rams,
> + sizeof(struct resource) * rams_size);
> + vfree(rams);
> + rams = rams_new;
> + rams_size = rams_new_size;
> + }
> +
> + rams[i].start = res.start;
> + rams[i++].end = res.end;
> +
> + res.start = res.end + 1;
> + res.end = end;
> + }
> +
> + /* go reverse */
> + for (i--; i >= 0; i--) {
> + ret = (*func)(&rams[i], arg);
> + if (ret)
> + break;
> + }

erk, this is pretty nasty. Isn't there a better way :(

> +out:
> + vfree(rams);
> + return ret;
> +}


2018-03-22 22:39:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/2] Kexec_file: Load kernel at top of system ram

On Thu, 22 Mar 2018 11:37:20 +0800 Baoquan He <[email protected]> wrote:

> The current kexec_file ignores kexec_buf.top_down value when call
> arch_kexec_walk_mem() to allocate memory for loading kernel/initrd
> stuffs. This is not supposed to be what kexec_buf.top_down is used
> for.

OK, but why is this a problem? When fixing a bug, please fully
describe the user-visible effects of that bug. That helps others
understand the importance of the fix, whether it should be backported
into various kernels, etc.



2018-03-23 01:02:00

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

Hi Andrew,

Thanks a lot for your reviewing!

On 03/22/18 at 03:29pm, Andrew Morton wrote:
> > /*
> > + * This function, being a variant of walk_system_ram_res(), calls the @func
> > + * callback against all memory ranges of type System RAM which are marked as
> > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
> > + * higher to lower.
> > + */
>
> This should document the return value, as should walk_system_ram_res().
> Why does it return -1 on error rather than an errno (ENOMEM)?

OK, will add sentences to tell this. So for walk_system_ram_res() only
'-1' indicates the failure of finding, '0' the success. While in
walk_system_ram_res_rev(), add '-ENOMEM' to indicate failure of vmalloc
allocation.

>
> > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> > + int (*func)(struct resource *, void *))
> > +{
> > + struct resource res, *rams;
> > + int rams_size = 16, i;
> > + int ret = -1;
> > +
> > + /* create a list */
> > + rams = vmalloc(sizeof(struct resource) * rams_size);
> > + if (!rams)
> > + return ret;
> > +
> > + res.start = start;
> > + res.end = end;
> > + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > + i = 0;
> > + while ((res.start < res.end) &&
> > + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
> > + if (i >= rams_size) {
> > + /* re-alloc */
> > + struct resource *rams_new;
> > + int rams_new_size;
> > +
> > + rams_new_size = rams_size + 16;
> > + rams_new = vmalloc(sizeof(struct resource)
> > + * rams_new_size);
> > + if (!rams_new)
> > + goto out;
> > +
> > + memcpy(rams_new, rams,
> > + sizeof(struct resource) * rams_size);
> > + vfree(rams);
> > + rams = rams_new;
> > + rams_size = rams_new_size;
> > + }
> > +
> > + rams[i].start = res.start;
> > + rams[i++].end = res.end;
> > +
> > + res.start = res.end + 1;
> > + res.end = end;
> > + }
> > +
> > + /* go reverse */
> > + for (i--; i >= 0; i--) {
> > + ret = (*func)(&rams[i], arg);
> > + if (ret)
> > + break;
> > + }
>
> erk, this is pretty nasty. Isn't there a better way :(

Yes, this is not efficient.

In struct resource{}, ->sibling list is a singly linked list. I ever
thought about changing it to doubly linked list, yet not very sure if
it will have effect since struct resource is a core data structure.
AKASHI's method is more acceptable, and currently only kexec has this
requirement.

>
> > +out:
> > + vfree(rams);
> > + return ret;
> > +}
>

2018-03-23 02:10:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <[email protected]> wrote:

> > erk, this is pretty nasty. Isn't there a better way :(
>
> Yes, this is not efficient.
>
> In struct resource{}, ->sibling list is a singly linked list. I ever
> thought about changing it to doubly linked list, yet not very sure if
> it will have effect since struct resource is a core data structure.

Switching to a list_head sounds OK. The only issue really is memory
consumption and surely we don't have tens of thousands of struct
resources floating about(?). Or if we do have a lot, the machine is
presumably huge (hope?).

> AKASHI's method is more acceptable, and currently only kexec has this
> requirement.

What method is that?

2018-03-23 03:11:41

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On 03/22/18 at 07:06pm, Andrew Morton wrote:
> On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <[email protected]> wrote:
>
> > > erk, this is pretty nasty. Isn't there a better way :(
> >
> > Yes, this is not efficient.
> >
> > In struct resource{}, ->sibling list is a singly linked list. I ever
> > thought about changing it to doubly linked list, yet not very sure if
> > it will have effect since struct resource is a core data structure.
>
> Switching to a list_head sounds OK. The only issue really is memory
> consumption and surely we don't have tens of thousands of struct
> resources floating about(?). Or if we do have a lot, the machine is
> presumably huge (hope?).

Yes. It doubles the memory consumption.

AFAIK, the biggest number of resrouces I heard of possibly is mentioned
in this user space kexec_tools commit. In this commit, Xunlei told on
SGI system with 64TB RAM, the array which we have been using to store
"System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that
case, we need extra 8Byte*2048=16KB at most. With my understanding, this
increase is system wide, since each resource instance only needs its own
list_head member, right?

commit 4a6d67d9e938a7accf128aff23f8ad4bda67f729
Author: Xunlei Pang <[email protected]>
Date: Thu Mar 23 19:16:59 2017 +0800

x86: Support large number of memory ranges

We got a problem on one SGI 64TB machine, the current kexec-tools
failed to work due to the insufficient ranges(MAX_MEMORY_RANGES)
allowed which is defined as 1024(less than the ranges on the machine).
The kcore header is insufficient due to the same reason as well.

To solve this, this patch simply doubles "MAX_MEMORY_RANGES" and
"KCORE_ELF_HEADERS_SIZE".

Signed-off-by: Xunlei Pang <[email protected]>
Tested-by: Frank Ramsay <[email protected]>
Signed-off-by: Simon Horman <[email protected]>

diff --git a/kexec/arch/i386/kexec-x86.h b/kexec/arch/i386/kexec-x86.h
index 33df352..51855f8 100644
--- a/kexec/arch/i386/kexec-x86.h
+++ b/kexec/arch/i386/kexec-x86.h
@@ -1,7 +1,7 @@
#ifndef KEXEC_X86_H
#define KEXEC_X86_H

-#define MAX_MEMORY_RANGES 1024
+#define MAX_MEMORY_RANGES 2048


>
> > AKASHI's method is more acceptable, and currently only kexec has this
> > requirement.
>
> What method is that?

I meant this patch is made by AKASHI, he posted a patchset to add
kexec_file support for arm64. Among those patches this one is used to
arm64 kernel at top of system RAM. Later they change a different way to
search memory region to load kernel, so he dropped this patch.


2018-03-23 08:40:23

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/2] Kexec_file: Load kernel at top of system ram

On 03/22/18 at 03:38pm, Andrew Morton wrote:
> On Thu, 22 Mar 2018 11:37:20 +0800 Baoquan He <[email protected]> wrote:
>
> > The current kexec_file ignores kexec_buf.top_down value when call
> > arch_kexec_walk_mem() to allocate memory for loading kernel/initrd
> > stuffs. This is not supposed to be what kexec_buf.top_down is used
> > for.
>
> OK, but why is this a problem? When fixing a bug, please fully
> describe the user-visible effects of that bug. That helps others
> understand the importance of the fix, whether it should be backported
> into various kernels, etc.

The problem is the current kexec file loading has different behaviour
with the old kexec loading. In kexec loading, user space kexec_tools
utility does most of job, it searches in /proc/iomem to try to find a
memory region from top to down to load kernel. Therefore with the kexec
loading, x86_64 bzImage kernel are all loaded at top of System RAM.
However, the kexec file loading just searches for available memory
region in iomem resource from bottom to top, then try to allocate from
top to down in that region. E.g on my testing system with 2G memory as
below, the kexec loading will put kernel near 0x000000013fffffff, while
kexec file loading will put kernel near 0x000000003ffddfff. There's no
bug reported yet, just we need consider it when take care of the kexec
collaboration with other kernel components like kaslr/hotplug etc, and
also the consistentency between the different kexec interface.

[Mar23 15:13] Linux version 4.16.0-rc3+ ([email protected]) (gcc version
[ +0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.16.0-rc3+ root=UUID=be8f8e3a-9
[ +0.000000] x86/fpu: x87 FPU will use FXSAVE
[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003ffddfff] usable
[ +0.000000] BIOS-e820: [mem 0x000000003ffde000-0x000000003fffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

I searched on internet and found the original patches posted for adding
bzImage 64 support into the old kexec loading, which is located in user
space kexec_tools utility made by Yinghai, and Vivek and hpa reviewed
patches. Still I didn't found out why kernel has to be put at top of
system RAM. I guess low memory are reserved for system usage mostly,
putting kexec kernel at top is safer and no need to exclude those resereved
regions by system or firmware which we may not find out all of them, but
not very sure about it.

Hi Yinghai, Vivek, hpa,

Do you know why we load kexec kernel at top of system RAM when do
x86_64 bzImage loading?

Thanks
Baoquan

2018-03-23 20:07:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On Fri, 23 Mar 2018 11:10:13 +0800 Baoquan He <[email protected]> wrote:

> On 03/22/18 at 07:06pm, Andrew Morton wrote:
> > On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <[email protected]> wrote:
> >
> > > > erk, this is pretty nasty. Isn't there a better way :(
> > >
> > > Yes, this is not efficient.
> > >
> > > In struct resource{}, ->sibling list is a singly linked list. I ever
> > > thought about changing it to doubly linked list, yet not very sure if
> > > it will have effect since struct resource is a core data structure.
> >
> > Switching to a list_head sounds OK. The only issue really is memory
> > consumption and surely we don't have tens of thousands of struct
> > resources floating about(?). Or if we do have a lot, the machine is
> > presumably huge (hope?).
>
> Yes. It doubles the memory consumption.
>
> AFAIK, the biggest number of resrouces I heard of possibly is mentioned
> in this user space kexec_tools commit. In this commit, Xunlei told on
> SGI system with 64TB RAM, the array which we have been using to store
> "System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that
> case, we need extra 8Byte*2048=16KB at most. With my understanding, this
> increase is system wide, since each resource instance only needs its own
> list_head member, right?

Yes. That sounds perfectly acceptable.

It would be interesting to see what this approach looks like, if you
have time to toss something together?


2018-03-24 13:35:17

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On 03/23/18 at 01:06pm, Andrew Morton wrote:
> On Fri, 23 Mar 2018 11:10:13 +0800 Baoquan He <[email protected]> wrote:
>
> > On 03/22/18 at 07:06pm, Andrew Morton wrote:
> > > On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <[email protected]> wrote:
> > >
> > > > > erk, this is pretty nasty. Isn't there a better way :(
> > > >
> > > > Yes, this is not efficient.
> > > >
> > > > In struct resource{}, ->sibling list is a singly linked list. I ever
> > > > thought about changing it to doubly linked list, yet not very sure if
> > > > it will have effect since struct resource is a core data structure.
> > >
> > > Switching to a list_head sounds OK. The only issue really is memory
> > > consumption and surely we don't have tens of thousands of struct
> > > resources floating about(?). Or if we do have a lot, the machine is
> > > presumably huge (hope?).
> >
> > Yes. It doubles the memory consumption.
> >
> > AFAIK, the biggest number of resrouces I heard of possibly is mentioned
> > in this user space kexec_tools commit. In this commit, Xunlei told on
> > SGI system with 64TB RAM, the array which we have been using to store
> > "System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that
> > case, we need extra 8Byte*2048=16KB at most. With my understanding, this
> > increase is system wide, since each resource instance only needs its own
> > list_head member, right?
>
> Yes. That sounds perfectly acceptable.
>
> It would be interesting to see what this approach looks like, if you
> have time to toss something together?

OK, will make patches for reviewing. Thanks!

2018-03-24 16:14:59

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On Sat, Mar 24, 2018 at 09:33:30PM +0800, Baoquan He wrote:
>>
>> Yes. That sounds perfectly acceptable.
>>
>> It would be interesting to see what this approach looks like, if you
>> have time to toss something together?
>
>OK, will make patches for reviewing. Thanks!

Hi, Baoquan, Andrew

I have come up with an implementation for top-down search the ram resources.
Hope this would meet your need.

From b36d50487f1d4e4d6a5103965a27101b3121e0ea Mon Sep 17 00:00:00 2001
From: Wei Yang <[email protected]>
Date: Sat, 24 Mar 2018 23:25:46 +0800
Subject: [PATCH] kernel/resource: add walk_system_ram_res_rev()

As discussed on https://patchwork.kernel.org/patch/10300819/, this patch
comes up with a variant implementation of walk_system_ram_res_rev(), which
uses iteration instead of allocating array to store those resources.

Signed-off-by: Wei Yang <[email protected]>
---
include/linux/ioport.h | 3 ++
kernel/resource.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 116 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..473f1d9cb97e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -277,6 +277,9 @@ extern int
walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+ int (*func)(struct resource *, void *));
+extern int
walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
void *arg, int (*func)(struct resource *, void *));

diff --git a/kernel/resource.c b/kernel/resource.c
index 769109f20fb7..ddf6b4c41498 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -73,6 +73,38 @@ static struct resource *next_resource(struct resource *p, bool sibling_only)
return p->sibling;
}

+static struct resource *prev_resource(struct resource *p, bool sibling_only)
+{
+ struct resource *prev;
+ if (NULL == iomem_resource.child)
+ return NULL;
+
+ if (p == NULL) {
+ prev = iomem_resource.child;
+ while (prev->sibling)
+ prev = prev->sibling;
+ } else {
+ if (p->parent->child == p) {
+ return p->parent;
+ }
+
+ for (prev = p->parent->child; prev->sibling != p;
+ prev = prev->sibling) {}
+ }
+
+ /* Caller wants to traverse through siblings only */
+ if (sibling_only)
+ return prev;
+
+ for (;prev->child;) {
+ prev = prev->child;
+
+ while (prev->sibling)
+ prev = prev->sibling;
+ }
+ return prev;
+}
+
static void *r_next(struct seq_file *m, void *v, loff_t *pos)
{
struct resource *p = v;
@@ -401,6 +433,47 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
return 0;
}

+/*
+ * Finds the highest iomem resource existing within [res->start.res->end).
+ * The caller must specify res->start, res->end, res->flags, and optionally
+ * desc. If found, returns 0, res is overwritten, if not found, returns -1.
+ * This function walks the whole tree and not just first level children until
+ * and unless first_level_children_only is true.
+ */
+static int find_prev_iomem_res(struct resource *res, unsigned long desc,
+ bool first_level_children_only)
+{
+ struct resource *p;
+
+ BUG_ON(!res);
+ BUG_ON(res->start >= res->end);
+
+ read_lock(&resource_lock);
+
+ for (p = prev_resource(NULL, first_level_children_only); p;
+ p = prev_resource(p, first_level_children_only)) {
+ if ((p->flags & res->flags) != res->flags)
+ continue;
+ if ((desc != IORES_DESC_NONE) && (desc != p->desc))
+ continue;
+ if (p->end < res->start) {
+ p = NULL;
+ break;
+ }
+ if ((p->end >= res->start) && (p->start < res->end))
+ break;
+ }
+
+ read_unlock(&resource_lock);
+ if (!p)
+ return -1;
+ /* copy data */
+ resource_clip(res, p->start, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ return 0;
+}
+
static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
bool first_level_children_only,
void *arg,
@@ -422,6 +495,27 @@ static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
return ret;
}

+static int __walk_iomem_res_rev_desc(struct resource *res, unsigned long desc,
+ bool first_level_children_only,
+ void *arg,
+ int (*func)(struct resource *, void *))
+{
+ u64 orig_start = res->start;
+ int ret = -1;
+
+ while ((res->start < res->end) &&
+ !find_prev_iomem_res(res, desc, first_level_children_only)) {
+ ret = (*func)(res, arg);
+ if (ret)
+ break;
+
+ res->end = res->start - 1;
+ res->start = orig_start;
+ }
+
+ return ret;
+}
+
/*
* Walks through iomem resources and calls func() with matching resource
* ranges. This walks through whole tree and not just first level children.
@@ -468,6 +562,25 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
arg, func);
}

+/*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+ int (*func)(struct resource *, void *))
+{
+ struct resource res;
+
+ res.start = start;
+ res.end = end;
+ res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ return __walk_iomem_res_rev_desc(&res, IORES_DESC_NONE, true,
+ arg, func);
+}
+
/*
* This function calls the @func callback against all memory ranges, which
* are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
--
2.15.1


--
Wei Yang
Help you, Help me

2018-03-26 14:31:46

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

Hi Wei Yang,

On 03/25/18 at 12:13am, Wei Yang wrote:
> On Sat, Mar 24, 2018 at 09:33:30PM +0800, Baoquan He wrote:
> >>
> >> Yes. That sounds perfectly acceptable.
> >>
> >> It would be interesting to see what this approach looks like, if you
> >> have time to toss something together?
> >
> >OK, will make patches for reviewing. Thanks!
>
> Hi, Baoquan, Andrew
>
> I have come up with an implementation for top-down search the ram resources.
> Hope this would meet your need.

Thanks for telling and your effort. Glad to know
I am not the only buyer of walk_system_ram_res_rev. I am fine with other
ways to make it, people can compare them and know which one is better.
I am working to use the list_head instead, the doubly linked list way
as Andrew suggested. Andrew and other people can help make a choice. It
won't be long.

Thanks
Baoquan

>
> From b36d50487f1d4e4d6a5103965a27101b3121e0ea Mon Sep 17 00:00:00 2001
> From: Wei Yang <[email protected]>
> Date: Sat, 24 Mar 2018 23:25:46 +0800
> Subject: [PATCH] kernel/resource: add walk_system_ram_res_rev()
>
> As discussed on https://patchwork.kernel.org/patch/10300819/, this patch
> comes up with a variant implementation of walk_system_ram_res_rev(), which
> uses iteration instead of allocating array to store those resources.
>
> Signed-off-by: Wei Yang <[email protected]>
> ---
> include/linux/ioport.h | 3 ++
> kernel/resource.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 116 insertions(+)
>
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..473f1d9cb97e 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -277,6 +277,9 @@ extern int
> walk_system_ram_res(u64 start, u64 end, void *arg,
> int (*func)(struct resource *, void *));
> extern int
> +walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(struct resource *, void *));
> +extern int
> walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
> void *arg, int (*func)(struct resource *, void *));
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 769109f20fb7..ddf6b4c41498 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -73,6 +73,38 @@ static struct resource *next_resource(struct resource *p, bool sibling_only)
> return p->sibling;
> }
>
> +static struct resource *prev_resource(struct resource *p, bool sibling_only)
> +{
> + struct resource *prev;
> + if (NULL == iomem_resource.child)
> + return NULL;
> +
> + if (p == NULL) {
> + prev = iomem_resource.child;
> + while (prev->sibling)
> + prev = prev->sibling;
> + } else {
> + if (p->parent->child == p) {
> + return p->parent;
> + }
> +
> + for (prev = p->parent->child; prev->sibling != p;
> + prev = prev->sibling) {}
> + }
> +
> + /* Caller wants to traverse through siblings only */
> + if (sibling_only)
> + return prev;
> +
> + for (;prev->child;) {
> + prev = prev->child;
> +
> + while (prev->sibling)
> + prev = prev->sibling;
> + }
> + return prev;
> +}
> +
> static void *r_next(struct seq_file *m, void *v, loff_t *pos)
> {
> struct resource *p = v;
> @@ -401,6 +433,47 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
> return 0;
> }
>
> +/*
> + * Finds the highest iomem resource existing within [res->start.res->end).
> + * The caller must specify res->start, res->end, res->flags, and optionally
> + * desc. If found, returns 0, res is overwritten, if not found, returns -1.
> + * This function walks the whole tree and not just first level children until
> + * and unless first_level_children_only is true.
> + */
> +static int find_prev_iomem_res(struct resource *res, unsigned long desc,
> + bool first_level_children_only)
> +{
> + struct resource *p;
> +
> + BUG_ON(!res);
> + BUG_ON(res->start >= res->end);
> +
> + read_lock(&resource_lock);
> +
> + for (p = prev_resource(NULL, first_level_children_only); p;
> + p = prev_resource(p, first_level_children_only)) {
> + if ((p->flags & res->flags) != res->flags)
> + continue;
> + if ((desc != IORES_DESC_NONE) && (desc != p->desc))
> + continue;
> + if (p->end < res->start) {
> + p = NULL;
> + break;
> + }
> + if ((p->end >= res->start) && (p->start < res->end))
> + break;
> + }
> +
> + read_unlock(&resource_lock);
> + if (!p)
> + return -1;
> + /* copy data */
> + resource_clip(res, p->start, p->end);
> + res->flags = p->flags;
> + res->desc = p->desc;
> + return 0;
> +}
> +
> static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
> bool first_level_children_only,
> void *arg,
> @@ -422,6 +495,27 @@ static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
> return ret;
> }
>
> +static int __walk_iomem_res_rev_desc(struct resource *res, unsigned long desc,
> + bool first_level_children_only,
> + void *arg,
> + int (*func)(struct resource *, void *))
> +{
> + u64 orig_start = res->start;
> + int ret = -1;
> +
> + while ((res->start < res->end) &&
> + !find_prev_iomem_res(res, desc, first_level_children_only)) {
> + ret = (*func)(res, arg);
> + if (ret)
> + break;
> +
> + res->end = res->start - 1;
> + res->start = orig_start;
> + }
> +
> + return ret;
> +}
> +
> /*
> * Walks through iomem resources and calls func() with matching resource
> * ranges. This walks through whole tree and not just first level children.
> @@ -468,6 +562,25 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> arg, func);
> }
>
> +/*
> + * This function, being a variant of walk_system_ram_res(), calls the @func
> + * callback against all memory ranges of type System RAM which are marked as
> + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
> + * higher to lower.
> + */
> +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(struct resource *, void *))
> +{
> + struct resource res;
> +
> + res.start = start;
> + res.end = end;
> + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +
> + return __walk_iomem_res_rev_desc(&res, IORES_DESC_NONE, true,
> + arg, func);
> +}
> +
> /*
> * This function calls the @func callback against all memory ranges, which
> * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
> --
> 2.15.1
>
>
> --
> Wei Yang
> Help you, Help me

2018-03-26 15:07:11

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

On Mon, Mar 26, 2018 at 10:30:16PM +0800, Baoquan He wrote:
>Hi Wei Yang,
>
>On 03/25/18 at 12:13am, Wei Yang wrote:
>> On Sat, Mar 24, 2018 at 09:33:30PM +0800, Baoquan He wrote:
>> >>
>> >> Yes. That sounds perfectly acceptable.
>> >>
>> >> It would be interesting to see what this approach looks like, if you
>> >> have time to toss something together?
>> >
>> >OK, will make patches for reviewing. Thanks!
>>
>> Hi, Baoquan, Andrew
>>
>> I have come up with an implementation for top-down search the ram resources.
>> Hope this would meet your need.
>
>Thanks for telling and your effort. Glad to know
>I am not the only buyer of walk_system_ram_res_rev. I am fine with other
>ways to make it, people can compare them and know which one is better.
>I am working to use the list_head instead, the doubly linked list way
>as Andrew suggested. Andrew and other people can help make a choice. It
>won't be long.
>

Sure,

Look forward your approach.

>Thanks
>Baoquan
>

--
Wei Yang
Help you, Help me