2021-03-22 16:04:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/3] kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree

Playing with kdump+virtio-mem I noticed that kexec_file_load() does not
consider System RAM added via dax/kmem and virtio-mem when preparing the
elf header for kdump. Looking into the details, the logic used in
walk_system_ram_res() and walk_mem_res() seems to be outdated.

walk_system_ram_range() already does the right thing, let's change
walk_system_ram_res() and walk_mem_res(), and clean up.

Loading a kdump kernel via "kexec -p -s" ... will result in the kdump
kernel to also dump dax/kmem and virtio-mem added System RAM now.

Note: kexec-tools on x86-64 also have to be updated to consider this
memory in the kexec_load() case when processing /proc/iomem.

Against next-20210322.

David Hildenbrand (3):
kernel/resource: make walk_system_ram_res() find all busy
IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM
resources
kernel/resource: remove first_lvl / siblings_only logic

kernel/resource.c | 45 ++++++++++++---------------------------------
1 file changed, 12 insertions(+), 33 deletions(-)

--
2.29.2


2021-03-22 16:04:37

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

It used to be true that we can have busy system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
lower levels.

We have two users of walk_system_ram_res(), which currently only
consideres the first level:
a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
locate_mem_hole_callback(), so even after this change, we won't be
placing kexec images onto dax/kmem and virtio-mem added memory. No
change.
b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
not adding relevant ranges to the crash elf info, resulting in them
not getting dumped via kdump.

This change fixes loading a crashkernel via kexec_file_load() and including
dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
that e.g,, arm64 relies on memblock data and, therefore, always considers
all added System RAM already.

Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
behave like walk_system_ram_range().

Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..4efd6e912279 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;

- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
arg, func);
}

--
2.29.2

2021-03-22 16:04:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/3] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources

It used to be true that we can have system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., dax/kmem and virtio-mem).

The function walk_mem_res() only consideres the first level and is
used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
fail to identify System RAM added by dax/kmem and virtio-mem as
"IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
"normal RAM" in __ioremap_caller().

Let's find all busy IORESOURCE_MEM resources, making the function
behave similar to walk_system_ram_res().

Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 4efd6e912279..16e0c7e8ed24 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -470,7 +470,7 @@ int walk_mem_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;

- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
arg, func);
}

--
2.29.2

2021-03-22 16:04:47

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 3/3] kernel/resource: remove first_lvl / siblings_only logic

All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
whole resource tree, not just the first level. Let's drop the unused
first_lvl / siblings_only logic.

All functions properly search the whole tree, so remove documentation
that indicates that some functions behave differently.

Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 45 ++++++++++++---------------------------------
1 file changed, 12 insertions(+), 33 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 16e0c7e8ed24..7e00239a023a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
static struct resource *bootmem_resource_free;
static DEFINE_SPINLOCK(bootmem_resource_lock);

-static struct resource *next_resource(struct resource *p, bool sibling_only)
+static struct resource *next_resource(struct resource *p)
{
- /* Caller wants to traverse through siblings only */
- if (sibling_only)
- return p->sibling;
-
if (p->child)
return p->child;
while (!p->sibling && p->parent)
@@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
{
struct resource *p = v;
(*pos)++;
- return (void *)next_resource(p, false);
+ return (void *)next_resource(p);
}

#ifdef CONFIG_PROC_FS
@@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
* of the resource that's within [@start..@end]; if none is found, returns
* -ENODEV. Returns -EINVAL for invalid parameters.
*
- * This function walks the whole tree and not just first level children
- * unless @first_lvl is true.
- *
* @start: start address of the resource searched for
* @end: end address of same resource
* @flags: flags which the resource must have
* @desc: descriptor the resource must have
- * @first_lvl: walk only the first level children, if set
* @res: return ptr, if resource found
*
* The caller must specify @start, @end, @flags, and @desc
@@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
*/
static int find_next_iomem_res(resource_size_t start, resource_size_t end,
unsigned long flags, unsigned long desc,
- bool first_lvl, struct resource *res)
+ struct resource *res)
{
- bool siblings_only = true;
struct resource *p;

if (!res)
@@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,

read_lock(&resource_lock);

- for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
+ for (p = iomem_resource.child; p; p = next_resource(p)) {
/* If we passed the resource we are looking for, stop */
if (p->start > end) {
p = NULL;
@@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
if (p->end < start)
continue;

- /*
- * Now that we found a range that matches what we look for,
- * check the flags and the descriptor. If we were not asked to
- * use only the first level, start looking at children as well.
- */
- siblings_only = first_lvl;
-
if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
@@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,

static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
unsigned long flags, unsigned long desc,
- bool first_lvl, void *arg,
+ void *arg,
int (*func)(struct resource *, void *))
{
struct resource res;
int ret = -EINVAL;

while (start < end &&
- !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
+ !find_next_iomem_res(start, end, flags, desc, &res)) {
ret = (*func)(&res, arg);
if (ret)
break;
@@ -431,7 +415,6 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
* @arg: function argument for the callback @func
* @func: callback function that is called for each qualifying resource area
*
- * This walks through whole tree and not just first level children.
* All the memory ranges which overlap start,end and also match flags and
* desc are valid candidates.
*
@@ -441,7 +424,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
u64 end, void *arg, int (*func)(struct resource *, void *))
{
- return __walk_iomem_res_desc(start, end, flags, desc, false, arg, func);
+ return __walk_iomem_res_desc(start, end, flags, desc, arg, func);
}
EXPORT_SYMBOL_GPL(walk_iomem_res_desc);

@@ -457,8 +440,8 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;

- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
- arg, func);
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
+ func);
}

/*
@@ -470,17 +453,14 @@ int walk_mem_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;

- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
- arg, func);
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
+ func);
}

/*
* This function calls the @func callback against all memory ranges of type
* System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
* It is to be used only for System RAM.
- *
- * This will find System RAM ranges that are children of top-level resources
- * in addition to top-level System RAM resources.
*/
int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *))
@@ -495,8 +475,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
while (start < end &&
- !find_next_iomem_res(start, end, flags, IORES_DESC_NONE,
- false, &res)) {
+ !find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res)) {
pfn = PFN_UP(res.start);
end_pfn = PFN_DOWN(res.end + 1);
if (end_pfn > pfn)
--
2.29.2

2021-03-22 16:13:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources

On Mon, Mar 22, 2021 at 9:02 AM David Hildenbrand <[email protected]> wrote:
>
> It used to be true that we can have system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., dax/kmem and virtio-mem).
>
> The function walk_mem_res() only consideres the first level and is
> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
> fail to identify System RAM added by dax/kmem and virtio-mem as
> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
> "normal RAM" in __ioremap_caller().
>
> Let's find all busy IORESOURCE_MEM resources, making the function
> behave similar to walk_system_ram_res().

Looks good,

Reviewed-by: Dan Williams <[email protected]>

2021-03-22 16:15:16

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On Mon, Mar 22, 2021 at 9:02 AM David Hildenbrand <[email protected]> wrote:
>
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);

Looks good,

Reviewed-by: Dan Williams <[email protected]>

2021-03-22 16:16:27

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] kernel/resource: remove first_lvl / siblings_only logic

On Mon, Mar 22, 2021 at 9:03 AM David Hildenbrand <[email protected]> wrote:
>
> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
> whole resource tree, not just the first level. Let's drop the unused
> first_lvl / siblings_only logic.
>
> All functions properly search the whole tree, so remove documentation
> that indicates that some functions behave differently.

Looks good, and the staging of the potential regressions as standalone
lead-in patches makes sense.

Reviewed-by: Dan Williams <[email protected]>

2021-03-23 09:42:38

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On 22.03.21 17:01, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>

^ My copy-paste action when creating the cc list slipped in a duplicate
SO in all 3 patches. I can resend if desired.

--
Thanks,

David / dhildenb

2021-03-23 11:09:14

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree

On Mon, Mar 22, 2021 at 05:01:57PM +0100, David Hildenbrand wrote:
> Playing with kdump+virtio-mem I noticed that kexec_file_load() does not
> consider System RAM added via dax/kmem and virtio-mem when preparing the
> elf header for kdump. Looking into the details, the logic used in
> walk_system_ram_res() and walk_mem_res() seems to be outdated.
>
> walk_system_ram_range() already does the right thing, let's change
> walk_system_ram_res() and walk_mem_res(), and clean up.
>
> Loading a kdump kernel via "kexec -p -s" ... will result in the kdump
> kernel to also dump dax/kmem and virtio-mem added System RAM now.
>
> Note: kexec-tools on x86-64 also have to be updated to consider this
> memory in the kexec_load() case when processing /proc/iomem.


> Against next-20210322.

Hint: `git format-patch --base ...`

>
> David Hildenbrand (3):
> kernel/resource: make walk_system_ram_res() find all busy
> IORESOURCE_SYSTEM_RAM resources
> kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM
> resources
> kernel/resource: remove first_lvl / siblings_only logic
>
> kernel/resource.c | 45 ++++++++++++---------------------------------
> 1 file changed, 12 insertions(+), 33 deletions(-)
>
> --
> 2.29.2
>

--
With Best Regards,
Andy Shevchenko


2021-03-23 11:11:18

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On Tue, Mar 23, 2021 at 10:40:33AM +0100, David Hildenbrand wrote:
> On 22.03.21 17:01, David Hildenbrand wrote:
> > It used to be true that we can have busy system RAM only on the first level
> > in the resourc tree. However, this is no longer holds for driver-managed
> > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> > lower levels.
> >
> > We have two users of walk_system_ram_res(), which currently only
> > consideres the first level:
> > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> > locate_mem_hole_callback(), so even after this change, we won't be
> > placing kexec images onto dax/kmem and virtio-mem added memory. No
> > change.
> > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> > not adding relevant ranges to the crash elf info, resulting in them
> > not getting dumped via kdump.
> >
> > This change fixes loading a crashkernel via kexec_file_load() and including
> > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> > that e.g,, arm64 relies on memblock data and, therefore, always considers
> > all added System RAM already.
> >
> > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> > behave like walk_system_ram_range().
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: Greg Kroah-Hartman <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: Andy Shevchenko <[email protected]>
> > Cc: Mauro Carvalho Chehab <[email protected]>
> > Cc: Signed-off-by: David Hildenbrand <[email protected]>
>
> ^ My copy-paste action when creating the cc list slipped in a duplicate SO
> in all 3 patches. I can resend if desired.

I think to address my comments you will need to resend anyway (as v2).

--
With Best Regards,
Andy Shevchenko


2021-03-23 11:12:49

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including

"...fixes..." effectively means to me that Fixes tag should be provided.

> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
> }
>
> --
> 2.29.2
>

--
With Best Regards,
Andy Shevchenko


2021-03-23 11:13:25

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources

On Mon, Mar 22, 2021 at 05:01:59PM +0100, David Hildenbrand wrote:
> It used to be true that we can have system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., dax/kmem and virtio-mem).
>
> The function walk_mem_res() only consideres the first level and is
> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
> fail to identify System RAM added by dax/kmem and virtio-mem as
> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
> "normal RAM" in __ioremap_caller().

Here I dunno, but consider to add Fixes tag if it fixes known bad behaviour.

> Let's find all busy IORESOURCE_MEM resources, making the function
> behave similar to walk_system_ram_res().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 4efd6e912279..16e0c7e8ed24 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -470,7 +470,7 @@ int walk_mem_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
> }
>
> --
> 2.29.2
>

--
With Best Regards,
Andy Shevchenko


2021-03-23 11:14:31

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] kernel/resource: remove first_lvl / siblings_only logic

On Mon, Mar 22, 2021 at 05:02:00PM +0100, David Hildenbrand wrote:
> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
> whole resource tree, not just the first level. Let's drop the unused
> first_lvl / siblings_only logic.
>
> All functions properly search the whole tree, so remove documentation
> that indicates that some functions behave differently.


Like this clean up!
Reviewed-by: Andy Shevchenko <[email protected]>

Although a few nit-picks below.

> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 45 ++++++++++++---------------------------------
> 1 file changed, 12 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 16e0c7e8ed24..7e00239a023a 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
> static struct resource *bootmem_resource_free;
> static DEFINE_SPINLOCK(bootmem_resource_lock);
>
> -static struct resource *next_resource(struct resource *p, bool sibling_only)
> +static struct resource *next_resource(struct resource *p)
> {
> - /* Caller wants to traverse through siblings only */
> - if (sibling_only)
> - return p->sibling;
> -
> if (p->child)
> return p->child;
> while (!p->sibling && p->parent)
> @@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
> {
> struct resource *p = v;
> (*pos)++;
> - return (void *)next_resource(p, false);
> + return (void *)next_resource(p);
> }
>
> #ifdef CONFIG_PROC_FS
> @@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
> * of the resource that's within [@start..@end]; if none is found, returns
> * -ENODEV. Returns -EINVAL for invalid parameters.
> *
> - * This function walks the whole tree and not just first level children
> - * unless @first_lvl is true.
> - *
> * @start: start address of the resource searched for
> * @end: end address of same resource
> * @flags: flags which the resource must have
> * @desc: descriptor the resource must have
> - * @first_lvl: walk only the first level children, if set
> * @res: return ptr, if resource found
> *
> * The caller must specify @start, @end, @flags, and @desc
> @@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
> */
> static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> unsigned long flags, unsigned long desc,
> - bool first_lvl, struct resource *res)
> + struct resource *res)
> {
> - bool siblings_only = true;
> struct resource *p;
>
> if (!res)
> @@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>
> read_lock(&resource_lock);
>
> - for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
> + for (p = iomem_resource.child; p; p = next_resource(p)) {
> /* If we passed the resource we are looking for, stop */
> if (p->start > end) {
> p = NULL;
> @@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> if (p->end < start)
> continue;
>
> - /*
> - * Now that we found a range that matches what we look for,
> - * check the flags and the descriptor. If we were not asked to
> - * use only the first level, start looking at children as well.
> - */
> - siblings_only = first_lvl;
> -
> if ((p->flags & flags) != flags)
> continue;
> if ((desc != IORES_DESC_NONE) && (desc != p->desc))
> @@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>
> static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> unsigned long flags, unsigned long desc,
> - bool first_lvl, void *arg,

> + void *arg,
> int (*func)(struct resource *, void *))

Can it be one line?

> {
> struct resource res;
> int ret = -EINVAL;
>
> while (start < end &&
> - !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
> + !find_next_iomem_res(start, end, flags, desc, &res)) {
> ret = (*func)(&res, arg);
> if (ret)
> break;
> @@ -431,7 +415,6 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> * @arg: function argument for the callback @func
> * @func: callback function that is called for each qualifying resource area
> *
> - * This walks through whole tree and not just first level children.
> * All the memory ranges which overlap start,end and also match flags and
> * desc are valid candidates.
> *
> @@ -441,7 +424,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
> u64 end, void *arg, int (*func)(struct resource *, void *))
> {
> - return __walk_iomem_res_desc(start, end, flags, desc, false, arg, func);
> + return __walk_iomem_res_desc(start, end, flags, desc, arg, func);
> }
> EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
>
> @@ -457,8 +440,8 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> - arg, func);

> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
> + func);

I guess you may do it on one line.

> }
>
> /*
> @@ -470,17 +453,14 @@ int walk_mem_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> - arg, func);
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
> + func);

Ditto.

> }
>
> /*
> * This function calls the @func callback against all memory ranges of type
> * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
> * It is to be used only for System RAM.
> - *
> - * This will find System RAM ranges that are children of top-level resources
> - * in addition to top-level System RAM resources.
> */
> int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> void *arg, int (*func)(unsigned long, unsigned long, void *))
> @@ -495,8 +475,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
> flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> while (start < end &&
> - !find_next_iomem_res(start, end, flags, IORES_DESC_NONE,
> - false, &res)) {
> + !find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res)) {
> pfn = PFN_UP(res.start);
> end_pfn = PFN_DOWN(res.end + 1);
> if (end_pfn > pfn)
> --
> 2.29.2
>

--
With Best Regards,
Andy Shevchenko


2021-03-23 11:20:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] kernel/resource: remove first_lvl / siblings_only logic

On 23.03.21 12:11, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 05:02:00PM +0100, David Hildenbrand wrote:
>> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
>> whole resource tree, not just the first level. Let's drop the unused
>> first_lvl / siblings_only logic.
>>
>> All functions properly search the whole tree, so remove documentation
>> that indicates that some functions behave differently.
>
>
> Like this clean up!
> Reviewed-by: Andy Shevchenko <[email protected]>
>
> Although a few nit-picks below.
>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Daniel Vetter <[email protected]>
>> Cc: Andy Shevchenko <[email protected]>
>> Cc: Mauro Carvalho Chehab <[email protected]>
>> Cc: Signed-off-by: David Hildenbrand <[email protected]>
>> Cc: Dave Young <[email protected]>
>> Cc: Baoquan He <[email protected]>
>> Cc: Vivek Goyal <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Keith Busch <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Qian Cai <[email protected]>
>> Cc: Oscar Salvador <[email protected]>
>> Cc: Eric Biederman <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: Tom Lendacky <[email protected]>
>> Cc: Brijesh Singh <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> kernel/resource.c | 45 ++++++++++++---------------------------------
>> 1 file changed, 12 insertions(+), 33 deletions(-)
>>
>> diff --git a/kernel/resource.c b/kernel/resource.c
>> index 16e0c7e8ed24..7e00239a023a 100644
>> --- a/kernel/resource.c
>> +++ b/kernel/resource.c
>> @@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
>> static struct resource *bootmem_resource_free;
>> static DEFINE_SPINLOCK(bootmem_resource_lock);
>>
>> -static struct resource *next_resource(struct resource *p, bool sibling_only)
>> +static struct resource *next_resource(struct resource *p)
>> {
>> - /* Caller wants to traverse through siblings only */
>> - if (sibling_only)
>> - return p->sibling;
>> -
>> if (p->child)
>> return p->child;
>> while (!p->sibling && p->parent)
>> @@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
>> {
>> struct resource *p = v;
>> (*pos)++;
>> - return (void *)next_resource(p, false);
>> + return (void *)next_resource(p);
>> }
>>
>> #ifdef CONFIG_PROC_FS
>> @@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
>> * of the resource that's within [@start..@end]; if none is found, returns
>> * -ENODEV. Returns -EINVAL for invalid parameters.
>> *
>> - * This function walks the whole tree and not just first level children
>> - * unless @first_lvl is true.
>> - *
>> * @start: start address of the resource searched for
>> * @end: end address of same resource
>> * @flags: flags which the resource must have
>> * @desc: descriptor the resource must have
>> - * @first_lvl: walk only the first level children, if set
>> * @res: return ptr, if resource found
>> *
>> * The caller must specify @start, @end, @flags, and @desc
>> @@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
>> */
>> static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>> unsigned long flags, unsigned long desc,
>> - bool first_lvl, struct resource *res)
>> + struct resource *res)
>> {
>> - bool siblings_only = true;
>> struct resource *p;
>>
>> if (!res)
>> @@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>>
>> read_lock(&resource_lock);
>>
>> - for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
>> + for (p = iomem_resource.child; p; p = next_resource(p)) {
>> /* If we passed the resource we are looking for, stop */
>> if (p->start > end) {
>> p = NULL;
>> @@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>> if (p->end < start)
>> continue;
>>
>> - /*
>> - * Now that we found a range that matches what we look for,
>> - * check the flags and the descriptor. If we were not asked to
>> - * use only the first level, start looking at children as well.
>> - */
>> - siblings_only = first_lvl;
>> -
>> if ((p->flags & flags) != flags)
>> continue;
>> if ((desc != IORES_DESC_NONE) && (desc != p->desc))
>> @@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>>
>> static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>> unsigned long flags, unsigned long desc,
>> - bool first_lvl, void *arg,
>
>> + void *arg,
>> int (*func)(struct resource *, void *))
>
> Can it be one line?
>
>> {
>> struct resource res;
>> int ret = -EINVAL;
>>
>> while (start < end &&
>> - !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
>> + !find_next_iomem_res(start, end, flags, desc, &res)) {
>> ret = (*func)(&res, arg);
>> if (ret)
>> break;
>> @@ -431,7 +415,6 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>> * @arg: function argument for the callback @func
>> * @func: callback function that is called for each qualifying resource area
>> *
>> - * This walks through whole tree and not just first level children.
>> * All the memory ranges which overlap start,end and also match flags and
>> * desc are valid candidates.
>> *
>> @@ -441,7 +424,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>> int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
>> u64 end, void *arg, int (*func)(struct resource *, void *))
>> {
>> - return __walk_iomem_res_desc(start, end, flags, desc, false, arg, func);
>> + return __walk_iomem_res_desc(start, end, flags, desc, arg, func);
>> }
>> EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
>>
>> @@ -457,8 +440,8 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
>> {
>> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>
>> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>> - arg, func);
>
>> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
>> + func);
>
> I guess you may do it on one line.
>
>> }
>>
>> /*
>> @@ -470,17 +453,14 @@ int walk_mem_res(u64 start, u64 end, void *arg,
>> {
>> unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>
>> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>> - arg, func);
>> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
>> + func);
>
> Ditto.

To all your comments:

"The preferred limit on the length of a single line is 80 columns."

"Statements longer than 80 columns should be broken into sensible chunks
... unless exceeding 80 columns significantly increases readability"

I don't think it significantly increases readability.


--
Thanks,

David / dhildenb

2021-03-23 11:28:18

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On 23.03.21 12:06, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
>> It used to be true that we can have busy system RAM only on the first level
>> in the resourc tree. However, this is no longer holds for driver-managed
>> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
>> lower levels.
>>
>> We have two users of walk_system_ram_res(), which currently only
>> consideres the first level:
>> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
>> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
>> locate_mem_hole_callback(), so even after this change, we won't be
>> placing kexec images onto dax/kmem and virtio-mem added memory. No
>> change.
>> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
>> not adding relevant ranges to the crash elf info, resulting in them
>> not getting dumped via kdump.
>>
>> This change fixes loading a crashkernel via kexec_file_load() and including
>
> "...fixes..." effectively means to me that Fixes tag should be provided.

We can certainly add, although it doesn't really affect the running
kernel, but only crashdumps taken in the kdump kernel:

Fixes: ebf71552bb0e ("virtio-mem: Add parent resource for all added
"System RAM"")
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use
like normal RAM")

Thanks

--
Thanks,

David / dhildenb

2021-03-23 11:28:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources

On 23.03.21 12:08, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 05:01:59PM +0100, David Hildenbrand wrote:
>> It used to be true that we can have system RAM only on the first level
>> in the resourc tree. However, this is no longer holds for driver-managed
>> system RAM (i.e., dax/kmem and virtio-mem).
>>
>> The function walk_mem_res() only consideres the first level and is
>> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
>> fail to identify System RAM added by dax/kmem and virtio-mem as
>> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
>> "normal RAM" in __ioremap_caller().
>
> Here I dunno, but consider to add Fixes tag if it fixes known bad behaviour.

Haven't observed it in the wild. We can add the same fixes tags as to
patch #1.

--
Thanks,

David / dhildenb

2021-03-23 14:35:34

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On 03/22/21 at 05:01pm, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);

Thanks, David, this is a good fix.

Acked-by: Baoquan He <[email protected]>

> }
>
> --
> 2.29.2
>

2021-03-24 11:57:07

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.

Let me ask some rookie questions:

What does "busy" term stand for here?
Why resources coming from virtio-mem are added at a lower levels?

>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
> }
>
> --
> 2.29.2
>
>

--
Oscar Salvador
SUSE L3

2021-03-25 02:57:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources

On 24.03.21 12:18, Oscar Salvador wrote:
> On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
>> It used to be true that we can have busy system RAM only on the first level
>> in the resourc tree. However, this is no longer holds for driver-managed
>> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
>> lower levels.
>
> Let me ask some rookie questions:
>
> What does "busy" term stand for here?

IORESOURCE_BUSY - here: actually added, not just some reserved range / container.


> Why resources coming from virtio-mem are added at a lower levels?

Some information can be had from ebf71552bb0e690cad523ad175e8c4c89a33c333

commit ebf71552bb0e690cad523ad175e8c4c89a33c333
Author: David Hildenbrand <[email protected]>
Date: Thu May 7 16:01:35 2020 +0200

virtio-mem: Add parent resource for all added "System RAM"

Let's add a parent resource, named after the virtio device (inspired by
drivers/dax/kmem.c). This allows user space to identify which memory
belongs to which virtio-mem device.

With this change and two virtio-mem devices:
:/# cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
[...]
140000000-333ffffff : virtio0
140000000-147ffffff : System RAM
148000000-14fffffff : System RAM
150000000-157ffffff : System RAM
[...]
334000000-3033ffffff : virtio1
338000000-33fffffff : System RAM
340000000-347ffffff : System RAM
348000000-34fffffff : System RAM
[...]



For dax/kmem it comes naturally due to the "Persistent Memory" and
device parent resources like:

140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM (kmem)
3280000000-32ffffffff : PCI Bus 0000:00


Thanks

--
Thanks,

David / dhildenb