2023-06-14 14:30:46

by Song Shuai

[permalink] [raw]
Subject: [PATCH] memblock: Add error message when memblock_can_resize is not ready

The memblock APIs are always correct, thus the callers usually don't
handle the return code. But the failure caused by unready memblock_can_resize
is hard to recognize without the return code. Like this piece of log:

```
[ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
[ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
[ 0.000000] Oops - store (or AMO) access fault [#1]
```

So add an error message for this kind of failure:

```
[ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
[ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
[ 0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
[ 0.000000] Oops - store (or AMO) access fault [#1]
```

Signed-off-by: Song Shuai <[email protected]>
---
mm/memblock.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 3feafea06ab2..ab952a164f62 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
/* We don't allow resizing until we know about the reserved regions
* of memory that aren't suitable for allocation
*/
- if (!memblock_can_resize)
+ if (!memblock_can_resize) {
+ pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
+ type->name, &new_area_start, (unsigned long)new_area_size);
return -1;
+ }

/* Calculate new doubled size */
old_size = type->max * sizeof(struct memblock_region);
--
2.20.1



2023-06-14 16:30:26

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] memblock: Add error message when memblock_can_resize is not ready

Hi,

On Wed, Jun 14, 2023 at 09:17:46PM +0800, Song Shuai wrote:
> The memblock APIs are always correct, thus the callers usually don't
> handle the return code. But the failure caused by unready memblock_can_resize
> is hard to recognize without the return code. Like this piece of log:

Please make it clear that failure is in memblock_double_array(), e.g.

But when memblock_double_array() is called before memblock_can_resize
is true, it is hard to understand the actual reason for the failure.

>
> ```
> [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> [ 0.000000] Oops - store (or AMO) access fault [#1]
> ```
>
> So add an error message for this kind of failure:
>
> ```
> [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> [ 0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
> [ 0.000000] Oops - store (or AMO) access fault [#1]
> ```
>
> Signed-off-by: Song Shuai <[email protected]>
> ---
> mm/memblock.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 3feafea06ab2..ab952a164f62 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
> /* We don't allow resizing until we know about the reserved regions
> * of memory that aren't suitable for allocation
> */
> - if (!memblock_can_resize)
> + if (!memblock_can_resize) {
> + pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
> + type->name, &new_area_start, (unsigned long)new_area_size);

Most of the time memblock uses %llu and cast to u64 to print size, please
make this consistent.

> return -1;
> + }
>
> /* Calculate new doubled size */
> old_size = type->max * sizeof(struct memblock_region);
> --
> 2.20.1
>
>

--
Sincerely yours,
Mike.

2023-06-20 07:14:04

by Song Shuai

[permalink] [raw]
Subject: Re: [PATCH] memblock: Add error message when memblock_can_resize is not ready

Sorry for not replying to you in time

在 2023/6/15 00:07, Mike Rapoport 写道:
> Hi,
>
> On Wed, Jun 14, 2023 at 09:17:46PM +0800, Song Shuai wrote:
>> The memblock APIs are always correct, thus the callers usually don't
>> handle the return code. But the failure caused by unready memblock_can_resize
>> is hard to recognize without the return code. Like this piece of log:
>
> Please make it clear that failure is in memblock_double_array(), e.g.
>

Having numerous memblock reservations at early boot where
memblock_can_resize is unset
may exhaust the INIT_MEMBLOCK_REGIONS sized memblock.reserved regions
and try to
double the region array via memblock_double_array() which fails and
returns -1 to the caller.

You can find the numerous memblock reservations reported by this commit
24cc61d8cb5a ("arm64: memblock: don't permit memblock resizing until
linear mapping is up").
And the similar test sense can be simulated by a constructed dtb with
numerous discrete
/memreserve/ or /reserved-memory regions.

> But when memblock_double_array() is called before memblock_can_resize
> is true, it is hard to understand the actual reason for the failure.
>
>>
>> ```
>> [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
>> [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
>> [ 0.000000] Oops - store (or AMO) access fault [#1]
>> ```
>>
>> So add an error message for this kind of failure:
>>
>> ```
>> [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
>> [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
>> [ 0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
>> [ 0.000000] Oops - store (or AMO) access fault [#1]
>> ```
>>
>> Signed-off-by: Song Shuai <[email protected]>
>> ---
>> mm/memblock.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 3feafea06ab2..ab952a164f62 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
>> /* We don't allow resizing until we know about the reserved regions
>> * of memory that aren't suitable for allocation
>> */
>> - if (!memblock_can_resize)
>> + if (!memblock_can_resize) {
>> + pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
>> + type->name, &new_area_start, (unsigned long)new_area_size);
>
> Most of the time memblock uses %llu and cast to u64 to print size, please
> make this consistent.
I will fix it in next version if the above description is ok for you.
>
>> return -1;
>> + }
>>
>> /* Calculate new doubled size */
>> old_size = type->max * sizeof(struct memblock_region);
>> --
>> 2.20.1
>>
>>
>

--
Thanks
Song Shuai


2023-06-21 16:08:06

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] memblock: Add error message when memblock_can_resize is not ready

On Tue, Jun 20, 2023 at 03:04:55PM +0800, Song Shuai wrote:
> Sorry for not replying to you in time
>
> 在 2023/6/15 00:07, Mike Rapoport 写道:
> > Hi,
> >
> > On Wed, Jun 14, 2023 at 09:17:46PM +0800, Song Shuai wrote:
> > > The memblock APIs are always correct, thus the callers usually don't
> > > handle the return code. But the failure caused by unready memblock_can_resize
> > > is hard to recognize without the return code. Like this piece of log:
> >
> > Please make it clear that failure is in memblock_double_array(), e.g.
> >
>
> Having numerous memblock reservations at early boot where
> memblock_can_resize is unset
> may exhaust the INIT_MEMBLOCK_REGIONS sized memblock.reserved regions and
> try to
> double the region array via memblock_double_array() which fails and returns
> -1 to the caller.
>
> You can find the numerous memblock reservations reported by this commit
> 24cc61d8cb5a ("arm64: memblock: don't permit memblock resizing until linear
> mapping is up").
> And the similar test sense can be simulated by a constructed dtb with
> numerous discrete
> /memreserve/ or /reserved-memory regions.

Ideally, the callers of memblock_reserve() should check the return value
and panic with a meaningful message if it fails. Still, for now something
like this patch is an improvement.

How about we make the changelog to be something like:

Subject: memblock: report failures when memblock_can_resize is not set

The callers of memblock_reserve() do not check the return value presuming
that memblock_reserve() always succeeds, but there are cases where it may
fail.

Having numerous memblock reservations at early boot where
memblock_can_resize is unset may exhaust the INIT_MEMBLOCK_REGIONS sized
memblock.reserved regions array and an attempt to double this array via
memblock_double_array() will fail and will return -1 to the caller.

When this happens the system crashes anyway, but it's hard to identify the
reason for the crash.

Add a panic message to memblock_double_array() to aid debugging of the
cases when too many regions are reserved before memblock can resize
memblock.reserved array.

> > But when memblock_double_array() is called before memblock_can_resize
> > is true, it is hard to understand the actual reason for the failure.
> >
> > >
> > > ```
> > > [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> > > [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> > > [ 0.000000] Oops - store (or AMO) access fault [#1]
> > > ```
> > >
> > > So add an error message for this kind of failure:
> > >
> > > ```
> > > [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> > > [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> > > [ 0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
> > > [ 0.000000] Oops - store (or AMO) access fault [#1]
> > > ```
> > >
> > > Signed-off-by: Song Shuai <[email protected]>
> > > ---
> > > mm/memblock.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index 3feafea06ab2..ab952a164f62 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
> > > /* We don't allow resizing until we know about the reserved regions
> > > * of memory that aren't suitable for allocation
> > > */
> > > - if (!memblock_can_resize)
> > > + if (!memblock_can_resize) {
> > > + pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
> > > + type->name, &new_area_start, (unsigned long)new_area_size);

The system will crash anyway if we get, here, so why won't use panic?
Also, dumping new_area_start here does not add any information but rather
confuses. How about

panic("memblock: cannot resize %s array\n", type->name);

> >
> > Most of the time memblock uses %llu and cast to u64 to print size, please
> > make this consistent.
> I will fix it in next version if the above description is ok for you.
> >
> > > return -1;
> > > + }
> > > /* Calculate new doubled size */
> > > old_size = type->max * sizeof(struct memblock_region);
>
> --
> Thanks
> Song Shuai
>
>

--
Sincerely yours,
Mike.