2021-06-24 03:26:44

by Lianjie Zhang

[permalink] [raw]
Subject: [PATCH] mm: Fix the problem of mips architecture Oops

The cause of the problem is as follows:
1. when cat /sys/devices/system/memory/memory0/valid_zones,
test_pages_in_a_zone() will be called.
2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
The smallest pfn of the numa node in the mips architecture is 128,
and the page corresponding to the previous 0~127 pfn is not
initialized (page->flags is 0xFFFFFFFF)
3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
of bounds in the corresponding array,
&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
access to the out-of-bounds zone member variables appear abnormal,
resulting in Oops.
Therefore, it is necessary to keep the page between 0 and the minimum
pfn to prevent Oops from appearing.

Signed-off-by: zhanglianjie <[email protected]>
---
arch/mips/kernel/setup.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 23a140327a0b..f1da2b2ba5e9 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -653,6 +653,8 @@ static void __init arch_mem_init(char **cmdline_p)
*/
memblock_set_current_limit(PFN_PHYS(max_low_pfn));

+ memblock_reserve(0, PAGE_SIZE * NODE_DATA(0)->node_start_pfn);
+
mips_reserve_vmcore();

mips_parse_crashkernel();
--
2.20.1




2021-06-25 13:42:50

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix the problem of mips architecture Oops

On Thu, Jun 24, 2021 at 11:22:12AM +0800, zhanglianjie wrote:
> The cause of the problem is as follows:
> 1. when cat /sys/devices/system/memory/memory0/valid_zones,
> test_pages_in_a_zone() will be called.
> 2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
> The smallest pfn of the numa node in the mips architecture is 128,
> and the page corresponding to the previous 0~127 pfn is not
> initialized (page->flags is 0xFFFFFFFF)
> 3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
> of bounds in the corresponding array,
> &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
> access to the out-of-bounds zone member variables appear abnormal,
> resulting in Oops.
> Therefore, it is necessary to keep the page between 0 and the minimum
> pfn to prevent Oops from appearing.
>
> Signed-off-by: zhanglianjie <[email protected]>
> ---
> arch/mips/kernel/setup.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 23a140327a0b..f1da2b2ba5e9 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -653,6 +653,8 @@ static void __init arch_mem_init(char **cmdline_p)
> */
> memblock_set_current_limit(PFN_PHYS(max_low_pfn));
>
> + memblock_reserve(0, PAGE_SIZE * NODE_DATA(0)->node_start_pfn);
> +

which platform needs this ? This look it should be better fixed in
the platform memory registration code.

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2021-06-28 01:14:45

by Lianjie Zhang

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix the problem of mips architecture Oops



On 2021-06-25 21:39, Thomas Bogendoerfer wrote:
> On Thu, Jun 24, 2021 at 11:22:12AM +0800, zhanglianjie wrote:
>> The cause of the problem is as follows:
>> 1. when cat /sys/devices/system/memory/memory0/valid_zones,
>> test_pages_in_a_zone() will be called.
>> 2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
>> The smallest pfn of the numa node in the mips architecture is 128,
>> and the page corresponding to the previous 0~127 pfn is not
>> initialized (page->flags is 0xFFFFFFFF)
>> 3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
>> of bounds in the corresponding array,
>> &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
>> access to the out-of-bounds zone member variables appear abnormal,
>> resulting in Oops.
>> Therefore, it is necessary to keep the page between 0 and the minimum
>> pfn to prevent Oops from appearing.
>>
>> Signed-off-by: zhanglianjie <[email protected]>
>> ---
>> arch/mips/kernel/setup.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
>> index 23a140327a0b..f1da2b2ba5e9 100644
>> --- a/arch/mips/kernel/setup.c
>> +++ b/arch/mips/kernel/setup.c
>> @@ -653,6 +653,8 @@ static void __init arch_mem_init(char **cmdline_p)
>> */
>> memblock_set_current_limit(PFN_PHYS(max_low_pfn));
>>
>> + memblock_reserve(0, PAGE_SIZE * NODE_DATA(0)->node_start_pfn);
>> +
>
> which platform needs this ? This look it should be better fixed in
> the platform memory registration code.
>
> Thomas.
>

I have a problem on the loogson platform.

--
Regards,
Zhang Lianjie


2021-06-28 01:20:52

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix the problem of mips architecture Oops


在 2021/6/28 上午9:07, zhanglianjie 写道:
>
>
> On 2021-06-25 21:39, Thomas Bogendoerfer wrote:
>> On Thu, Jun 24, 2021 at 11:22:12AM +0800, zhanglianjie wrote:
>>> The cause of the problem is as follows:
>>> 1. when cat /sys/devices/system/memory/memory0/valid_zones,
>>>     test_pages_in_a_zone() will be called.
>>> 2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
>>>     The smallest pfn of the numa node in the mips architecture is 128,
>>>     and the page corresponding to the previous 0~127 pfn is not
>>>     initialized (page->flags is 0xFFFFFFFF)
>>> 3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
>>>     of bounds in the corresponding array,
>>> &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
>>>     access to the out-of-bounds zone member variables appear abnormal,
>>>     resulting in Oops.
>>> Therefore, it is necessary to keep the page between 0 and the minimum
>>> pfn to prevent Oops from appearing.
>>>
>>> Signed-off-by: zhanglianjie <[email protected]>
>>> ---
>>>   arch/mips/kernel/setup.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
>>> index 23a140327a0b..f1da2b2ba5e9 100644
>>> --- a/arch/mips/kernel/setup.c
>>> +++ b/arch/mips/kernel/setup.c
>>> @@ -653,6 +653,8 @@ static void __init arch_mem_init(char **cmdline_p)
>>>        */
>>>       memblock_set_current_limit(PFN_PHYS(max_low_pfn));
>>>
>>> +    memblock_reserve(0, PAGE_SIZE * NODE_DATA(0)->node_start_pfn);
>>> +
>>
>> which platform needs this ? This look it should be better fixed in
>> the platform memory registration code.
>>
>> Thomas.
>>
>
> I have a problem on the loogson platform.

I had checked a Loongson 3A4000 board (Lemote-A1901) with UEFI firmware
and the region is reserved by firmware.

Hmm, you'd better contact vendor to fix the firmware. If it's not
possible then workaround it in arch/mips/loongson64/numa.c

Thanks.

- Jiaxun


2021-06-28 05:54:55

by Lianjie Zhang

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix the problem of mips architecture Oops



On 2021-06-28 09:17, Jiaxun Yang wrote:
>
> 在 2021/6/28 上午9:07, zhanglianjie 写道:
>>
>>
>> On 2021-06-25 21:39, Thomas Bogendoerfer wrote:
>>> On Thu, Jun 24, 2021 at 11:22:12AM +0800, zhanglianjie wrote:
>>>> The cause of the problem is as follows:
>>>> 1. when cat /sys/devices/system/memory/memory0/valid_zones,
>>>>     test_pages_in_a_zone() will be called.
>>>> 2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
>>>>     The smallest pfn of the numa node in the mips architecture is 128,
>>>>     and the page corresponding to the previous 0~127 pfn is not
>>>>     initialized (page->flags is 0xFFFFFFFF)
>>>> 3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
>>>>     of bounds in the corresponding array,
>>>> &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
>>>>     access to the out-of-bounds zone member variables appear abnormal,
>>>>     resulting in Oops.
>>>> Therefore, it is necessary to keep the page between 0 and the minimum
>>>> pfn to prevent Oops from appearing.
>>>>
>>>> Signed-off-by: zhanglianjie <[email protected]>
>>>> ---
>>>>   arch/mips/kernel/setup.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
>>>> index 23a140327a0b..f1da2b2ba5e9 100644
>>>> --- a/arch/mips/kernel/setup.c
>>>> +++ b/arch/mips/kernel/setup.c
>>>> @@ -653,6 +653,8 @@ static void __init arch_mem_init(char **cmdline_p)
>>>>        */
>>>>       memblock_set_current_limit(PFN_PHYS(max_low_pfn));
>>>>
>>>> +    memblock_reserve(0, PAGE_SIZE * NODE_DATA(0)->node_start_pfn);
>>>> +
>>>
>>> which platform needs this ? This look it should be better fixed in
>>> the platform memory registration code.
>>>
>>> Thomas.
>>>
>>
>> I have a problem on the loogson platform.
>
> I had checked a Loongson 3A4000 board (Lemote-A1901) with UEFI firmware
> and the region is reserved by firmware.
>
> Hmm, you'd better contact vendor to fix the firmware. If it's not
> possible then workaround it in arch/mips/loongson64/numa.c
>
> Thanks.
>
> - Jiaxun
>
>
>
>

I will try to contact the manufacturer. However, the manufacturer cannot
be contacted temporarily. I resubmitted a patch according to your method.
thank you very much for your help.

I want to ask, how do you check that the region is reserved by UEFI
firmware?

The machine information I tested is as follows:
1. Lemote board
- hardware information:
Loongson 3A4000 board LEMOTE-LS3A4000-7A1000-1w-V01-pc.
- pagesize is 16k.
2. THTF board
- hardware information:
Loongson 3A4000 board THTF-LS3A4000-7A1000-1W-VB1-ML4A
- pagesize is 16k.



--
Regards,
Zhang Lianjie