This patch refers to the older thread:
https://marc.info/?t=153658306400001&r=1&w=2
As suggested by Michal Hocko, instead of adjusting memory_hotplug paths,
I have changed memmap_init_zone() to initialize struct pages beyond the
zone end (if zone end is not aligned with the section boundary).
Mikhail Zaslonko (1):
mm, memory_hotplug: Initialize struct pages for the full memory
section
mm/page_alloc.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
--
2.16.4
If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers:
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
[<00000000008f15c4>] show_valid_zones+0x5c/0x190
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fix the problem by initializing the last memory section of the highest zone
in memmap_init_zone() till the very end, even if it goes beyond the zone
end.
Signed-off-by: Mikhail Zaslonko <[email protected]>
Reviewed-by: Gerald Schaefer <[email protected]>
Cc: <[email protected]>
---
mm/page_alloc.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ec9cc407216..41ef5508e5f1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
cond_resched();
}
}
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * If there is no zone spanning the rest of the section
+ * then we should at least initialize those pages. Otherwise we
+ * could blow up on a poisoned page in some paths which depend
+ * on full sections being initialized (e.g. memory hotplug).
+ */
+ if (end_pfn == max_pfn) {
+ while (end_pfn % PAGES_PER_SECTION) {
+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone,
+ nid);
+ end_pfn++;
+ }
+ }
+#endif
}
#ifdef CONFIG_ZONE_DEVICE
--
2.16.4
On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote:
> If memory end is not aligned with the sparse memory section boundary, the
> mapping of such a section is only partly initialized.
It would be great to mention how you can end up in the situation like
this(a user provided memmap or a strange HW).
> This may lead to
> VM_BUG_ON due to uninitialized struct page access from
> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
> memory_hotplug sysfs handlers:
>
> page:000003d082008000 is uninitialized and poisoned
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> Call Trace:
> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
> [<00000000003e4194>] seq_read+0x204/0x480
> [<00000000003b53ea>] __vfs_read+0x32/0x178
> [<00000000003b55b2>] vfs_read+0x82/0x138
> [<00000000003b5be2>] ksys_read+0x5a/0xb0
> [<0000000000b86ba0>] system_call+0xdc/0x2d8
> Last Breaking-Event-Address:
> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
> Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> Fix the problem by initializing the last memory section of the highest zone
> in memmap_init_zone() till the very end, even if it goes beyond the zone
> end.
Why do we need to restrict this to the highest zone? In other words, why
cannot we do what I was suggesting earlier [1]. What does prevent other
zones to have an incomplete section boundary?
[1] http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Mikhail Zaslonko <[email protected]>
> Reviewed-by: Gerald Schaefer <[email protected]>
> Cc: <[email protected]>
> ---
> mm/page_alloc.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2ec9cc407216..41ef5508e5f1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> cond_resched();
> }
> }
> +#ifdef CONFIG_SPARSEMEM
> + /*
> + * If there is no zone spanning the rest of the section
> + * then we should at least initialize those pages. Otherwise we
> + * could blow up on a poisoned page in some paths which depend
> + * on full sections being initialized (e.g. memory hotplug).
> + */
> + if (end_pfn == max_pfn) {
> + while (end_pfn % PAGES_PER_SECTION) {
> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone,
> + nid);
> + end_pfn++;
> + }
> + }
> +#endif
> }
>
> #ifdef CONFIG_ZONE_DEVICE
> --
> 2.16.4
--
Michal Hocko
SUSE Labs
On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote:
>If memory end is not aligned with the sparse memory section boundary, the
>mapping of such a section is only partly initialized. This may lead to
>VM_BUG_ON due to uninitialized struct page access from
>is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>memory_hotplug sysfs handlers:
>
> page:000003d082008000 is uninitialized and poisoned
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> Call Trace:
> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
> [<00000000003e4194>] seq_read+0x204/0x480
> [<00000000003b53ea>] __vfs_read+0x32/0x178
> [<00000000003b55b2>] vfs_read+0x82/0x138
> [<00000000003b5be2>] ksys_read+0x5a/0xb0
> [<0000000000b86ba0>] system_call+0xdc/0x2d8
> Last Breaking-Event-Address:
> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
> Kernel panic - not syncing: Fatal exception: panic_on_oops
>
>Fix the problem by initializing the last memory section of the highest zone
>in memmap_init_zone() till the very end, even if it goes beyond the zone
>end.
>
>Signed-off-by: Mikhail Zaslonko <[email protected]>
>Reviewed-by: Gerald Schaefer <[email protected]>
>Cc: <[email protected]>
>---
> mm/page_alloc.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 2ec9cc407216..41ef5508e5f1 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> cond_resched();
> }
> }
>+#ifdef CONFIG_SPARSEMEM
>+ /*
>+ * If there is no zone spanning the rest of the section
>+ * then we should at least initialize those pages. Otherwise we
>+ * could blow up on a poisoned page in some paths which depend
>+ * on full sections being initialized (e.g. memory hotplug).
>+ */
>+ if (end_pfn == max_pfn) {
>+ while (end_pfn % PAGES_PER_SECTION) {
>+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone,
>+ nid);
>+ end_pfn++;
>+ }
>+ }
>+#endif
If my understanding is correct, end_pfn is not a valid range.
memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I
am afraid this will break the syntax.
And max_pfn is also not a valid one. For example, on x86,
update_end_of_memory_vars() will update max_pfn, which is calculated by:
end_pfn = PFN_UP(start + size);
BTW, as you mentioned this apply to hotplug case. And then why this couldn't
happen during boot up? What differ these two cases?
> }
>
> #ifdef CONFIG_ZONE_DEVICE
>--
>2.16.4
--
Wei Yang
Help you, Help me
On Mon 10-12-18 16:45:37, Zaslonko Mikhail wrote:
> Hello,
>
> On 10.12.2018 14:24, Michal Hocko wrote:
[...]
> > Why do we need to restrict this to the highest zone? In other words, why
> > cannot we do what I was suggesting earlier [1]. What does prevent other
> > zones to have an incomplete section boundary?
>
> Well, as you were also suggesting earlier: 'If we do not have a zone which
> spans the rest of the section'. I'm not sure how else we should verify that.
I am not sure I follow here. Why cannot we simply drop end_pfn check and
keep the rest?
> Moreover, I was able to recreate the problem only with the highest zone
> (memory end is not on the section boundary).
What exactly prevents exactmap memmap to generate these unfinished zones?
> > [1] http://lkml.kernel.org/r/[email protected]
> >
> >> Signed-off-by: Mikhail Zaslonko <[email protected]>
> >> Reviewed-by: Gerald Schaefer <[email protected]>
> >> Cc: <[email protected]>
> >> ---
> >> mm/page_alloc.c | 15 +++++++++++++++
> >> 1 file changed, 15 insertions(+)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 2ec9cc407216..41ef5508e5f1 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >> cond_resched();
> >> }
> >> }
> >> +#ifdef CONFIG_SPARSEMEM
> >> + /*
> >> + * If there is no zone spanning the rest of the section
> >> + * then we should at least initialize those pages. Otherwise we
> >> + * could blow up on a poisoned page in some paths which depend
> >> + * on full sections being initialized (e.g. memory hotplug).
> >> + */
> >> + if (end_pfn == max_pfn) {
> >> + while (end_pfn % PAGES_PER_SECTION) {
> >> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone,
> >> + nid);
> >> + end_pfn++;
> >> + }
> >> + }
> >> +#endif
> >> }
> >>
> >> #ifdef CONFIG_ZONE_DEVICE
> >> --
> >> 2.16.4
> >
>
> Thanks,
> Mikhail Zaslonko
--
Michal Hocko
SUSE Labs
On Mon, Dec 10, 2018 at 05:14:36PM +0100, Zaslonko Mikhail wrote:
>Hello,
>
>On 10.12.2018 16:10, Wei Yang wrote:
>> On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote:
>>> If memory end is not aligned with the sparse memory section boundary, the
>>> mapping of such a section is only partly initialized. This may lead to
>>> VM_BUG_ON due to uninitialized struct page access from
>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>>> memory_hotplug sysfs handlers:
>>>
>>> page:000003d082008000 is uninitialized and poisoned
>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>> Call Trace:
>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>> [<00000000003e4194>] seq_read+0x204/0x480
>>> [<00000000003b53ea>] __vfs_read+0x32/0x178
>>> [<00000000003b55b2>] vfs_read+0x82/0x138
>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>> Last Breaking-Event-Address:
>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>
>>> Fix the problem by initializing the last memory section of the highest zone
>>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>>> end.
>>>
>>> Signed-off-by: Mikhail Zaslonko <[email protected]>
>>> Reviewed-by: Gerald Schaefer <[email protected]>
>>> Cc: <[email protected]>
>>> ---
>>> mm/page_alloc.c | 15 +++++++++++++++
>>> 1 file changed, 15 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 2ec9cc407216..41ef5508e5f1 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>> cond_resched();
>>> }
>>> }
>>> +#ifdef CONFIG_SPARSEMEM
>>> + /*
>>> + * If there is no zone spanning the rest of the section
>>> + * then we should at least initialize those pages. Otherwise we
>>> + * could blow up on a poisoned page in some paths which depend
>>> + * on full sections being initialized (e.g. memory hotplug).
>>> + */
>>> + if (end_pfn == max_pfn) {
>>> + while (end_pfn % PAGES_PER_SECTION) {
>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone,
>>> + nid);
>>> + end_pfn++;
>>> + }
>>> + }
>>> +#endif
>>
>> If my understanding is correct, end_pfn is not a valid range.
>>
>> memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I
>> am afraid this will break the syntax.
>>
>> And max_pfn is also not a valid one. For example, on x86,
>I used pfn_max here to check for the highest zone. What would be a better way?
>
>> update_end_of_memory_vars() will update max_pfn, which is calculated by:
>>
>> end_pfn = PFN_UP(start + size);
>>
>> BTW, as you mentioned this apply to hotplug case. And then why this couldn't
>> happen during boot up? What differ these two cases?
>
>Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes
>place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers).
>You can find more details in the original thread:
>https://marc.info/?t=153658306400001&r=1&w=2
>
Thanks.
I took a look into the original thread and try to reproduce this on x86.
My step is:
1. config page_poisoning
2. use kernel parameter mem=3075M
3. cat the last memory block device sysfs file removable
eg. when mem is 3075, cat memory9/removable
I don't see the Call trace. Do I miss something to reproduce it?
>>
>>> }
>>>
>>> #ifdef CONFIG_ZONE_DEVICE
>>> --
>>> 2.16.4
>>
>
>Thanks,
>Mikhail Zaslonko
--
Wei Yang
Help you, Help me
On Mon, Dec 10, 2018 at 02:24:51PM +0100, Michal Hocko wrote:
>On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote:
>> If memory end is not aligned with the sparse memory section boundary, the
>> mapping of such a section is only partly initialized.
>
>It would be great to mention how you can end up in the situation like
>this(a user provided memmap or a strange HW).
>
>> This may lead to
>> VM_BUG_ON due to uninitialized struct page access from
>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>> memory_hotplug sysfs handlers:
>>
>> page:000003d082008000 is uninitialized and poisoned
>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>> Call Trace:
>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>> [<00000000003e4194>] seq_read+0x204/0x480
>> [<00000000003b53ea>] __vfs_read+0x32/0x178
>> [<00000000003b55b2>] vfs_read+0x82/0x138
>> [<00000000003b5be2>] ksys_read+0x5a/0xb0
>> [<0000000000b86ba0>] system_call+0xdc/0x2d8
>> Last Breaking-Event-Address:
>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> Fix the problem by initializing the last memory section of the highest zone
>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>> end.
>
>Why do we need to restrict this to the highest zone? In other words, why
>cannot we do what I was suggesting earlier [1]. What does prevent other
>zones to have an incomplete section boundary?
>
>[1] http://lkml.kernel.org/r/[email protected]
>
I tried to go through the original list and make myself familiar with
the bug.
Confused why initialize the *last* end_pfn could fix this, since
is_mem_section_removable() will iterate on each page of a section. This
means we need to initialize all the pages left in the section.
One way to fix this in my mind is to record the last pfn in mem_section.
This could be done in memory_preset(), since after that we may assume
the section is full. Not sure whether you would like it.
--
Wei Yang
Help you, Help me
On Tue, Dec 11, 2018 at 04:23:05PM +0100, Zaslonko Mikhail wrote:
>Hello,
>
>On 11.12.2018 02:50, Wei Yang wrote:
>> On Mon, Dec 10, 2018 at 05:14:36PM +0100, Zaslonko Mikhail wrote:
>>> Hello,
>>>
>>> On 10.12.2018 16:10, Wei Yang wrote:
>>>> On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote:
>>>>> If memory end is not aligned with the sparse memory section boundary, the
>>>>> mapping of such a section is only partly initialized. This may lead to
>>>>> VM_BUG_ON due to uninitialized struct page access from
>>>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>>>>> memory_hotplug sysfs handlers:
>>>>>
>>>>> page:000003d082008000 is uninitialized and poisoned
>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>>>> Call Trace:
>>>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>>> [<00000000003e4194>] seq_read+0x204/0x480
>>>>> [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>>> [<00000000003b55b2>] vfs_read+0x82/0x138
>>>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>>>> Last Breaking-Event-Address:
>>>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>>>
>>>>> Fix the problem by initializing the last memory section of the highest zone
>>>>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>>>>> end.
>>>>>
>>>>> Signed-off-by: Mikhail Zaslonko <[email protected]>
>>>>> Reviewed-by: Gerald Schaefer <[email protected]>
>>>>> Cc: <[email protected]>
>>>>> ---
>>>>> mm/page_alloc.c | 15 +++++++++++++++
>>>>> 1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index 2ec9cc407216..41ef5508e5f1 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>>> cond_resched();
>>>>> }
>>>>> }
>>>>> +#ifdef CONFIG_SPARSEMEM
>>>>> + /*
>>>>> + * If there is no zone spanning the rest of the section
>>>>> + * then we should at least initialize those pages. Otherwise we
>>>>> + * could blow up on a poisoned page in some paths which depend
>>>>> + * on full sections being initialized (e.g. memory hotplug).
>>>>> + */
>>>>> + if (end_pfn == max_pfn) {
>>>>> + while (end_pfn % PAGES_PER_SECTION) {
>>>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone,
>>>>> + nid);
>>>>> + end_pfn++;
>>>>> + }
>>>>> + }
>>>>> +#endif
>>>>
>>>> If my understanding is correct, end_pfn is not a valid range.
>>>>
>>>> memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I
>>>> am afraid this will break the syntax.
>>>>
>>>> And max_pfn is also not a valid one. For example, on x86,
>>> I used pfn_max here to check for the highest zone. What would be a better way?
>>>
>>>> update_end_of_memory_vars() will update max_pfn, which is calculated by:
>>>>
>>>> end_pfn = PFN_UP(start + size);
>>>>
>>>> BTW, as you mentioned this apply to hotplug case. And then why this couldn't
>>>> happen during boot up? What differ these two cases?
>>>
>>> Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes
>>> place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers).
>>> You can find more details in the original thread:
>>> https://marc.info/?t=153658306400001&r=1&w=2
>>>
>>
>> Thanks.
>>
>> I took a look into the original thread and try to reproduce this on x86.
>>
>> My step is:
>>
>> 1. config page_poisoning
>> 2. use kernel parameter mem=3075M
>> 3. cat the last memory block device sysfs file removable
>> eg. when mem is 3075, cat memory9/removable
>>
>> I don't see the Call trace. Do I miss something to reproduce it?
>>
>
>No you don't. I guess there might be deviations depending on the architecture (I am on s390).
>As I understand, memory block size is 384 Mb on your system and memory9 is the last block on the list?
Sorry, my calculation is not correct. The last memory_block is 23
instead of 9.
>BTW, do you have CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_PGFLAGS on?
>
Yes, I have set it:
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
And the kernel cmdline is:
BOOT_IMAGE=/vmlinuz-4.20.0-rc5+ root=UUID=98aa84d6-9ba6-4033-ab91-9ca6fe3dd74f ro \
resume=UUID=b7c21053-d9c1-4e58-8488-7d385f8ee107 console=ttyS0 \
LANG=en_US.UTF-8 mem=3075M
>
>>>>
>>>>> }
>>>>>
>>>>> #ifdef CONFIG_ZONE_DEVICE
>>>>> --
>>>>> 2.16.4
>>>>
>>>
>>> Thanks,
>>> Mikhail Zaslonko
>>
--
Wei Yang
Help you, Help me
On Tue, Dec 11, 2018 at 04:17:34PM +0100, Zaslonko Mikhail wrote:
>
>
>On 11.12.2018 10:49, Wei Yang wrote:
>> On Mon, Dec 10, 2018 at 02:24:51PM +0100, Michal Hocko wrote:
>>> On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote:
>>>> If memory end is not aligned with the sparse memory section boundary, the
>>>> mapping of such a section is only partly initialized.
>>>
>>> It would be great to mention how you can end up in the situation like
>>> this(a user provided memmap or a strange HW).
>>>
>>>> This may lead to
>>>> VM_BUG_ON due to uninitialized struct page access from
>>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>>>> memory_hotplug sysfs handlers:
>>>>
>>>> page:000003d082008000 is uninitialized and poisoned
>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>>> Call Trace:
>>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>> [<00000000003e4194>] seq_read+0x204/0x480
>>>> [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>> [<00000000003b55b2>] vfs_read+0x82/0x138
>>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>>> Last Breaking-Event-Address:
>>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>>
>>>> Fix the problem by initializing the last memory section of the highest zone
>>>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>>>> end.
>>>
>>> Why do we need to restrict this to the highest zone? In other words, why
>>> cannot we do what I was suggesting earlier [1]. What does prevent other
>>> zones to have an incomplete section boundary?
>>>
>>> [1] http://lkml.kernel.org/r/[email protected]
>>>
>>
>> I tried to go through the original list and make myself familiar with
>> the bug.
>>
>> Confused why initialize the *last* end_pfn could fix this, since
>> is_mem_section_removable() will iterate on each page of a section. This
>> means we need to initialize all the pages left in the section.
>That's exactly what the fix does. We initialize all the pages left in
>the section.
>
You are right, I misunderstand your code.
>>
>> One way to fix this in my mind is to record the last pfn in mem_section.
>Do you mean last initialized pfn? I guess we have agreed upon that the
>entire section should be initialized.
Well, that would be great.
>
>> This could be done in memory_preset(), since after that we may assume
>> the section is full. Not sure whether you would like it.
>>
--
Wei Yang
Help you, Help me