2020-10-09 13:55:42

by Rahul Gopakumar

[permalink] [raw]
Subject: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

As part of VMware's performance regression testing for Linux Kernel
upstream releases,?we identified boot time increase when comparing
Linux 5.8 kernel against Linux 5.7 kernel.?Increase in boot time is
noticeable on VM with a **large amount of memory**.
?
In our test cases, it's noticeable with memory 1TB and more, whereas
there was no major?difference noticed in testcases with <1TB.
?
On bisecting between 5.7 and 5.8, we found the following commit from?
?Baoquan He? to be?the cause of boot time increase in big VM test cases.
?
-------------------------------------
?
commit 73a6e474cb376921a311786652782155eac2fdf0
Author: Baoquan He <[email protected]>
Date: Wed Jun 3 15:57:55 2020 -0700
?
mm: memmap_init: iterate over memblock regions rather that check each PFN
?
When called during boot the memmap_init_zone() function checks if each PFN
is valid and actually belongs to the node being initialized using
early_pfn_valid() and early_pfn_in_nid().
?
Each such check may cost up to O(log(n)) where n is the number of memory
banks, so for large amount of memory overall time spent in early_pfn*()
becomes substantial.
?
-------------------------------------
?
For boot time test, we used RHEL 8.1 as the guest OS.
VM config is 84 vcpu and 1TB vRAM.
?
Here are the actual performance numbers.
?
5.7 GA - 18.17 secs
Baoquan's commit - 21.6 secs (-16% increase in time)
?
From dmesg logs, we can see significant time delay around memmap.
?
Refer below logs.
?
Good commit
?
[0.033176] Normal zone: 1445888 pages used for memmap
[0.033176] Normal zone: 89391104 pages, LIFO batch:63
[0.035851] ACPI: PM-Timer IO Port: 0x448
?
Problem commit
?
[0.026874] Normal zone: 1445888 pages used for memmap
[0.026875] Normal zone: 89391104 pages, LIFO batch:63
[2.028450] ACPI: PM-Timer IO Port: 0x448
?
We did some analysis, and it looks like with the problem commit it's
not deferring the memory initialization to a later stage and it's
initializing the huge chunk of memory in serial - during the boot-up
time. ?Whereas with the good commit, it was able to defer the
initialization of the memory when it could be done in parallel.


Rahul Gopakumar
Performance?Engineering
VMware, Inc.


2020-10-10 08:45:53

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 10/09/20 at 01:15pm, Rahul Gopakumar wrote:
> As part of VMware's performance regression testing for Linux Kernel
> upstream releases, we identified boot time increase when comparing
> Linux 5.8 kernel against Linux 5.7 kernel. Increase in boot time is
> noticeable on VM with a **large amount of memory**.
>  
> In our test cases, it's noticeable with memory 1TB and more, whereas
> there was no major difference noticed in testcases with <1TB.
>  
> On bisecting between 5.7 and 5.8, we found the following commit from 
> “Baoquan He” to be the cause of boot time increase in big VM test cases.
>  
> -------------------------------------
>  
> commit 73a6e474cb376921a311786652782155eac2fdf0
> Author: Baoquan He <[email protected]>
> Date: Wed Jun 3 15:57:55 2020 -0700
>  
> mm: memmap_init: iterate over memblock regions rather that check each PFN
>  
> When called during boot the memmap_init_zone() function checks if each PFN
> is valid and actually belongs to the node being initialized using
> early_pfn_valid() and early_pfn_in_nid().
>  
> Each such check may cost up to O(log(n)) where n is the number of memory
> banks, so for large amount of memory overall time spent in early_pfn*()
> becomes substantial.
>  
> -------------------------------------
>  
> For boot time test, we used RHEL 8.1 as the guest OS.
> VM config is 84 vcpu and 1TB vRAM.
>  
> Here are the actual performance numbers.
>  
> 5.7 GA - 18.17 secs
> Baoquan's commit - 21.6 secs (-16% increase in time)
>  
> From dmesg logs, we can see significant time delay around memmap.
>  
> Refer below logs.
>  
> Good commit
>  
> [0.033176] Normal zone: 1445888 pages used for memmap
> [0.033176] Normal zone: 89391104 pages, LIFO batch:63
> [0.035851] ACPI: PM-Timer IO Port: 0x448
>  
> Problem commit
>  
> [0.026874] Normal zone: 1445888 pages used for memmap
> [0.026875] Normal zone: 89391104 pages, LIFO batch:63
> [2.028450] ACPI: PM-Timer IO Port: 0x448

Could you add memblock=debug to kernel cmdline and paste the boot logs of
system w and w/o the commit?

>  
> We did some analysis, and it looks like with the problem commit it's
> not deferring the memory initialization to a later stage and it's
> initializing the huge chunk of memory in serial - during the boot-up
> time.  Whereas with the good commit, it was able to defer the
> initialization of the memory when it could be done in parallel.
>
>
> Rahul Gopakumar
> Performance Engineering
> VMware, Inc.
>

2020-10-13 08:14:10

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

Attached collected dmesg logs for with and without
commit after adding memblock=debug to kernel cmdline.

________________________________________
From: [email protected] <[email protected]>
Sent: 10 October 2020 11:41 AM
To: Rahul Gopakumar
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 10/09/20 at 01:15pm, Rahul Gopakumar wrote:
> As part of VMware's performance regression testing for Linux Kernel
> upstream releases, we identified boot time increase when comparing
> Linux 5.8 kernel against Linux 5.7 kernel. Increase in boot time is
> noticeable on VM with a **large amount of memory**.
>
> In our test cases, it's noticeable with memory 1TB and more, whereas
> there was no major difference noticed in testcases with <1TB.
>
> On bisecting between 5.7 and 5.8, we found the following commit from
> ?Baoquan He? to be the cause of boot time increase in big VM test cases.
>
> -------------------------------------
>
> commit 73a6e474cb376921a311786652782155eac2fdf0
> Author: Baoquan He <[email protected]>
> Date: Wed Jun 3 15:57:55 2020 -0700
>
> mm: memmap_init: iterate over memblock regions rather that check each PFN
>
> When called during boot the memmap_init_zone() function checks if each PFN
> is valid and actually belongs to the node being initialized using
> early_pfn_valid() and early_pfn_in_nid().
>
> Each such check may cost up to O(log(n)) where n is the number of memory
> banks, so for large amount of memory overall time spent in early_pfn*()
> becomes substantial.
>
> -------------------------------------
>
> For boot time test, we used RHEL 8.1 as the guest OS.
> VM config is 84 vcpu and 1TB vRAM.
>
> Here are the actual performance numbers.
>
> 5.7 GA - 18.17 secs
> Baoquan's commit - 21.6 secs (-16% increase in time)
>
> From dmesg logs, we can see significant time delay around memmap.
>
> Refer below logs.
>
> Good commit
>
> [0.033176] Normal zone: 1445888 pages used for memmap
> [0.033176] Normal zone: 89391104 pages, LIFO batch:63
> [0.035851] ACPI: PM-Timer IO Port: 0x448
>
> Problem commit
>
> [0.026874] Normal zone: 1445888 pages used for memmap
> [0.026875] Normal zone: 89391104 pages, LIFO batch:63
> [2.028450] ACPI: PM-Timer IO Port: 0x448

Could you add memblock=debug to kernel cmdline and paste the boot logs of
system w and w/o the commit?

>
> We did some analysis, and it looks like with the problem commit it's
> not deferring the memory initialization to a later stage and it's
> initializing the huge chunk of memory in serial - during the boot-up
> time. Whereas with the good commit, it was able to defer the
> initialization of the memory when it could be done in parallel.
>
>
> Rahul Gopakumar
> Performance Engineering
> VMware, Inc.
>


Attachments:
with_commit_dmesg.log (189.92 kB)
with_commit_dmesg.log
without_commit_dmesg.log (189.91 kB)
without_commit_dmesg.log
Download all attachments

2020-10-13 11:30:12

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 10/12/20 at 05:21pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> Attached collected dmesg logs for with and without
> commit after adding memblock=debug to kernel cmdline.

Thanks, I have got the root cause, will make a patch for your testing
soon.

2020-10-13 16:33:51

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Rahul,

On 10/12/20 at 05:21pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> Attached collected dmesg logs for with and without
> commit after adding memblock=debug to kernel cmdline.

Can you test below draft patch and see if it works for you?

From a2ea6caef3c73ad9efb2dd2b48039065fe430bb2 Mon Sep 17 00:00:00 2001
From: Baoquan He <[email protected]>
Date: Tue, 13 Oct 2020 20:05:30 +0800
Subject: [PATCH] mm: make memmap defer init only take effect per zone

Deferred struct page init is designed to work per zone. However since
commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions
rather that check each PFN"), the handling is mistakenly done in all memory
ranges inside one zone. Especially in those unmovable zones of multiple nodes,
memblock reservation split them into many memory ranges. This makes
initialized struct page more than expected in early stage, then increases
much boot time.

Let's fix it to make the memmap defer init handled in zone wide, but not in
memory range of one zone.

Signed-off-by: Baoquan He <[email protected]>
---
arch/ia64/mm/init.c | 4 ++--
include/linux/mm.h | 5 +++--
mm/memory_hotplug.c | 2 +-
mm/page_alloc.c | 6 +++---
4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index ef12e097f318..27ca549ff47e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)

if (map_start < map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
- args->nid, args->zone, page_to_pfn(map_start),
+ args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
return 0;
}
@@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn)
{
if (!vmem_map) {
- memmap_init_zone(size, nid, zone, start_pfn,
+ memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
} else {
struct page *start;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe70aaf..5f9fc61d5be2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2439,8 +2439,9 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
#endif

extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
- enum meminit_context, struct vmem_altmap *, int migratetype);
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+ unsigned long, unsigned long, enum meminit_context,
+ struct vmem_altmap *, int migratetype);
extern void setup_per_zone_wmarks(void);
extern int __meminit init_per_zone_wmark_min(void);
extern void mem_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b44d4c7ba73b..f9a37e6abc1c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -732,7 +732,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
* expects the zone spans the pfn range. All the pages in the range
* are reserved so nobody should be touching them so we should be safe
*/
- memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
+ memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype);

set_zone_contiguous(zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ebf9ddafa3a..e8b19fdd18ec 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6044,7 +6044,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
* zone stats (e.g., nr_isolate_pageblock) are touched.
*/
void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
- unsigned long start_pfn,
+ unsigned long start_pfn, unsigned long zone_end_pfn,
enum meminit_context context,
struct vmem_altmap *altmap, int migratetype)
{
@@ -6080,7 +6080,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
if (context == MEMINIT_EARLY) {
if (overlap_memmap_init(zone, &pfn))
continue;
- if (defer_init(nid, pfn, end_pfn))
+ if (defer_init(nid, pfn, zone_end_pfn))
break;
}

@@ -6194,7 +6194,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,

if (end_pfn > start_pfn) {
size = end_pfn - start_pfn;
- memmap_init_zone(size, nid, zone, start_pfn,
+ memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
}
}
--
2.17.2

2020-10-21 06:14:24

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.

Thanks for your feedback. From the code, I am sure what the problem is,
but I didn't test it on system with huge memory. Forget mentioning my
draft patch is based on akpm/master branch since it's a mm issue, it
might be a little different with linus's mainline kernel, sorry for the
inconvenience.

I will test and debug this on a server with 4T memory in our lab, and
update if any progress.

>
> Patch on latest commit - 20.161 secs
> Vanilla latest commit - 19.50 secs

Here, do you mean it even cost more time with the patch applied?

>
> Here is the draft patch we tried
>
> ------------------------
>
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 8e7b8c6c576e..ff5fa4c3889e 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -537,7 +537,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
> ?
> ? ? ? ? ?if (map_start < map_end)
> ? ? ? ? ? ? ? ? ?memmap_init_zone((unsigned long)(map_end - map_start),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?args->nid, args->zone, page_to_pfn(map_start),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
> ? ? ? ? ?return 0;
> ?}
> @@ -547,7 +547,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
> ? ? ? ? ? ? ? unsigned long start_pfn)
> ?{
> ? ? ? ? ?if (!vmem_map) {
> - ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn,
> + ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
> ? ? ? ? ?} else {
> ? ? ? ? ? ? ? ? ?struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..65e34b370e33 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2416,7 +2416,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
> ?
> ?extern void set_dma_reserve(unsigned long new_dma_reserve);
> ?extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> - ? ? ? ? ? ? ? enum meminit_context, struct vmem_altmap *);
> + ? ? ? ? ? ? ? unsigned long, enum meminit_context, struct vmem_altmap *);
> ?extern void setup_per_zone_wmarks(void);
> ?extern int __meminit init_per_zone_wmark_min(void);
> ?extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ce3e73e3a5c1..03fddd8f4b11 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -728,7 +728,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> ? ? ? ? ? * expects the zone spans the pfn range. All the pages in the range
> ? ? ? ? ? * are reserved so nobody should be touching them so we should be safe
> ? ? ? ? ? */
> - ? ? ? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> + ? ? ? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
> ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_HOTPLUG, altmap);
> ?
> ? ? ? ? ?set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 780c8f023b28..fe80055ea59c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5989,8 +5989,8 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> ? * done. Non-atomic initialization, single-pass.
> ? */
> ?void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> - ? ? ? ? ? ? ? unsigned long start_pfn, enum meminit_context context,
> - ? ? ? ? ? ? ? struct vmem_altmap *altmap)
> + ? ? ? ? ? ? ? unsigned long start_pfn, unsigned long zone_end_pfn,
> + ? ? ? ? ? ? ? enum meminit_context context, struct vmem_altmap *altmap)
> ?{
> ? ? ? ? ?unsigned long pfn, end_pfn = start_pfn + size;
> ? ? ? ? ?struct page *page;
> @@ -6024,7 +6024,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> ? ? ? ? ? ? ? ? ?if (context == MEMINIT_EARLY) {
> ? ? ? ? ? ? ? ? ? ? ? ? ?if (overlap_memmap_init(zone, &pfn))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?continue;
> - ? ? ? ? ? ? ? ? ? ? ? if (defer_init(nid, pfn, end_pfn))
> + ? ? ? ? ? ? ? ? ? ? ? if (defer_init(nid, pfn, zone_end_pfn))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?break;
> ? ? ? ? ? ? ? ? ?}
> ?
> @@ -6150,7 +6150,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
> ?
> ? ? ? ? ? ? ? ? ?if (end_pfn > start_pfn) {
> ? ? ? ? ? ? ? ? ? ? ? ? ?size = end_pfn - start_pfn;
> - ? ? ? ? ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn,
> + ? ? ? ? ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
> ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ?}
>
>
> ------------------------
>
> We have attached default dmesg logs and also dmesg logs collected with memblock=debug kernel cmdline for both vanilla and patched kernels. Let me know if you need more info.
>
>
>
> From: [email protected] <[email protected]>
> Sent: 13 October 2020 6:47 PM
> To: Rahul Gopakumar <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
> ?
> Hi Rahul,
>
> On 10/12/20 at 05:21pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > Attached collected dmesg logs for with and without
> > commit after adding memblock=debug to kernel cmdline.
>
> Can you test below draft patch and see if it works for you?
>
> From a2ea6caef3c73ad9efb2dd2b48039065fe430bb2 Mon Sep 17 00:00:00 2001
> From: Baoquan He <[email protected]>
> Date: Tue, 13 Oct 2020 20:05:30 +0800
> Subject: [PATCH] mm: make memmap defer init only take effect per zone
>
> Deferred struct page init is designed to work per zone. However since
> commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions
> rather that check each PFN"), the handling is mistakenly done in all memory
> ranges inside one zone. Especially in those unmovable zones of multiple nodes,
> memblock reservation split them into many memory ranges. This makes
> initialized struct page more than expected in early stage, then increases
> much boot time.
>
> Let's fix it to make the memmap defer init handled in zone wide, but not in
> memory range of one zone.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> ?arch/ia64/mm/init.c | 4 ++--
> ?include/linux/mm.h? | 5 +++--
> ?mm/memory_hotplug.c | 2 +-
> ?mm/page_alloc.c???? | 6 +++---
> ?4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index ef12e097f318..27ca549ff47e 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
> ?
> ???????? if (map_start < map_end)
> ???????????????? memmap_init_zone((unsigned long)(map_end - map_start),
> -??????????????????????????????? args->nid, args->zone, page_to_pfn(map_start),
> +??????????????????????????????? args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
> ????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
> ???????? return 0;
> ?}
> @@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
> ????????????? unsigned long start_pfn)
> ?{
> ???????? if (!vmem_map) {
> -?????????????? memmap_init_zone(size, nid, zone, start_pfn,
> +?????????????? memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
> ????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
> ???????? } else {
> ???????????????? struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360fe70aaf..5f9fc61d5be2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2439,8 +2439,9 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
> ?#endif
> ?
> ?extern void set_dma_reserve(unsigned long new_dma_reserve);
> -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> -?????????????? enum meminit_context, struct vmem_altmap *, int migratetype);
> +extern void memmap_init_zone(unsigned long, int, unsigned long,
> +?????????????? unsigned long, unsigned long, enum meminit_context,
> +?????????????? struct vmem_altmap *, int migratetype);
> ?extern void setup_per_zone_wmarks(void);
> ?extern int __meminit init_per_zone_wmark_min(void);
> ?extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b44d4c7ba73b..f9a37e6abc1c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -732,7 +732,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> ????????? * expects the zone spans the pfn range. All the pages in the range
> ????????? * are reserved so nobody should be touching them so we should be safe
> ????????? */
> -?????? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> +?????? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
> ????????????????????????? MEMINIT_HOTPLUG, altmap, migratetype);
> ?
> ???????? set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2ebf9ddafa3a..e8b19fdd18ec 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6044,7 +6044,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> ? * zone stats (e.g., nr_isolate_pageblock) are touched.
> ? */
> ?void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> -?????????????? unsigned long start_pfn,
> +?????????????? unsigned long start_pfn, unsigned long zone_end_pfn,
> ???????????????? enum meminit_context context,
> ???????????????? struct vmem_altmap *altmap, int migratetype)
> ?{
> @@ -6080,7 +6080,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> ???????????????? if (context == MEMINIT_EARLY) {
> ???????????????????????? if (overlap_memmap_init(zone, &pfn))
> ???????????????????????????????? continue;
> -?????????????????????? if (defer_init(nid, pfn, end_pfn))
> +?????????????????????? if (defer_init(nid, pfn, zone_end_pfn))
> ???????????????????????????????? break;
> ???????????????? }
> ?
> @@ -6194,7 +6194,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
> ?
> ???????????????? if (end_pfn > start_pfn) {
> ???????????????????????? size = end_pfn - start_pfn;
> -?????????????????????? memmap_init_zone(size, nid, zone, start_pfn,
> +?????????????????????? memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
> ????????????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
> ???????????????? }
> ???????? }
> --
> 2.17.2





2020-10-21 06:24:00

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

>> Here, do you mean it even cost more time with the patch applied?

Yes, we ran it multiple times and it looks like there is a
very minor increase with the patch.


From: [email protected] <[email protected]>
Sent: 20 October 2020 8:48 PM
To: Rahul Gopakumar <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
?
On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.

Thanks for your feedback. From the code, I am sure what the problem is,
but I didn't test it on system with huge memory. Forget mentioning my
draft patch is based on akpm/master branch since it's a mm issue, it
might be a little different with linus's mainline kernel, sorry for the
inconvenience.

I will test and debug this on a server with 4T memory in our lab, and
update if any progress.

>
> Patch on latest commit - 20.161 secs
> Vanilla latest commit - 19.50 secs

Here, do you mean it even cost more time with the patch applied?

>
> Here is the draft patch we tried
>
> ------------------------
>
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 8e7b8c6c576e..ff5fa4c3889e 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -537,7 +537,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
> ?
> ? ? ? ? ?if (map_start < map_end)
> ? ? ? ? ? ? ? ? ?memmap_init_zone((unsigned long)(map_end - map_start),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?args->nid, args->zone, page_to_pfn(map_start),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
> ? ? ? ? ?return 0;
> ?}
> @@ -547,7 +547,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
> ? ? ? ? ? ? ? unsigned long start_pfn)
> ?{
> ? ? ? ? ?if (!vmem_map) {
> - ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn,
> + ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
> ? ? ? ? ?} else {
> ? ? ? ? ? ? ? ? ?struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..65e34b370e33 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2416,7 +2416,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
> ?
> ?extern void set_dma_reserve(unsigned long new_dma_reserve);
> ?extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> - ? ? ? ? ? ? ? enum meminit_context, struct vmem_altmap *);
> + ? ? ? ? ? ? ? unsigned long, enum meminit_context, struct vmem_altmap *);
> ?extern void setup_per_zone_wmarks(void);
> ?extern int __meminit init_per_zone_wmark_min(void);
> ?extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ce3e73e3a5c1..03fddd8f4b11 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -728,7 +728,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> ? ? ? ? ? * expects the zone spans the pfn range. All the pages in the range
> ? ? ? ? ? * are reserved so nobody should be touching them so we should be safe
> ? ? ? ? ? */
> - ? ? ? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> + ? ? ? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
> ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_HOTPLUG, altmap);
> ?
> ? ? ? ? ?set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 780c8f023b28..fe80055ea59c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5989,8 +5989,8 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> ? * done. Non-atomic initialization, single-pass.
> ? */
> ?void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> - ? ? ? ? ? ? ? unsigned long start_pfn, enum meminit_context context,
> - ? ? ? ? ? ? ? struct vmem_altmap *altmap)
> + ? ? ? ? ? ? ? unsigned long start_pfn, unsigned long zone_end_pfn,
> + ? ? ? ? ? ? ? enum meminit_context context, struct vmem_altmap *altmap)
> ?{
> ? ? ? ? ?unsigned long pfn, end_pfn = start_pfn + size;
> ? ? ? ? ?struct page *page;
> @@ -6024,7 +6024,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> ? ? ? ? ? ? ? ? ?if (context == MEMINIT_EARLY) {
> ? ? ? ? ? ? ? ? ? ? ? ? ?if (overlap_memmap_init(zone, &pfn))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?continue;
> - ? ? ? ? ? ? ? ? ? ? ? if (defer_init(nid, pfn, end_pfn))
> + ? ? ? ? ? ? ? ? ? ? ? if (defer_init(nid, pfn, zone_end_pfn))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?break;
> ? ? ? ? ? ? ? ? ?}
> ?
> @@ -6150,7 +6150,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
> ?
> ? ? ? ? ? ? ? ? ?if (end_pfn > start_pfn) {
> ? ? ? ? ? ? ? ? ? ? ? ? ?size = end_pfn - start_pfn;
> - ? ? ? ? ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn,
> + ? ? ? ? ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
> ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ?}
>
>
> ------------------------
>
> We have attached default dmesg logs and also dmesg logs collected with memblock=debug kernel cmdline for both vanilla and patched kernels. Let me know if you need more info.
>
>
>
> From: [email protected] <[email protected]>
> Sent: 13 October 2020 6:47 PM
> To: Rahul Gopakumar <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
> ?
> Hi Rahul,
>
> On 10/12/20 at 05:21pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > Attached collected dmesg logs for with and without
> > commit after adding memblock=debug to kernel cmdline.
>
> Can you test below draft patch and see if it works for you?
>
> From a2ea6caef3c73ad9efb2dd2b48039065fe430bb2 Mon Sep 17 00:00:00 2001
> From: Baoquan He <[email protected]>
> Date: Tue, 13 Oct 2020 20:05:30 +0800
> Subject: [PATCH] mm: make memmap defer init only take effect per zone
>
> Deferred struct page init is designed to work per zone. However since
> commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions
> rather that check each PFN"), the handling is mistakenly done in all memory
> ranges inside one zone. Especially in those unmovable zones of multiple nodes,
> memblock reservation split them into many memory ranges. This makes
> initialized struct page more than expected in early stage, then increases
> much boot time.
>
> Let's fix it to make the memmap defer init handled in zone wide, but not in
> memory range of one zone.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> ?arch/ia64/mm/init.c | 4 ++--
> ?include/linux/mm.h? | 5 +++--
> ?mm/memory_hotplug.c | 2 +-
> ?mm/page_alloc.c???? | 6 +++---
> ?4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index ef12e097f318..27ca549ff47e 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
> ?
> ???????? if (map_start < map_end)
> ???????????????? memmap_init_zone((unsigned long)(map_end - map_start),
> -??????????????????????????????? args->nid, args->zone, page_to_pfn(map_start),
> +??????????????????????????????? args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
> ????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
> ???????? return 0;
> ?}
> @@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
> ????????????? unsigned long start_pfn)
> ?{
> ???????? if (!vmem_map) {
> -?????????????? memmap_init_zone(size, nid, zone, start_pfn,
> +?????????????? memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
> ????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
> ???????? } else {
> ???????????????? struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360fe70aaf..5f9fc61d5be2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2439,8 +2439,9 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
> ?#endif
> ?
> ?extern void set_dma_reserve(unsigned long new_dma_reserve);
> -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> -?????????????? enum meminit_context, struct vmem_altmap *, int migratetype);
> +extern void memmap_init_zone(unsigned long, int, unsigned long,
> +?????????????? unsigned long, unsigned long, enum meminit_context,
> +?????????????? struct vmem_altmap *, int migratetype);
> ?extern void setup_per_zone_wmarks(void);
> ?extern int __meminit init_per_zone_wmark_min(void);
> ?extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b44d4c7ba73b..f9a37e6abc1c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -732,7 +732,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> ????????? * expects the zone spans the pfn range. All the pages in the range
> ????????? * are reserved so nobody should be touching them so we should be safe
> ????????? */
> -?????? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> +?????? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
> ????????????????????????? MEMINIT_HOTPLUG, altmap, migratetype);
> ?
> ???????? set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2ebf9ddafa3a..e8b19fdd18ec 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6044,7 +6044,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> ? * zone stats (e.g., nr_isolate_pageblock) are touched.
> ? */
> ?void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> -?????????????? unsigned long start_pfn,
> +?????????????? unsigned long start_pfn, unsigned long zone_end_pfn,
> ???????????????? enum meminit_context context,
> ???????????????? struct vmem_altmap *altmap, int migratetype)
> ?{
> @@ -6080,7 +6080,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> ???????????????? if (context == MEMINIT_EARLY) {
> ???????????????????????? if (overlap_memmap_init(zone, &pfn))
> ???????????????????????????????? continue;
> -?????????????????????? if (defer_init(nid, pfn, end_pfn))
> +?????????????????????? if (defer_init(nid, pfn, zone_end_pfn))
> ???????????????????????????????? break;
> ???????????????? }
> ?
> @@ -6194,7 +6194,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
> ?
> ???????????????? if (end_pfn > start_pfn) {
> ???????????????????????? size = end_pfn - start_pfn;
> -?????????????????????? memmap_init_zone(size, nid, zone, start_pfn,
> +?????????????????????? memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
> ????????????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
> ???????????????? }
> ???????? }
> --
> 2.17.2




2020-10-21 06:28:46

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.

Patch on latest commit - 20.161 secs
Vanilla latest commit - 19.50 secs

Here is the draft patch we tried

------------------------

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 8e7b8c6c576e..ff5fa4c3889e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -537,7 +537,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
?
? ? ? ? ?if (map_start < map_end)
? ? ? ? ? ? ? ? ?memmap_init_zone((unsigned long)(map_end - map_start),
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?args->nid, args->zone, page_to_pfn(map_start),
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
? ? ? ? ?return 0;
?}
@@ -547,7 +547,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
? ? ? ? ? ? ? unsigned long start_pfn)
?{
? ? ? ? ?if (!vmem_map) {
- ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn,
+ ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
? ? ? ? ?} else {
? ? ? ? ? ? ? ? ?struct page *start;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..65e34b370e33 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2416,7 +2416,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
?
?extern void set_dma_reserve(unsigned long new_dma_reserve);
?extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
- ? ? ? ? ? ? ? enum meminit_context, struct vmem_altmap *);
+ ? ? ? ? ? ? ? unsigned long, enum meminit_context, struct vmem_altmap *);
?extern void setup_per_zone_wmarks(void);
?extern int __meminit init_per_zone_wmark_min(void);
?extern void mem_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ce3e73e3a5c1..03fddd8f4b11 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -728,7 +728,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
? ? ? ? ? * expects the zone spans the pfn range. All the pages in the range
? ? ? ? ? * are reserved so nobody should be touching them so we should be safe
? ? ? ? ? */
- ? ? ? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
+ ? ? ? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_HOTPLUG, altmap);
?
? ? ? ? ?set_zone_contiguous(zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 780c8f023b28..fe80055ea59c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5989,8 +5989,8 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
? * done. Non-atomic initialization, single-pass.
? */
?void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
- ? ? ? ? ? ? ? unsigned long start_pfn, enum meminit_context context,
- ? ? ? ? ? ? ? struct vmem_altmap *altmap)
+ ? ? ? ? ? ? ? unsigned long start_pfn, unsigned long zone_end_pfn,
+ ? ? ? ? ? ? ? enum meminit_context context, struct vmem_altmap *altmap)
?{
? ? ? ? ?unsigned long pfn, end_pfn = start_pfn + size;
? ? ? ? ?struct page *page;
@@ -6024,7 +6024,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
? ? ? ? ? ? ? ? ?if (context == MEMINIT_EARLY) {
? ? ? ? ? ? ? ? ? ? ? ? ?if (overlap_memmap_init(zone, &pfn))
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?continue;
- ? ? ? ? ? ? ? ? ? ? ? if (defer_init(nid, pfn, end_pfn))
+ ? ? ? ? ? ? ? ? ? ? ? if (defer_init(nid, pfn, zone_end_pfn))
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?break;
? ? ? ? ? ? ? ? ?}
?
@@ -6150,7 +6150,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
?
? ? ? ? ? ? ? ? ?if (end_pfn > start_pfn) {
? ? ? ? ? ? ? ? ? ? ? ? ?size = end_pfn - start_pfn;
- ? ? ? ? ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn,
+ ? ? ? ? ? ? ? ? ? ? ? memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEMINIT_EARLY, NULL);
? ? ? ? ? ? ? ? ?}
? ? ? ? ?}


------------------------

We have attached default dmesg logs and also dmesg logs collected with memblock=debug kernel cmdline for both vanilla and patched kernels. Let me know if you need more info.



From: [email protected] <[email protected]>
Sent: 13 October 2020 6:47 PM
To: Rahul Gopakumar <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
?
Hi Rahul,

On 10/12/20 at 05:21pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> Attached collected dmesg logs for with and without
> commit after adding memblock=debug to kernel cmdline.

Can you test below draft patch and see if it works for you?

From a2ea6caef3c73ad9efb2dd2b48039065fe430bb2 Mon Sep 17 00:00:00 2001
From: Baoquan He <[email protected]>
Date: Tue, 13 Oct 2020 20:05:30 +0800
Subject: [PATCH] mm: make memmap defer init only take effect per zone

Deferred struct page init is designed to work per zone. However since
commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions
rather that check each PFN"), the handling is mistakenly done in all memory
ranges inside one zone. Especially in those unmovable zones of multiple nodes,
memblock reservation split them into many memory ranges. This makes
initialized struct page more than expected in early stage, then increases
much boot time.

Let's fix it to make the memmap defer init handled in zone wide, but not in
memory range of one zone.

Signed-off-by: Baoquan He <[email protected]>
---
?arch/ia64/mm/init.c | 4 ++--
?include/linux/mm.h? | 5 +++--
?mm/memory_hotplug.c | 2 +-
?mm/page_alloc.c???? | 6 +++---
?4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index ef12e097f318..27ca549ff47e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
?
???????? if (map_start < map_end)
???????????????? memmap_init_zone((unsigned long)(map_end - map_start),
-??????????????????????????????? args->nid, args->zone, page_to_pfn(map_start),
+??????????????????????????????? args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
???????? return 0;
?}
@@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
????????????? unsigned long start_pfn)
?{
???????? if (!vmem_map) {
-?????????????? memmap_init_zone(size, nid, zone, start_pfn,
+?????????????? memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
???????? } else {
???????????????? struct page *start;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe70aaf..5f9fc61d5be2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2439,8 +2439,9 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
?#endif
?
?extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
-?????????????? enum meminit_context, struct vmem_altmap *, int migratetype);
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+?????????????? unsigned long, unsigned long, enum meminit_context,
+?????????????? struct vmem_altmap *, int migratetype);
?extern void setup_per_zone_wmarks(void);
?extern int __meminit init_per_zone_wmark_min(void);
?extern void mem_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b44d4c7ba73b..f9a37e6abc1c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -732,7 +732,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
????????? * expects the zone spans the pfn range. All the pages in the range
????????? * are reserved so nobody should be touching them so we should be safe
????????? */
-?????? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
+?????? memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
????????????????????????? MEMINIT_HOTPLUG, altmap, migratetype);
?
???????? set_zone_contiguous(zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ebf9ddafa3a..e8b19fdd18ec 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6044,7 +6044,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
? * zone stats (e.g., nr_isolate_pageblock) are touched.
? */
?void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-?????????????? unsigned long start_pfn,
+?????????????? unsigned long start_pfn, unsigned long zone_end_pfn,
???????????????? enum meminit_context context,
???????????????? struct vmem_altmap *altmap, int migratetype)
?{
@@ -6080,7 +6080,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
???????????????? if (context == MEMINIT_EARLY) {
???????????????????????? if (overlap_memmap_init(zone, &pfn))
???????????????????????????????? continue;
-?????????????????????? if (defer_init(nid, pfn, end_pfn))
+?????????????????????? if (defer_init(nid, pfn, zone_end_pfn))
???????????????????????????????? break;
???????????????? }
?
@@ -6194,7 +6194,7 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
?
???????????????? if (end_pfn > start_pfn) {
???????????????????????? size = end_pfn - start_pfn;
-?????????????????????? memmap_init_zone(size, nid, zone, start_pfn,
+?????????????????????? memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
????????????????????????????????????????? MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
???????????????? }
???????? }
--
2.17.2


Attachments:
patch_dmesg_memblock_debug.log (250.19 kB)
patch_dmesg_memblock_debug.log
patch_dmesg.log (195.57 kB)
patch_dmesg.log
vanilla_dmesg_memblock_debug.log (190.59 kB)
vanilla_dmesg_memblock_debug.log
vanilla_dmesg.log (136.06 kB)
vanilla_dmesg.log
Download all attachments

2020-10-22 08:23:36

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Rahul,

On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> >> Here, do you mean it even cost more time with the patch applied?
>
> Yes, we ran it multiple times and it looks like there is a
> very minor increase with the patch.
>
......?
> On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.
>
> Thanks for your feedback. From the code, I am sure what the problem is,
> but I didn't test it on system with huge memory. Forget mentioning my
> draft patch is based on akpm/master branch since it's a mm issue, it
> might be a little different with linus's mainline kernel, sorry for the
> inconvenience.
>
> I will test and debug this on a server with 4T memory in our lab, and
> update if any progress.
>
> >
> > Patch on latest commit - 20.161 secs
> > Vanilla latest commit - 19.50 secs
>

Can you tell how you measure the boot time? I checked the boot logs you
attached, E.g in below two logs, I saw patch_dmesg.log even has less
time during memmap init. Now I have got a machine with 1T memory for
testing, but didn't see obvious time cost increase. At above, you said
"Patch on latest commit - 20.161 secs", could you tell where this 20.161
secs comes from, so that I can investigate and reproduce on my system?

patch_dmesg.log:
[ 0.023126] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[ 0.023128] On node 1 totalpages: 89128960
[ 0.023129] Normal zone: 1392640 pages used for memmap
[ 0.023130] Normal zone: 89128960 pages, LIFO batch:63
[ 0.023893] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[ 0.023895] On node 2 totalpages: 89391104
[ 0.023896] Normal zone: 1445888 pages used for memmap
[ 0.023897] Normal zone: 89391104 pages, LIFO batch:63
[ 0.026744] ACPI: PM-Timer IO Port: 0x448
[ 0.026747] ACPI: Local APIC address 0xfee00000

vanilla_dmesg.log:
[ 0.024295] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[ 0.024298] On node 1 totalpages: 89128960
[ 0.024299] Normal zone: 1392640 pages used for memmap
[ 0.024299] Normal zone: 89128960 pages, LIFO batch:63
[ 0.025289] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[ 0.025291] On node 2 totalpages: 89391104
[ 0.025292] Normal zone: 1445888 pages used for memmap
[ 0.025293] Normal zone: 89391104 pages, LIFO batch:63
[ 2.096982] ACPI: PM-Timer IO Port: 0x448
[ 2.096987] ACPI: Local APIC address 0xfee00000

2020-10-22 18:20:51

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi?Baoquan,

>>?Can you tell how you measure the boot time??

Our test is actually boothalt, time reported by this test
includes both boot-up and shutdown time.

>> At above, you said "Patch on latest commit - 20.161 secs",
>> could you tell where this 20.161 secs comes from,

So this time is boot-up time + shutdown time.

From the dmesg.log it looks like during the memmap_init
it's taking less time in the patch. Let me take a closer look to
confirm this and also to find where the 1-sec delay in the patch
run is coming from.


From: [email protected] <[email protected]>
Sent: 22 October 2020 9:34 AM
To: Rahul Gopakumar <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
?
Hi Rahul,

On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> >> Here, do you mean it even cost more time with the patch applied?
>
> Yes, we ran it multiple times and it looks like there is a
> very minor increase with the patch.
>
......?
> On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.
>
> Thanks for your feedback. From the code, I am sure what the problem is,
> but I didn't test it on system with huge memory. Forget mentioning my
> draft patch is based on akpm/master branch since it's a mm issue, it
> might be a little different with linus's mainline kernel, sorry for the
> inconvenience.
>
> I will test and debug this on a server with 4T memory in our lab, and
> update if any progress.
>
> >
> > Patch on latest commit - 20.161 secs
> > Vanilla latest commit - 19.50 secs
>

Can you tell how you measure the boot time? I checked the boot logs you
attached, E.g in below two logs, I saw patch_dmesg.log even has less
time during memmap init. Now I have got a machine with 1T memory for
testing, but didn't see obvious time cost increase. At above, you said
"Patch on latest commit - 20.161 secs", could you tell where this 20.161
secs comes from, so that I can investigate and reproduce on my system?

patch_dmesg.log:
[??? 0.023126] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[??? 0.023128] On node 1 totalpages: 89128960
[??? 0.023129]?? Normal zone: 1392640 pages used for memmap
[??? 0.023130]?? Normal zone: 89128960 pages, LIFO batch:63
[??? 0.023893] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[??? 0.023895] On node 2 totalpages: 89391104
[??? 0.023896]?? Normal zone: 1445888 pages used for memmap
[??? 0.023897]?? Normal zone: 89391104 pages, LIFO batch:63
[??? 0.026744] ACPI: PM-Timer IO Port: 0x448
[??? 0.026747] ACPI: Local APIC address 0xfee00000

vanilla_dmesg.log:
[??? 0.024295] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[??? 0.024298] On node 1 totalpages: 89128960
[??? 0.024299]?? Normal zone: 1392640 pages used for memmap
[??? 0.024299]?? Normal zone: 89128960 pages, LIFO batch:63
[??? 0.025289] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[??? 0.025291] On node 2 totalpages: 89391104
[??? 0.025292]?? Normal zone: 1445888 pages used for memmap
[??? 0.025293]?? Normal zone: 89391104 pages, LIFO batch:63
[??? 2.096982] ACPI: PM-Timer IO Port: 0x448
[??? 2.096987] ACPI: Local APIC address 0xfee00000

2020-11-02 14:17:31

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

There could still be some memory initialization problem with
the draft patch. I see a lot of page corruption errors.

BUG: Bad page state in process swapper pfn:ab0803c

Here is the call trace

[ 0.262826] dump_stack+0x57/0x6a
[ 0.262827] bad_page.cold.119+0x63/0x93
[ 0.262828] __free_pages_ok+0x31f/0x330
[ 0.262829] memblock_free_all+0x153/0x1bf
[ 0.262830] mem_init+0x23/0x1f2
[ 0.262831] start_kernel+0x299/0x57a
[ 0.262832] secondary_startup_64_no_verify+0xb8/0xbb

I don't see this in dmesg log with vanilla kernel.

It looks like the overhead due to this initialization problem
is around 3 secs.

[ 0.262831] start_kernel+0x299/0x57a
[ 0.262832] secondary_startup_64_no_verify+0xb8/0xbb
[ 3.758185] Memory: 3374072K/1073740756K available (12297K kernel code, 5778Krwdata, 4376K rodata, 2352K init, 6480K bss, 16999716K reserved, 0K cma-reserved)

But the draft patch is fixing the initial problem
reported around 2 secs (log snippet below) hence the total
delay of 1 sec.

[ 0.024752] Normal zone: 1445888 pages used for memmap
[ 0.024753] Normal zone: 89391104 pages, LIFO batch:63
[ 0.027379] ACPI: PM-Timer IO Port: 0x448


________________________________________
From: Rahul Gopakumar <[email protected]>
Sent: 22 October 2020 10:51 PM
To: [email protected]
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

>>?Can you tell how you measure the boot time?

Our test is actually boothalt, time reported by this test
includes both boot-up and shutdown time.

>> At above, you said "Patch on latest commit - 20.161 secs",
>> could you tell where this 20.161 secs comes from,

So this time is boot-up time + shutdown time.

From the dmesg.log it looks like during the memmap_init
it's taking less time in the patch. Let me take a closer look to
confirm this and also to find where the 1-sec delay in the patch
run is coming from.


From: [email protected] <[email protected]>
Sent: 22 October 2020 9:34 AM
To: Rahul Gopakumar <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Rahul,

On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> >> Here, do you mean it even cost more time with the patch applied?
>
> Yes, we ran it multiple times and it looks like there is a
> very minor increase with the patch.
>
......
> On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.
>
> Thanks for your feedback. From the code, I am sure what the problem is,
> but I didn't test it on system with huge memory. Forget mentioning my
> draft patch is based on akpm/master branch since it's a mm issue, it
> might be a little different with linus's mainline kernel, sorry for the
> inconvenience.
>
> I will test and debug this on a server with 4T memory in our lab, and
> update if any progress.
>
> >
> > Patch on latest commit - 20.161 secs
> > Vanilla latest commit - 19.50 secs
>

Can you tell how you measure the boot time? I checked the boot logs you
attached, E.g in below two logs, I saw patch_dmesg.log even has less
time during memmap init. Now I have got a machine with 1T memory for
testing, but didn't see obvious time cost increase. At above, you said
"Patch on latest commit - 20.161 secs", could you tell where this 20.161
secs comes from, so that I can investigate and reproduce on my system?

patch_dmesg.log:
[??? 0.023126] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[??? 0.023128] On node 1 totalpages: 89128960
[??? 0.023129]?? Normal zone: 1392640 pages used for memmap
[??? 0.023130]?? Normal zone: 89128960 pages, LIFO batch:63
[??? 0.023893] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[??? 0.023895] On node 2 totalpages: 89391104
[??? 0.023896]?? Normal zone: 1445888 pages used for memmap
[??? 0.023897]?? Normal zone: 89391104 pages, LIFO batch:63
[??? 0.026744] ACPI: PM-Timer IO Port: 0x448
[??? 0.026747] ACPI: Local APIC address 0xfee00000

vanilla_dmesg.log:
[??? 0.024295] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[??? 0.024298] On node 1 totalpages: 89128960
[??? 0.024299]?? Normal zone: 1392640 pages used for memmap
[??? 0.024299]?? Normal zone: 89128960 pages, LIFO batch:63
[??? 0.025289] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[??? 0.025291] On node 2 totalpages: 89391104
[??? 0.025292]?? Normal zone: 1445888 pages used for memmap
[??? 0.025293]?? Normal zone: 89391104 pages, LIFO batch:63
[??? 2.096982] ACPI: PM-Timer IO Port: 0x448
[??? 2.096987] ACPI: Local APIC address 0xfee00000

2020-11-02 14:32:47

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/02/20 at 02:15pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> There could still be some memory initialization problem with
> the draft patch. I see a lot of page corruption errors.
>
> BUG: Bad page state in process swapper pfn:ab0803c
>
> Here is the call trace
>
> [ 0.262826] dump_stack+0x57/0x6a
> [ 0.262827] bad_page.cold.119+0x63/0x93
> [ 0.262828] __free_pages_ok+0x31f/0x330
> [ 0.262829] memblock_free_all+0x153/0x1bf
> [ 0.262830] mem_init+0x23/0x1f2
> [ 0.262831] start_kernel+0x299/0x57a
> [ 0.262832] secondary_startup_64_no_verify+0xb8/0xbb
>
> I don't see this in dmesg log with vanilla kernel.
>
> It looks like the overhead due to this initialization problem
> is around 3 secs.
>
> [ 0.262831] start_kernel+0x299/0x57a
> [ 0.262832] secondary_startup_64_no_verify+0xb8/0xbb
> [ 3.758185] Memory: 3374072K/1073740756K available (12297K kernel code, 5778Krwdata, 4376K rodata, 2352K init, 6480K bss, 16999716K reserved, 0K cma-reserved)
>
> But the draft patch is fixing the initial problem
> reported around 2 secs (log snippet below) hence the total
> delay of 1 sec.
>
> [ 0.024752] Normal zone: 1445888 pages used for memmap
> [ 0.024753] Normal zone: 89391104 pages, LIFO batch:63
> [ 0.027379] ACPI: PM-Timer IO Port: 0x448

So, you mean with the draft patch applied, the initial performance
regression goes away, just many page corruption errors with call trace
are seen, right? And the performance regression is about 2sec delay in
your system?

Could you tell how you setup vmware VM so that I can ask our QA for
help to create a vmware VM for me to test? I tested the draft patch on
bare metal system with more than 1T memory, didn't see the page
corruption call trace, need reproduce and have a look.

>
>
> ________________________________________
> From: Rahul Gopakumar <[email protected]>
> Sent: 22 October 2020 10:51 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
>
> Hi Baoquan,
>
> >>?Can you tell how you measure the boot time?
>
> Our test is actually boothalt, time reported by this test
> includes both boot-up and shutdown time.
>
> >> At above, you said "Patch on latest commit - 20.161 secs",
> >> could you tell where this 20.161 secs comes from,
>
> So this time is boot-up time + shutdown time.
>
> From the dmesg.log it looks like during the memmap_init
> it's taking less time in the patch. Let me take a closer look to
> confirm this and also to find where the 1-sec delay in the patch
> run is coming from.
>
>
> From: [email protected] <[email protected]>
> Sent: 22 October 2020 9:34 AM
> To: Rahul Gopakumar <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
>
> Hi Rahul,
>
> On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> > >> Here, do you mean it even cost more time with the patch applied?
> >
> > Yes, we ran it multiple times and it looks like there is a
> > very minor increase with the patch.
> >
> ......
> > On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > > Hi Baoquan,
> > >
> > > We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.
> >
> > Thanks for your feedback. From the code, I am sure what the problem is,
> > but I didn't test it on system with huge memory. Forget mentioning my
> > draft patch is based on akpm/master branch since it's a mm issue, it
> > might be a little different with linus's mainline kernel, sorry for the
> > inconvenience.
> >
> > I will test and debug this on a server with 4T memory in our lab, and
> > update if any progress.
> >
> > >
> > > Patch on latest commit - 20.161 secs
> > > Vanilla latest commit - 19.50 secs
> >
>
> Can you tell how you measure the boot time? I checked the boot logs you
> attached, E.g in below two logs, I saw patch_dmesg.log even has less
> time during memmap init. Now I have got a machine with 1T memory for
> testing, but didn't see obvious time cost increase. At above, you said
> "Patch on latest commit - 20.161 secs", could you tell where this 20.161
> secs comes from, so that I can investigate and reproduce on my system?
>
> patch_dmesg.log:
> [??? 0.023126] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
> [??? 0.023128] On node 1 totalpages: 89128960
> [??? 0.023129]?? Normal zone: 1392640 pages used for memmap
> [??? 0.023130]?? Normal zone: 89128960 pages, LIFO batch:63
> [??? 0.023893] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
> [??? 0.023895] On node 2 totalpages: 89391104
> [??? 0.023896]?? Normal zone: 1445888 pages used for memmap
> [??? 0.023897]?? Normal zone: 89391104 pages, LIFO batch:63
> [??? 0.026744] ACPI: PM-Timer IO Port: 0x448
> [??? 0.026747] ACPI: Local APIC address 0xfee00000
>
> vanilla_dmesg.log:
> [??? 0.024295] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
> [??? 0.024298] On node 1 totalpages: 89128960
> [??? 0.024299]?? Normal zone: 1392640 pages used for memmap
> [??? 0.024299]?? Normal zone: 89128960 pages, LIFO batch:63
> [??? 0.025289] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
> [??? 0.025291] On node 2 totalpages: 89391104
> [??? 0.025292]?? Normal zone: 1445888 pages used for memmap
> [??? 0.025293]?? Normal zone: 89391104 pages, LIFO batch:63
> [??? 2.096982] ACPI: PM-Timer IO Port: 0x448
> [??? 2.096987] ACPI: Local APIC address 0xfee00000
>

2020-11-03 12:36:20

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

>> So, you mean with the draft patch applied, the initial performance
regression goes away, just many page corruption errors with call trace
are seen, right?

Yes, that's right.

>> And the performance regression is about 2sec delay in
your system?

The delay due to this new page corruption issue is about
3 secs.

Here is the summary

* Initial problem - 2 secs
* Draft patch - Fixes initial problem (recovers 2 secs) but
brings in new page corruption issue (3 secs)

>> Could you tell how you setup vmware VM so that I can ask our QA for
help to create a vmware VM for me to test?

* Use vSphere ESXi 6.7 or 7.0 GA.
* Create VM using vSphere Web Client and specify 1TB VM Memory.
* Install RHEL 8.1, that's the guest used in this test.

With draft patch, you should be able to reproduce the issue.
Let me know if you need more details.

________________________________________
From: [email protected] <[email protected]>
Sent: 02 November 2020 8:00 PM
To: Rahul Gopakumar
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/02/20 at 02:15pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> There could still be some memory initialization problem with
> the draft patch. I see a lot of page corruption errors.
>
> BUG: Bad page state in process swapper pfn:ab0803c
>
> Here is the call trace
>
> [ 0.262826] dump_stack+0x57/0x6a
> [ 0.262827] bad_page.cold.119+0x63/0x93
> [ 0.262828] __free_pages_ok+0x31f/0x330
> [ 0.262829] memblock_free_all+0x153/0x1bf
> [ 0.262830] mem_init+0x23/0x1f2
> [ 0.262831] start_kernel+0x299/0x57a
> [ 0.262832] secondary_startup_64_no_verify+0xb8/0xbb
>
> I don't see this in dmesg log with vanilla kernel.
>
> It looks like the overhead due to this initialization problem
> is around 3 secs.
>
> [ 0.262831] start_kernel+0x299/0x57a
> [ 0.262832] secondary_startup_64_no_verify+0xb8/0xbb
> [ 3.758185] Memory: 3374072K/1073740756K available (12297K kernel code, 5778Krwdata, 4376K rodata, 2352K init, 6480K bss, 16999716K reserved, 0K cma-reserved)
>
> But the draft patch is fixing the initial problem
> reported around 2 secs (log snippet below) hence the total
> delay of 1 sec.
>
> [ 0.024752] Normal zone: 1445888 pages used for memmap
> [ 0.024753] Normal zone: 89391104 pages, LIFO batch:63
> [ 0.027379] ACPI: PM-Timer IO Port: 0x448

So, you mean with the draft patch applied, the initial performance
regression goes away, just many page corruption errors with call trace
are seen, right? And the performance regression is about 2sec delay in
your system?

Could you tell how you setup vmware VM so that I can ask our QA for
help to create a vmware VM for me to test? I tested the draft patch on
bare metal system with more than 1T memory, didn't see the page
corruption call trace, need reproduce and have a look.

>
>
> ________________________________________
> From: Rahul Gopakumar <[email protected]>
> Sent: 22 October 2020 10:51 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
>
> Hi Baoquan,
>
> >>?Can you tell how you measure the boot time?
>
> Our test is actually boothalt, time reported by this test
> includes both boot-up and shutdown time.
>
> >> At above, you said "Patch on latest commit - 20.161 secs",
> >> could you tell where this 20.161 secs comes from,
>
> So this time is boot-up time + shutdown time.
>
> From the dmesg.log it looks like during the memmap_init
> it's taking less time in the patch. Let me take a closer look to
> confirm this and also to find where the 1-sec delay in the patch
> run is coming from.
>
>
> From: [email protected] <[email protected]>
> Sent: 22 October 2020 9:34 AM
> To: Rahul Gopakumar <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
>
> Hi Rahul,
>
> On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> > >> Here, do you mean it even cost more time with the patch applied?
> >
> > Yes, we ran it multiple times and it looks like there is a
> > very minor increase with the patch.
> >
> ......
> > On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > > Hi Baoquan,
> > >
> > > We had some trouble applying the patch to problem commit and the latest upstream commit. Steven (CC'ed) helped us by providing the updated draft patch. We applied it on the latest commit (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like improving the performance numbers.
> >
> > Thanks for your feedback. From the code, I am sure what the problem is,
> > but I didn't test it on system with huge memory. Forget mentioning my
> > draft patch is based on akpm/master branch since it's a mm issue, it
> > might be a little different with linus's mainline kernel, sorry for the
> > inconvenience.
> >
> > I will test and debug this on a server with 4T memory in our lab, and
> > update if any progress.
> >
> > >
> > > Patch on latest commit - 20.161 secs
> > > Vanilla latest commit - 19.50 secs
> >
>
> Can you tell how you measure the boot time? I checked the boot logs you
> attached, E.g in below two logs, I saw patch_dmesg.log even has less
> time during memmap init. Now I have got a machine with 1T memory for
> testing, but didn't see obvious time cost increase. At above, you said
> "Patch on latest commit - 20.161 secs", could you tell where this 20.161
> secs comes from, so that I can investigate and reproduce on my system?
>
> patch_dmesg.log:
> [??? 0.023126] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
> [??? 0.023128] On node 1 totalpages: 89128960
> [??? 0.023129]?? Normal zone: 1392640 pages used for memmap
> [??? 0.023130]?? Normal zone: 89128960 pages, LIFO batch:63
> [??? 0.023893] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
> [??? 0.023895] On node 2 totalpages: 89391104
> [??? 0.023896]?? Normal zone: 1445888 pages used for memmap
> [??? 0.023897]?? Normal zone: 89391104 pages, LIFO batch:63
> [??? 0.026744] ACPI: PM-Timer IO Port: 0x448
> [??? 0.026747] ACPI: Local APIC address 0xfee00000
>
> vanilla_dmesg.log:
> [??? 0.024295] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
> [??? 0.024298] On node 1 totalpages: 89128960
> [??? 0.024299]?? Normal zone: 1392640 pages used for memmap
> [??? 0.024299]?? Normal zone: 89128960 pages, LIFO batch:63
> [??? 0.025289] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
> [??? 0.025291] On node 2 totalpages: 89391104
> [??? 0.025292]?? Normal zone: 1445888 pages used for memmap
> [??? 0.025293]?? Normal zone: 89391104 pages, LIFO batch:63
> [??? 2.096982] ACPI: PM-Timer IO Port: 0x448
> [??? 2.096987] ACPI: Local APIC address 0xfee00000
>

2020-11-03 14:06:39

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/03/20 at 12:34pm, Rahul Gopakumar wrote:
> >> So, you mean with the draft patch applied, the initial performance
> regression goes away, just many page corruption errors with call trace
> are seen, right?
>
> Yes, that's right.
>
> >> And the performance regression is about 2sec delay in
> your system?
>
> The delay due to this new page corruption issue is about
> 3 secs.
>
> Here is the summary
>
> * Initial problem - 2 secs
> * Draft patch - Fixes initial problem (recovers 2 secs) but
> brings in new page corruption issue (3 secs)
>
> >> Could you tell how you setup vmware VM so that I can ask our QA for
> help to create a vmware VM for me to test?
>
> * Use vSphere ESXi 6.7 or 7.0 GA.
> * Create VM using vSphere Web Client and specify 1TB VM Memory.
> * Install RHEL 8.1, that's the guest used in this test.

OK, I see. The draft patch fix the original issue, seems some boundary
of memory region is not handled correctly. Thanks for confirmation.

The memory layout is important in this case. Not sure if making a VM gesut
as you suggested can also create a system with below memory layout.

[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]


>
> With draft patch, you should be able to reproduce the issue.
> Let me know if you need more details.

2020-11-12 14:53:52

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/03/20 at 12:34pm, Rahul Gopakumar wrote:
> >> So, you mean with the draft patch applied, the initial performance
> regression goes away, just many page corruption errors with call trace
> are seen, right?
>
> Yes, that's right.
>
> >> And the performance regression is about 2sec delay in
> your system?
>
> The delay due to this new page corruption issue is about
> 3 secs.
>
> Here is the summary
>
> * Initial problem - 2 secs
> * Draft patch - Fixes initial problem (recovers 2 secs) but
> brings in new page corruption issue (3 secs)
>
> >> Could you tell how you setup vmware VM so that I can ask our QA for
> help to create a vmware VM for me to test?
>
> * Use vSphere ESXi 6.7 or 7.0 GA.
> * Create VM using vSphere Web Client and specify 1TB VM Memory.
> * Install RHEL 8.1, that's the guest used in this test.

Can you try the attached draft patch?

>
> With draft patch, you should be able to reproduce the issue.
> Let me know if you need more details.


Attachments:
(No filename) (0.99 kB)
0001-mm-make-memmap-defer-init-only-take-effect-per-zone.patch (4.66 kB)
Download all attachments

2020-11-20 03:14:26

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

To which commit should we apply the draft patch. We tried applying
the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
(the one we used for applying the previous patch) but it fails.


From: [email protected] <[email protected]>
Sent: 12 November 2020 8:21 PM
To: Rahul Gopakumar <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
?
On 11/03/20 at 12:34pm, Rahul Gopakumar wrote:
> >> So, you mean with the draft patch applied, the initial performance
> regression goes away, just many page corruption errors with call trace
> are seen, right?
>
> Yes, that's right.
>
> >> And the performance regression is about 2sec delay in
> your system?
>
> The delay due to this new page corruption issue is about
> 3 secs.
>
> Here is the summary
>
> * Initial problem - 2 secs
> * Draft patch - Fixes initial problem (recovers 2 secs) but
> brings in new page corruption issue (3 secs)
>
> >> Could you tell how you setup vmware VM so that I can ask our QA for
> help to create a vmware VM for me to test?
>
> * Use vSphere ESXi 6.7 or 7.0 GA.
> * Create VM using vSphere Web Client and specify 1TB VM Memory.
> * Install RHEL 8.1, that's the guest used in this test.

Can you try the attached draft patch?

>
> With draft patch, you should be able to reproduce the issue.
> Let me know if you need more details.

2020-11-22 01:13:42

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/20/20 at 03:11am, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> To which commit should we apply the draft patch. We tried applying
> the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
> (the one we used for applying the previous patch) but it fails.

I tested on 5.10-rc3+. You can append below change to the old patch in
your testing kernel.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fa6076e1a840..5e5b74e88d69 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
return false;

+ if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
+ return true;
/*
* We start only with one section of pages, more pages are added as
* needed until the rest of deferred pages are initialized.

2020-11-24 15:13:30

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

We applied the new patch to 5.10 rc3 and tested it. We are still
observing the same page corruption issue which we saw with the
old patch. This is causing 3 secs delay in boot time.

Attached dmesg log from the new patch and also from vanilla
5.10 rc3 kernel.

There are multiple lines like below in the dmesg log of the
new patch.

"BUG: Bad page state in process swapper pfn:ab08001"

________________________________________
From: [email protected] <[email protected]>
Sent: 22 November 2020 6:38 AM
To: Rahul Gopakumar
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/20/20 at 03:11am, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> To which commit should we apply the draft patch. We tried applying
> the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
> (the one we used for applying the previous patch) but it fails.

I tested on 5.10-rc3+. You can append below change to the old patch in
your testing kernel.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fa6076e1a840..5e5b74e88d69 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
return false;

+ if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
+ return true;
/*
* We start only with one section of pages, more pages are added as
* needed until the rest of deferred pages are initialized.


Attachments:
new-patch-dmesg.log (196.66 kB)
new-patch-dmesg.log
vanilla-dmesg.log (137.40 kB)
vanilla-dmesg.log
Download all attachments

2020-11-30 17:00:40

by Mike Rapoport

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Rahul,

On Tue, Nov 24, 2020 at 03:03:40PM +0000, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> We applied the new patch to 5.10 rc3 and tested it. We are still
> observing the same page corruption issue which we saw with the
> old patch. This is causing 3 secs delay in boot time.
>
> Attached dmesg log from the new patch and also from vanilla
> 5.10 rc3 kernel.
>
> There are multiple lines like below in the dmesg log of the
> new patch.
>
> "BUG: Bad page state in process swapper pfn:ab08001"

Can you please run your test with the below patch and send output of

dmesg | grep defer


diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eaa227a479e4..ce7ec660c777 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -455,6 +455,7 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
nr_initialised++;
if ((nr_initialised > PAGES_PER_SECTION) &&
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
+ pr_info("=> %s: nid: %d pfn: %lx\n", __func__, nid, pfn);
NODE_DATA(nid)->first_deferred_pfn = pfn;
return true;
}

> ________________________________________
> From: [email protected] <[email protected]>
> Sent: 22 November 2020 6:38 AM
> To: Rahul Gopakumar
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
>
> On 11/20/20 at 03:11am, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > To which commit should we apply the draft patch. We tried applying
> > the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
> > (the one we used for applying the previous patch) but it fails.
>
> I tested on 5.10-rc3+. You can append below change to the old patch in
> your testing kernel.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fa6076e1a840..5e5b74e88d69 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
> return false;
>
> + if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
> + return true;
> /*
> * We start only with one section of pages, more pages are added as
> * needed until the rest of deferred pages are initialized.
>




--
Sincerely yours,
Mike.

2020-12-12 19:35:42

by Rahul Gopakumar

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

We re-evaluated your last patch and it seems to be fixing the
initial performance bug reported. During our previous testing,
we did not apply the patch rightly hence it was reporting
some issues.

Here is the dmesg log confirming no delay in the draft patch.

Vanilla (5.10 rc3)
------------------

[ 0.024011] On node 2 totalpages: 89391104
[ 0.024012] Normal zone: 1445888 pages used for memmap
[ 0.024012] Normal zone: 89391104 pages, LIFO batch:63
[ 2.054646] ACPI: PM-Timer IO Port: 0x448 --------------> 2 secs delay

Patch
------

[ 0.024166] On node 2 totalpages: 89391104
[ 0.024167] Normal zone: 1445888 pages used for memmap
[ 0.024167] Normal zone: 89391104 pages, LIFO batch:63
[ 0.026694] ACPI: PM-Timer IO Port: 0x448 --------------> No delay

Attached dmesg logs. Let me know if anything is needed from our end.



From: Rahul Gopakumar <[email protected]>
Sent: 24 November 2020 8:33 PM
To: [email protected] <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
?
Hi Baoquan,

We applied the new patch to 5.10 rc3 and tested it. We are still
observing the same page corruption issue which we saw with the
old patch. This is causing 3 secs delay in boot time.

Attached dmesg log from the new patch and also from vanilla
5.10 rc3 kernel.

There are multiple lines like below in the dmesg log of the
new patch.

"BUG: Bad page state in process swapper? pfn:ab08001"

________________________________________
From: [email protected] <[email protected]>
Sent: 22 November 2020 6:38 AM
To: Rahul Gopakumar
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/20/20 at 03:11am, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> To which commit should we apply the draft patch. We tried applying
> the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
> (the one we used for applying the previous patch) but it fails.

I tested on 5.10-rc3+. You can append below change to the old patch in
your testing kernel.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fa6076e1a840..5e5b74e88d69 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
??????? if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
??????????????? return false;

+?????? if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
+?????????????? return true;
??????? /*
???????? * We start only with one section of pages, more pages are added as
???????? * needed until the rest of deferred pages are initialized.


Attachments:
patch-dmesg.log (136.83 kB)
patch-dmesg.log
vanilla-dmesg.log (136.84 kB)
vanilla-dmesg.log
Download all attachments

2020-12-13 18:52:36

by Baoquan He

[permalink] [raw]
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 12/11/20 at 04:16pm, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> We re-evaluated your last patch and it seems to be fixing the
> initial performance bug reported. During our previous testing,
> we did not apply the patch rightly hence it was reporting
> some issues.
>
> Here is the dmesg log confirming no delay in the draft patch.
>
> Vanilla (5.10 rc3)
> ------------------
>
> [ 0.024011] On node 2 totalpages: 89391104
> [ 0.024012] Normal zone: 1445888 pages used for memmap
> [ 0.024012] Normal zone: 89391104 pages, LIFO batch:63
> [ 2.054646] ACPI: PM-Timer IO Port: 0x448 --------------> 2 secs delay
>
> Patch
> ------
>
> [ 0.024166] On node 2 totalpages: 89391104
> [ 0.024167] Normal zone: 1445888 pages used for memmap
> [ 0.024167] Normal zone: 89391104 pages, LIFO batch:63
> [ 0.026694] ACPI: PM-Timer IO Port: 0x448 --------------> No delay
>
> Attached dmesg logs. Let me know if anything is needed from our end.

I posted formal patchset to fix this issue. The patch 1 is doing the
fix, and almost the same as the draft v2 patch I attached in this thread.
Please feel free to help test and add your Tested-by: tag in the patch
thread if possible.

>
>
>
> From: Rahul Gopakumar <[email protected]>
> Sent: 24 November 2020 8:33 PM
> To: [email protected] <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Rajender M <[email protected]>; Yiu Cho Lau <[email protected]>; Peter Jonasson <[email protected]>; Venkatesh Rajaram <[email protected]>
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
> ?
> Hi Baoquan,
>
> We applied the new patch to 5.10 rc3 and tested it. We are still
> observing the same page corruption issue which we saw with the
> old patch. This is causing 3 secs delay in boot time.
>
> Attached dmesg log from the new patch and also from vanilla
> 5.10 rc3 kernel.
>
> There are multiple lines like below in the dmesg log of the
> new patch.
>
> "BUG: Bad page state in process swapper? pfn:ab08001"
>
> ________________________________________
> From: [email protected] <[email protected]>
> Sent: 22 November 2020 6:38 AM
> To: Rahul Gopakumar
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
> Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
>
> On 11/20/20 at 03:11am, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > To which commit should we apply the draft patch. We tried applying
> > the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
> > (the one we used for applying the previous patch) but it fails.
>
> I tested on 5.10-rc3+. You can append below change to the old patch in
> your testing kernel.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fa6076e1a840..5e5b74e88d69 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> ??????? if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
> ??????????????? return false;
>
> +?????? if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
> +?????????????? return true;
> ??????? /*
> ???????? * We start only with one section of pages, more pages are added as
> ???????? * needed until the rest of deferred pages are initialized.