2012-10-08 15:17:05

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
> memmap_init_zone() loops through every Page Frame Number (pfn),
> including pfn values that are within the gaps between existing
> memory sections. The unneeded looping will become a boot
> performance issue when machines configure larger memory ranges
> that will contain larger and more numerous gaps.
>
> The code will skip across invalid sections to reduce the
> number of loops executed.
>
> Signed-off-by: Mike Yoknis <[email protected]>

This only helps SPARSEMEM and changes more headers than should be
necessary. It would have been easier to do something simple like

if (!early_pfn_valid(pfn)) {
pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
continue;
}

because that would obey the expectation that pages within a
MAX_ORDER_NR_PAGES-aligned range are all valid or all invalid (ARM is the
exception that breaks this rule). It would be less efficient on
SPARSEMEM than what you're trying to merge but I do not see the need for
the additional complexity unless you can show it makes a big difference
to boot times.

--
Mel Gorman
SUSE Labs


2012-10-09 00:42:30

by Ni zhan Chen

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On 10/08/2012 11:16 PM, Mel Gorman wrote:
> On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
>> memmap_init_zone() loops through every Page Frame Number (pfn),
>> including pfn values that are within the gaps between existing
>> memory sections. The unneeded looping will become a boot
>> performance issue when machines configure larger memory ranges
>> that will contain larger and more numerous gaps.
>>
>> The code will skip across invalid sections to reduce the
>> number of loops executed.
>>
>> Signed-off-by: Mike Yoknis <[email protected]>
> This only helps SPARSEMEM and changes more headers than should be
> necessary. It would have been easier to do something simple like
>
> if (!early_pfn_valid(pfn)) {
> pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
> continue;
> }

So if present memoy section in sparsemem can have
MAX_ORDER_NR_PAGES-aligned range are all invalid?
If the answer is yes, when this will happen?

>
> because that would obey the expectation that pages within a
> MAX_ORDER_NR_PAGES-aligned range are all valid or all invalid (ARM is the
> exception that breaks this rule). It would be less efficient on
> SPARSEMEM than what you're trying to merge but I do not see the need for
> the additional complexity unless you can show it makes a big difference
> to boot times.
>

2012-10-09 14:56:59

by Mike Yoknis

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Mon, 2012-10-08 at 16:16 +0100, Mel Gorman wrote:
> On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
> > memmap_init_zone() loops through every Page Frame Number (pfn),
> > including pfn values that are within the gaps between existing
> > memory sections. The unneeded looping will become a boot
> > performance issue when machines configure larger memory ranges
> > that will contain larger and more numerous gaps.
> >
> > The code will skip across invalid sections to reduce the
> > number of loops executed.
> >
> > Signed-off-by: Mike Yoknis <[email protected]>
>
> This only helps SPARSEMEM and changes more headers than should be
> necessary. It would have been easier to do something simple like
>
> if (!early_pfn_valid(pfn)) {
> pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
> continue;
> }
>
> because that would obey the expectation that pages within a
> MAX_ORDER_NR_PAGES-aligned range are all valid or all invalid (ARM is the
> exception that breaks this rule). It would be less efficient on
> SPARSEMEM than what you're trying to merge but I do not see the need for
> the additional complexity unless you can show it makes a big difference
> to boot times.
>

Mel,
I, too, was concerned that pfn_valid() was defined in so many header
files. But, I did not feel that it was appropriate for me to try to
restructure things to consolidate those definitions just to add this one
new function. Being a kernel newbie I did not believe that I had a good
enough understanding of what combinations and permutations of CONFIG and
architecture may have made all of those different definitions necessary,
so I left them in.

Yes, indeed, this fix is targeted for systems that have holes in memory.
That is where we see the problem. We are creating large computer
systems and we would like for those machines to perform well, including
boot times.

Let me pass along the numbers I have. We have what we call an
"architectural simulator". It is a computer program that pretends that
it is a computer system. We use it to test the firmware before real
hardware is available. We have booted Linux on our simulator. As you
would expect it takes longer to boot on the simulator than it does on
real hardware.

With my patch - boot time 41 minutes
Without patch - boot time 94 minutes

These numbers do not scale linearly to real hardware. But indicate to
me a place where Linux can be improved.

Mike Yoknis

2012-10-19 19:53:27

by Mike Yoknis

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Tue, 2012-10-09 at 08:56 -0600, Mike Yoknis wrote:
> On Mon, 2012-10-08 at 16:16 +0100, Mel Gorman wrote:
> > On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
> > > memmap_init_zone() loops through every Page Frame Number (pfn),
> > > including pfn values that are within the gaps between existing
> > > memory sections. The unneeded looping will become a boot
> > > performance issue when machines configure larger memory ranges
> > > that will contain larger and more numerous gaps.
> > >
> > > The code will skip across invalid sections to reduce the
> > > number of loops executed.
> > >
> > > Signed-off-by: Mike Yoknis <[email protected]>
> >
> > I do not see the need for
> > the additional complexity unless you can show it makes a big difference
> > to boot times.
> >
>
> Mel,
>
> Let me pass along the numbers I have. We have what we call an
> "architectural simulator". It is a computer program that pretends that
> it is a computer system. We use it to test the firmware before real
> hardware is available. We have booted Linux on our simulator. As you
> would expect it takes longer to boot on the simulator than it does on
> real hardware.
>
> With my patch - boot time 41 minutes
> Without patch - boot time 94 minutes
>
> These numbers do not scale linearly to real hardware. But indicate to
> me a place where Linux can be improved.
>
> Mike Yoknis
>
Mel,
I finally got access to prototype hardware.
It is a relatively small machine with only 64GB of RAM.

I put in a time measurement by reading the TSC register.
I booted both with and without my patch -

Without patch -
[ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
[ 0.000000] memmap_init_zone() enter 1404184834218
[ 0.000000] memmap_init_zone() exit 1411174884438 diff = 6990050220

With patch -
[ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
[ 0.000000] memmap_init_zone() enter 1555530050778
[ 0.000000] memmap_init_zone() exit 1559379204643 diff = 3849153865

This shows that without the patch the routine spends 45%
of its time spinning unnecessarily.

Mike Yoknis

2012-10-20 08:36:49

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Fri, Oct 19, 2012 at 01:53:18PM -0600, Mike Yoknis wrote:
> On Tue, 2012-10-09 at 08:56 -0600, Mike Yoknis wrote:
> > On Mon, 2012-10-08 at 16:16 +0100, Mel Gorman wrote:
> > > On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
> > > > memmap_init_zone() loops through every Page Frame Number (pfn),
> > > > including pfn values that are within the gaps between existing
> > > > memory sections. The unneeded looping will become a boot
> > > > performance issue when machines configure larger memory ranges
> > > > that will contain larger and more numerous gaps.
> > > >
> > > > The code will skip across invalid sections to reduce the
> > > > number of loops executed.
> > > >
> > > > Signed-off-by: Mike Yoknis <[email protected]>
> > >
> > > I do not see the need for
> > > the additional complexity unless you can show it makes a big difference
> > > to boot times.
> > >
> >
> > Mel,
> >
> > Let me pass along the numbers I have. We have what we call an
> > "architectural simulator". It is a computer program that pretends that
> > it is a computer system. We use it to test the firmware before real
> > hardware is available. We have booted Linux on our simulator. As you
> > would expect it takes longer to boot on the simulator than it does on
> > real hardware.
> >
> > With my patch - boot time 41 minutes
> > Without patch - boot time 94 minutes
> >
> > These numbers do not scale linearly to real hardware. But indicate to
> > me a place where Linux can be improved.
> >
> > Mike Yoknis
> >
> Mel,
> I finally got access to prototype hardware.
> It is a relatively small machine with only 64GB of RAM.
>
> I put in a time measurement by reading the TSC register.
> I booted both with and without my patch -
>
> Without patch -
> [ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
> [ 0.000000] memmap_init_zone() enter 1404184834218
> [ 0.000000] memmap_init_zone() exit 1411174884438 diff = 6990050220
>
> With patch -
> [ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
> [ 0.000000] memmap_init_zone() enter 1555530050778
> [ 0.000000] memmap_init_zone() exit 1559379204643 diff = 3849153865
>
> This shows that without the patch the routine spends 45%
> of its time spinning unnecessarily.
>

I'm travelling at the moment so apologies that I have not followed up on
this. My problem is still the same with the patch - it changes more
headers than is necessary and it is sparsemem specific. At minimum, try
the suggestion of

if (!early_pfn_valid(pfn)) {
pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
continue;
}

and see how much it gains you as it should work on all memory models. If
it turns out that you really need to skip whole sections then the strice
could MAX_ORDER_NR_PAGES on all memory models except sparsemem where the
stride would be PAGES_PER_SECTION

--
Mel Gorman
SUSE Labs

2012-10-24 15:47:57

by Mike Yoknis

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Sat, 2012-10-20 at 09:29 +0100, Mel Gorman wrote:
> On Fri, Oct 19, 2012 at 01:53:18PM -0600, Mike Yoknis wrote:
> > On Tue, 2012-10-09 at 08:56 -0600, Mike Yoknis wrote:
> > > On Mon, 2012-10-08 at 16:16 +0100, Mel Gorman wrote:
> > > > On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
> > > > > memmap_init_zone() loops through every Page Frame Number (pfn),
> > > > > including pfn values that are within the gaps between existing
> > > > > memory sections. The unneeded looping will become a boot
> > > > > performance issue when machines configure larger memory ranges
> > > > > that will contain larger and more numerous gaps.
> > > > >
> > > > > The code will skip across invalid sections to reduce the
> > > > > number of loops executed.
> > > > >
> > > > > Signed-off-by: Mike Yoknis <[email protected]>
> > > >
> > > > I do not see the need for
> > > > the additional complexity unless you can show it makes a big difference
> > > > to boot times.
> > > >
> > >
> > > Mel,
> > >
> > > Let me pass along the numbers I have. We have what we call an
> > > "architectural simulator". It is a computer program that pretends that
> > > it is a computer system. We use it to test the firmware before real
> > > hardware is available. We have booted Linux on our simulator. As you
> > > would expect it takes longer to boot on the simulator than it does on
> > > real hardware.
> > >
> > > With my patch - boot time 41 minutes
> > > Without patch - boot time 94 minutes
> > >
> > > These numbers do not scale linearly to real hardware. But indicate to
> > > me a place where Linux can be improved.
> > >
> > > Mike Yoknis
> > >
> > Mel,
> > I finally got access to prototype hardware.
> > It is a relatively small machine with only 64GB of RAM.
> >
> > I put in a time measurement by reading the TSC register.
> > I booted both with and without my patch -
> >
> > Without patch -
> > [ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
> > [ 0.000000] memmap_init_zone() enter 1404184834218
> > [ 0.000000] memmap_init_zone() exit 1411174884438 diff = 6990050220
> >
> > With patch -
> > [ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
> > [ 0.000000] memmap_init_zone() enter 1555530050778
> > [ 0.000000] memmap_init_zone() exit 1559379204643 diff = 3849153865
> >
> > This shows that without the patch the routine spends 45%
> > of its time spinning unnecessarily.
> >
>
> I'm travelling at the moment so apologies that I have not followed up on
> this. My problem is still the same with the patch - it changes more
> headers than is necessary and it is sparsemem specific. At minimum, try
> the suggestion of
>
> if (!early_pfn_valid(pfn)) {
> pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
> continue;
> }
>
> and see how much it gains you as it should work on all memory models. If
> it turns out that you really need to skip whole sections then the strice
> could MAX_ORDER_NR_PAGES on all memory models except sparsemem where the
> stride would be PAGES_PER_SECTION
>
Mel,
I tried your suggestion. I re-ran all 3 methods on our latest firmware.

The following are TSC difference numbers (*10^6) to execute
memmap_init_zone() -

No patch - 7010
Mel's patch- 3918
My patch - 3847

The incremental improvement of my method is not significant vs. yours.

If you believe your suggested change is worthwhile I will create a v2
patch.
Mike Y

2012-10-25 09:51:45

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Wed, Oct 24, 2012 at 09:47:47AM -0600, Mike Yoknis wrote:
> On Sat, 2012-10-20 at 09:29 +0100, Mel Gorman wrote:
> > On Fri, Oct 19, 2012 at 01:53:18PM -0600, Mike Yoknis wrote:
> > > On Tue, 2012-10-09 at 08:56 -0600, Mike Yoknis wrote:
> > > > On Mon, 2012-10-08 at 16:16 +0100, Mel Gorman wrote:
> > > > > On Wed, Oct 03, 2012 at 08:56:14AM -0600, Mike Yoknis wrote:
> > > > > > memmap_init_zone() loops through every Page Frame Number (pfn),
> > > > > > including pfn values that are within the gaps between existing
> > > > > > memory sections. The unneeded looping will become a boot
> > > > > > performance issue when machines configure larger memory ranges
> > > > > > that will contain larger and more numerous gaps.
> > > > > >
> > > > > > The code will skip across invalid sections to reduce the
> > > > > > number of loops executed.
> > > > > >
> > > > > > Signed-off-by: Mike Yoknis <[email protected]>
> > > > >
> > > > > I do not see the need for
> > > > > the additional complexity unless you can show it makes a big difference
> > > > > to boot times.
> > > > >
> > > >
> > > > Mel,
> > > >
> > > > Let me pass along the numbers I have. We have what we call an
> > > > "architectural simulator". It is a computer program that pretends that
> > > > it is a computer system. We use it to test the firmware before real
> > > > hardware is available. We have booted Linux on our simulator. As you
> > > > would expect it takes longer to boot on the simulator than it does on
> > > > real hardware.
> > > >
> > > > With my patch - boot time 41 minutes
> > > > Without patch - boot time 94 minutes
> > > >
> > > > These numbers do not scale linearly to real hardware. But indicate to
> > > > me a place where Linux can be improved.
> > > >
> > > > Mike Yoknis
> > > >
> > > Mel,
> > > I finally got access to prototype hardware.
> > > It is a relatively small machine with only 64GB of RAM.
> > >
> > > I put in a time measurement by reading the TSC register.
> > > I booted both with and without my patch -
> > >
> > > Without patch -
> > > [ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
> > > [ 0.000000] memmap_init_zone() enter 1404184834218
> > > [ 0.000000] memmap_init_zone() exit 1411174884438 diff = 6990050220
> > >
> > > With patch -
> > > [ 0.000000] Normal zone: 13400064 pages, LIFO batch:31
> > > [ 0.000000] memmap_init_zone() enter 1555530050778
> > > [ 0.000000] memmap_init_zone() exit 1559379204643 diff = 3849153865
> > >
> > > This shows that without the patch the routine spends 45%
> > > of its time spinning unnecessarily.
> > >
> >
> > I'm travelling at the moment so apologies that I have not followed up on
> > this. My problem is still the same with the patch - it changes more
> > headers than is necessary and it is sparsemem specific. At minimum, try
> > the suggestion of
> >
> > if (!early_pfn_valid(pfn)) {
> > pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
> > continue;
> > }
> >
> > and see how much it gains you as it should work on all memory models. If
> > it turns out that you really need to skip whole sections then the strice
> > could MAX_ORDER_NR_PAGES on all memory models except sparsemem where the
> > stride would be PAGES_PER_SECTION
> >
> Mel,
> I tried your suggestion. I re-ran all 3 methods on our latest firmware.
>
> The following are TSC difference numbers (*10^6) to execute
> memmap_init_zone() -
>
> No patch - 7010
> Mel's patch- 3918
> My patch - 3847
>
> The incremental improvement of my method is not significant vs. yours.
>
> If you believe your suggested change is worthwhile I will create a v2
> patch.

I think it is a reasonable change and I prefer my suggestion because it
should work for all memory models. Please do a V2 of the patch. I'm still
travelling at the moment (writing this from an airport) but I'll be back
online next Tuesday and will review it when I can.

--
Mel Gorman
SUSE Labs

2012-10-26 22:47:58

by Mike Yoknis

[permalink] [raw]
Subject: [PATCH v2] mm: memmap_init_zone() performance improvement

memmap_init_zone() loops through every Page Frame Number (pfn),
including pfn values that are within the gaps between existing
memory sections. The unneeded looping will become a boot
performance issue when machines configure larger memory ranges
that will contain larger and more numerous gaps.

The code will skip across invalid pfn values to reduce the
number of loops executed.

Signed-off-by: Mike Yoknis <[email protected]>
---
mm/page_alloc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 45c916b..9f9c1a6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3857,8 +3857,11 @@ void __meminit memmap_init_zone(unsigned long
size, int nid, unsigned long zone,
* exist on hotplugged memory.
*/
if (context == MEMMAP_EARLY) {
- if (!early_pfn_valid(pfn))
+ if (!early_pfn_valid(pfn)) {
+ pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES,
+ MAX_ORDER_NR_PAGES) - 1;
continue;
+ }
if (!early_pfn_in_nid(pfn, nid))
continue;
}
--
1.7.11.3

2012-10-30 15:23:28

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On 10/20/2012 01:29 AM, Mel Gorman wrote:
> I'm travelling at the moment so apologies that I have not followed up on
> this. My problem is still the same with the patch - it changes more
> headers than is necessary and it is sparsemem specific. At minimum, try
> the suggestion of
>
> if (!early_pfn_valid(pfn)) {
> pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
> continue;
> }

Sorry I didn't catch this until v2...

Is that ALIGN() correct? If pfn=3, then it would expand to:

(3+MAX_ORDER_NR_PAGES+MAX_ORDER_NR_PAGES-1) & ~(MAX_ORDER_NR_PAGES-1)

You would end up skipping the current MAX_ORDER_NR_PAGES area, and then
one _extra_ because ALIGN() aligns up, and you're adding
MAX_ORDER_NR_PAGES too. It doesn't matter unless you run in to a
!early_valid_pfn() in the middle of a MAX_ORDER area, I guess.

I think this would work, plus be a bit smaller:

pfn = ALIGN(pfn + 1, MAX_ORDER_NR_PAGES) - 1;

2012-10-30 22:32:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2] mm: memmap_init_zone() performance improvement

On Fri, 26 Oct 2012 16:47:47 -0600
Mike Yoknis <[email protected]> wrote:

> memmap_init_zone() loops through every Page Frame Number (pfn),
> including pfn values that are within the gaps between existing
> memory sections. The unneeded looping will become a boot
> performance issue when machines configure larger memory ranges
> that will contain larger and more numerous gaps.
>
> The code will skip across invalid pfn values to reduce the
> number of loops executed.
>

So I was wondering how much difference this makes. Then I see Mel
already asked and was answered. The lesson: please treat a reviewer
question as a sign that the changelog needs more information! I added
this text to the changelog:

: We have what we call an "architectural simulator". It is a computer
: program that pretends that it is a computer system. We use it to test the
: firmware before real hardware is available. We have booted Linux on our
: simulator. As you would expect it takes longer to boot on the simulator
: than it does on real hardware.
:
: With my patch - boot time 41 minutes
: Without patch - boot time 94 minutes
:
: These numbers do not scale linearly to real hardware. But indicate to me
: a place where Linux can be improved.

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3857,8 +3857,11 @@ void __meminit memmap_init_zone(unsigned long
> size, int nid, unsigned long zone,
> * exist on hotplugged memory.
> */
> if (context == MEMMAP_EARLY) {
> - if (!early_pfn_valid(pfn))
> + if (!early_pfn_valid(pfn)) {
> + pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES,
> + MAX_ORDER_NR_PAGES) - 1;
> continue;
> + }
> if (!early_pfn_in_nid(pfn, nid))
> continue;
> }

So what is the assumption here? That each zone's first page has a pfn
which is a multiple of MAX_ORDER_NR_PAGES?

That seems reasonable, but is it actually true, for all architectures
and for all time? Where did this come from?

2012-11-06 16:03:41

by Mike Yoknis

[permalink] [raw]
Subject: Re: [PATCH] mm: memmap_init_zone() performance improvement

On Tue, 2012-10-30 at 09:14 -0600, Dave Hansen wrote:
> On 10/20/2012 01:29 AM, Mel Gorman wrote:
> > I'm travelling at the moment so apologies that I have not followed up on
> > this. My problem is still the same with the patch - it changes more
> > headers than is necessary and it is sparsemem specific. At minimum, try
> > the suggestion of
> >
> > if (!early_pfn_valid(pfn)) {
> > pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES, MAX_ORDER_NR_PAGES) - 1;
> > continue;
> > }
>
> Sorry I didn't catch this until v2...
>
> Is that ALIGN() correct? If pfn=3, then it would expand to:
>
> (3+MAX_ORDER_NR_PAGES+MAX_ORDER_NR_PAGES-1) & ~(MAX_ORDER_NR_PAGES-1)
>
> You would end up skipping the current MAX_ORDER_NR_PAGES area, and then
> one _extra_ because ALIGN() aligns up, and you're adding
> MAX_ORDER_NR_PAGES too. It doesn't matter unless you run in to a
> !early_valid_pfn() in the middle of a MAX_ORDER area, I guess.
>
> I think this would work, plus be a bit smaller:
>
> pfn = ALIGN(pfn + 1, MAX_ORDER_NR_PAGES) - 1;
>
Dave,
I see your point about "rounding-up". But, I favor the way Mel
suggested it. It more clearly shows the intent, which is to move up by
MAX_ORDER_NR_PAGES. The "pfn+1" may suggest that there is some
significance to the next pfn, but there is not.
I find Mel's way easier to understand.
Mike Y