2019-12-05 02:17:20

by Wei Yang

[permalink] [raw]
Subject: [Patch v2 0/6] Refactor split_mem_range with proper helper and loop

split_mem_range is used to prepare range before mapping kernel page table.

After first version, Thomas suggested some brilliant idea to re-write the
logic.

Wei split the big patch into pieces and did some tests. To verify the
functionality, Wei abstract the code into userland and did following test
cases:

* ranges fits only 4K
* ranges fits only 2M
* ranges fits only 1G
* ranges fits 4K and 2M
* ranges fits 2M and 1G
* ranges fits 4K, 2M and 1G
* ranges fits 4K, 2M and 1G but w/o 1G size
* ranges fits 4K, 2M and 1G with only 4K size

Below is the test result:

### Split [4K, 16K][0x00001000-0x00004000]:
[mem 0x00001000-0x00003fff] page size 4K
### Split [4M, 64M][0x00400000-0x04000000]:
[mem 0x00400000-0x03ffffff] page size 2M
### Split [0G, 2G][0000000000-0x80000000]:
[mem 0000000000-0x7fffffff] page size 1G
### Split [16K, 4M + 16K][0x00004000-0x00404000]:
[mem 0x00004000-0x001fffff] page size 4K
[mem 0x00200000-0x003fffff] page size 2M
[mem 0x00400000-0x00403fff] page size 4K
### Split [4M, 2G + 2M][0x00400000-0x80200000]:
[mem 0x00400000-0x3fffffff] page size 2M
[mem 0x40000000-0x7fffffff] page size 1G
[mem 0x80000000-0x801fffff] page size 2M
### Split [4M - 16K, 2G + 2M + 16K][0x003fc000-0x80204000]:
[mem 0x003fc000-0x003fffff] page size 4K
[mem 0x00400000-0x3fffffff] page size 2M
[mem 0x40000000-0x7fffffff] page size 1G
[mem 0x80000000-0x801fffff] page size 2M
[mem 0x80200000-0x80203fff] page size 4K
### Split w/o 1G size [4M - 16K, 2G + 2M + 16K][0x003fc000-0x80204000]:
[mem 0x003fc000-0x003fffff] page size 4K
[mem 0x00400000-0x801fffff] page size 2M
[mem 0x80200000-0x80203fff] page size 4K
### Split w/ only 4K [4M - 16K, 2G + 2M + 16K][0x003fc000-0x80204000]:
[mem 0x003fc000-0x80203fff] page size 4K

Thomas Gleixner (1):
x86/mm: Refactor split_mem_range with proper helper and loop

Wei Yang (5):
x86/mm: Remove second argument of split_mem_range()
x86/mm: Add attribute __ro_after_init to after_bootmem
x86/mm: Make page_size_mask unsigned int clearly
x86/mm: Refine debug print string retrieval function
x86/mm: Use address directly in split_mem_range()

arch/x86/mm/init.c | 259 ++++++++++++++++++---------------------------
1 file changed, 103 insertions(+), 156 deletions(-)

--
2.17.1


2019-12-05 02:17:30

by Wei Yang

[permalink] [raw]
Subject: [Patch v2 4/6] x86/mm: Refine debug print string retrieval function

Generally, the mapping page size are:

4K, 2M, 1G

except in case 32-bit without PAE, the mapping page size are:

4K, 4M

Based on PG_LEVEL_X definition and mr->page_size_mask, we can calculate
the mapping page size from a predefined string array.

Signed-off-by: Wei Yang <[email protected]>
---
arch/x86/mm/init.c | 39 +++++++++++++--------------------------
1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 0eb5edb63fa2..ded58a31c679 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -308,29 +308,20 @@ static void __ref adjust_range_page_size_mask(struct map_range *mr,
}
}

-static const char *page_size_string(struct map_range *mr)
+static void __meminit mr_print(struct map_range *mr, unsigned int maxidx)
{
- static const char str_1g[] = "1G";
- static const char str_2m[] = "2M";
- static const char str_4m[] = "4M";
- static const char str_4k[] = "4k";
-
- if (mr->page_size_mask & (1U<<PG_LEVEL_1G))
- return str_1g;
- /*
- * 32-bit without PAE has a 4M large page size.
- * PG_LEVEL_2M is misnamed, but we can at least
- * print out the right size in the string.
- */
- if (IS_ENABLED(CONFIG_X86_32) &&
- !IS_ENABLED(CONFIG_X86_PAE) &&
- mr->page_size_mask & (1U<<PG_LEVEL_2M))
- return str_4m;
-
- if (mr->page_size_mask & (1U<<PG_LEVEL_2M))
- return str_2m;
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
+ static const char *sz[2] = { "4K", "4M" };
+#else
+ static const char *sz[4] = { "4K", "2M", "1G", "" };
+#endif
+ unsigned int idx, s;

- return str_4k;
+ for (idx = 0; idx < maxidx; idx++, mr++) {
+ s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
+ pr_debug(" [mem %#010lx-%#010lx] page size %s\n",
+ mr->start, mr->end - 1, sz[s]);
+ }
}

static int __meminit split_mem_range(struct map_range *mr,
@@ -425,11 +416,7 @@ static int __meminit split_mem_range(struct map_range *mr,
nr_range--;
}

- for (i = 0; i < nr_range; i++)
- pr_debug(" [mem %#010lx-%#010lx] page %s\n",
- mr[i].start, mr[i].end - 1,
- page_size_string(&mr[i]));
-
+ mr_print(mr, nr_range);
return nr_range;
}

--
2.17.1

2019-12-05 02:18:04

by Wei Yang

[permalink] [raw]
Subject: [Patch v2 5/6] x86/mm: Use address directly in split_mem_range()

This is not necessary to convert address to pfn to split range. And
finally, convert back to address and store it to map_range.

Signed-off-by: Wei Yang <[email protected]>
---
arch/x86/mm/init.c | 73 +++++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 37 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ded58a31c679..5fe3f645f02c 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -259,14 +259,14 @@ static void setup_pcid(void)
#endif

static int __meminit save_mr(struct map_range *mr, int nr_range,
- unsigned long start_pfn, unsigned long end_pfn,
+ unsigned long start, unsigned long end,
unsigned long page_size_mask)
{
if (start_pfn < end_pfn) {
if (nr_range >= NR_RANGE_MR)
panic("run out of range for init_memory_mapping\n");
- mr[nr_range].start = start_pfn<<PAGE_SHIFT;
- mr[nr_range].end = end_pfn<<PAGE_SHIFT;
+ mr[nr_range].start = start_pfn;
+ mr[nr_range].end = end_pfn;
mr[nr_range].page_size_mask = page_size_mask;
nr_range++;
}
@@ -328,14 +328,13 @@ static int __meminit split_mem_range(struct map_range *mr,
unsigned long start,
unsigned long end)
{
- unsigned long start_pfn, end_pfn, limit_pfn;
- unsigned long pfn;
+ unsigned long addr, limit;
int i, nr_range = 0;

- limit_pfn = PFN_DOWN(end);
+ limit = end;

/* head if not big page alignment ? */
- pfn = start_pfn = PFN_DOWN(start);
+ addr = start;
#ifdef CONFIG_X86_32
/*
* Don't use a large page for the first 2/4MB of memory
@@ -343,61 +342,61 @@ static int __meminit split_mem_range(struct map_range *mr,
* and overlapping MTRRs into large pages can cause
* slowdowns.
*/
- if (pfn == 0)
- end_pfn = PFN_DOWN(PMD_SIZE);
+ if (addr == 0)
+ end = PMD_SIZE;
else
- end_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
+ end = round_up(addr, PMD_SIZE);
#else /* CONFIG_X86_64 */
- end_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
+ end = round_up(addr, PMD_SIZE);
#endif
- if (end_pfn > limit_pfn)
- end_pfn = limit_pfn;
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);
- pfn = end_pfn;
+ if (end > limit)
+ end = limit;
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end, 0);
+ addr = end;
}

/* big page (2M) range */
- start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
+ start = round_up(addr, PMD_SIZE);
#ifdef CONFIG_X86_32
- end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));
+ end = round_down(limit, PMD_SIZE);
#else /* CONFIG_X86_64 */
- end_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));
- if (end_pfn > round_down(limit_pfn, PFN_DOWN(PMD_SIZE)))
- end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));
+ end = round_up(addr, PUD_SIZE);
+ if (end > round_down(limit, PMD_SIZE))
+ end = round_down(limit, PMD_SIZE);
#endif

- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end,
page_size_mask & (1U<<PG_LEVEL_2M));
- pfn = end_pfn;
+ addr = end;
}

#ifdef CONFIG_X86_64
/* big page (1G) range */
- start_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));
- end_pfn = round_down(limit_pfn, PFN_DOWN(PUD_SIZE));
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,
+ start = round_up(addr, PUD_SIZE);
+ end = round_down(limit, PUD_SIZE);
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end,
page_size_mask &
((1U<<PG_LEVEL_2M)|(1U<<PG_LEVEL_1G)));
- pfn = end_pfn;
+ addr = end;
}

/* tail is not big page (1G) alignment */
- start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
- end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,
+ start = round_up(addr, PMD_SIZE);
+ end = round_down(limit, PMD_SIZE);
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end,
page_size_mask & (1U<<PG_LEVEL_2M));
- pfn = end_pfn;
+ addr = end;
}
#endif

/* tail is not big page (2M) alignment */
- start_pfn = pfn;
- end_pfn = limit_pfn;
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);
+ start = addr;
+ end = limit;
+ nr_range = save_mr(mr, nr_range, start, end, 0);

if (!after_bootmem)
adjust_range_page_size_mask(mr, nr_range);
--
2.17.1

2019-12-05 09:14:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch v2 4/6] x86/mm: Refine debug print string retrieval function

On Thu, Dec 05, 2019 at 10:14:01AM +0800, Wei Yang wrote:
> Generally, the mapping page size are:
>
> 4K, 2M, 1G
>
> except in case 32-bit without PAE, the mapping page size are:
>
> 4K, 4M
>
> Based on PG_LEVEL_X definition and mr->page_size_mask, we can calculate
> the mapping page size from a predefined string array.
>
> Signed-off-by: Wei Yang <[email protected]>
> ---
> arch/x86/mm/init.c | 39 +++++++++++++--------------------------
> 1 file changed, 13 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 0eb5edb63fa2..ded58a31c679 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -308,29 +308,20 @@ static void __ref adjust_range_page_size_mask(struct map_range *mr,
> }
> }
>
> +static void __meminit mr_print(struct map_range *mr, unsigned int maxidx)
> {
> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
> + static const char *sz[2] = { "4K", "4M" };
> +#else
> + static const char *sz[4] = { "4K", "2M", "1G", "" };
> +#endif
> + unsigned int idx, s;
>
> + for (idx = 0; idx < maxidx; idx++, mr++) {
> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);

Is it at all possible for !PAE to have 1G here, if you use the sz[4]
definition unconditionally?

> + pr_debug(" [mem %#010lx-%#010lx] page size %s\n",
> + mr->start, mr->end - 1, sz[s]);
> + }
> }

2019-12-06 01:53:23

by Wei Yang

[permalink] [raw]
Subject: Re: [Patch v2 4/6] x86/mm: Refine debug print string retrieval function

On Thu, Dec 05, 2019 at 10:13:11AM +0100, Peter Zijlstra wrote:
>On Thu, Dec 05, 2019 at 10:14:01AM +0800, Wei Yang wrote:
>> Generally, the mapping page size are:
>>
>> 4K, 2M, 1G
>>
>> except in case 32-bit without PAE, the mapping page size are:
>>
>> 4K, 4M
>>
>> Based on PG_LEVEL_X definition and mr->page_size_mask, we can calculate
>> the mapping page size from a predefined string array.
>>
>> Signed-off-by: Wei Yang <[email protected]>
>> ---
>> arch/x86/mm/init.c | 39 +++++++++++++--------------------------
>> 1 file changed, 13 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index 0eb5edb63fa2..ded58a31c679 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -308,29 +308,20 @@ static void __ref adjust_range_page_size_mask(struct map_range *mr,
>> }
>> }
>>
>> +static void __meminit mr_print(struct map_range *mr, unsigned int maxidx)
>> {
>> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
>> + static const char *sz[2] = { "4K", "4M" };
>> +#else
>> + static const char *sz[4] = { "4K", "2M", "1G", "" };
>> +#endif
>> + unsigned int idx, s;
>>
>> + for (idx = 0; idx < maxidx; idx++, mr++) {
>> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
>
>Is it at all possible for !PAE to have 1G here, if you use the sz[4]
>definition unconditionally?
>

You mean remove the ifdef and use sz[4] for both condition?

Then how to differentiate 4M and 2M?

>> + pr_debug(" [mem %#010lx-%#010lx] page size %s\n",
>> + mr->start, mr->end - 1, sz[s]);
>> + }
>> }

--
Wei Yang
Help you, Help me

2019-12-06 10:29:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch v2 4/6] x86/mm: Refine debug print string retrieval function

On Fri, Dec 06, 2019 at 09:51:26AM +0800, Wei Yang wrote:

> >> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
> >> + static const char *sz[2] = { "4K", "4M" };
> >> +#else
> >> + static const char *sz[4] = { "4K", "2M", "1G", "" };
> >> +#endif
> >> + unsigned int idx, s;
> >>
> >> + for (idx = 0; idx < maxidx; idx++, mr++) {
> >> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
> >
> >Is it at all possible for !PAE to have 1G here, if you use the sz[4]
> >definition unconditionally?
> >
>
> You mean remove the ifdef and use sz[4] for both condition?
>
> Then how to differentiate 4M and 2M?

Argh, I'm blind.. I failed to spot that. N/m then.

2019-12-06 15:19:55

by Wei Yang

[permalink] [raw]
Subject: Re: [Patch v2 4/6] x86/mm: Refine debug print string retrieval function

On Fri, Dec 06, 2019 at 11:27:46AM +0100, Peter Zijlstra wrote:
>On Fri, Dec 06, 2019 at 09:51:26AM +0800, Wei Yang wrote:
>
>> >> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
>> >> + static const char *sz[2] = { "4K", "4M" };
>> >> +#else
>> >> + static const char *sz[4] = { "4K", "2M", "1G", "" };
>> >> +#endif
>> >> + unsigned int idx, s;
>> >>
>> >> + for (idx = 0; idx < maxidx; idx++, mr++) {
>> >> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
>> >
>> >Is it at all possible for !PAE to have 1G here, if you use the sz[4]
>> >definition unconditionally?
>> >
>>
>> You mean remove the ifdef and use sz[4] for both condition?
>>
>> Then how to differentiate 4M and 2M?
>
>Argh, I'm blind.. I failed to spot that. N/m then.

Never mind. I always do the same thing :-(

--
Wei Yang
Help you, Help me

2019-12-07 03:37:55

by kernel test robot

[permalink] [raw]
Subject: Re: [Patch v2 5/6] x86/mm: Use address directly in split_mem_range()

Hi Wei,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/mm]
[also build test ERROR on tip/auto-latest linus/master v5.4 next-20191206]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7f264dab5b60343358e788d4c939c166c22ea4a2
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-1) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

Note: the linux-review/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345 HEAD 7f535395f79354bfa29cca182dd203525bcb4237 builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

arch/x86/mm/init.c: In function 'save_mr':
>> arch/x86/mm/init.c:265:6: error: 'start_pfn' undeclared (first use in this function); did you mean 'start'?
if (start_pfn < end_pfn) {
^~~~~~~~~
start
arch/x86/mm/init.c:265:6: note: each undeclared identifier is reported only once for each function it appears in
>> arch/x86/mm/init.c:265:18: error: 'end_pfn' undeclared (first use in this function); did you mean 'pgd_pfn'?
if (start_pfn < end_pfn) {
^~~~~~~
pgd_pfn

vim +265 arch/x86/mm/init.c

f765090a2617b8 Pekka Enberg 2009-03-05 260
dc9dd5cc854cde Jan Beulich 2009-03-12 261 static int __meminit save_mr(struct map_range *mr, int nr_range,
51c6d529efdc86 Wei Yang 2019-12-05 262 unsigned long start, unsigned long end,
f765090a2617b8 Pekka Enberg 2009-03-05 263 unsigned long page_size_mask)
f765090a2617b8 Pekka Enberg 2009-03-05 264 {
f765090a2617b8 Pekka Enberg 2009-03-05 @265 if (start_pfn < end_pfn) {
f765090a2617b8 Pekka Enberg 2009-03-05 266 if (nr_range >= NR_RANGE_MR)
f765090a2617b8 Pekka Enberg 2009-03-05 267 panic("run out of range for init_memory_mapping\n");
51c6d529efdc86 Wei Yang 2019-12-05 268 mr[nr_range].start = start_pfn;
51c6d529efdc86 Wei Yang 2019-12-05 269 mr[nr_range].end = end_pfn;
f765090a2617b8 Pekka Enberg 2009-03-05 270 mr[nr_range].page_size_mask = page_size_mask;
f765090a2617b8 Pekka Enberg 2009-03-05 271 nr_range++;
f765090a2617b8 Pekka Enberg 2009-03-05 272 }
f765090a2617b8 Pekka Enberg 2009-03-05 273
f765090a2617b8 Pekka Enberg 2009-03-05 274 return nr_range;
f765090a2617b8 Pekka Enberg 2009-03-05 275 }
f765090a2617b8 Pekka Enberg 2009-03-05 276

:::::: The code at line 265 was first introduced by commit
:::::: f765090a2617b8d9cb73b71e0aa850c29460d8be x86: move init_memory_mapping() to common mm/init.c

:::::: TO: Pekka Enberg <[email protected]>
:::::: CC: Ingo Molnar <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation


Attachments:
(No filename) (3.30 kB)
.config.gz (7.05 kB)
Download all attachments

2019-12-07 07:19:35

by Wei Yang

[permalink] [raw]
Subject: Re: [Patch v2 5/6] x86/mm: Use address directly in split_mem_range()

On Sat, Dec 07, 2019 at 11:36:10AM +0800, kbuild test robot wrote:
>Hi Wei,
>
>Thank you for the patch! Yet something to improve:
>
>[auto build test ERROR on tip/x86/mm]
>[also build test ERROR on tip/auto-latest linus/master v5.4 next-20191206]
>[if your patch is applied to the wrong git tree, please drop us a note to help
>improve the system. BTW, we also suggest to use '--base' option to specify the
>base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
>url: https://github.com/0day-ci/linux/commits/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345
>base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7f264dab5b60343358e788d4c939c166c22ea4a2
>config: i386-tinyconfig (attached as .config)
>compiler: gcc-7 (Debian 7.5.0-1) 7.5.0
>reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
>If you fix the issue, kindly add following tag
>Reported-by: kbuild test robot <[email protected]>
>
>Note: the linux-review/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345 HEAD 7f535395f79354bfa29cca182dd203525bcb4237 builds fine.
> It only hurts bisectibility.
>
>All errors (new ones prefixed by >>):
>
> arch/x86/mm/init.c: In function 'save_mr':
>>> arch/x86/mm/init.c:265:6: error: 'start_pfn' undeclared (first use in this function); did you mean 'start'?
> if (start_pfn < end_pfn) {
> ^~~~~~~~~
> start
> arch/x86/mm/init.c:265:6: note: each undeclared identifier is reported only once for each function it appears in
>>> arch/x86/mm/init.c:265:18: error: 'end_pfn' undeclared (first use in this function); did you mean 'pgd_pfn'?
> if (start_pfn < end_pfn) {
> ^~~~~~~
> pgd_pfn
>

Oops, introduced an error after resolving a conflict. Should be start and end.

Will correct it in next version.


--
Wei Yang
Help you, Help me