split_mem_range is used to prepare range before mapping kernel page table.
After first version, Thomas suggested some brilliant idea to re-write the
logic.
Wei split the big patch into pieces and did some tests. To verify the
functionality, Wei abstract the code into userland and did following test
cases:
* ranges fits only 4K
* ranges fits only 2M
* ranges fits only 1G
* ranges fits 4K and 2M
* ranges fits 2M and 1G
* ranges fits 4K, 2M and 1G
* ranges fits 4K, 2M and 1G but w/o 1G size
* ranges fits 4K, 2M and 1G with only 4K size
Below is the test result:
### Split [4K, 16K][0x00001000-0x00004000]:
[mem 0x00001000-0x00003fff] page size 4K
### Split [4M, 64M][0x00400000-0x04000000]:
[mem 0x00400000-0x03ffffff] page size 2M
### Split [0G, 2G][0000000000-0x80000000]:
[mem 0000000000-0x7fffffff] page size 1G
### Split [16K, 4M + 16K][0x00004000-0x00404000]:
[mem 0x00004000-0x001fffff] page size 4K
[mem 0x00200000-0x003fffff] page size 2M
[mem 0x00400000-0x00403fff] page size 4K
### Split [4M, 2G + 2M][0x00400000-0x80200000]:
[mem 0x00400000-0x3fffffff] page size 2M
[mem 0x40000000-0x7fffffff] page size 1G
[mem 0x80000000-0x801fffff] page size 2M
### Split [4M - 16K, 2G + 2M + 16K][0x003fc000-0x80204000]:
[mem 0x003fc000-0x003fffff] page size 4K
[mem 0x00400000-0x3fffffff] page size 2M
[mem 0x40000000-0x7fffffff] page size 1G
[mem 0x80000000-0x801fffff] page size 2M
[mem 0x80200000-0x80203fff] page size 4K
### Split w/o 1G size [4M - 16K, 2G + 2M + 16K][0x003fc000-0x80204000]:
[mem 0x003fc000-0x003fffff] page size 4K
[mem 0x00400000-0x801fffff] page size 2M
[mem 0x80200000-0x80203fff] page size 4K
### Split w/ only 4K [4M - 16K, 2G + 2M + 16K][0x003fc000-0x80204000]:
[mem 0x003fc000-0x80203fff] page size 4K
Thomas Gleixner (1):
x86/mm: Refactor split_mem_range with proper helper and loop
Wei Yang (5):
x86/mm: Remove second argument of split_mem_range()
x86/mm: Add attribute __ro_after_init to after_bootmem
x86/mm: Make page_size_mask unsigned int clearly
x86/mm: Refine debug print string retrieval function
x86/mm: Use address directly in split_mem_range()
arch/x86/mm/init.c | 259 ++++++++++++++++++---------------------------
1 file changed, 103 insertions(+), 156 deletions(-)
--
2.17.1
Generally, the mapping page size are:
4K, 2M, 1G
except in case 32-bit without PAE, the mapping page size are:
4K, 4M
Based on PG_LEVEL_X definition and mr->page_size_mask, we can calculate
the mapping page size from a predefined string array.
Signed-off-by: Wei Yang <[email protected]>
---
arch/x86/mm/init.c | 39 +++++++++++++--------------------------
1 file changed, 13 insertions(+), 26 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 0eb5edb63fa2..ded58a31c679 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -308,29 +308,20 @@ static void __ref adjust_range_page_size_mask(struct map_range *mr,
}
}
-static const char *page_size_string(struct map_range *mr)
+static void __meminit mr_print(struct map_range *mr, unsigned int maxidx)
{
- static const char str_1g[] = "1G";
- static const char str_2m[] = "2M";
- static const char str_4m[] = "4M";
- static const char str_4k[] = "4k";
-
- if (mr->page_size_mask & (1U<<PG_LEVEL_1G))
- return str_1g;
- /*
- * 32-bit without PAE has a 4M large page size.
- * PG_LEVEL_2M is misnamed, but we can at least
- * print out the right size in the string.
- */
- if (IS_ENABLED(CONFIG_X86_32) &&
- !IS_ENABLED(CONFIG_X86_PAE) &&
- mr->page_size_mask & (1U<<PG_LEVEL_2M))
- return str_4m;
-
- if (mr->page_size_mask & (1U<<PG_LEVEL_2M))
- return str_2m;
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
+ static const char *sz[2] = { "4K", "4M" };
+#else
+ static const char *sz[4] = { "4K", "2M", "1G", "" };
+#endif
+ unsigned int idx, s;
- return str_4k;
+ for (idx = 0; idx < maxidx; idx++, mr++) {
+ s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
+ pr_debug(" [mem %#010lx-%#010lx] page size %s\n",
+ mr->start, mr->end - 1, sz[s]);
+ }
}
static int __meminit split_mem_range(struct map_range *mr,
@@ -425,11 +416,7 @@ static int __meminit split_mem_range(struct map_range *mr,
nr_range--;
}
- for (i = 0; i < nr_range; i++)
- pr_debug(" [mem %#010lx-%#010lx] page %s\n",
- mr[i].start, mr[i].end - 1,
- page_size_string(&mr[i]));
-
+ mr_print(mr, nr_range);
return nr_range;
}
--
2.17.1
This is not necessary to convert address to pfn to split range. And
finally, convert back to address and store it to map_range.
Signed-off-by: Wei Yang <[email protected]>
---
arch/x86/mm/init.c | 73 +++++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 37 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ded58a31c679..5fe3f645f02c 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -259,14 +259,14 @@ static void setup_pcid(void)
#endif
static int __meminit save_mr(struct map_range *mr, int nr_range,
- unsigned long start_pfn, unsigned long end_pfn,
+ unsigned long start, unsigned long end,
unsigned long page_size_mask)
{
if (start_pfn < end_pfn) {
if (nr_range >= NR_RANGE_MR)
panic("run out of range for init_memory_mapping\n");
- mr[nr_range].start = start_pfn<<PAGE_SHIFT;
- mr[nr_range].end = end_pfn<<PAGE_SHIFT;
+ mr[nr_range].start = start_pfn;
+ mr[nr_range].end = end_pfn;
mr[nr_range].page_size_mask = page_size_mask;
nr_range++;
}
@@ -328,14 +328,13 @@ static int __meminit split_mem_range(struct map_range *mr,
unsigned long start,
unsigned long end)
{
- unsigned long start_pfn, end_pfn, limit_pfn;
- unsigned long pfn;
+ unsigned long addr, limit;
int i, nr_range = 0;
- limit_pfn = PFN_DOWN(end);
+ limit = end;
/* head if not big page alignment ? */
- pfn = start_pfn = PFN_DOWN(start);
+ addr = start;
#ifdef CONFIG_X86_32
/*
* Don't use a large page for the first 2/4MB of memory
@@ -343,61 +342,61 @@ static int __meminit split_mem_range(struct map_range *mr,
* and overlapping MTRRs into large pages can cause
* slowdowns.
*/
- if (pfn == 0)
- end_pfn = PFN_DOWN(PMD_SIZE);
+ if (addr == 0)
+ end = PMD_SIZE;
else
- end_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
+ end = round_up(addr, PMD_SIZE);
#else /* CONFIG_X86_64 */
- end_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
+ end = round_up(addr, PMD_SIZE);
#endif
- if (end_pfn > limit_pfn)
- end_pfn = limit_pfn;
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);
- pfn = end_pfn;
+ if (end > limit)
+ end = limit;
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end, 0);
+ addr = end;
}
/* big page (2M) range */
- start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
+ start = round_up(addr, PMD_SIZE);
#ifdef CONFIG_X86_32
- end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));
+ end = round_down(limit, PMD_SIZE);
#else /* CONFIG_X86_64 */
- end_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));
- if (end_pfn > round_down(limit_pfn, PFN_DOWN(PMD_SIZE)))
- end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));
+ end = round_up(addr, PUD_SIZE);
+ if (end > round_down(limit, PMD_SIZE))
+ end = round_down(limit, PMD_SIZE);
#endif
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end,
page_size_mask & (1U<<PG_LEVEL_2M));
- pfn = end_pfn;
+ addr = end;
}
#ifdef CONFIG_X86_64
/* big page (1G) range */
- start_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));
- end_pfn = round_down(limit_pfn, PFN_DOWN(PUD_SIZE));
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,
+ start = round_up(addr, PUD_SIZE);
+ end = round_down(limit, PUD_SIZE);
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end,
page_size_mask &
((1U<<PG_LEVEL_2M)|(1U<<PG_LEVEL_1G)));
- pfn = end_pfn;
+ addr = end;
}
/* tail is not big page (1G) alignment */
- start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));
- end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));
- if (start_pfn < end_pfn) {
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,
+ start = round_up(addr, PMD_SIZE);
+ end = round_down(limit, PMD_SIZE);
+ if (start < end) {
+ nr_range = save_mr(mr, nr_range, start, end,
page_size_mask & (1U<<PG_LEVEL_2M));
- pfn = end_pfn;
+ addr = end;
}
#endif
/* tail is not big page (2M) alignment */
- start_pfn = pfn;
- end_pfn = limit_pfn;
- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);
+ start = addr;
+ end = limit;
+ nr_range = save_mr(mr, nr_range, start, end, 0);
if (!after_bootmem)
adjust_range_page_size_mask(mr, nr_range);
--
2.17.1
On Thu, Dec 05, 2019 at 10:14:01AM +0800, Wei Yang wrote:
> Generally, the mapping page size are:
>
> 4K, 2M, 1G
>
> except in case 32-bit without PAE, the mapping page size are:
>
> 4K, 4M
>
> Based on PG_LEVEL_X definition and mr->page_size_mask, we can calculate
> the mapping page size from a predefined string array.
>
> Signed-off-by: Wei Yang <[email protected]>
> ---
> arch/x86/mm/init.c | 39 +++++++++++++--------------------------
> 1 file changed, 13 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 0eb5edb63fa2..ded58a31c679 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -308,29 +308,20 @@ static void __ref adjust_range_page_size_mask(struct map_range *mr,
> }
> }
>
> +static void __meminit mr_print(struct map_range *mr, unsigned int maxidx)
> {
> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
> + static const char *sz[2] = { "4K", "4M" };
> +#else
> + static const char *sz[4] = { "4K", "2M", "1G", "" };
> +#endif
> + unsigned int idx, s;
>
> + for (idx = 0; idx < maxidx; idx++, mr++) {
> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
Is it at all possible for !PAE to have 1G here, if you use the sz[4]
definition unconditionally?
> + pr_debug(" [mem %#010lx-%#010lx] page size %s\n",
> + mr->start, mr->end - 1, sz[s]);
> + }
> }
On Thu, Dec 05, 2019 at 10:13:11AM +0100, Peter Zijlstra wrote:
>On Thu, Dec 05, 2019 at 10:14:01AM +0800, Wei Yang wrote:
>> Generally, the mapping page size are:
>>
>> 4K, 2M, 1G
>>
>> except in case 32-bit without PAE, the mapping page size are:
>>
>> 4K, 4M
>>
>> Based on PG_LEVEL_X definition and mr->page_size_mask, we can calculate
>> the mapping page size from a predefined string array.
>>
>> Signed-off-by: Wei Yang <[email protected]>
>> ---
>> arch/x86/mm/init.c | 39 +++++++++++++--------------------------
>> 1 file changed, 13 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index 0eb5edb63fa2..ded58a31c679 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -308,29 +308,20 @@ static void __ref adjust_range_page_size_mask(struct map_range *mr,
>> }
>> }
>>
>> +static void __meminit mr_print(struct map_range *mr, unsigned int maxidx)
>> {
>> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
>> + static const char *sz[2] = { "4K", "4M" };
>> +#else
>> + static const char *sz[4] = { "4K", "2M", "1G", "" };
>> +#endif
>> + unsigned int idx, s;
>>
>> + for (idx = 0; idx < maxidx; idx++, mr++) {
>> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
>
>Is it at all possible for !PAE to have 1G here, if you use the sz[4]
>definition unconditionally?
>
You mean remove the ifdef and use sz[4] for both condition?
Then how to differentiate 4M and 2M?
>> + pr_debug(" [mem %#010lx-%#010lx] page size %s\n",
>> + mr->start, mr->end - 1, sz[s]);
>> + }
>> }
--
Wei Yang
Help you, Help me
On Fri, Dec 06, 2019 at 09:51:26AM +0800, Wei Yang wrote:
> >> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
> >> + static const char *sz[2] = { "4K", "4M" };
> >> +#else
> >> + static const char *sz[4] = { "4K", "2M", "1G", "" };
> >> +#endif
> >> + unsigned int idx, s;
> >>
> >> + for (idx = 0; idx < maxidx; idx++, mr++) {
> >> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
> >
> >Is it at all possible for !PAE to have 1G here, if you use the sz[4]
> >definition unconditionally?
> >
>
> You mean remove the ifdef and use sz[4] for both condition?
>
> Then how to differentiate 4M and 2M?
Argh, I'm blind.. I failed to spot that. N/m then.
On Fri, Dec 06, 2019 at 11:27:46AM +0100, Peter Zijlstra wrote:
>On Fri, Dec 06, 2019 at 09:51:26AM +0800, Wei Yang wrote:
>
>> >> +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
>> >> + static const char *sz[2] = { "4K", "4M" };
>> >> +#else
>> >> + static const char *sz[4] = { "4K", "2M", "1G", "" };
>> >> +#endif
>> >> + unsigned int idx, s;
>> >>
>> >> + for (idx = 0; idx < maxidx; idx++, mr++) {
>> >> + s = (mr->page_size_mask >> PG_LEVEL_2M) & (ARRAY_SIZE(sz) - 1);
>> >
>> >Is it at all possible for !PAE to have 1G here, if you use the sz[4]
>> >definition unconditionally?
>> >
>>
>> You mean remove the ifdef and use sz[4] for both condition?
>>
>> Then how to differentiate 4M and 2M?
>
>Argh, I'm blind.. I failed to spot that. N/m then.
Never mind. I always do the same thing :-(
--
Wei Yang
Help you, Help me
Hi Wei,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on tip/x86/mm]
[also build test ERROR on tip/auto-latest linus/master v5.4 next-20191206]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7f264dab5b60343358e788d4c939c166c22ea4a2
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-1) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>
Note: the linux-review/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345 HEAD 7f535395f79354bfa29cca182dd203525bcb4237 builds fine.
It only hurts bisectibility.
All errors (new ones prefixed by >>):
arch/x86/mm/init.c: In function 'save_mr':
>> arch/x86/mm/init.c:265:6: error: 'start_pfn' undeclared (first use in this function); did you mean 'start'?
if (start_pfn < end_pfn) {
^~~~~~~~~
start
arch/x86/mm/init.c:265:6: note: each undeclared identifier is reported only once for each function it appears in
>> arch/x86/mm/init.c:265:18: error: 'end_pfn' undeclared (first use in this function); did you mean 'pgd_pfn'?
if (start_pfn < end_pfn) {
^~~~~~~
pgd_pfn
vim +265 arch/x86/mm/init.c
f765090a2617b8 Pekka Enberg 2009-03-05 260
dc9dd5cc854cde Jan Beulich 2009-03-12 261 static int __meminit save_mr(struct map_range *mr, int nr_range,
51c6d529efdc86 Wei Yang 2019-12-05 262 unsigned long start, unsigned long end,
f765090a2617b8 Pekka Enberg 2009-03-05 263 unsigned long page_size_mask)
f765090a2617b8 Pekka Enberg 2009-03-05 264 {
f765090a2617b8 Pekka Enberg 2009-03-05 @265 if (start_pfn < end_pfn) {
f765090a2617b8 Pekka Enberg 2009-03-05 266 if (nr_range >= NR_RANGE_MR)
f765090a2617b8 Pekka Enberg 2009-03-05 267 panic("run out of range for init_memory_mapping\n");
51c6d529efdc86 Wei Yang 2019-12-05 268 mr[nr_range].start = start_pfn;
51c6d529efdc86 Wei Yang 2019-12-05 269 mr[nr_range].end = end_pfn;
f765090a2617b8 Pekka Enberg 2009-03-05 270 mr[nr_range].page_size_mask = page_size_mask;
f765090a2617b8 Pekka Enberg 2009-03-05 271 nr_range++;
f765090a2617b8 Pekka Enberg 2009-03-05 272 }
f765090a2617b8 Pekka Enberg 2009-03-05 273
f765090a2617b8 Pekka Enberg 2009-03-05 274 return nr_range;
f765090a2617b8 Pekka Enberg 2009-03-05 275 }
f765090a2617b8 Pekka Enberg 2009-03-05 276
:::::: The code at line 265 was first introduced by commit
:::::: f765090a2617b8d9cb73b71e0aa850c29460d8be x86: move init_memory_mapping() to common mm/init.c
:::::: TO: Pekka Enberg <[email protected]>
:::::: CC: Ingo Molnar <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
On Sat, Dec 07, 2019 at 11:36:10AM +0800, kbuild test robot wrote:
>Hi Wei,
>
>Thank you for the patch! Yet something to improve:
>
>[auto build test ERROR on tip/x86/mm]
>[also build test ERROR on tip/auto-latest linus/master v5.4 next-20191206]
>[if your patch is applied to the wrong git tree, please drop us a note to help
>improve the system. BTW, we also suggest to use '--base' option to specify the
>base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
>url: https://github.com/0day-ci/linux/commits/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345
>base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7f264dab5b60343358e788d4c939c166c22ea4a2
>config: i386-tinyconfig (attached as .config)
>compiler: gcc-7 (Debian 7.5.0-1) 7.5.0
>reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
>If you fix the issue, kindly add following tag
>Reported-by: kbuild test robot <[email protected]>
>
>Note: the linux-review/Wei-Yang/x86-mm-Remove-second-argument-of-split_mem_range/20191207-061345 HEAD 7f535395f79354bfa29cca182dd203525bcb4237 builds fine.
> It only hurts bisectibility.
>
>All errors (new ones prefixed by >>):
>
> arch/x86/mm/init.c: In function 'save_mr':
>>> arch/x86/mm/init.c:265:6: error: 'start_pfn' undeclared (first use in this function); did you mean 'start'?
> if (start_pfn < end_pfn) {
> ^~~~~~~~~
> start
> arch/x86/mm/init.c:265:6: note: each undeclared identifier is reported only once for each function it appears in
>>> arch/x86/mm/init.c:265:18: error: 'end_pfn' undeclared (first use in this function); did you mean 'pgd_pfn'?
> if (start_pfn < end_pfn) {
> ^~~~~~~
> pgd_pfn
>
Oops, introduced an error after resolving a conflict. Should be start and end.
Will correct it in next version.
--
Wei Yang
Help you, Help me