2014-04-04 06:27:32

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH V2 0/2] FAULT_AROUND_ORDER patchset performance data for powerpc

Kirill A. Shutemov with faultaround patchset introduced
vm_ops->map_pages() for mapping easy accessible pages around
fault address in hope to reduce number of minor page faults.

This patchset creates infrastructure to move the FAULT_AROUND_ORDER
to arch/ using Kconfig. This will enable architecture maintainers
to decide on suitable FAULT_AROUND_ORDER value based on
performance data for that architecture. First patch also adds
FAULT_AROUND_ORDER Kconfig element for X86. Second patch list
out the performance numbers for powerpc (platform pseries) and
adds FAULT_AROUND_ORDER Kconfig element for powerpc.

V2 Changes:
Created Kconfig parameter for FAULT_AROUND_ORDER
Added check in do_read_fault to handle FAULT_AROUND_ORDER value of 0
Made changes in commit messages.

Madhavan Srinivasan (2):
mm: move FAULT_AROUND_ORDER to arch/
mm: add FAULT_AROUND_ORDER Kconfig paramater for powerpc

arch/powerpc/platforms/pseries/Kconfig | 5 +++++
arch/x86/Kconfig | 4 ++++
include/linux/mm.h | 9 +++++++++
mm/memory.c | 12 +++++-------
4 files changed, 23 insertions(+), 7 deletions(-)

--
1.7.10.4


2014-04-04 06:27:38

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

Kirill A. Shutemov with faultaround patchset introduced
vm_ops->map_pages() for mapping easy accessible pages around
fault address in hope to reduce number of minor page faults.

This patch creates infrastructure to move the FAULT_AROUND_ORDER
to arch/ using Kconfig. This will enable architecture maintainers
to decide on suitable FAULT_AROUND_ORDER value based on
performance data for that architecture. Patch also adds
FAULT_AROUND_ORDER Kconfig element in arch/X86.

Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/x86/Kconfig | 4 ++++
include/linux/mm.h | 9 +++++++++
mm/memory.c | 12 +++++-------
3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9c0a657..5833f22 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1177,6 +1177,10 @@ config DIRECT_GBPAGES
support it. This can improve the kernel's performance a tiny bit by
reducing TLB pressure. If in doubt, say "Y".

+config FAULT_AROUND_ORDER
+ int
+ default "4"
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0bd4359..b93c1c3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -26,6 +26,15 @@ struct file_ra_state;
struct user_struct;
struct writeback_control;

+/*
+ * Fault around order is a control knob to decide the fault around pages.
+ * Default value is set to 0UL (disabled), but the arch can override it as
+ * desired.
+ */
+#ifndef CONFIG_FAULT_AROUND_ORDER
+#define CONFIG_FAULT_AROUND_ORDER 0
+#endif
+
#ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */
extern unsigned long max_mapnr;

diff --git a/mm/memory.c b/mm/memory.c
index b02c584..22a4a89 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3358,10 +3358,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
update_mmu_cache(vma, address, pte);
}

-#define FAULT_AROUND_ORDER 4
-
#ifdef CONFIG_DEBUG_FS
-static unsigned int fault_around_order = FAULT_AROUND_ORDER;
+static unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;

static int fault_around_order_get(void *data, u64 *val)
{
@@ -3371,7 +3369,7 @@ static int fault_around_order_get(void *data, u64 *val)

static int fault_around_order_set(void *data, u64 val)
{
- BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
+ BUILD_BUG_ON((1UL << CONFIG_FAULT_AROUND_ORDER) > PTRS_PER_PTE);
if (1UL << val > PTRS_PER_PTE)
return -EINVAL;
fault_around_order = val;
@@ -3406,14 +3404,14 @@ static inline unsigned long fault_around_pages(void)
{
unsigned long nr_pages;

- nr_pages = 1UL << FAULT_AROUND_ORDER;
+ nr_pages = 1UL << CONFIG_FAULT_AROUND_ORDER;
BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
return nr_pages;
}

static inline unsigned long fault_around_mask(void)
{
- return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
+ return ~((1UL << (PAGE_SHIFT + CONFIG_FAULT_AROUND_ORDER)) - 1);
}
#endif

@@ -3471,7 +3469,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* if page by the offset is not ready to be mapped (cold cache or
* something).
*/
- if (vma->vm_ops->map_pages) {
+ if ((vma->vm_ops->map_pages) && (fault_around_pages() > 1)) {
pte = pte_offset_map_lock(mm, pmd, address, &ptl);
do_fault_around(vma, address, pte, pgoff, flags);
if (!pte_same(*pte, orig_pte))
--
1.7.10.4

2014-04-04 06:27:42

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH V2 2/2] mm: add FAULT_AROUND_ORDER Kconfig paramater for powerpc

Performance data for different FAULT_AROUND_ORDER values from 4 socket
Power7 system (128 Threads and 128GB memory) is below. perf stat with
repeat of 5 is used to get the stddev values. This patch create
FAULT_AROUND_ORDER Kconfig parameter and defaults it to 3 based on the
performance data.

FAULT_AROUND_ORDER Baseline 1 3 4 5 7

Linux build (make -j64)
minor-faults 7184385 5874015 4567289 4318518 4193815 4159193
times in seconds 61.433776136 60.865935292 59.245368038 60.630675011 60.56587624 59.828271924
stddev for time ( +- 1.18% ) ( +- 1.78% ) ( +- 0.44% ) ( +- 2.03% ) ( +- 1.66% ) ( +- 1.45% )

Linux rebuild (make -j64)
minor-faults 303018 226392 146170 132480 126878 126236
times in seconds 5.659819172 5.723996942 5.591238319 5.622533357 5.878811995 5.550133096
stddev for time ( +- 0.71% ) ( +- 0.78% ) ( +- 0.62% ) ( +- 0.45% ) ( +- 1.55% ) ( +- 0.29% )

Two synthetic tests: access every word in file in sequential/random order.
Marginal Performance gains seen for FAO value of 3 when compared to value
of 4.

Sequential access 16GiB file
FAULT_AROUND_ORDER Baseline 1 3 4 5 7
1 thread
minor-faults 262302 131192 32873 16486 8291 2351
times in seconds 53.071497352 52.945826882 52.931417302 52.928577184 52.859285439 53.116800539
stddev for time ( +- 0.01% ) ( +- 0.02% ) ( +- 0.02% ) ( +- 0.04% ) ( +- 0.04% ) ( +- 0.01% )
8 threads
minor-faults 2097314 1051046 263336 131715 66098 16653
times in seconds 54.385698561 54.603652339 54.771282004 54.488565674 54.496701531 54.962142189
stddev for time ( +- 0.05% ) ( +- 0.02% ) ( +- 0.37% ) ( +- 0.08% ) ( +- 0.07% ) ( +- 0.51% )
32 threads
minor-faults 8389267 4218595 1059961 531319 266463 67271
times in seconds 60.61715047 60.827964038 60.46412673 60.266045885 60.492398315 60.24531921
stddev for time ( +- 0.65% ) ( +- 0.21% ) ( +- 0.25% ) ( +- 0.29% ) ( +- 0.19% ) ( +- 0.35% )
64 threads
minor-faults 16777455 8485998 2178582 1092106 544302 137693
times in seconds 86.471334554 84.412415735 85.208303832 84.331473392 85.598793479 84.695469266
stddev for time ( +- 0.60% ) ( +- 1.47% ) ( +- 0.74% ) ( +- 1.55% ) ( +- 0.92% ) ( +- 1.16% )
128 threads
minor-faults 33555267 17734522 4710107 2380821 1182707 292077
times in seconds 117.535385569 114.291359037 112.593908276 113.081807611 114.358686588 114.491043011
stddev for time ( +- 2.24% ) ( +- 0.92% ) ( +- 0.36% ) ( +- 0.53% ) ( +- 0.70% ) ( +- 0.53% )

Random access 1GiB file
FAULT_AROUND_ORDER Baseline 1 3 4 5 7
1 thread
minor-faults 16503 8664 2149 1126 610 437
times in seconds 43.843573808 48.042069805 50.580779682 54.282884593 52.641739876 51.803302129
stddev for time ( +- 1.30% ) ( +- 2.25% ) ( +- 2.92% ) ( +- 1.44% ) ( +- 4.49% ) ( +- 3.78% )
8 threads
minor-faults 131201 70916 17760 8665 4250 1149
times in seconds 46.262626804 55.942851041 56.629191584 57.97044714 55.417557594 56.019709166
stddev for time ( +- 4.66% ) ( +- 1.52% ) ( +- 1.43% ) ( +- 1.61% ) ( +- 0.65% ) ( +- 1.27% )
32 threads
minor-faults 524959 265980 67282 33601 16930 4316
times in seconds 67.754175928 69.85012331 71.750338061 71.053074643 68.90728294 71.250103217
stddev for time ( +- 3.79% ) ( +- 0.77% ) ( +- 1.15% ) ( +- 1.08% ) ( +- 2.14% ) ( +- 1.17% )
64 threads
minor-faults 1048831 528829 133256 66700 33428 8776
times in seconds 96.674025305 93.109961822 87.441777715 91.986332028 88.686748472 93.101434306
stddev for time ( +- 2.85% ) ( +- 2.42% ) ( +- 0.42% ) ( +- 1.58% ) ( +- 1.29% ) ( +- 2.01% )
128 threads
minor-faults 2098043 1053224 266271 133702 66966 17276
times in seconds 156.525792044 152.117971403 147.523673243 148.560226602 148.596575663 149.389288429
stddev for time ( +- 2.39% ) ( +- 0.71% ) ( +- 0.72% ) ( +- 0.35% ) ( +- 0.41% ) ( +- 1.04% )

Worst case scenario: we touch one page every 16M to demonstrate overhead.

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER Baseline 1 3 4 5 7
1 thread
minor-faults 1077 1064 1051 1048 1046 1045
times in seconds 0.00615347 0.008327379 0.019775282 0.034444003 0.05905971 0.220863339
stddev for time ( +- 3.12% ) ( +- 2.29% ) ( +- 1.68% ) ( +- 1.40% ) ( +- 1.68% ) ( +- 1.53% )
8 threads
minor-faults 8252 8239 8226 8223 8220 8224
times in seconds 0.04387392 0.059859294 0.113897648 0.199707764 0.361585762 1.343366843
stddev for time ( +- 3.66% ) ( +- 2.18% ) ( +- 1.44% ) ( +- 2.40% ) ( +- 2.08% ) ( +- 1.75% )
32 threads
minor-faults 32852 32841 32825 32826 32824 32828
times in seconds 0.191404544 0.21907773 0.433207123 0.72430447 1.334983196 4.97727449
stddev for time ( +- 5.08% ) ( +- 3.19% ) ( +- 4.45% ) ( +- 2.18% ) ( +- 2.75% ) ( +- 1.52% )
64 threads
minor-faults 65652 65642 65629 65622 65623 65634
times in seconds 0.402140429 0.510806718 0.854288645 1.412329805 2.556707704 8.711074863
stddev for time ( +- 2.90% ) ( +- 3.04% ) ( +- 1.83% ) ( +- 2.75% ) ( +- 2.49% ) ( +- 0.68% )
128 threads
minor-faults 131255 131239 131228 131228 131229 131243
times in seconds 0.817782148 1.124631348 2.023730928 3.184792382 5.331392072 17.309524609
stddev for time ( +- 3.35% ) ( +- 11.74% ) ( +- 4.55% ) ( +- 2.00% ) ( +- 1.31% ) ( +- 1.30% )

Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/platforms/pseries/Kconfig | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 2cb8b77..2246d9f 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -132,3 +132,8 @@ config DTL
which are accessible through a debugfs file.

Say N if you are unsure.
+
+config FAULT_AROUND_ORDER
+ int
+ default "3"
+
--
1.7.10.4

2014-04-04 07:03:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm: add FAULT_AROUND_ORDER Kconfig paramater for powerpc


* Madhavan Srinivasan <[email protected]> wrote:

> Performance data for different FAULT_AROUND_ORDER values from 4 socket
> Power7 system (128 Threads and 128GB memory) is below. perf stat with
> repeat of 5 is used to get the stddev values. This patch create
> FAULT_AROUND_ORDER Kconfig parameter and defaults it to 3 based on the
> performance data.
>
> FAULT_AROUND_ORDER Baseline 1 3 4 5 7
>
> Linux build (make -j64)
> minor-faults 7184385 5874015 4567289 4318518 4193815 4159193
> times in seconds 61.433776136 60.865935292 59.245368038 60.630675011 60.56587624 59.828271924
> stddev for time ( +- 1.18% ) ( +- 1.78% ) ( +- 0.44% ) ( +- 2.03% ) ( +- 1.66% ) ( +- 1.45% )

Ok, this is better, but it is still rather incomplete statistically,
please also calculate the percentage difference to baseline, so that
the stddev becomes meaningful and can be compared to something!

As an example I did this for the first line of measurements (all
errors in the numbers are mine, this was done manually), and it gives:

> stddev for time ( +- 1.18% ) ( +- 1.78% ) ( +- 0.44% ) ( +- 2.03% ) ( +- 1.66% ) ( +- 1.45% )
+0.9% +3.5% +1.3% +1.4% +2.6%

This shows that there is probably a statistically significant
(positiv) effect from the change, but from these numbers alone I would
not draw any quantitative (sizing, tuning) conclusions, because in 3
out of 5 cases the stddev was larger than the effect, so the resulting
percentages are not comparable.

Please do this calculation for all the other lines as well and also
close all the numbers with a conclusion section where you *analyze*
the results, outline the statistics and compare the various workloads
and how the tuning affects them and don't force the readers of the
commit guess what it all means and how significant it all is!

Thanks,

Ingo

2014-04-04 07:11:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm: add FAULT_AROUND_ORDER Kconfig paramater for powerpc


* Ingo Molnar <[email protected]> wrote:

> * Madhavan Srinivasan <[email protected]> wrote:
>
> > Performance data for different FAULT_AROUND_ORDER values from 4
> > socket Power7 system (128 Threads and 128GB memory) is below. perf
> > stat with repeat of 5 is used to get the stddev values. This patch
> > create FAULT_AROUND_ORDER Kconfig parameter and defaults it to 3
> > based on the performance data.
> >
> > FAULT_AROUND_ORDER Baseline 1 3 4 5 7
> >
> > Linux build (make -j64)
> > minor-faults 7184385 5874015 4567289 4318518 4193815 4159193
> > times in seconds 61.433776136 60.865935292 59.245368038 60.630675011 60.56587624 59.828271924
> > stddev for time ( +- 1.18% ) ( +- 1.78% ) ( +- 0.44% ) ( +- 2.03% ) ( +- 1.66% ) ( +- 1.45% )
>
> Ok, this is better, but it is still rather incomplete statistically,
> please also calculate the percentage difference to baseline, so that
> the stddev becomes meaningful and can be compared to something!
>
> As an example I did this for the first line of measurements (all
> errors in the numbers are mine, this was done manually), and it
> gives:
>
> > stddev for time ( +- 1.18% ) ( +- 1.78% ) ( +- 0.44% ) ( +- 2.03% ) ( +- 1.66% ) ( +- 1.45% )
> +0.9% +3.5% +1.3% +1.4% +2.6%
>
> This shows that there is probably a statistically significant
> (positiv) effect from the change, but from these numbers alone I
> would not draw any quantitative (sizing, tuning) conclusions,
> because in 3 out of 5 cases the stddev was larger than the effect,
> so the resulting percentages are not comparable.

Also note that because we calculate the percentage by dividing result
with baseline, the stddev of the two values roughly adds up. So for
example the second column the true noise is around 1.5%, not 0.4%

So for good sizing decisions the stddev must be 'comfortably' below
the effect. (or sizing should be done based on the other workloads yu
tested, I have not checked them.)

It also makes sense to run more measurements to reduce the stddev of
the baseline. So if each measurement is run 3 times then it makes
sense to run the baseline 6 times, this gives a ~30% improvement in
the confidence of our result, at just a small increase in test time.

[ For such cases it might also make sense to script all of that,
combined with a debug patch that puts the tuned fault-around value
into a dynamic knob in /proc/sys/, so that you can run the full
measurement in a single pass, with no reboot and with no human
intervention. ]

Thanks,

Ingo

2014-04-04 13:20:55

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Fri, Apr 04, 2014 at 11:57:14AM +0530, Madhavan Srinivasan wrote:
> Kirill A. Shutemov with faultaround patchset introduced
> vm_ops->map_pages() for mapping easy accessible pages around
> fault address in hope to reduce number of minor page faults.
>
> This patch creates infrastructure to move the FAULT_AROUND_ORDER
> to arch/ using Kconfig. This will enable architecture maintainers
> to decide on suitable FAULT_AROUND_ORDER value based on
> performance data for that architecture. Patch also adds
> FAULT_AROUND_ORDER Kconfig element in arch/X86.
>
> Signed-off-by: Madhavan Srinivasan <[email protected]>
> ---
> arch/x86/Kconfig | 4 ++++
> include/linux/mm.h | 9 +++++++++
> mm/memory.c | 12 +++++-------
> 3 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9c0a657..5833f22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1177,6 +1177,10 @@ config DIRECT_GBPAGES
> support it. This can improve the kernel's performance a tiny bit by
> reducing TLB pressure. If in doubt, say "Y".
>
> +config FAULT_AROUND_ORDER
> + int
> + default "4"
> +
> # Common NUMA Features
> config NUMA
> bool "Numa Memory Allocation and Scheduler Support"
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0bd4359..b93c1c3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -26,6 +26,15 @@ struct file_ra_state;
> struct user_struct;
> struct writeback_control;
>
> +/*
> + * Fault around order is a control knob to decide the fault around pages.
> + * Default value is set to 0UL (disabled), but the arch can override it as
> + * desired.
> + */
> +#ifndef CONFIG_FAULT_AROUND_ORDER
> +#define CONFIG_FAULT_AROUND_ORDER 0
> +#endif
> +

I don't think it should be in header file: nobody except mm/memory.c cares.
Just put it instead '#define FAULT_AROUND_ORDER'.

> #ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */
> extern unsigned long max_mapnr;
>
> diff --git a/mm/memory.c b/mm/memory.c
> index b02c584..22a4a89 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3358,10 +3358,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
> update_mmu_cache(vma, address, pte);
> }
>
> -#define FAULT_AROUND_ORDER 4
> -
> #ifdef CONFIG_DEBUG_FS
> -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
> +static unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;
>
> static int fault_around_order_get(void *data, u64 *val)
> {
> @@ -3371,7 +3369,7 @@ static int fault_around_order_get(void *data, u64 *val)
>
> static int fault_around_order_set(void *data, u64 val)
> {
> - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
> + BUILD_BUG_ON((1UL << CONFIG_FAULT_AROUND_ORDER) > PTRS_PER_PTE);
> if (1UL << val > PTRS_PER_PTE)
> return -EINVAL;
> fault_around_order = val;
> @@ -3406,14 +3404,14 @@ static inline unsigned long fault_around_pages(void)
> {
> unsigned long nr_pages;
>
> - nr_pages = 1UL << FAULT_AROUND_ORDER;
> + nr_pages = 1UL << CONFIG_FAULT_AROUND_ORDER;
> BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
> return nr_pages;
> }
>
> static inline unsigned long fault_around_mask(void)
> {
> - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
> + return ~((1UL << (PAGE_SHIFT + CONFIG_FAULT_AROUND_ORDER)) - 1);
> }
> #endif
>
> @@ -3471,7 +3469,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> * if page by the offset is not ready to be mapped (cold cache or
> * something).
> */
> - if (vma->vm_ops->map_pages) {
> + if ((vma->vm_ops->map_pages) && (fault_around_pages() > 1)) {

if (vma->vm_ops->map_pages && fault_around_pages()) {

> pte = pte_offset_map_lock(mm, pmd, address, &ptl);
> do_fault_around(vma, address, pte, pgoff, flags);
> if (!pte_same(*pte, orig_pte))
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kirill A. Shutemov

2014-04-04 16:20:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote:
> This patch creates infrastructure to move the FAULT_AROUND_ORDER
> to arch/ using Kconfig. This will enable architecture maintainers
> to decide on suitable FAULT_AROUND_ORDER value based on
> performance data for that architecture. Patch also adds
> FAULT_AROUND_ORDER Kconfig element in arch/X86.

Please don't do it this way.

In mm/Kconfig, put

config FAULT_AROUND_ORDER
int
default 1234 if POWERPC
default 4

The way you have it now, every single architecture that needs to enable
this has to go put that in their Kconfig. That's madness. This way,
you only put it in one place, and folks only have to care if they want
to change the default to be something other than 4.

2014-04-04 17:49:24

by David Miller

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

From: Dave Hansen <[email protected]>
Date: Fri, 04 Apr 2014 09:18:43 -0700

> On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote:
>> This patch creates infrastructure to move the FAULT_AROUND_ORDER
>> to arch/ using Kconfig. This will enable architecture maintainers
>> to decide on suitable FAULT_AROUND_ORDER value based on
>> performance data for that architecture. Patch also adds
>> FAULT_AROUND_ORDER Kconfig element in arch/X86.
>
> Please don't do it this way.
>
> In mm/Kconfig, put
>
> config FAULT_AROUND_ORDER
> int
> default 1234 if POWERPC
> default 4
>
> The way you have it now, every single architecture that needs to enable
> this has to go put that in their Kconfig. That's madness. This way,
> you only put it in one place, and folks only have to care if they want
> to change the default to be something other than 4.

It looks more like it's necessary only to change the default, not
to enable it. Unless I read his patch wrong...

2014-04-07 07:06:08

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Fri, 2014-04-04 at 09:18 -0700, Dave Hansen wrote:
> On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote:
> > This patch creates infrastructure to move the FAULT_AROUND_ORDER
> > to arch/ using Kconfig. This will enable architecture maintainers
> > to decide on suitable FAULT_AROUND_ORDER value based on
> > performance data for that architecture. Patch also adds
> > FAULT_AROUND_ORDER Kconfig element in arch/X86.
>
> Please don't do it this way.
>
> In mm/Kconfig, put
>
> config FAULT_AROUND_ORDER
> int
> default 1234 if POWERPC
> default 4
>
> The way you have it now, every single architecture that needs to enable
> this has to go put that in their Kconfig. That's madness. This way,
> you only put it in one place, and folks only have to care if they want
> to change the default to be something other than 4.

Also does it have to be a constant ? Maddy here tested on our POWER
servers. The "Sweet spot" value might be VERY different on an embedded
chip or even on a future generation of server chip.

Cheers,
Ben.

2014-04-09 01:14:56

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Friday 04 April 2014 06:47 PM, Kirill A. Shutemov wrote:
> On Fri, Apr 04, 2014 at 11:57:14AM +0530, Madhavan Srinivasan wrote:
>> Kirill A. Shutemov with faultaround patchset introduced
>> vm_ops->map_pages() for mapping easy accessible pages around
>> fault address in hope to reduce number of minor page faults.
>>
>> This patch creates infrastructure to move the FAULT_AROUND_ORDER
>> to arch/ using Kconfig. This will enable architecture maintainers
>> to decide on suitable FAULT_AROUND_ORDER value based on
>> performance data for that architecture. Patch also adds
>> FAULT_AROUND_ORDER Kconfig element in arch/X86.
>>
>> Signed-off-by: Madhavan Srinivasan <[email protected]>
>> ---
>> arch/x86/Kconfig | 4 ++++
>> include/linux/mm.h | 9 +++++++++
>> mm/memory.c | 12 +++++-------
>> 3 files changed, 18 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 9c0a657..5833f22 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1177,6 +1177,10 @@ config DIRECT_GBPAGES
>> support it. This can improve the kernel's performance a tiny bit by
>> reducing TLB pressure. If in doubt, say "Y".
>>
>> +config FAULT_AROUND_ORDER
>> + int
>> + default "4"
>> +
>> # Common NUMA Features
>> config NUMA
>> bool "Numa Memory Allocation and Scheduler Support"
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 0bd4359..b93c1c3 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -26,6 +26,15 @@ struct file_ra_state;
>> struct user_struct;
>> struct writeback_control;
>>
>> +/*
>> + * Fault around order is a control knob to decide the fault around pages.
>> + * Default value is set to 0UL (disabled), but the arch can override it as
>> + * desired.
>> + */
>> +#ifndef CONFIG_FAULT_AROUND_ORDER
>> +#define CONFIG_FAULT_AROUND_ORDER 0
>> +#endif
>> +
>
> I don't think it should be in header file: nobody except mm/memory.c cares.
> Just put it instead '#define FAULT_AROUND_ORDER'.
>

Ok. Will do this change.

>> #ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */
>> extern unsigned long max_mapnr;
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b02c584..22a4a89 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3358,10 +3358,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>> update_mmu_cache(vma, address, pte);
>> }
>>
>> -#define FAULT_AROUND_ORDER 4
>> -
>> #ifdef CONFIG_DEBUG_FS
>> -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
>> +static unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;
>>
>> static int fault_around_order_get(void *data, u64 *val)
>> {
>> @@ -3371,7 +3369,7 @@ static int fault_around_order_get(void *data, u64 *val)
>>
>> static int fault_around_order_set(void *data, u64 val)
>> {
>> - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
>> + BUILD_BUG_ON((1UL << CONFIG_FAULT_AROUND_ORDER) > PTRS_PER_PTE);
>> if (1UL << val > PTRS_PER_PTE)
>> return -EINVAL;
>> fault_around_order = val;
>> @@ -3406,14 +3404,14 @@ static inline unsigned long fault_around_pages(void)
>> {
>> unsigned long nr_pages;
>>
>> - nr_pages = 1UL << FAULT_AROUND_ORDER;
>> + nr_pages = 1UL << CONFIG_FAULT_AROUND_ORDER;
>> BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
>> return nr_pages;
>> }
>>
>> static inline unsigned long fault_around_mask(void)
>> {
>> - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
>> + return ~((1UL << (PAGE_SHIFT + CONFIG_FAULT_AROUND_ORDER)) - 1);
>> }
>> #endif
>>
>> @@ -3471,7 +3469,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>> * if page by the offset is not ready to be mapped (cold cache or
>> * something).
>> */
>> - if (vma->vm_ops->map_pages) {
>> + if ((vma->vm_ops->map_pages) && (fault_around_pages() > 1)) {
>
> if (vma->vm_ops->map_pages && fault_around_pages()) {
>
For a fault around value of 0, fault_around_pages() will return 1 and
that is reason for checking it greater than 1. Also, using debug fs,
fault around value can be zeroed.

With regards
Maddy
>> pte = pte_offset_map_lock(mm, pmd, address, &ptl);
>> do_fault_around(vma, address, pte, pgoff, flags);
>> if (!pte_same(*pte, orig_pte))
>> --
>> 1.7.10.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>

2014-04-09 01:32:38

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Friday 04 April 2014 09:48 PM, Dave Hansen wrote:
> On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote:
>> This patch creates infrastructure to move the FAULT_AROUND_ORDER
>> to arch/ using Kconfig. This will enable architecture maintainers
>> to decide on suitable FAULT_AROUND_ORDER value based on
>> performance data for that architecture. Patch also adds
>> FAULT_AROUND_ORDER Kconfig element in arch/X86.
>
> Please don't do it this way.
>
> In mm/Kconfig, put
>
> config FAULT_AROUND_ORDER
> int
> default 1234 if POWERPC
> default 4
>
> The way you have it now, every single architecture that needs to enable
> this has to go put that in their Kconfig. That's madness. This way,

I though about it and decided not to do this way because, in future,
sub platforms of the architecture may decide to change the values. Also,
adding an if line for each architecture with different sub platforms
oring to it will look messy.

With regards
Maddy

> you only put it in one place, and folks only have to care if they want
> to change the default to be something other than 4.
>

2014-04-09 01:45:00

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Friday 04 April 2014 11:20 PM, David Miller wrote:
> From: Dave Hansen <[email protected]>
> Date: Fri, 04 Apr 2014 09:18:43 -0700
>
>> On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote:
>>> This patch creates infrastructure to move the FAULT_AROUND_ORDER
>>> to arch/ using Kconfig. This will enable architecture maintainers
>>> to decide on suitable FAULT_AROUND_ORDER value based on
>>> performance data for that architecture. Patch also adds
>>> FAULT_AROUND_ORDER Kconfig element in arch/X86.
>>
>> Please don't do it this way.
>>
>> In mm/Kconfig, put
>>
>> config FAULT_AROUND_ORDER
>> int
>> default 1234 if POWERPC
>> default 4
>>
>> The way you have it now, every single architecture that needs to enable
>> this has to go put that in their Kconfig. That's madness. This way,
>> you only put it in one place, and folks only have to care if they want
>> to change the default to be something other than 4.
>
> It looks more like it's necessary only to change the default, not
> to enable it. Unless I read his patch wrong...
>
Yes. With current patch, you only need to change the default by which
you enable it.

With regards
Maddy
>

2014-04-09 08:20:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Wed, Apr 09, 2014 at 07:02:02AM +0530, Madhavan Srinivasan wrote:
> On Friday 04 April 2014 09:48 PM, Dave Hansen wrote:
> > On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote:
> >> This patch creates infrastructure to move the FAULT_AROUND_ORDER
> >> to arch/ using Kconfig. This will enable architecture maintainers
> >> to decide on suitable FAULT_AROUND_ORDER value based on
> >> performance data for that architecture. Patch also adds
> >> FAULT_AROUND_ORDER Kconfig element in arch/X86.
> >
> > Please don't do it this way.
> >
> > In mm/Kconfig, put
> >
> > config FAULT_AROUND_ORDER
> > int
> > default 1234 if POWERPC
> > default 4
> >
> > The way you have it now, every single architecture that needs to enable
> > this has to go put that in their Kconfig. That's madness. This way,
>
> I though about it and decided not to do this way because, in future,
> sub platforms of the architecture may decide to change the values. Also,
> adding an if line for each architecture with different sub platforms
> oring to it will look messy.

This still misses out on Ben's objection that its impossible to get this
right at compile time for many kernels, since they can boot and run on
many different subarchs.

2014-04-09 15:51:05

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On 04/08/2014 06:32 PM, Madhavan Srinivasan wrote:
>> > In mm/Kconfig, put
>> >
>> > config FAULT_AROUND_ORDER
>> > int
>> > default 1234 if POWERPC
>> > default 4
>> >
>> > The way you have it now, every single architecture that needs to enable
>> > this has to go put that in their Kconfig. That's madness. This way,
> I though about it and decided not to do this way because, in future,
> sub platforms of the architecture may decide to change the values. Also,
> adding an if line for each architecture with different sub platforms
> oring to it will look messy.

I'm not sure why I'm trying here any more. You do seem quite content to
add as much cruft to ppc and every other architecture as possible. If
your theoretical scenario pops up, you simply do this in ppc:

config ARCH_FAULT_AROUND_ORDER
int
default 999
default 888 if OTHER_SILLY_POWERPC_SUBARCH

But *ONLY* in the architectures that care about doing that stuff. You
leave every other architecture on the planet alone. Then, in mm/Kconfig:

config FAULT_AROUND_ORDER
int
default ARCH_FAULT_AROUND_ORDER if ARCH_FAULT_AROUND_ORDER
default 4

Your way still requires going and individually touching every single
architecture's Kconfig that wants to enable fault around. That's not an
acceptable solution.

2014-04-09 15:52:00

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On 04/09/2014 01:20 AM, Peter Zijlstra wrote:
> This still misses out on Ben's objection that its impossible to get this
> right at compile time for many kernels, since they can boot and run on
> many different subarchs.

Completely agree. The Kconfig-time stuff should probably just be a knob
to turn it off completely, if anything.

2014-04-10 08:29:43

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

On Wednesday 09 April 2014 09:18 PM, Dave Hansen wrote:
> On 04/09/2014 01:20 AM, Peter Zijlstra wrote:
>> This still misses out on Ben's objection that its impossible to get this
>> right at compile time for many kernels, since they can boot and run on
>> many different subarchs.
>
> Completely agree. The Kconfig-time stuff should probably just be a knob
> to turn it off completely, if anything.
>

ok. Here is my thought. So to address Ben's concern, it would be better
to have this as a variable with a default value (and the platform can
override ride it). And a mm/Kconfig to disable it?
Kindly let me know whether this will work.

Thanks for review comments.
With regards
Maddy

2014-04-22 07:39:27

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH V2 1/2] mm: move FAULT_AROUND_ORDER to arch/

Dave Hansen <[email protected]> writes:
> On 04/08/2014 06:32 PM, Madhavan Srinivasan wrote:
>>> > In mm/Kconfig, put
>>> >
>>> > config FAULT_AROUND_ORDER
>>> > int
>>> > default 1234 if POWERPC
>>> > default 4
>>> >
>>> > The way you have it now, every single architecture that needs to enable
>>> > this has to go put that in their Kconfig. That's madness. This way,
>> I though about it and decided not to do this way because, in future,
>> sub platforms of the architecture may decide to change the values. Also,
>> adding an if line for each architecture with different sub platforms
>> oring to it will look messy.
>
> I'm not sure why I'm trying here any more. You do seem quite content to
> add as much cruft to ppc and every other architecture as possible. If
> your theoretical scenario pops up, you simply do this in ppc:
>
> config ARCH_FAULT_AROUND_ORDER
> int
> default 999
> default 888 if OTHER_SILLY_POWERPC_SUBARCH
>
> But *ONLY* in the architectures that care about doing that stuff. You
> leave every other architecture on the planet alone. Then, in mm/Kconfig:
>
> config FAULT_AROUND_ORDER
> int
> default ARCH_FAULT_AROUND_ORDER if ARCH_FAULT_AROUND_ORDER
> default 4
>
> Your way still requires going and individually touching every single
> architecture's Kconfig that wants to enable fault around. That's not an
> acceptable solution.

Why bother with Kconfig at all? It seems like a weird indirection.

And talking about future tuning seems like a separate issue, if and when
someone does the work. For the moment, let's keep it simple (as below).

If you really want Kconfig, then just go straight from
ARCH_FAULT_AROUND_ORDER, ie:

#ifdef CONFIG_ARCH_FAULT_AROUND_ORDER
#define FAULT_AROUND_ORDER CONFIG_ARCH_FAULT_AROUND_ORDER
#else
#define FAULT_AROUND_ORDER 4
#endif

Then powerpc's Kconfig defines CONFIG_ARCH_FAULT_AROUND_ORDER, and
we're done.

Cheers,
Rusty.

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 32e4e212b9c1..b519c5c53cfc 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -412,4 +412,7 @@ typedef struct page *pgtable_t;
#include <asm-generic/memory_model.h>
#endif /* __ASSEMBLY__ */

+/* Measured on a 4 socket Power7 system (128 Threads and 128GB memory) */
+#define FAULT_AROUND_ORDER 3
+
#endif /* _ASM_POWERPC_PAGE_H */
diff --git a/mm/memory.c b/mm/memory.c
index d0f0bef3be48..9aa47e9ec7ba 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3382,7 +3382,10 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
update_mmu_cache(vma, address, pte);
}

+/* Archs can override, but this seems to work for x86. */
+#ifndef FAULT_AROUND_ORDER
#define FAULT_AROUND_ORDER 4
+#endif

#ifdef CONFIG_DEBUG_FS
static unsigned int fault_around_order = FAULT_AROUND_ORDER;