LinuxLists.cc - [PATCH v2 0/2] mm, thp: Fix unnecessarry resource consuming in swapin

2016-03-13 09:29:21

Subject: [PATCH v2 0/2] mm, thp: Fix unnecessarry resource consuming in swapin

This patch series fixes unnecessarry resource consuming
in khugepaged swapin and introduces a new function to
calculate value of specific vm event.

Ebru Akagunduz (2):
mm, vmstat: calculate particular vm event
mm, thp: avoid unnecessary swapin in khugepaged

include/linux/vmstat.h | 2 ++
mm/huge_memory.c | 13 +++++++++++--
mm/vmstat.c | 12 ++++++++++++
3 files changed, 25 insertions(+), 2 deletions(-)

--
1.9.1

2016-03-13 09:29:33

by Ebru Akagunduz

[permalink] [raw]

Subject: [PATCH v2 1/2] mm, vmstat: calculate particular vm event

Currently, vmstat can calculate specific vm event with all_vm_events()
however it allocates all vm events to stack. This patch introduces
a helper to sum value of a specific vm event over all cpu, without
loading all the events.

Signed-off-by: Ebru Akagunduz <[email protected]>
---
Changes in v2:
- this patch newly created in this version
- create sum event function to
calculate particular vm event (Kirill A. Shutemov)

include/linux/vmstat.h | 2 ++
mm/vmstat.c | 12 ++++++++++++
2 files changed, 14 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 73fae8c..add0cc1 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -53,6 +53,8 @@ static inline void count_vm_events(enum vm_event_item item, long delta)

extern void all_vm_events(unsigned long *);

+extern unsigned long sum_vm_event(enum vm_event_item item);
+
extern void vm_events_fold_cpu(int cpu);

#else
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 5e43004..b76d664 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -34,6 +34,18 @@
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
EXPORT_PER_CPU_SYMBOL(vm_event_states);

+unsigned long sum_vm_event(enum vm_event_item item)
+{
+ int cpu;
+ unsigned long ret = 0;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ ret += per_cpu(vm_event_states, cpu).event[item];
+ put_online_cpus();
+ return ret;
+}
+
static void sum_vm_events(unsigned long *ret)
{
int cpu;
--
1.9.1

2016-03-13 09:29:45

by Ebru Akagunduz

[permalink] [raw]

Subject: [PATCH v2 2/2] mm, thp: avoid unnecessary swapin in khugepaged

Currently khugepaged makes swapin readahead to improve
THP collapse rate. This patch checks vm statistics
to avoid workload of swapin, if unnecessary. So that
when system under pressure, khugepaged won't consume
resources to swapin.

The patch was tested with a test program that allocates
800MB of memory, writes to it, and then sleeps. The system
was forced to swap out all. Afterwards, the test program
touches the area by writing, it skips a page in each
20 pages of the area. When waiting to swapin readahead
left part of the test, the memory forced to be busy
doing page reclaim. There was enough free memory during
test, khugepaged did not swapin readahead due to business.

Test results:

After swapped out
-------------------------------------------------------------------
| Anonymous | AnonHugePages | Swap | Fraction |
-------------------------------------------------------------------
With patch | 325784 kB | 325632 kB | 474216 kB | %99 |
-------------------------------------------------------------------
Without patch | 351308 kB | 350208 kB | 448692 kB | %99 |
-------------------------------------------------------------------

After swapped in (waiting 10 minutes)
-------------------------------------------------------------------
| Anonymous | AnonHugePages | Swap | Fraction |
-------------------------------------------------------------------
With patch | 714164 kB | 489472 kB | 85836 kB | %68 |
-------------------------------------------------------------------
Without patch | 586816 kB | 464896 kB | 213184 kB | %79 |
-------------------------------------------------------------------

Signed-off-by: Ebru Akagunduz <[email protected]>
Fixes: 363cd76e5b11c ("mm: make swapin readahead to improve thp collapse rate")
---
Changes in v2:
- Add reference to specify which patch fixed (Ebru Akagunduz)
- Fix commit subject line (Ebru Akagunduz)

mm/huge_memory.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86e9666..4a60035 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -102,6 +102,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
*/
static unsigned int khugepaged_max_ptes_none __read_mostly;
static unsigned int khugepaged_max_ptes_swap __read_mostly;
+static unsigned long int allocstall = 0;

static int khugepaged(void *none);
static int khugepaged_slab_init(void);
@@ -2438,7 +2439,7 @@ static void collapse_huge_page(struct mm_struct *mm,
struct page *new_page;
spinlock_t *pmd_ptl, *pte_ptl;
int isolated = 0, result = 0;
- unsigned long hstart, hend;
+ unsigned long hstart, hend, swap = 0, curr_allocstall = 0;
struct mem_cgroup *memcg;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
@@ -2493,7 +2494,14 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out;
}

- __collapse_huge_page_swapin(mm, vma, address, pmd);
+ swap = get_mm_counter(mm, MM_SWAPENTS);
+ curr_allocstall = sum_vm_event(ALLOCSTALL);
+ /*
+ * When system under pressure, don't swapin readahead.
+ * So that avoid unnecessary resource consuming.
+ */
+ if (allocstall == curr_allocstall && swap != 0)
+ __collapse_huge_page_swapin(mm, vma, address, pmd);

anon_vma_lock_write(vma->anon_vma);

@@ -2790,6 +2798,7 @@ skip:
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
+ allocstall = sum_vm_event(ALLOCSTALL);
ret = khugepaged_scan_pmd(mm, vma,
khugepaged_scan.address,
hpage);
--
1.9.1

2016-03-13 09:46:19

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mm, thp: avoid unnecessary swapin in khugepaged

Hi Ebru,

[auto build test ERROR on next-20160311]
[cannot apply to v4.5-rc7 v4.5-rc6 v4.5-rc5 v4.5-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Ebru-Akagunduz/mm-vmstat-calculate-particular-vm-event/20160313-173055
config: i386-randconfig-x009-201611 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

mm/huge_memory.c: In function 'collapse_huge_page':
>> mm/huge_memory.c:2498:20: error: implicit declaration of function 'sum_vm_event' [-Werror=implicit-function-declaration]
curr_allocstall = sum_vm_event(ALLOCSTALL);
^
cc1: some warnings being treated as errors

vim +/sum_vm_event +2498 mm/huge_memory.c

2492 if (!pmd) {
2493 result = SCAN_PMD_NULL;
2494 goto out;
2495 }
2496
2497 swap = get_mm_counter(mm, MM_SWAPENTS);
> 2498 curr_allocstall = sum_vm_event(ALLOCSTALL);
2499 /*
2500 * When system under pressure, don't swapin readahead.
2501 * So that avoid unnecessary resource consuming.

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

Attachments:

(No filename) (1.31 kB)
.config.gz (26.05 kB)
Download all attachments

2016-03-13 23:08:19

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm, vmstat: calculate particular vm event

On Sun, Mar 13, 2016 at 11:28:54AM +0200, Ebru Akagunduz wrote:
> Currently, vmstat can calculate specific vm event with all_vm_events()
> however it allocates all vm events to stack. This patch introduces
> a helper to sum value of a specific vm event over all cpu, without
> loading all the events.
>
> Signed-off-by: Ebru Akagunduz <[email protected]>
> ---
> Changes in v2:
> - this patch newly created in this version
> - create sum event function to
> calculate particular vm event (Kirill A. Shutemov)
>
> include/linux/vmstat.h | 2 ++
> mm/vmstat.c | 12 ++++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 73fae8c..add0cc1 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -53,6 +53,8 @@ static inline void count_vm_events(enum vm_event_item item, long delta)
>
> extern void all_vm_events(unsigned long *);
>
> +extern unsigned long sum_vm_event(enum vm_event_item item);
> +
> extern void vm_events_fold_cpu(int cpu);
>
> #else

You need dumy definition of the function for !CONFIG_VM_EVENT_COUNTERS
case here. Otherwise build will fail. See 0-day report.

Otherwise looks good to me:

Acked-by: Kirill A. Shutemov <[email protected]>

> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 5e43004..b76d664 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -34,6 +34,18 @@
> DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
> EXPORT_PER_CPU_SYMBOL(vm_event_states);
>
> +unsigned long sum_vm_event(enum vm_event_item item)
> +{
> + int cpu;
> + unsigned long ret = 0;
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu)
> + ret += per_cpu(vm_event_states, cpu).event[item];
> + put_online_cpus();
> + return ret;
> +}
> +
> static void sum_vm_events(unsigned long *ret)
> {
> int cpu;
> --
> 1.9.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2016-03-13 23:33:12

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mm, thp: avoid unnecessary swapin in khugepaged

On Sun, Mar 13, 2016 at 11:28:55AM +0200, Ebru Akagunduz wrote:
> Currently khugepaged makes swapin readahead to improve
> THP collapse rate. This patch checks vm statistics
> to avoid workload of swapin, if unnecessary. So that
> when system under pressure, khugepaged won't consume
> resources to swapin.
>
> The patch was tested with a test program that allocates
> 800MB of memory, writes to it, and then sleeps. The system
> was forced to swap out all. Afterwards, the test program
> touches the area by writing, it skips a page in each
> 20 pages of the area. When waiting to swapin readahead
> left part of the test, the memory forced to be busy
> doing page reclaim. There was enough free memory during
> test, khugepaged did not swapin readahead due to business.
>
> Test results:
>
> After swapped out
> -------------------------------------------------------------------
> | Anonymous | AnonHugePages | Swap | Fraction |
> -------------------------------------------------------------------
> With patch | 325784 kB | 325632 kB | 474216 kB | %99 |
> -------------------------------------------------------------------
> Without patch | 351308 kB | 350208 kB | 448692 kB | %99 |
> -------------------------------------------------------------------
>
> After swapped in (waiting 10 minutes)
> -------------------------------------------------------------------
> | Anonymous | AnonHugePages | Swap | Fraction |
> -------------------------------------------------------------------
> With patch | 714164 kB | 489472 kB | 85836 kB | %68 |
> -------------------------------------------------------------------
> Without patch | 586816 kB | 464896 kB | 213184 kB | %79 |
> -------------------------------------------------------------------
>
> Signed-off-by: Ebru Akagunduz <[email protected]>
> Fixes: 363cd76e5b11c ("mm: make swapin readahead to improve thp collapse rate")
> ---
> Changes in v2:
> - Add reference to specify which patch fixed (Ebru Akagunduz)
> - Fix commit subject line (Ebru Akagunduz)
>
> mm/huge_memory.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86e9666..4a60035 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -102,6 +102,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
> */
> static unsigned int khugepaged_max_ptes_none __read_mostly;
> static unsigned int khugepaged_max_ptes_swap __read_mostly;
> +static unsigned long int allocstall = 0;

No need to zero it out. The variable is in .bss.

> static int khugepaged(void *none);
> static int khugepaged_slab_init(void);
> @@ -2438,7 +2439,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> struct page *new_page;
> spinlock_t *pmd_ptl, *pte_ptl;
> int isolated = 0, result = 0;
> - unsigned long hstart, hend;
> + unsigned long hstart, hend, swap = 0, curr_allocstall = 0;

No need to zero out too, because you always will initialize it anyway.

> struct mem_cgroup *memcg;
> unsigned long mmun_start; /* For mmu_notifiers */
> unsigned long mmun_end; /* For mmu_notifiers */
> @@ -2493,7 +2494,14 @@ static void collapse_huge_page(struct mm_struct *mm,
> goto out;
> }
>
> - __collapse_huge_page_swapin(mm, vma, address, pmd);
> + swap = get_mm_counter(mm, MM_SWAPENTS);
> + curr_allocstall = sum_vm_event(ALLOCSTALL);
> + /*
> + * When system under pressure, don't swapin readahead.
> + * So that avoid unnecessary resource consuming.
> + */
> + if (allocstall == curr_allocstall && swap != 0)
> + __collapse_huge_page_swapin(mm, vma, address, pmd);

So, between these too points, where new ALLOCSTALL events comes from?

I would guess that in most cases they would come from allocation of huge
page itself (if khugepaged defrag is enabled). So we are willing to pay
for allocation new huge page, but not for swapping in.

I wounder, if it was wise to allocate the huge page in first place?

Or shouldn't we at least have consistent behaviour on swap-in vs.
allocation wrt khugepaged defragmentation option?

Or am I wrong and ALLOCSTALLs aren't caused by khugepagd?

> anon_vma_lock_write(vma->anon_vma);
>
> @@ -2790,6 +2798,7 @@ skip:
> VM_BUG_ON(khugepaged_scan.address < hstart ||
> khugepaged_scan.address + HPAGE_PMD_SIZE >
> hend);
> + allocstall = sum_vm_event(ALLOCSTALL);
> ret = khugepaged_scan_pmd(mm, vma,
> khugepaged_scan.address,
> hpage);
> --
> 1.9.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2016-03-14 01:15:58

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mm, thp: avoid unnecessary swapin in khugepaged

On Mon, 2016-03-14 at 02:33 +0300, Kirill A. Shutemov wrote:
> On Sun, Mar 13, 2016 at 11:28:55AM +0200, Ebru Akagunduz wrote:
> >
> > @@ -2493,7 +2494,14 @@ static void collapse_huge_page(struct
> > mm_struct *mm,
> > goto out;
> > }
> >
> > - __collapse_huge_page_swapin(mm, vma, address, pmd);
> > + swap = get_mm_counter(mm, MM_SWAPENTS);
> > + curr_allocstall = sum_vm_event(ALLOCSTALL);
> > + /*
> > + * When system under pressure, don't swapin readahead.
> > + * So that avoid unnecessary resource consuming.
> > + */
> > + if (allocstall == curr_allocstall && swap != 0)
> > + __collapse_huge_page_swapin(mm, vma, address,
> > pmd);
>
> So, between these too points, where new ALLOCSTALL events comes from?
>
> I would guess that in most cases they would come from allocation of
> huge
> page itself (if khugepaged defrag is enabled). So we are willing to
> pay
> for allocation new huge page, but not for swapping in.
>
> I wounder, if it was wise to allocate the huge page in first place?
>
> Or shouldn't we at least have consistent behaviour on swap-in vs.
> allocation wrt khugepaged defragmentation option?
>
> Or am I wrong and ALLOCSTALLs aren't caused by khugepagd?

It could be caused by khugepaged, but it could just as well
be caused by any other task running in the system.

Khugepaged stores the allocstall value when it goes to sleep,
and checks it before calling (or not) __collapse_huge_page_swapin.

--
All Rights Reversed.

Attachments:

signature.asc (473.00 B)
This is a digitally signed message part