2011-02-25 20:05:07

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 0/2] Reduce the amount of time compaction disables IRQs for V2

Changelog since V1
o Fix initialisation of isolated (Andrea)

The following two patches are aimed at reducing the amount of time IRQs are
disabled. It was reported by some ALSA people that transparent hugepages was
causing slowdowns on MIDI playback but I strongly suspect that compaction
running for smaller orders was also a factor. The theory was that IRQs
were being disabled for too long and sure enough, compaction was found to
be disabling IRQs for a long time. The patches reduce the length of time
IRQs are disabled when scanning for free pages and for pages to migrate.

It's late in the cycle but the IRQs disabled times are sufficiently bad
that it would be desirable to have these merged for 2.6.38 if possible.

mm/compaction.c | 35 ++++++++++++++++++++++++++++++-----
1 files changed, 30 insertions(+), 5 deletions(-)

--
1.7.2.3


2011-02-25 20:05:09

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

From: Andrea Arcangeli <[email protected]>

compaction_alloc() isolates pages for migration in isolate_migratepages. While
it's scanning, IRQs are disabled on the mistaken assumption the scanning
should be short. Tests show this to be true for the most part but
contention times on the LRU lock can be increased. Before this patch,
the IRQ disabled times for a simple test looked like

Total sampled time IRQs off (not real total time): 5493
Event shrink_inactive_list..shrink_zone 1596 us count 1
Event shrink_inactive_list..shrink_zone 1530 us count 1
Event shrink_inactive_list..shrink_zone 956 us count 1
Event shrink_inactive_list..shrink_zone 541 us count 1
Event shrink_inactive_list..shrink_zone 531 us count 1
Event split_huge_page..add_to_swap 232 us count 1
Event save_args..call_softirq 36 us count 1
Event save_args..call_softirq 35 us count 2
Event __wake_up..__wake_up 1 us count 1

This patch reduces the worst-case IRQs-disabled latencies by releasing the
lock every SWAP_CLUSTER_MAX pages that are scanned and releasing the CPU if
necessary. The cost of this is that the processing performing compaction will
be slower but IRQs being disabled for too long a time has worse consequences
as the following report shows;

Total sampled time IRQs off (not real total time): 4367
Event shrink_inactive_list..shrink_zone 881 us count 1
Event shrink_inactive_list..shrink_zone 875 us count 1
Event shrink_inactive_list..shrink_zone 868 us count 1
Event shrink_inactive_list..shrink_zone 555 us count 1
Event split_huge_page..add_to_swap 495 us count 1
Event compact_zone..compact_zone_order 269 us count 1
Event split_huge_page..add_to_swap 266 us count 1
Event shrink_inactive_list..shrink_zone 85 us count 1
Event save_args..call_softirq 36 us count 2
Event __wake_up..__wake_up 1 us count 1

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
---
mm/compaction.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 11d88a2..ec9eb0f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -279,9 +279,27 @@ static unsigned long isolate_migratepages(struct zone *zone,
}

/* Time to isolate some pages for migration */
+ cond_resched();
spin_lock_irq(&zone->lru_lock);
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
+ bool unlocked = false;
+
+ /* give a chance to irqs before checking need_resched() */
+ if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
+ spin_unlock_irq(&zone->lru_lock);
+ unlocked = true;
+ }
+ if (need_resched() || spin_is_contended(&zone->lru_lock)) {
+ if (!unlocked)
+ spin_unlock_irq(&zone->lru_lock);
+ cond_resched();
+ spin_lock_irq(&zone->lru_lock);
+ if (fatal_signal_pending(current))
+ break;
+ } else if (unlocked)
+ spin_lock_irq(&zone->lru_lock);
+
if (!pfn_valid_within(low_pfn))
continue;
nr_scanned++;
--
1.7.2.3

2011-02-25 20:05:08

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 1/2] mm: compaction: Minimise the time IRQs are disabled while isolating free pages

compaction_alloc() isolates free pages to be used as migration targets.
While its scanning, IRQs are disabled on the mistaken assumption the scanning
should be short. Analysis showed that IRQs were in fact being disabled for
substantial time. A simple test was run using large anonymous mappings with
transparent hugepage support enabled to trigger frequent compactions. A
monitor sampled what the worst IRQ-off latencies were and a post-processing
tool found the following;

Total sampled time IRQs off (not real total time): 22355
Event compaction_alloc..compaction_alloc 8409 us count 1
Event compaction_alloc..compaction_alloc 7341 us count 1
Event compaction_alloc..compaction_alloc 2463 us count 1
Event compaction_alloc..compaction_alloc 2054 us count 1
Event shrink_inactive_list..shrink_zone 1864 us count 1
Event shrink_inactive_list..shrink_zone 88 us count 1
Event save_args..call_softirq 36 us count 1
Event save_args..call_softirq 35 us count 2
Event __make_request..__blk_run_queue 24 us count 1
Event __alloc_pages_nodemask..__alloc_pages_nodemask 6 us count 1

i.e. compaction is disabled IRQs for a prolonged period of time - 8ms in
one instance. The full report generated by the tool can be found at
http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-vanilla-micro.report .
This patch reduces the time IRQs are disabled by simply disabling IRQs
at the last possible minute. An updated IRQs-off summary report then
looks like;

Total sampled time IRQs off (not real total time): 5493
Event shrink_inactive_list..shrink_zone 1596 us count 1
Event shrink_inactive_list..shrink_zone 1530 us count 1
Event shrink_inactive_list..shrink_zone 956 us count 1
Event shrink_inactive_list..shrink_zone 541 us count 1
Event shrink_inactive_list..shrink_zone 531 us count 1
Event split_huge_page..add_to_swap 232 us count 1
Event save_args..call_softirq 36 us count 1
Event save_args..call_softirq 35 us count 2
Event __wake_up..__wake_up 1 us count 1

A full report is again available at
http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-minimiseirq-free-v1r4-micro.report .
. As should be obvious, IRQ disabled latencies due to compaction are
almost elimimnated for this particular test.

[[email protected]: Fix initialisation of isolated]
Signed-off-by: Mel Gorman <[email protected]>
---
mm/compaction.c | 18 +++++++++++++-----
1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8be430b8..11d88a2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -155,7 +155,6 @@ static void isolate_freepages(struct zone *zone,
* pages on cc->migratepages. We stop searching if the migrate
* and free page scanners meet or enough free pages are isolated.
*/
- spin_lock_irqsave(&zone->lock, flags);
for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
pfn -= pageblock_nr_pages) {
unsigned long isolated;
@@ -178,9 +177,19 @@ static void isolate_freepages(struct zone *zone,
if (!suitable_migration_target(page))
continue;

- /* Found a block suitable for isolating free pages from */
- isolated = isolate_freepages_block(zone, pfn, freelist);
- nr_freepages += isolated;
+ /*
+ * Found a block suitable for isolating free pages from. Now
+ * we disabled interrupts, double check things are ok and
+ * isolate the pages. This is to minimise the time IRQs
+ * are disabled
+ */
+ isolated = 0;
+ spin_lock_irqsave(&zone->lock, flags);
+ if (suitable_migration_target(page)) {
+ isolated = isolate_freepages_block(zone, pfn, freelist);
+ nr_freepages += isolated;
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);

/*
* Record the highest PFN we isolated pages from. When next
@@ -190,7 +199,6 @@ static void isolate_freepages(struct zone *zone,
if (isolated)
high_pfn = max(high_pfn, pfn);
}
- spin_unlock_irqrestore(&zone->lock, flags);

/* split_free_page does not map the pages */
list_for_each_entry(page, freelist, lru) {
--
1.7.2.3

2011-02-25 20:08:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 0/2] Reduce the amount of time compaction disables IRQs for V2

On Fri, Feb 25, 2011 at 08:04:57PM +0000, Mel Gorman wrote:
> Changelog since V1
> o Fix initialisation of isolated (Andrea)
>
> The following two patches are aimed at reducing the amount of time IRQs are
> disabled. It was reported by some ALSA people that transparent hugepages was
> causing slowdowns on MIDI playback but I strongly suspect that compaction
> running for smaller orders was also a factor. The theory was that IRQs
> were being disabled for too long and sure enough, compaction was found to
> be disabling IRQs for a long time. The patches reduce the length of time
> IRQs are disabled when scanning for free pages and for pages to migrate.
>
> It's late in the cycle but the IRQs disabled times are sufficiently bad
> that it would be desirable to have these merged for 2.6.38 if possible.
>
> mm/compaction.c | 35 ++++++++++++++++++++++++++++++-----
> 1 files changed, 30 insertions(+), 5 deletions(-)

both patches:

Acked-by: Andrea Arcangeli <[email protected]>

2011-02-25 22:32:35

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Fri, Feb 25, 2011 at 08:04:59PM +0000, Mel Gorman wrote:
> From: Andrea Arcangeli <[email protected]>
>
> compaction_alloc() isolates pages for migration in isolate_migratepages. While
> it's scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Tests show this to be true for the most part but
> contention times on the LRU lock can be increased. Before this patch,
> the IRQ disabled times for a simple test looked like
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> This patch reduces the worst-case IRQs-disabled latencies by releasing the
> lock every SWAP_CLUSTER_MAX pages that are scanned and releasing the CPU if
> necessary. The cost of this is that the processing performing compaction will
> be slower but IRQs being disabled for too long a time has worse consequences
> as the following report shows;
>
> Total sampled time IRQs off (not real total time): 4367
> Event shrink_inactive_list..shrink_zone 881 us count 1
> Event shrink_inactive_list..shrink_zone 875 us count 1
> Event shrink_inactive_list..shrink_zone 868 us count 1
> Event shrink_inactive_list..shrink_zone 555 us count 1
> Event split_huge_page..add_to_swap 495 us count 1
> Event compact_zone..compact_zone_order 269 us count 1
> Event split_huge_page..add_to_swap 266 us count 1
> Event shrink_inactive_list..shrink_zone 85 us count 1
> Event save_args..call_softirq 36 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/compaction.c | 18 ++++++++++++++++++
> 1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 11d88a2..ec9eb0f 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -279,9 +279,27 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Time to isolate some pages for migration */
> + cond_resched();
> spin_lock_irq(&zone->lru_lock);
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
> + bool unlocked = false;
> +
> + /* give a chance to irqs before checking need_resched() */
> + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
> + spin_unlock_irq(&zone->lru_lock);
> + unlocked = true;
> + }
> + if (need_resched() || spin_is_contended(&zone->lru_lock)) {
> + if (!unlocked)
> + spin_unlock_irq(&zone->lru_lock);
> + cond_resched();
> + spin_lock_irq(&zone->lru_lock);
> + if (fatal_signal_pending(current))
> + break;
> + } else if (unlocked)
> + spin_lock_irq(&zone->lru_lock);
> +

I don't understand why this conditional is broken up like this.
cond_resched() will have the right checks anyway. Okay, you would
save fatal_signal_pending() in the 'did one cluster' case. Is it that
expensive? Couldn't this be simpler like

did_cluster = ((low_pfn + 1) % SWAP_CLUSTER_MAX) == 0
lock_contended = spin_is_contended(&zone->lru_lock);
if (did_cluster || lock_contended || need_resched()) {
spin_unlock_irq(&zone->lru_lock);
cond_resched();
spin_lock_irq(&zone->lru_lock);
if (fatal_signal_pending(current))
break;
}

instead?

2011-02-25 22:35:14

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: compaction: Minimise the time IRQs are disabled while isolating free pages

On Fri, Feb 25, 2011 at 08:04:58PM +0000, Mel Gorman wrote:
> compaction_alloc() isolates free pages to be used as migration targets.
> While its scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Analysis showed that IRQs were in fact being disabled for
> substantial time. A simple test was run using large anonymous mappings with
> transparent hugepage support enabled to trigger frequent compactions. A
> monitor sampled what the worst IRQ-off latencies were and a post-processing
> tool found the following;
>
> Total sampled time IRQs off (not real total time): 22355
> Event compaction_alloc..compaction_alloc 8409 us count 1
> Event compaction_alloc..compaction_alloc 7341 us count 1
> Event compaction_alloc..compaction_alloc 2463 us count 1
> Event compaction_alloc..compaction_alloc 2054 us count 1
> Event shrink_inactive_list..shrink_zone 1864 us count 1
> Event shrink_inactive_list..shrink_zone 88 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __make_request..__blk_run_queue 24 us count 1
> Event __alloc_pages_nodemask..__alloc_pages_nodemask 6 us count 1
>
> i.e. compaction is disabled IRQs for a prolonged period of time - 8ms in
> one instance. The full report generated by the tool can be found at
> http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-vanilla-micro.report .
> This patch reduces the time IRQs are disabled by simply disabling IRQs
> at the last possible minute. An updated IRQs-off summary report then
> looks like;
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> A full report is again available at
> http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-minimiseirq-free-v1r4-micro.report .
> . As should be obvious, IRQ disabled latencies due to compaction are
> almost elimimnated for this particular test.
>
> [[email protected]: Fix initialisation of isolated]
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

2011-02-26 00:16:53

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Fri, Feb 25, 2011 at 11:32:04PM +0100, Johannes Weiner wrote:
> I don't understand why this conditional is broken up like this.
> cond_resched() will have the right checks anyway. Okay, you would
> save fatal_signal_pending() in the 'did one cluster' case. Is it that
> expensive? Couldn't this be simpler like
>
> did_cluster = ((low_pfn + 1) % SWAP_CLUSTER_MAX) == 0
> lock_contended = spin_is_contended(&zone->lru_lock);
> if (did_cluster || lock_contended || need_resched()) {
> spin_unlock_irq(&zone->lru_lock);
> cond_resched();
> spin_lock_irq(&zone->lru_lock);
> if (fatal_signal_pending(current))
> break;
> }
>
> instead?

If we don't release irqs first, how can need_resched get set in the
first place if the local apic irq can't run? I guess that's why
there's no cond_resched_lock_irq. BTW, I never liked too much clearing
interrupts for locks that can't ever be taken from irq (it's a
scalability boost to reduce contention but it makes things like above
confusing and it increases irq latency a bit.

2011-02-28 02:01:53

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: compaction: Minimise the time IRQs are disabled while isolating free pages

On Fri, 25 Feb 2011 20:04:58 +0000
Mel Gorman <[email protected]> wrote:

> compaction_alloc() isolates free pages to be used as migration targets.
> While its scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Analysis showed that IRQs were in fact being disabled for
> substantial time. A simple test was run using large anonymous mappings with
> transparent hugepage support enabled to trigger frequent compactions. A
> monitor sampled what the worst IRQ-off latencies were and a post-processing
> tool found the following;
>
> Total sampled time IRQs off (not real total time): 22355
> Event compaction_alloc..compaction_alloc 8409 us count 1
> Event compaction_alloc..compaction_alloc 7341 us count 1
> Event compaction_alloc..compaction_alloc 2463 us count 1
> Event compaction_alloc..compaction_alloc 2054 us count 1
> Event shrink_inactive_list..shrink_zone 1864 us count 1
> Event shrink_inactive_list..shrink_zone 88 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __make_request..__blk_run_queue 24 us count 1
> Event __alloc_pages_nodemask..__alloc_pages_nodemask 6 us count 1
>
> i.e. compaction is disabled IRQs for a prolonged period of time - 8ms in
> one instance. The full report generated by the tool can be found at
> http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-vanilla-micro.report .
> This patch reduces the time IRQs are disabled by simply disabling IRQs
> at the last possible minute. An updated IRQs-off summary report then
> looks like;
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> A full report is again available at
> http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-minimiseirq-free-v1r4-micro.report .
> . As should be obvious, IRQ disabled latencies due to compaction are
> almost elimimnated for this particular test.
>
> [[email protected]: Fix initialisation of isolated]
> Signed-off-by: Mel Gorman <[email protected]>

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2011-02-28 02:24:07

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Fri, 25 Feb 2011 20:04:59 +0000
Mel Gorman <[email protected]> wrote:

> From: Andrea Arcangeli <[email protected]>
>
> compaction_alloc() isolates pages for migration in isolate_migratepages. While
> it's scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Tests show this to be true for the most part but
> contention times on the LRU lock can be increased. Before this patch,
> the IRQ disabled times for a simple test looked like
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> This patch reduces the worst-case IRQs-disabled latencies by releasing the
> lock every SWAP_CLUSTER_MAX pages that are scanned and releasing the CPU if
> necessary. The cost of this is that the processing performing compaction will
> be slower but IRQs being disabled for too long a time has worse consequences
> as the following report shows;
>
> Total sampled time IRQs off (not real total time): 4367
> Event shrink_inactive_list..shrink_zone 881 us count 1
> Event shrink_inactive_list..shrink_zone 875 us count 1
> Event shrink_inactive_list..shrink_zone 868 us count 1
> Event shrink_inactive_list..shrink_zone 555 us count 1
> Event split_huge_page..add_to_swap 495 us count 1
> Event compact_zone..compact_zone_order 269 us count 1
> Event split_huge_page..add_to_swap 266 us count 1
> Event shrink_inactive_list..shrink_zone 85 us count 1
> Event save_args..call_softirq 36 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/compaction.c | 18 ++++++++++++++++++
> 1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 11d88a2..ec9eb0f 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -279,9 +279,27 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Time to isolate some pages for migration */
> + cond_resched();
> spin_lock_irq(&zone->lru_lock);
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
> + bool unlocked = false;
> +
> + /* give a chance to irqs before checking need_resched() */
> + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
> + spin_unlock_irq(&zone->lru_lock);
> + unlocked = true;
> + }
> + if (need_resched() || spin_is_contended(&zone->lru_lock)) {
> + if (!unlocked)
> + spin_unlock_irq(&zone->lru_lock);
> + cond_resched();
> + spin_lock_irq(&zone->lru_lock);
> + if (fatal_signal_pending(current))
> + break;
> + } else if (unlocked)
> + spin_lock_irq(&zone->lru_lock);
> +
> if (!pfn_valid_within(low_pfn))
> continue;
> nr_scanned++;

Hmm.... I don't like this kind of complicated locks and 'give-a-chance' logic.

BTW, I forget why we always take zone->lru_lock with IRQ disabled....
rotate_lru_page() is a bad thing ?
If so, I think it can be implemented in other way...


Thanks,
-Kame








2011-02-28 05:49:31

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Mon, Feb 28, 2011 at 11:17:46AM +0900, KAMEZAWA Hiroyuki wrote:
> BTW, I forget why we always take zone->lru_lock with IRQ disabled....

To decrease lock contention in SMP to deliver overall better
performance (not sure how much it helps though). It was supposed to be
hold for a very short time (PAGEVEC_SIZE) to avoid giving irq latency
problems.

2011-02-28 06:00:36

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Mon, 28 Feb 2011 06:48:18 +0100
Andrea Arcangeli <[email protected]> wrote:

> On Mon, Feb 28, 2011 at 11:17:46AM +0900, KAMEZAWA Hiroyuki wrote:
> > BTW, I forget why we always take zone->lru_lock with IRQ disabled....
>
> To decrease lock contention in SMP to deliver overall better
> performance (not sure how much it helps though). It was supposed to be
> hold for a very short time (PAGEVEC_SIZE) to avoid giving irq latency
> problems.
>

memory hotplug uses MIGRATE_ISOLATED migrate types for scanning pfn range
without lru_lock. I wonder whether we can make use of it (the function
which memory hotplug may need rework for the compaction but migrate_type can
be used, I think).

Hmm.
-Kame

2011-02-28 09:28:44

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Mon, Feb 28, 2011 at 02:54:02PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 28 Feb 2011 06:48:18 +0100
> Andrea Arcangeli <[email protected]> wrote:
>
> > On Mon, Feb 28, 2011 at 11:17:46AM +0900, KAMEZAWA Hiroyuki wrote:
> > > BTW, I forget why we always take zone->lru_lock with IRQ disabled....
> >
> > To decrease lock contention in SMP to deliver overall better
> > performance (not sure how much it helps though). It was supposed to be
> > hold for a very short time (PAGEVEC_SIZE) to avoid giving irq latency
> > problems.
> >
>
> memory hotplug uses MIGRATE_ISOLATED migrate types for scanning pfn range
> without lru_lock. I wonder whether we can make use of it (the function
> which memory hotplug may need rework for the compaction but migrate_type can
> be used, I think).
>

I don't see how migrate_type would be of any benefit here particularly
as compaction does not directly affect the migratetype of a pageblock. I
have not checked closely which part of hotplug you are on about but if
you're talking about when pages actually get offlined, the zone lock is
not necessary there because the pages are not on the LRU. In compactions
case, they are. Did I misunderstand?

That said, a certain about of lockless scanning could be done here if the lock
hold times were shown to be still high. Specifically, scan until an LRU page
is found, take the lock and hold the lock for a maximum of SWAP_CLUSTER_MAX
scanned pages before releasing again. I don't think it would be a massive
improvement though.

--
Mel Gorman
SUSE Labs

2011-02-28 09:48:55

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Mon, 28 Feb 2011 09:28:14 +0000
Mel Gorman <[email protected]> wrote:

> On Mon, Feb 28, 2011 at 02:54:02PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 28 Feb 2011 06:48:18 +0100
> > Andrea Arcangeli <[email protected]> wrote:
> >
> > > On Mon, Feb 28, 2011 at 11:17:46AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > BTW, I forget why we always take zone->lru_lock with IRQ disabled....
> > >
> > > To decrease lock contention in SMP to deliver overall better
> > > performance (not sure how much it helps though). It was supposed to be
> > > hold for a very short time (PAGEVEC_SIZE) to avoid giving irq latency
> > > problems.
> > >
> >
> > memory hotplug uses MIGRATE_ISOLATED migrate types for scanning pfn range
> > without lru_lock. I wonder whether we can make use of it (the function
> > which memory hotplug may need rework for the compaction but migrate_type can
> > be used, I think).
> >
>
> I don't see how migrate_type would be of any benefit here particularly
> as compaction does not directly affect the migratetype of a pageblock. I
> have not checked closely which part of hotplug you are on about but if
> you're talking about when pages actually get offlined, the zone lock is
> not necessary there because the pages are not on the LRU. In compactions
> case, they are. Did I misunderstand?
>

memory offline code doesn't take big lru_lock (and call isolate_lru_page())
at picking up migration target pages from LRU. While this, allocation from
the zone is allowed. memory offline is done by mem_section unit.

memory offline does.

1. making a whole section as MIGRATETYPE_ISOLATED.
2. scan pfn within section.
3. find a page on LRU
4. isolate_lru_page() -> take/release lru_lock. ----(*)
5. migrate it.
6. making all pages in the range as RESERVED.

During this, by marking the pageblock as MIGRATETYPE_ISOLATED,

- new allocation will never picks up a page in the range.
- newly freed pages in the range will never be allocated and never in pcp.
- page type of the range will never change.

then, memory offline success.

If (*) seems too heavy anyway and will be no help even if with some batching
as isolate_lru_page_pagevec() or some, okay please forget offlining.


BTW, can't we drop disable_irq() from all lru_lock related codes ?

Thanks,
-Kame






2011-02-28 10:18:56

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Mon, Feb 28, 2011 at 06:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 28 Feb 2011 09:28:14 +0000
> Mel Gorman <[email protected]> wrote:
>
> > On Mon, Feb 28, 2011 at 02:54:02PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 28 Feb 2011 06:48:18 +0100
> > > Andrea Arcangeli <[email protected]> wrote:
> > >
> > > > On Mon, Feb 28, 2011 at 11:17:46AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > BTW, I forget why we always take zone->lru_lock with IRQ disabled....
> > > >
> > > > To decrease lock contention in SMP to deliver overall better
> > > > performance (not sure how much it helps though). It was supposed to be
> > > > hold for a very short time (PAGEVEC_SIZE) to avoid giving irq latency
> > > > problems.
> > > >
> > >
> > > memory hotplug uses MIGRATE_ISOLATED migrate types for scanning pfn range
> > > without lru_lock. I wonder whether we can make use of it (the function
> > > which memory hotplug may need rework for the compaction but migrate_type can
> > > be used, I think).
> > >
> >
> > I don't see how migrate_type would be of any benefit here particularly
> > as compaction does not directly affect the migratetype of a pageblock. I
> > have not checked closely which part of hotplug you are on about but if
> > you're talking about when pages actually get offlined, the zone lock is
> > not necessary there because the pages are not on the LRU. In compactions
> > case, they are. Did I misunderstand?
> >
>
> memory offline code doesn't take big lru_lock (and call isolate_lru_page())
> at picking up migration target pages from LRU.

Right - the scanning doesn't hold the lock but you get and release the lock
for every LRU page encountered. This is fairly expensive but considered ok
during memory hotplug. It's less ideal in compaction which is expected
to be a more frequent operation than memory hotplug.

> While this, allocation from
> the zone is allowed. memory offline is done by mem_section unit.
>
> memory offline does.
>
> 1. making a whole section as MIGRATETYPE_ISOLATED.
> 2. scan pfn within section.
> 3. find a page on LRU
> 4. isolate_lru_page() -> take/release lru_lock. ----(*)
> 5. migrate it.
> 6. making all pages in the range as RESERVED.
>
> During this, by marking the pageblock as MIGRATETYPE_ISOLATED,
>
> - new allocation will never picks up a page in the range.

Overkill for compaction though. To use MIGRATE_ISOLATE properly, all pages
on the freelist for that block have to be moved as well. It also operates
at the granularity of a pageblock which is potentially far higher than the
allocation target. It'd might make some sense for transparent hugepage support.

> - newly freed pages in the range will never be allocated and never in pcp.
> - page type of the range will never change.
>
> then, memory offline success.
>
> If (*) seems too heavy anyway and will be no help even if with some batching
> as isolate_lru_page_pagevec() or some, okay please forget offlining.

(*) is indeed too heavy but if IRQ disabled times are found to be too high
it could be batched like I described in my previous mail. Right now, the
interrupt disabled times are not showing up as very high.

> BTW, can't we drop disable_irq() from all lru_lock related codes ?
>

I don't think so - at least not right now. Some LRU operations such as LRU
pagevec draining are run from IPI which is running from an interrupt so
minimally spin_lock_irq is necessary.

--
Mel Gorman
SUSE Labs

2011-02-28 22:08:16

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: compaction: Minimise the time IRQs are disabled while isolating free pages

On Fri, Feb 25, 2011 at 08:04:58PM +0000, Mel Gorman wrote:
> compaction_alloc() isolates free pages to be used as migration targets.
> While its scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Analysis showed that IRQs were in fact being disabled for
> substantial time. A simple test was run using large anonymous mappings with
> transparent hugepage support enabled to trigger frequent compactions. A
> monitor sampled what the worst IRQ-off latencies were and a post-processing
> tool found the following;
>
> Total sampled time IRQs off (not real total time): 22355
> Event compaction_alloc..compaction_alloc 8409 us count 1
> Event compaction_alloc..compaction_alloc 7341 us count 1
> Event compaction_alloc..compaction_alloc 2463 us count 1
> Event compaction_alloc..compaction_alloc 2054 us count 1
> Event shrink_inactive_list..shrink_zone 1864 us count 1
> Event shrink_inactive_list..shrink_zone 88 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __make_request..__blk_run_queue 24 us count 1
> Event __alloc_pages_nodemask..__alloc_pages_nodemask 6 us count 1
>
> i.e. compaction is disabled IRQs for a prolonged period of time - 8ms in
> one instance. The full report generated by the tool can be found at
> http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-vanilla-micro.report .
> This patch reduces the time IRQs are disabled by simply disabling IRQs
> at the last possible minute. An updated IRQs-off summary report then
> looks like;
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> A full report is again available at
> http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-minimiseirq-free-v1r4-micro.report .
> . As should be obvious, IRQ disabled latencies due to compaction are
> almost elimimnated for this particular test.
>
> [[email protected]: Fix initialisation of isolated]
> Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

--
Kind regards,
Minchan Kim

2011-02-28 23:01:40

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Fri, Feb 25, 2011 at 08:04:59PM +0000, Mel Gorman wrote:
> From: Andrea Arcangeli <[email protected]>
>
> compaction_alloc() isolates pages for migration in isolate_migratepages. While
> it's scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Tests show this to be true for the most part but
> contention times on the LRU lock can be increased. Before this patch,
> the IRQ disabled times for a simple test looked like
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> This patch reduces the worst-case IRQs-disabled latencies by releasing the
> lock every SWAP_CLUSTER_MAX pages that are scanned and releasing the CPU if
> necessary. The cost of this is that the processing performing compaction will
> be slower but IRQs being disabled for too long a time has worse consequences
> as the following report shows;
>
> Total sampled time IRQs off (not real total time): 4367
> Event shrink_inactive_list..shrink_zone 881 us count 1
> Event shrink_inactive_list..shrink_zone 875 us count 1
> Event shrink_inactive_list..shrink_zone 868 us count 1
> Event shrink_inactive_list..shrink_zone 555 us count 1
> Event split_huge_page..add_to_swap 495 us count 1
> Event compact_zone..compact_zone_order 269 us count 1
> Event split_huge_page..add_to_swap 266 us count 1
> Event shrink_inactive_list..shrink_zone 85 us count 1
> Event save_args..call_softirq 36 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/compaction.c | 18 ++++++++++++++++++
> 1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 11d88a2..ec9eb0f 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -279,9 +279,27 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Time to isolate some pages for migration */
> + cond_resched();
> spin_lock_irq(&zone->lru_lock);
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
> + bool unlocked = false;
> +
> + /* give a chance to irqs before checking need_resched() */
> + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
> + spin_unlock_irq(&zone->lru_lock);
> + unlocked = true;
> + }
> + if (need_resched() || spin_is_contended(&zone->lru_lock)) {

I am not sure it's good if we release the lock whenever lru->lock was contended
unconditionally? There are many kinds of lru_lock operations(add to lru,
del from lru, isolation, reclaim, activation, deactivation and so on).

Do we really need to release the lock whenever all such operations were contened?
I think what we need is just spin_is_contended_irqcontext.
Otherwise, please write down the comment for justifying for it.

> + if (!unlocked)
> + spin_unlock_irq(&zone->lru_lock);
> + cond_resched();
> + spin_lock_irq(&zone->lru_lock);
> + if (fatal_signal_pending(current))
> + break;

This patch is for reducing for irq latency but do we have to check signal
in irq hold time?

> + } else if (unlocked)
> + spin_lock_irq(&zone->lru_lock);
> +
> if (!pfn_valid_within(low_pfn))
> continue;
> nr_scanned++;
> --
> 1.7.2.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kind regards,
Minchan Kim

2011-02-28 23:07:45

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Tue, Mar 01, 2011 at 08:01:31AM +0900, Minchan Kim wrote:
> I am not sure it's good if we release the lock whenever lru->lock was contended
> unconditionally? There are many kinds of lru_lock operations(add to lru,
> del from lru, isolation, reclaim, activation, deactivation and so on).

This is mostly to mirror cond_resched_lock (which actually uses
spin_needbreak but it's ok to have it also when preempt is off). I
doubt it makes a big difference but I tried to mirror
cond_resched_lock.

> Do we really need to release the lock whenever all such operations were contened?
> I think what we need is just spin_is_contended_irqcontext.
> Otherwise, please write down the comment for justifying for it.

What is spin_is_contended_irqcontext?

> This patch is for reducing for irq latency but do we have to check signal
> in irq hold time?

I think it's good idea to check the signal in case the loop is very
long and this is run in direct compaction context.

2011-02-28 23:25:18

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Tue, Mar 01, 2011 at 12:07:12AM +0100, Andrea Arcangeli wrote:
> On Tue, Mar 01, 2011 at 08:01:31AM +0900, Minchan Kim wrote:
> > I am not sure it's good if we release the lock whenever lru->lock was contended
> > unconditionally? There are many kinds of lru_lock operations(add to lru,
> > del from lru, isolation, reclaim, activation, deactivation and so on).
>
> This is mostly to mirror cond_resched_lock (which actually uses
> spin_needbreak but it's ok to have it also when preempt is off). I
> doubt it makes a big difference but I tried to mirror
> cond_resched_lock.

But what's the benefit of releasing lock in here when lock contentionn happen where
activate_page for example?
>
> > Do we really need to release the lock whenever all such operations were contened?
> > I think what we need is just spin_is_contended_irqcontext.
> > Otherwise, please write down the comment for justifying for it.
>
> What is spin_is_contended_irqcontext?

I thought what we need function is to check lock contention happened in only irq context
for short irq latency.

>
> > This patch is for reducing for irq latency but do we have to check signal
> > in irq hold time?
>
> I think it's good idea to check the signal in case the loop is very
> long and this is run in direct compaction context.

I don't oppose the signal check.
I am not sure why we should check by just sign of lru_lock contention.

How about this by coarse-grained?

/* give a chance to irqs before checking need_resched() */
if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
if (fatal_signal_pending(current))
break;
spin_unlock_irq(&zone->lru_lock);
unlocked = true;
}
if (need_resched() || spin_is_contended(&zone->lru_lock)) {
if (!unlocked)
spin_unlock_irq(&zone->lru_lock);
cond_resched();
spin_lock_irq(&zone->lru_lock);
} else if (unlocked)
spin_lock_irq(&zone->lru_lock);


>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2011-02-28 23:48:37

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Mon, 28 Feb 2011 10:18:27 +0000
Mel Gorman <[email protected]> wrote:

> > BTW, can't we drop disable_irq() from all lru_lock related codes ?
> >
>
> I don't think so - at least not right now. Some LRU operations such as LRU
> pagevec draining are run from IPI which is running from an interrupt so
> minimally spin_lock_irq is necessary.
>

pagevec draining is done by workqueue(schedule_on_each_cpu()).
I think only racy case is just lru rotation after writeback.

Thanks,
-Kame

2011-03-01 04:11:59

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Tue, Mar 01, 2011 at 08:42:09AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 28 Feb 2011 10:18:27 +0000
> Mel Gorman <[email protected]> wrote:
>
> > > BTW, can't we drop disable_irq() from all lru_lock related codes ?
> > >
> >
> > I don't think so - at least not right now. Some LRU operations such as LRU
> > pagevec draining are run from IPI which is running from an interrupt so
> > minimally spin_lock_irq is necessary.
> >
>
> pagevec draining is done by workqueue(schedule_on_each_cpu()).
> I think only racy case is just lru rotation after writeback.

put_page still need irq disable.


>
> Thanks,
> -Kame
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2011-03-01 04:56:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Tue, 1 Mar 2011 13:11:46 +0900
Minchan Kim <[email protected]> wrote:

> On Tue, Mar 01, 2011 at 08:42:09AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 28 Feb 2011 10:18:27 +0000
> > Mel Gorman <[email protected]> wrote:
> >
> > > > BTW, can't we drop disable_irq() from all lru_lock related codes ?
> > > >
> > >
> > > I don't think so - at least not right now. Some LRU operations such as LRU
> > > pagevec draining are run from IPI which is running from an interrupt so
> > > minimally spin_lock_irq is necessary.
> > >
> >
> > pagevec draining is done by workqueue(schedule_on_each_cpu()).
> > I think only racy case is just lru rotation after writeback.
>
> put_page still need irq disable.
>

Aha..ok. put_page() removes a page from LRU via __page_cache_release().
Then, we may need to remove a page from LRU under irq context.
Hmm...

Thanks,
-Kame

2011-03-01 15:37:14

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Tue, Mar 01, 2011 at 01:49:25PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 1 Mar 2011 13:11:46 +0900
> Minchan Kim <[email protected]> wrote:
>
> > On Tue, Mar 01, 2011 at 08:42:09AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 28 Feb 2011 10:18:27 +0000
> > > Mel Gorman <[email protected]> wrote:
> > >
> > > > > BTW, can't we drop disable_irq() from all lru_lock related codes ?
> > > > >
> > > >
> > > > I don't think so - at least not right now. Some LRU operations such as LRU
> > > > pagevec draining are run from IPI which is running from an interrupt so
> > > > minimally spin_lock_irq is necessary.
> > > >
> > >
> > > pagevec draining is done by workqueue(schedule_on_each_cpu()).
> > > I think only racy case is just lru rotation after writeback.
> >
> > put_page still need irq disable.
> >
>
> Aha..ok. put_page() removes a page from LRU via __page_cache_release().
> Then, we may need to remove a page from LRU under irq context.
> Hmm...

But as __page_cache_release's comment said, normally vm doesn't release page in
irq context. so it would be rare.
If we can remove it, could we change all of spin_lock_irqsave with spin_lock?
If it is right, I think it's very desirable to reduce irq latency.

How about this? It's totally a quick implementation and untested.
I just want to hear opinions of you guys if the work is valuable or not before
going ahead.

== CUT_HERE ==

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c71c487..5d17de6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1353,7 +1353,7 @@ extern void si_meminfo_node(struct sysinfo *val, int nid);
extern int after_bootmem;

extern void setup_per_cpu_pageset(void);
-
+extern void setup_per_cpu_defer_free_pages(void);
extern void zone_pcp_update(struct zone *zone);

/* nommu.c */
diff --git a/init/main.c b/init/main.c
index 3627bb3..9c35fad 100644
--- a/init/main.c
+++ b/init/main.c
@@ -583,6 +583,7 @@ asmlinkage void __init start_kernel(void)
kmemleak_init();
debug_objects_mem_init();
setup_per_cpu_pageset();
+ setup_per_cpu_defer_free_pages();
numa_policy_init();
if (late_time_init)
late_time_init();
diff --git a/mm/swap.c b/mm/swap.c
index 1b9e4eb..62a9f3b 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -37,10 +37,23 @@
/* How many pages do we try to swap or page in/out together? */
int page_cluster;

+/*
+ * This structure is to free pages which are deallocated
+ * by interrupt context for reducing irq disable time.
+ */
+#define DEFER_FREE_PAGES_THRESH_HOLD 32
+struct defer_free_pages_pcp {
+ struct page *next;
+ unsigned long count;
+ struct work_struct work;
+};
+
+static DEFINE_PER_CPU(struct defer_free_pages_pcp, defer_free_pages);
static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

+static void __put_single_page(struct page *page);
/*
* This path almost never happens for VM activity - pages are normally
* freed via pagevecs. But it gets used by networking.
@@ -59,10 +72,76 @@ static void __page_cache_release(struct page *page)
}
}

+void init_defer_free_pages(struct defer_free_pages_pcp *free_pages)
+{
+ free_pages->count = 0;
+ free_pages->next = NULL;
+}
+
+static void drain_cpu_defer_free_pages(int cpu)
+{
+ struct page *page;
+ struct defer_free_pages_pcp *free_pages = &per_cpu(defer_free_pages, cpu);
+ local_irq_disable();
+ while((page = free_pages->next)) {
+ free_pages->next = (struct page*)page_private(page);
+ __put_single_page(page);
+ free_pages->count--;
+ }
+ local_irq_enable();
+}
+
+static void defer_free_pages_drain(struct work_struct *dummy)
+{
+ drain_cpu_defer_free_pages(get_cpu());
+ put_cpu();
+}
+
+void __init setup_per_cpu_defer_free_pages(void)
+{
+ int cpu;
+ struct defer_free_pages_pcp *free_pages;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ free_pages = &per_cpu(defer_free_pages, cpu);
+ INIT_WORK(&free_pages->work, defer_free_pages_drain);
+ init_defer_free_pages(free_pages);
+ }
+ put_online_cpus();
+}
+
+static void defer_free(struct page *page, int cpu)
+{
+ struct defer_free_pages_pcp *free_pages = &per_cpu(defer_free_pages, cpu);
+
+ set_page_private(page, (unsigned long)free_pages->next);
+ free_pages->next = page;
+ free_pages->count++;
+
+ if (free_pages->count >= DEFER_FREE_PAGES_THRESH_HOLD) {
+ schedule_work_on(cpu, &free_pages->work);
+ flush_work(&free_pages->work);
+ }
+}
+
+static void __page_cache_release_defer(struct page *page)
+{
+ static int i = 0;
+ defer_free(page, get_cpu());
+ put_cpu();
+}
+
static void __put_single_page(struct page *page)
{
- __page_cache_release(page);
- free_hot_cold_page(page, 0);
+ if (in_irq())
+ __page_cache_release_defer(page);
+ else {
+ __page_cache_release(page);
+ free_hot_cold_page(page, 0);
+ }
}

static void __put_compound_page(struct page *page)


>
> Thanks,
> -Kame
>

--
Kind regards,
Minchan Kim

2011-03-01 16:20:06

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Wed, Mar 02, 2011 at 12:35:58AM +0900, Minchan Kim wrote:
> On Tue, Mar 01, 2011 at 01:49:25PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 1 Mar 2011 13:11:46 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> > > On Tue, Mar 01, 2011 at 08:42:09AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Mon, 28 Feb 2011 10:18:27 +0000
> > > > Mel Gorman <[email protected]> wrote:
> > > >
> > > > > > BTW, can't we drop disable_irq() from all lru_lock related codes ?
> > > > > >
> > > > >
> > > > > I don't think so - at least not right now. Some LRU operations such as LRU
> > > > > pagevec draining are run from IPI which is running from an interrupt so
> > > > > minimally spin_lock_irq is necessary.
> > > > >
> > > >
> > > > pagevec draining is done by workqueue(schedule_on_each_cpu()).
> > > > I think only racy case is just lru rotation after writeback.
> > >
> > > put_page still need irq disable.
> > >
> >
> > Aha..ok. put_page() removes a page from LRU via __page_cache_release().
> > Then, we may need to remove a page from LRU under irq context.
> > Hmm...
>
> But as __page_cache_release's comment said, normally vm doesn't release page in
> irq context. so it would be rare.
> If we can remove it, could we change all of spin_lock_irqsave with spin_lock?
> If it is right, I think it's very desirable to reduce irq latency.
>
> How about this? It's totally a quick implementation and untested.
> I just want to hear opinions of you guys if the work is valuable or not before
> going ahead.

pages freed from irq shouldn't be PageLRU.

deferring freeing to workqueue doesn't look ok. firewall loads runs
only from irq and this will cause some more work and a delay in the
freeing. I doubt it's worhwhile especially for the lru_lock.

2011-03-01 22:15:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Fri, 25 Feb 2011 20:04:59 +0000
Mel Gorman <[email protected]> wrote:

> From: Andrea Arcangeli <[email protected]>
>
> compaction_alloc() isolates pages for migration in isolate_migratepages. While
> it's scanning, IRQs are disabled on the mistaken assumption the scanning
> should be short. Tests show this to be true for the most part but
> contention times on the LRU lock can be increased. Before this patch,
> the IRQ disabled times for a simple test looked like
>
> Total sampled time IRQs off (not real total time): 5493
> Event shrink_inactive_list..shrink_zone 1596 us count 1
> Event shrink_inactive_list..shrink_zone 1530 us count 1
> Event shrink_inactive_list..shrink_zone 956 us count 1
> Event shrink_inactive_list..shrink_zone 541 us count 1
> Event shrink_inactive_list..shrink_zone 531 us count 1
> Event split_huge_page..add_to_swap 232 us count 1
> Event save_args..call_softirq 36 us count 1
> Event save_args..call_softirq 35 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> This patch reduces the worst-case IRQs-disabled latencies by releasing the
> lock every SWAP_CLUSTER_MAX pages that are scanned and releasing the CPU if
> necessary. The cost of this is that the processing performing compaction will
> be slower but IRQs being disabled for too long a time has worse consequences
> as the following report shows;
>
> Total sampled time IRQs off (not real total time): 4367
> Event shrink_inactive_list..shrink_zone 881 us count 1
> Event shrink_inactive_list..shrink_zone 875 us count 1
> Event shrink_inactive_list..shrink_zone 868 us count 1
> Event shrink_inactive_list..shrink_zone 555 us count 1
> Event split_huge_page..add_to_swap 495 us count 1
> Event compact_zone..compact_zone_order 269 us count 1
> Event split_huge_page..add_to_swap 266 us count 1
> Event shrink_inactive_list..shrink_zone 85 us count 1
> Event save_args..call_softirq 36 us count 2
> Event __wake_up..__wake_up 1 us count 1
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/compaction.c | 18 ++++++++++++++++++
> 1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 11d88a2..ec9eb0f 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -279,9 +279,27 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Time to isolate some pages for migration */
> + cond_resched();
> spin_lock_irq(&zone->lru_lock);
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
> + bool unlocked = false;
> +
> + /* give a chance to irqs before checking need_resched() */
> + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
> + spin_unlock_irq(&zone->lru_lock);
> + unlocked = true;
> + }
> + if (need_resched() || spin_is_contended(&zone->lru_lock)) {
> + if (!unlocked)
> + spin_unlock_irq(&zone->lru_lock);
> + cond_resched();
> + spin_lock_irq(&zone->lru_lock);
> + if (fatal_signal_pending(current))
> + break;
> + } else if (unlocked)
> + spin_lock_irq(&zone->lru_lock);
> +
> if (!pfn_valid_within(low_pfn))
> continue;
> nr_scanned++;

That was somewhat ghastly.

I think it can be made only 99% as ghastly by renaming "unlocked" to
"locked" and eliminating the double-negative logic.

--- a/mm/compaction.c~mm-compaction-minimise-the-time-irqs-are-disabled-while-isolating-pages-for-migration
+++ a/mm/compaction.c
@@ -279,9 +279,27 @@ static unsigned long isolate_migratepage
}

/* Time to isolate some pages for migration */
+ cond_resched();
spin_lock_irq(&zone->lru_lock);
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
+ bool locked = true;
+
+ /* give a chance to irqs before checking need_resched() */
+ if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
+ spin_unlock_irq(&zone->lru_lock);
+ locked = false;
+ }
+ if (need_resched() || spin_is_contended(&zone->lru_lock)) {
+ if (locked)
+ spin_unlock_irq(&zone->lru_lock);
+ cond_resched();
+ spin_lock_irq(&zone->lru_lock);
+ if (fatal_signal_pending(current))
+ break;
+ } else if (!locked)
+ spin_lock_irq(&zone->lru_lock);
+
if (!pfn_valid_within(low_pfn))
continue;
nr_scanned++;

2011-03-01 22:22:35

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Wed, Mar 2, 2011 at 1:19 AM, Andrea Arcangeli <[email protected]> wrote:
> On Wed, Mar 02, 2011 at 12:35:58AM +0900, Minchan Kim wrote:
>> On Tue, Mar 01, 2011 at 01:49:25PM +0900, KAMEZAWA Hiroyuki wrote:
>> > On Tue, 1 Mar 2011 13:11:46 +0900
>> > Minchan Kim <[email protected]> wrote:
>> >
>> > > On Tue, Mar 01, 2011 at 08:42:09AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > > On Mon, 28 Feb 2011 10:18:27 +0000
>> > > > Mel Gorman <[email protected]> wrote:
>> > > >
>> > > > > > BTW, can't we drop disable_irq() from all lru_lock related codes ?
>> > > > > >
>> > > > >
>> > > > > I don't think so - at least not right now. Some LRU operations such as LRU
>> > > > > pagevec draining are run from IPI which is running from an interrupt so
>> > > > > minimally spin_lock_irq is necessary.
>> > > > >
>> > > >
>> > > > pagevec draining is done by workqueue(schedule_on_each_cpu()).
>> > > > I think only racy case is just lru rotation after writeback.
>> > >
>> > > put_page still need irq disable.
>> > >
>> >
>> > Aha..ok. put_page() removes a page from LRU via __page_cache_release().
>> > Then, we may need to remove a page from LRU under irq context.
>> > Hmm...
>>
>> But as __page_cache_release's comment said, normally vm doesn't release page in
>> irq context. so it would be rare.
>> If we can remove it, could we change all of spin_lock_irqsave with spin_lock?
>> If it is right, I think it's very desirable to reduce irq latency.
>>
>> How about this? It's totally a quick implementation and untested.
>> I just want to hear opinions of you guys if the work is valuable or not before
>> going ahead.
>
> pages freed from irq shouldn't be PageLRU.

Hmm..
As looking code, it seems to be no problem and I didn't see the any
comment about such rule. It should have been written down in
__page_cache_release.
Just out of curiosity.
What kinds of problem happen if we release lru page in irq context?

>
> deferring freeing to workqueue doesn't look ok. firewall loads runs
> only from irq and this will cause some more work and a delay in the
> freeing. I doubt it's worhwhile especially for the lru_lock.
>

As you said, if it is for decreasing lock contention in SMP to deliver
overall better performance, maybe we need to check again how much it
helps.
If it doesn't help much, could we remove irq_save/restore of lru_lock?
Do you know any benchmark to prove it had a benefit at that time or
any thread discussing about that in lkml?

Thanks, Andrea.

--
Kind regards,
Minchan Kim

2011-03-01 22:35:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Wed, 2 Mar 2011 07:22:33 +0900
Minchan Kim <[email protected]> wrote:

> On Wed, Mar 2, 2011 at 1:19 AM, Andrea Arcangeli <[email protected]> wrote:
> > On Wed, Mar 02, 2011 at 12:35:58AM +0900, Minchan Kim wrote:
> >> On Tue, Mar 01, 2011 at 01:49:25PM +0900, KAMEZAWA Hiroyuki wrote:
> >> > On Tue, 1 Mar 2011 13:11:46 +0900
> >> > Minchan Kim <[email protected]> wrote:
> >> >
>
> ...
>
> > pages freed from irq shouldn't be PageLRU.
>
> Hmm..
> As looking code, it seems to be no problem and I didn't see the any
> comment about such rule. It should have been written down in
> __page_cache_release.
> Just out of curiosity.
> What kinds of problem happen if we release lru page in irq context?

put_page() from irq context has been permissible for ten years. I
expect there are a number of sites which do this (via subtle code
paths, often). It might get messy.

> >
> > deferring freeing to workqueue doesn't look ok. firewall loads runs
> > only from irq and this will cause some more work and a delay in the
> > freeing. I doubt it's worhwhile especially for the lru_lock.
> >
>
> As you said, if it is for decreasing lock contention in SMP to deliver
> overall better performance, maybe we need to check again how much it
> helps.
> If it doesn't help much, could we remove irq_save/restore of lru_lock?
> Do you know any benchmark to prove it had a benefit at that time or
> any thread discussing about that in lkml?


: commit b10a82b195d63575958872de5721008b0e9bef2d
: Author: akpm <akpm>
: Date: Thu Aug 15 18:21:05 2002 +0000
:
: [PATCH] make pagemap_lru_lock irq-safe
:
: It is expensive for a CPU to take an interrupt while holding the page
: LRU lock, because other CPUs will pile up on the lock while the
: interrupt runs.
:
: Disabling interrupts while holding the lock reduces contention by an
: additional 30% on 4-way. This is when the only source of interrupts is
: disk completion. The improvement will be higher with more CPUs and it
: will be higher if there is networking happening.
:
: The maximum hold time of this lock is 17 microseconds on 500 MHx PIII,
: which is well inside the kernel's maximum interrupt latency (which was
: 100 usecs when I last looked, a year ago).
:
: This optimisation is not needed on uniprocessor, but the patch disables
: IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe
: spinlock, and pages can be moved from the LRU in interrupt context.
:
: pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any
: missed uses, and to reliably break any out-of-tree patches which may be
: using the old semantics.
:
: BKrev: 3d5bf1110yfdAAur4xqJfiLBDJ2Cqw


Ancient stuff, and not a lot of detail. But I did measure it. I
measured everything ;) And, as mentioned, I'd expect that the
contention problems would worsen on higher CPU machines and higher
interrupt frequencies.

I expect we could eliminate the irqsave requirement from
rotate_reclaimable_page() simply by switching to a trylock. Some pages
will end up at the wrong end of the LRU but the effects may be
negligible. Or perhaps they may not - disk seeks are costly.

2011-03-01 22:57:53

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: compaction: Minimise the time IRQs are disabled while isolating pages for migration

On Wed, Mar 2, 2011 at 7:34 AM, Andrew Morton <[email protected]> wrote:
> On Wed, 2 Mar 2011 07:22:33 +0900
> Minchan Kim <[email protected]> wrote:
>
>> On Wed, Mar 2, 2011 at 1:19 AM, Andrea Arcangeli <[email protected]> wrote:
>> > On Wed, Mar 02, 2011 at 12:35:58AM +0900, Minchan Kim wrote:
>> >> On Tue, Mar 01, 2011 at 01:49:25PM +0900, KAMEZAWA Hiroyuki wrote:
>> >> > On Tue, 1 Mar 2011 13:11:46 +0900
>> >> > Minchan Kim <[email protected]> wrote:
>> >> >
>>
>> ...
>>
>> > pages freed from irq shouldn't be PageLRU.
>>
>> Hmm..
>> As looking code, it seems to be no problem and I didn't see the any
>> comment about such rule. It should have been written down in
>> __page_cache_release.
>> Just out of curiosity.
>> What kinds of problem happen if we release lru page in irq context?
>
> put_page() from irq context has been permissible for ten years.  I
> expect there are a number of sites which do this (via subtle code
> paths, often).  It might get messy.
>
>> >
>> > deferring freeing to workqueue doesn't look ok. firewall loads runs
>> > only from irq and this will cause some more work and a delay in the
>> > freeing. I doubt it's worhwhile especially for the lru_lock.
>> >
>>
>> As you said, if it is for decreasing lock contention in SMP to deliver
>> overall better performance, maybe we need to check again how much it
>> helps.
>> If it doesn't help much, could we remove irq_save/restore of lru_lock?
>> Do you know any benchmark to prove it had a benefit at that time or
>> any thread discussing about that in lkml?
>
>
> : commit b10a82b195d63575958872de5721008b0e9bef2d
> : Author: akpm <akpm>
> : Date:   Thu Aug 15 18:21:05 2002 +0000
> :
> :     [PATCH] make pagemap_lru_lock irq-safe
> :
> :     It is expensive for a CPU to take an interrupt while holding the page
> :     LRU lock, because other CPUs will pile up on the lock while the
> :     interrupt runs.
> :
> :     Disabling interrupts while holding the lock reduces contention by an
> :     additional 30% on 4-way.  This is when the only source of interrupts is
> :     disk completion.  The improvement will be higher with more CPUs and it
> :     will be higher if there is networking happening.
> :
> :     The maximum hold time of this lock is 17 microseconds on 500 MHx PIII,
> :     which is well inside the kernel's maximum interrupt latency (which was
> :     100 usecs when I last looked, a year ago).
> :
> :     This optimisation is not needed on uniprocessor, but the patch disables
> :     IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe
> :     spinlock, and pages can be moved from the LRU in interrupt context.
> :
> :     pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any
> :     missed uses, and to reliably break any out-of-tree patches which may be
> :     using the old semantics.
> :
> :     BKrev: 3d5bf1110yfdAAur4xqJfiLBDJ2Cqw
>
>
> Ancient stuff, and not a lot of detail.  But I did measure it.  I
> measured everything ;) And, as mentioned, I'd expect that the
> contention problems would worsen on higher CPU machines and higher
> interrupt frequencies.

Thanks for giving the important information.

>
> I expect we could eliminate the irqsave requirement from
> rotate_reclaimable_page() simply by switching to a trylock.  Some pages
> will end up at the wrong end of the LRU but the effects may be
> negligible.  Or perhaps they may not - disk seeks are costly.
>
>

Releasing 14 pages should not have much cost about interrupt latency
and It's a general concept we have been used. If it really has a
problem, I think it would be better to reduce PAGEVEC_SIZE rather than
fixing the rotate_reclaimable_page.




--
Kind regards,
Minchan Kim