LinuxLists.cc - Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP

2011-02-22 07:43:44

Subject: Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0

Arthur Marsh wrote:
> I'm experiencing MIDI playback slow-downs when I'm observing kswapd0
> active (a few percent of cpu in top output) in recent kernels.
>
> I git-bisected the problem down to:
>
> commit 5a03b051ed87e72b959f32a86054e1142ac4cf55
> Author: Andrea Arcangeli <[email protected]>
> Date: Thu Jan 13 15:47:11 2011 -0800
>
> thp: use compaction in kswapd for GFP_ATOMIC order > 0
>
> This takes advantage of memory compaction to properly generate pages of
> order > 0 if regular page reclaim fails and priority level becomes more
> severe and we don't reach the proper watermarks.
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> I ran git-bisect over the weekend, building and installing ALSA 1.0.24
> with each kernel. After identifying the above commit, I rebuilt the 2.6
> head with that commit reverted and verified that the problem was no
> longer present.

Apparently, huge page compaction disables interrupts for much too long.

> MIDI playback was via aplaymidi -p 17:0 to a Soundblaster Audigy 2 ZS
> (SB0350) wavetable.

The ALSA sequencer uses either the system timer or an HR timer at 1 kHz
to deliver MIDI commands (notes); the wavetable driver requires its own
interrupts in regular 5.3 ms intervals.

Regards,
Clemens

2011-02-22 07:46:56

On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote:
> OK, these patches applied together against upstream didn't cause a crash
> but I did observe:
>
> significant slowdowns of MIDI playback (moreso than in previous cases,
> and with less than 20 Meg of swap file in use);
>
> kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent).
>
> If I should try only one of the patches or something else entirely,
> please let me know.

Yes, with irq off, schedule won't run and need_resched won't get set.

So let's try this.

In case this doesn't fix I definitely give it up with compaction in
kswapd as GFP_ATOMIC will likely not get an huge benefit out of
compaction anyway because 1) the allocations from GFP_ATOMIC are
likely short lived, 2) the cost of the compaction scan loop +
migration may be higher than the benefit we get from jumbo frames

So if this doesn't fix it, I'll already post a definitive fix that
removes compaction from kswapd (but leaves it enabled for direct
reclaim for all order sizes including kernel stack). I already
verified that this solves not just the midi latency issue but the
other server benchmark that I'm dealing with. Then we can think at
compaction+kswapd later for 2.6.39 but I think we need to close this
kswapd issue in time for 2.6.38. So if the below isn't enough the next
patch should be applied. I'll get those two patches tested in the
server load too, and I'll let you know how the results are when it
finishes.

Please apply also the attached "kswapd-high-wmark" before the below
one.

====
Subject: compaction: fix high latencies created by kswapd

From: Andrea Arcangeli <[email protected]>

We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid
hurting latencies. We must also stop being too aggressive insisting with
compaction within kswapd if compaction don't make progress in every zone.

Signed-off-by: Andrea Arcangeli <[email protected]>
---
include/linux/compaction.h | 6 +++++
mm/compaction.c | 53 +++++++++++++++++++++++++++++++++++++--------
mm/vmscan.c | 39 +++++++++++++++++++--------------
3 files changed, 73 insertions(+), 25 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -271,9 +271,27 @@ static unsigned long isolate_migratepage
}

/* Time to isolate some pages for migration */
+ cond_resched();
spin_lock_irq(&zone->lru_lock);
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
+ int unlocked = 0;
+
+ /* give a chance to irqs before checking need_resched() */
+ if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
+ spin_unlock_irq(&zone->lru_lock);
+ unlocked = 1;
+ }
+ if (need_resched() || spin_is_contended(&zone->lru_lock)) {
+ if (fatal_signal_pending(current))
+ break;
+ if (!unlocked)
+ spin_unlock_irq(&zone->lru_lock);
+ cond_resched();
+ spin_lock_irq(&zone->lru_lock);
+ } else if (unlocked)
+ spin_lock_irq(&zone->lru_lock);
+
if (!pfn_valid_within(low_pfn))
continue;
nr_scanned++;
@@ -436,6 +454,28 @@ static int compact_finished(struct zone
return COMPACT_CONTINUE;
}

+static unsigned long compaction_watermark(struct zone *zone, int order)
+{
+ /*
+ * Watermarks for order-0 must be met for compaction. Note the 2UL.
+ * This is because during migration, copies of pages need to be
+ * allocated and for a short time, the footprint is higher
+ */
+ return low_wmark_pages(zone) + (2UL << order);
+}
+
+static int __compaction_need_reclaim(struct zone *zone, int order,
+ unsigned long watermark)
+{
+ return !zone_watermark_ok(zone, 0, watermark, 0, 0);
+}
+
+int compaction_need_reclaim(struct zone *zone, int order)
+{
+ return __compaction_need_reclaim(zone, order,
+ compaction_watermark(zone, order));
+}
+
/*
* compaction_suitable: Is this suitable to run compaction on this zone now?
* Returns
@@ -449,21 +489,16 @@ unsigned long compaction_suitable(struct
unsigned long watermark;

/*
- * Watermarks for order-0 must be met for compaction. Note the 2UL.
- * This is because during migration, copies of pages need to be
- * allocated and for a short time, the footprint is higher
- */
- watermark = low_wmark_pages(zone) + (2UL << order);
- if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
- return COMPACT_SKIPPED;
-
- /*
* order == -1 is expected when compacting via
* /proc/sys/vm/compact_memory
*/
if (order == -1)
return COMPACT_CONTINUE;

+ watermark = compaction_watermark(zone, order);
+ if (__compaction_need_reclaim(zone, order, watermark))
+ return COMPACT_SKIPPED;
+
/*
* fragmentation index determines if allocation failures are due to
* low memory or external fragmentation
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2385,9 +2385,9 @@ loop_again:
* cause too much scanning of the lower zones.
*/
for (i = 0; i <= end_zone; i++) {
- int compaction;
struct zone *zone = pgdat->node_zones + i;
int nr_slab;
+ int local_order;

if (!populated_zone(zone))
continue;
@@ -2407,20 +2407,21 @@ loop_again:
* We put equal pressure on every zone, unless one
* zone has way too many pages free already.
*/
- if (!zone_watermark_ok_safe(zone, order,
- high_wmark_pages(zone), end_zone, 0))
+ if (!zone_watermark_ok_safe(zone, 0,
+ high_wmark_pages(zone), end_zone, 0) ||
+ compaction_need_reclaim(zone, order)) {
shrink_zone(priority, zone, &sc);
- reclaim_state->reclaimed_slab = 0;
- nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
- lru_pages);
- sc.nr_reclaimed += reclaim_state->reclaimed_slab;
- total_scanned += sc.nr_scanned;
+ reclaim_state->reclaimed_slab = 0;
+ nr_slab = shrink_slab(sc.nr_scanned,
+ GFP_KERNEL,
+ lru_pages);
+ sc.nr_reclaimed += reclaim_state->reclaimed_slab;
+ total_scanned += sc.nr_scanned;
+ }

- compaction = 0;
+ local_order = order;
if (order &&
- zone_watermark_ok(zone, 0,
- high_wmark_pages(zone),
- end_zone, 0) &&
+ !compaction_need_reclaim(zone, order) &&
!zone_watermark_ok(zone, order,
high_wmark_pages(zone),
end_zone, 0)) {
@@ -2428,12 +2429,18 @@ loop_again:
order,
sc.gfp_mask, false,
COMPACT_MODE_KSWAPD);
- compaction = 1;
+ /*
+ * Let kswapd stop immediately even if
+ * compaction didn't succeed to
+ * generate "high_wmark_pages" of the
+ * max order wanted in every zone.
+ */
+ local_order = 0;
}

if (zone->all_unreclaimable)
continue;
- if (!compaction && nr_slab == 0 &&
+ if (nr_slab == 0 &&
!zone_reclaimable(zone))
zone->all_unreclaimable = 1;
/*
@@ -2445,7 +2452,7 @@ loop_again:
total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
sc.may_writepage = 1;

- if (!zone_watermark_ok_safe(zone, order,
+ if (!zone_watermark_ok_safe(zone, local_order,
high_wmark_pages(zone), end_zone, 0)) {
all_zones_ok = 0;
/*
@@ -2453,7 +2460,7 @@ loop_again:
* means that we have a GFP_ATOMIC allocation
* failure risk. Hurry up!
*/
- if (!zone_watermark_ok_safe(zone, order,
+ if (!zone_watermark_ok_safe(zone, local_order,
min_wmark_pages(zone), end_zone, 0))
has_under_min_watermark_zone = 1;
} else {
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -26,6 +26,7 @@ extern int fragmentation_index(struct zo
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
bool sync);
+extern int compaction_need_reclaim(struct zone *zone, int order);
extern unsigned long compaction_suitable(struct zone *zone, int order);
extern unsigned long compact_zone_order(struct zone *zone, int order,
gfp_t gfp_mask, bool sync,
@@ -68,6 +69,11 @@ static inline unsigned long try_to_compa
return COMPACT_CONTINUE;
}

+static inline int compaction_need_reclaim(struct zone *zone, int order)
+{
+ return 0;
+}
+
static inline unsigned long compaction_suitable(struct zone *zone, int order)
{
return COMPACT_SKIPPED;

Attachments:

(No filename) (7.94 kB)
kswapd-high_wmark (1.81 kB)
Download all attachments

Andrea Arcangeli wrote, on 26/02/11 17:13:
> On Thu, Feb 24, 2011 at 02:54:05AM +0100, Andrea Arcangeli wrote:
>> Ok so tomorrow I'll get all results on these 3 kernels (
>> compaction-kswapd-3+compaction_alloc_lowlat-2 vs
>> compaction-no-kswapd-3+compaction_alloc_lowlat-2 vs
>> compaction_alloc_lowlat2) on network server load, where throughout is
>> measured in addition to latency. Then we'll have a better picture to
>
> Latency is still lowest with compaction-no-kswapd-3. compaction_alloc
> still is at the top of the profiling with
> compaction-kswapd-3+compaction_alloc_lowlat-2. However
> compaction-kswapd-3 reduced the overhead somewhat but not enough to be
> as fast as with compaction-no-kswapd-3 (even if much better than
> before).
>
> So we should apply compaction-no-kswapd-3 to 2.6.38 I think.

I built a kernel based against git head of around 0800 UTC Saturday
2011-02-26 plus compaction-no-kswapd-3 (Andrea's) and
compaction_alloc_lowlat-2 (Mel's) patches.

The the latency performance for MIDI playback is excellent and kswap0
has reported 100 seconds of CPU time in 22 hours of the kernel kernel
running.

However, I am now battling against the kernel freeing up memory when
using icedove/thunderbird. top is reporting 74 MiB resident, 418 MiB
virtual, and as I pause typing to watch top output, free RAM has gone up
to 50 MiB and I'm stuck waiting for parts of icedove to swap back in so
that I can get a response from my input (either scrolling a message when
reading it or composing a message, where I am left waiting for the text
that I typed to appear).

Normally in this machine with only 384 MiB RAM and (say) 2.6.35-2.6.37
kernels, free RAM will hover around the 5 MiB mark when lots of
applications are open. I may wait when switching from one running
application to another for it to respond, but once I am using that
particular application it sufficiently responsive not to be annoying.

The aggressive reclamation of RAM by the kernel I built around 0800 UTC
Saturday is working against the application currently in focus.

I will try Andrea's and Mel's patches built against 2.6.36-rc6 over the
next several hours to see how that compares.

Arthur.