2014-11-04 08:55:06

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On 11/04/2014 08:26 AM, P. Christeas wrote:
> TL;DR: I'm testing Linus's 3.18-rcX in my desktop (x86_64, full load),
> experiencing mm races about every day. Current -rc starves the canary of
> stablity
>
> Will keep testing (should I try some -mm tree, please? ) , provide you
> feedback about the issue.

Hello,

Please do keep testing (and see below what we need), and don't try
another tree - it's 3.18 we need to fix!

> Not an active kernel-developer.
>
> Long:
>
> Since 26 Oct. upgraded my everything-on-it laptop to new distro (systemd -
> based, all new glibc etc.) and switched from 3.17 to 3.18-pre . First time in
> years, kernel got unstable.
>
> This machine is occasionaly under heavy load, doing I/O and serving random
> desktop applications. (machine is Intel x86_64, dual core, mechanical SATA
> disk).
> Now, I have a race about once a day, have narrowed them down (guess) to:
>
> [<ffffffff813b1025>] preempt_schedule_irq+0x3c/0x59
> [<ffffffff813b4810>] retint_kernel+0x20/0x30
> [<ffffffff810d7481>] ? __zone_watermark_ok+0x77/0x85
> [<ffffffff810d8256>] zone_watermark_ok+0x1a/0x1c
> [<ffffffff810eee56>] compact_zone+0x215/0x4b2
> [<ffffffff810ef13f>] compact_zone_order+0x4c/0x5f
> [<ffffffff810ef2fe>] try_to_compact_pages+0xc4/0x1e8
> [<ffffffff813ad7f8>] __alloc_pages_direct_compact+0x61/0x1bf
> [<ffffffff810da299>] __alloc_pages_nodemask+0x409/0x799
> [<ffffffff8110d3fd>] new_slab+0x5f/0x21c
> ...

I'm not sure what you mean by "race" here and your snippet is
unfortunately just a small portion of the output which could be a BUG,
OOPS, lockdep, soft-lockup, hardlock and possibly many other things. But
the backtrace itself is not enough, please send the whole error output
(it should stard and end with something like:
-----[ cut here ]------
Thanks in advance.

> Sometimes is a less critical process, that I can safely kill, otherwise I have
> to drop everything and reboot.

OK so the process is not dead due to the problem? That probably rules
out some kinds of errors but we still need the full output. Thanks in
advance.

> Unless you are already aware of this case, please accept this feedback.
> I'm pulling from Linus, should I also try some of your trees for an early
> solution?

I'm not aware of this, CCing lkml for wider coverage.


2014-11-04 09:51:45

by P. Christeas

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Tuesday 04 November 2014, Vlastimil Babka wrote:
> Please do keep testing (and see below what we need), and don't try
> another tree - it's 3.18 we need to fix!
Let me apologize/warn you about the poor quality of this report (and debug
data).
It is on a system meant for everyday desktop usage, not kernel development.
Thus, it is tuned to be "slightly" debuggable ; mostly for performance.

> I'm not sure what you mean by "race" here and your snippet is
> unfortunately just a small portion of the output ...

It is a shot in the dark. System becomes non-responsive (narrowed to desktop
apps waiting each other, or the X+kwin blocking), I can feel the CPU heating
and /sometimes/ disk I/O.

No BUG, Oops or any kernel message. (is printk level 4 adequate? )

Then, I try to drop to a console and collect as much data as possible with
SysRq.

The snippet I'd sent you is from all-cpus-backtrace (l), trying to see which
traces appear consistently during the lockup. There is also the huge traces of
"task-states" (t), but I reckon they are too noisy.
That trace also matches the usage profile, because AFAICG[uess] the issue
appears when allocating during I/O load.

After turning on full-preemption, I have been able to terminate/kill all tasks
and continue with same kernel but new userspace.

> OK so the process is not dead due to the problem? That probably rules
> out some kinds of errors but we still need the full output. Thanks in
> advance.
> I'm not aware of this, CCing lkml for wider coverage.

Thank you. As I've told in the first mail, this is an early report of possible
3.18 regression. I'm trying to narrow down the case and make it reproducible
or get a good trace.

Attached is my current .config



Attachments:
config-3.18.gz (34.68 kB)

2014-11-05 15:26:25

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On 11/04/2014 10:36 AM, P. Christeas wrote:
> On Tuesday 04 November 2014, Vlastimil Babka wrote:
>> Please do keep testing (and see below what we need), and don't try
>> another tree - it's 3.18 we need to fix!
> Let me apologize/warn you about the poor quality of this report (and debug
> data).
> It is on a system meant for everyday desktop usage, not kernel development.
> Thus, it is tuned to be "slightly" debuggable ; mostly for performance.
>
>> I'm not sure what you mean by "race" here and your snippet is
>> unfortunately just a small portion of the output ...
>
> It is a shot in the dark. System becomes non-responsive (narrowed to desktop
> apps waiting each other, or the X+kwin blocking), I can feel the CPU heating
> and /sometimes/ disk I/O.
>
> No BUG, Oops or any kernel message. (is printk level 4 adequate? )
>
> Then, I try to drop to a console and collect as much data as possible with
> SysRq.
>
> The snippet I'd sent you is from all-cpus-backtrace (l), trying to see which
> traces appear consistently during the lockup. There is also the huge traces of
> "task-states" (t), but I reckon they are too noisy.
> That trace also matches the usage profile, because AFAICG[uess] the issue
> appears when allocating during I/O load.
>
> After turning on full-preemption, I have been able to terminate/kill all tasks
> and continue with same kernel but new userspace.
>
>> OK so the process is not dead due to the problem? That probably rules
>> out some kinds of errors but we still need the full output. Thanks in
>> advance.
>> I'm not aware of this, CCing lkml for wider coverage.
>
> Thank you. As I've told in the first mail, this is an early report of possible
> 3.18 regression. I'm trying to narrow down the case and make it reproducible
> or get a good trace.

I see. I've tried to reproduce such issues with 3.18-rc3 but wasn't successful.
But I noticed a possible issue that could lead to your problem.
Can you please try the following patch?

--------8<-------
>From fe9c963cc665cdab50abb41f3babb5b19d08fab1 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Wed, 5 Nov 2014 14:19:18 +0100
Subject: [PATCH] mm, compaction: do not reset deferred compaction
optimistically

In try_to_compact_pages() we reset deferred compaction for a zone where we
think compaction has succeeded. Although this action does not reset the
counters affecting deferred compaction period, just bumping the deferred order
means that another compaction attempt will be able to pass the check in
compaction_deferred() and proceed with compaction.

This is a problem when try_to_compact_pages() thinks compaction was successful
just because the watermark check is missing proper classzone_idx parameter,
but then the allocation attempt itself will fail due to its watermark check
having the proper value. Although __alloc_pages_direct_compact() will re-defer
compaction in such case, this happens only in the case of sync compaction.
Async compaction will leave the zone open for another compaction attempt which
may reset the deferred order again. This could possibly explain what
P. Christeas reported - a system where many processes include the following
backtrace:

[<ffffffff813b1025>] preempt_schedule_irq+0x3c/0x59
[<ffffffff813b4810>] retint_kernel+0x20/0x30
[<ffffffff810d7481>] ? __zone_watermark_ok+0x77/0x85
[<ffffffff810d8256>] zone_watermark_ok+0x1a/0x1c
[<ffffffff810eee56>] compact_zone+0x215/0x4b2
[<ffffffff810ef13f>] compact_zone_order+0x4c/0x5f
[<ffffffff810ef2fe>] try_to_compact_pages+0xc4/0x1e8
[<ffffffff813ad7f8>] __alloc_pages_direct_compact+0x61/0x1bf
[<ffffffff810da299>] __alloc_pages_nodemask+0x409/0x799
[<ffffffff8110d3fd>] new_slab+0x5f/0x21c

The issue has been made visible by commit 53853e2d2bfb ("mm, compaction: defer
each zone individually instead of preferred zone"), since before the commit,
deferred compaction for fallback zones (where classzone_idx matters) was not
considered separately.

Although work is underway to fix the underlying zone watermark check mismatch,
this patch fixes the immediate problem by removing the optimistic defer reset
completely. Its usefulness is questionable anyway, since if the allocation
really succeeds, a full defer reset (including the period counters) follows.

Fixes: 53853e2d2bfb748a8b5aa2fd1de15699266865e0
Reported-by: P. Christeas <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ec74cf0..f0335f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
alloc_flags)) {
*candidate_zone = zone;
/*
- * We think the allocation will succeed in this zone,
- * but it is not certain, hence the false. The caller
- * will repeat this with true if allocation indeed
- * succeeds in this zone.
- */
- compaction_defer_reset(zone, order, false);
- /*
* It is possible that async compaction aborted due to
* need_resched() and the watermarks were ok thanks to
* somebody else freeing memory. The allocation can
--
2.1.2

2014-11-05 16:03:54

by P. Christeas

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Wednesday 05 November 2014, Vlastimil Babka wrote:
> I see. I've tried to reproduce such issues with 3.18-rc3 but wasn't
> successful. But I noticed a possible issue that could lead to your problem.
> Can you please try the following patch?

OK, I can give it a try.

FYI, the "stability canary" is still alive, my system is on for 28hours, under
avg. load >=3 all this time, HEAD=980d0d51b1c9617a4

/me goes busy fire-proofing your patch...


2014-11-06 19:23:59

by P. Christeas

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Wednesday 05 November 2014, Vlastimil Babka wrote:
> Can you please try the following patch?
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist
> - compaction_defer_reset(zone, order, false);

NACK :(

I just got again into a state that some task was spinning out of control, and
blocking the rest of the desktop.

You will see me trying a few things, apparently the first OOM managed to
unblock something, but a few seconds later the system "stepped" on some other
blocking task.

See attached log, it may only give you some hint; the problem could well be in
some other part of the kernel.

In the meanwhile, I'm pulling linus/master ...


Attachments:
kcrash.log (38.98 kB)

2014-11-06 21:38:10

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On 11/06/2014 08:23 PM, P. Christeas wrote:
> On Wednesday 05 November 2014, Vlastimil Babka wrote:
>> Can you please try the following patch?
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist
>> - compaction_defer_reset(zone, order, false);
>
> NACK :(

Sigh.

> I just got again into a state that some task was spinning out of control, and
> blocking the rest of the desktop.

Well this is similar to reports [1] and [2] except [1] points to
isolate_freepages_block() and your traces only go as deep as compact_zone. Which
probably inlines isolate_migratepages* but I would expect it cannot inline
isolate_freepages* due to invocation via pointer.

> You will see me trying a few things, apparently the first OOM managed to
> unblock something, but a few seconds later the system "stepped" on some other
> blocking task.
>
> See attached log, it may only give you some hint; the problem could well be in
> some other part of the kernel.

Well I doubt that but I'd like to be surprised :)

> In the meanwhile, I'm pulling linus/master ...

Could you perhaps bisect the most suspicious part? It's not a lot of commits
and you seem to be reproducing this quite easily?

commit 447f05bb488bff4282088259b04f47f0f9f76760 should be good
commit 6d7ce55940b6ecd463ca044ad241f0122d913293 should be bad

If that's true, then bisection should find the cause rather quickly.

Oh and did I ask in this thread for /proc/zoneinfo yet? :)

Thanks.

> kcrash.log
>

[1]
http://article.gmane.org/gmane.linux.kernel.mm/124451/match=isolate_freepages_block+very+high+intermittent+overhead

[2] https://lkml.org/lkml/2014/11/4/904

2014-11-08 13:12:30

by P. Christeas

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Thursday 06 November 2014, Vlastimil Babka wrote:
> > On Wednesday 05 November 2014, Vlastimil Babka wrote:
> >> Can you please try the following patch?
> >> - compaction_defer_reset(zone, order, false);
> Oh and did I ask in this thread for /proc/zoneinfo yet? :)

Using that same kernel[1], got again into a race, gathered a few more data.

This time, I had 1x "urpmq" process [2] hung at 100% CPU , when "kwin" got
apparently blocked (100% CPU, too) trying to resize a GUI window. I suppose
the resizing operation would mean heavy memory alloc/free.

The rest of the system was responsive, I could easily get a console, login,
gather the files.. Then, I have *killed* -9 the "urpmq" process, which solved
the race and my system is still alive! "kwin" is still running, returned to
regular CPU load.

Attached is traces from SysRq+l (pressed a few times, wanted to "snapshot" the
stack) and /proc/zoneinfo + /proc/vmstat

Bisection is not yet meaningful, IMHO, because I cannot be sure that "good"
points are really free from this issue. I'd estimate that each test would take
+3days, unless I really find a deterministic way to reproduce the issue .


Thank you, again.


[1] linus's didn't have any -mm changes, so I haven't compiled anything yet.
This means it also contains the "- compaction_defer_reset()" change

[2] urpmq is a Mandrake distro Perl script for querying the RPM database. It
does some disk I/O , loads data into allocated Perl structs and sorts that,
FYI.


Attachments:
zoneinfo.log (4.88 kB)
kcrash2.log (38.60 kB)
Download all attachments

2014-11-08 22:18:42

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On 11/08/2014 02:11 PM, P. Christeas wrote:
> On Thursday 06 November 2014, Vlastimil Babka wrote:
>>> On Wednesday 05 November 2014, Vlastimil Babka wrote:
>>>> Can you please try the following patch?
>>>> - compaction_defer_reset(zone, order, false);
>> Oh and did I ask in this thread for /proc/zoneinfo yet? :)
>
> Using that same kernel[1], got again into a race, gathered a few more data.
>
> This time, I had 1x "urpmq" process [2] hung at 100% CPU , when "kwin" got
> apparently blocked (100% CPU, too) trying to resize a GUI window. I suppose
> the resizing operation would mean heavy memory alloc/free.
>
> The rest of the system was responsive, I could easily get a console, login,
> gather the files.. Then, I have *killed* -9 the "urpmq" process, which solved
> the race and my system is still alive! "kwin" is still running, returned to
> regular CPU load.
>
> Attached is traces from SysRq+l (pressed a few times, wanted to "snapshot" the
> stack) and /proc/zoneinfo + /proc/vmstat
>
> Bisection is not yet meaningful, IMHO, because I cannot be sure that "good"
> points are really free from this issue. I'd estimate that each test would take
> +3days, unless I really find a deterministic way to reproduce the issue .

Hi,

I think I finally found the cause by staring into the code... CCing
people from all 4 separate threads I know about this issue.
The problem with finding the cause was that the first report I got from
Markus was about isolate_freepages_block() overhead, and later Norbert
reported that reverting a patch for isolate_freepages* helped. But the
problem seems to be that although the loop in isolate_migratepages exits
because the scanners almost meet (they are within same pageblock), they
don't truly meet, therefore compact_finished() decides to continue, but
isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
made this situation possible, as free scaner pfn can now point to a
middle of pageblock.

So I hope the attached patch will fix the soft-lockup issues in
compact_zone. Please apply on 3.18-rc3 or later without any other reverts,
and test. It probably won't help Markus and his isolate_freepages_block()
overhead though...

Thanks,
Vlastimil

------8<------
>From fbf8eb0bcd2897090312e23da6a31bad9cc6b337 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Sat, 8 Nov 2014 22:20:43 +0100
Subject: [PATCH] mm, compaction: prevent endless loop in migrate scanner

---
mm/compaction.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ec74cf0..1b7a1be 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1029,8 +1029,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}

acct_isolated(zone, cc);
- /* Record where migration scanner will be restarted */
- cc->migrate_pfn = low_pfn;
+ /*
+ * Record where migration scanner will be restarted. If we end up in
+ * the same pageblock as the free scanner, make the scanners fully
+ * meet so that compact_finished() terminates compaction.
+ */
+ cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;

return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
}
--
2.1.2

2014-11-09 08:27:52

by Pavel Machek

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

Hi!

> >> Oh and did I ask in this thread for /proc/zoneinfo yet? :)
> >
> > Using that same kernel[1], got again into a race, gathered a few more data.
> >
> > This time, I had 1x "urpmq" process [2] hung at 100% CPU , when "kwin" got
> > apparently blocked (100% CPU, too) trying to resize a GUI window. I suppose
> > the resizing operation would mean heavy memory alloc/free.
> >
> > The rest of the system was responsive, I could easily get a console, login,
> > gather the files.. Then, I have *killed* -9 the "urpmq" process, which solved
> > the race and my system is still alive! "kwin" is still running, returned to
> > regular CPU load.
> >
> > Attached is traces from SysRq+l (pressed a few times, wanted to "snapshot" the
> > stack) and /proc/zoneinfo + /proc/vmstat
> >
> > Bisection is not yet meaningful, IMHO, because I cannot be sure that "good"
> > points are really free from this issue. I'd estimate that each test would take
> > +3days, unless I really find a deterministic way to reproduce the issue .
>
> Hi,
>
> I think I finally found the cause by staring into the code... CCing
> people from all 4 separate threads I know about this issue.
> The problem with finding the cause was that the first report I got from
> Markus was about isolate_freepages_block() overhead, and later Norbert
> reported that reverting a patch for isolate_freepages* helped. But the
> problem seems to be that although the loop in isolate_migratepages exits
> because the scanners almost meet (they are within same pageblock), they
> don't truly meet, therefore compact_finished() decides to continue, but
> isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
> made this situation possible, as free scaner pfn can now point to a
> middle of pageblock.

Ok, it seems it happened second time now, again shortly after
resume. I guess I should apply your patch after all.

(Or... instead it should go to Linus ASAP -- it fixes known problem
that is affected people, and we want it in soon in case it is not
complete fix.)

Dmesg is in the attachment, perhaps it helps.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.21 kB)
delme.gz (18.00 kB)
Download all attachments

2014-11-09 09:43:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On 11/09/2014 09:27 AM, Pavel Machek wrote:
> Hi!
>
>>>> Oh and did I ask in this thread for /proc/zoneinfo yet? :)
>>>
>>> Using that same kernel[1], got again into a race, gathered a few more data.
>>>
>>> This time, I had 1x "urpmq" process [2] hung at 100% CPU , when "kwin" got
>>> apparently blocked (100% CPU, too) trying to resize a GUI window. I suppose
>>> the resizing operation would mean heavy memory alloc/free.
>>>
>>> The rest of the system was responsive, I could easily get a console, login,
>>> gather the files.. Then, I have *killed* -9 the "urpmq" process, which solved
>>> the race and my system is still alive! "kwin" is still running, returned to
>>> regular CPU load.
>>>
>>> Attached is traces from SysRq+l (pressed a few times, wanted to "snapshot" the
>>> stack) and /proc/zoneinfo + /proc/vmstat
>>>
>>> Bisection is not yet meaningful, IMHO, because I cannot be sure that "good"
>>> points are really free from this issue. I'd estimate that each test would take
>>> +3days, unless I really find a deterministic way to reproduce the issue .
>>
>> Hi,
>>
>> I think I finally found the cause by staring into the code... CCing
>> people from all 4 separate threads I know about this issue.
>> The problem with finding the cause was that the first report I got from
>> Markus was about isolate_freepages_block() overhead, and later Norbert
>> reported that reverting a patch for isolate_freepages* helped. But the
>> problem seems to be that although the loop in isolate_migratepages exits
>> because the scanners almost meet (they are within same pageblock), they
>> don't truly meet, therefore compact_finished() decides to continue, but
>> isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
>> made this situation possible, as free scaner pfn can now point to a
>> middle of pageblock.
>
> Ok, it seems it happened second time now, again shortly after
> resume. I guess I should apply your patch after all.

Thanks.

> (Or... instead it should go to Linus ASAP -- it fixes known problem
> that is affected people, and we want it in soon in case it is not
> complete fix.)

I don't want to send untested fix, and wasn't able to reproduce the bug
myself. I think Norbert could do it rather quickly so I hope he can tell
us soon.

> Dmesg is in the attachment, perhaps it helps.
> Pavel

It looks the same as before, so no surprises there, which is good.

2014-11-09 22:32:34

by Norbert Preining

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

Hi Vlastimil, hi all,

On Sun, 09 Nov 2014, Vlastimil Babka wrote:
> I don't want to send untested fix, and wasn't able to reproduce the bug
> myself. I think Norbert could do it rather quickly so I hope he can tell
> us soon.

Sorry, weekend means I am away from my laptop for extended times,
and I wanted to give it a bit of stress testing.

No problems till now, no hangs, all working as expected with
your latest patch.

Thanks a lot

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------

2014-11-10 06:05:24

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Sat, Nov 08, 2014 at 11:18:37PM +0100, Vlastimil Babka wrote:
> On 11/08/2014 02:11 PM, P. Christeas wrote:
> > On Thursday 06 November 2014, Vlastimil Babka wrote:
> >>> On Wednesday 05 November 2014, Vlastimil Babka wrote:
> >>>> Can you please try the following patch?
> >>>> - compaction_defer_reset(zone, order, false);
> >> Oh and did I ask in this thread for /proc/zoneinfo yet? :)
> >
> > Using that same kernel[1], got again into a race, gathered a few more data.
> >
> > This time, I had 1x "urpmq" process [2] hung at 100% CPU , when "kwin" got
> > apparently blocked (100% CPU, too) trying to resize a GUI window. I suppose
> > the resizing operation would mean heavy memory alloc/free.
> >
> > The rest of the system was responsive, I could easily get a console, login,
> > gather the files.. Then, I have *killed* -9 the "urpmq" process, which solved
> > the race and my system is still alive! "kwin" is still running, returned to
> > regular CPU load.
> >
> > Attached is traces from SysRq+l (pressed a few times, wanted to "snapshot" the
> > stack) and /proc/zoneinfo + /proc/vmstat
> >
> > Bisection is not yet meaningful, IMHO, because I cannot be sure that "good"
> > points are really free from this issue. I'd estimate that each test would take
> > +3days, unless I really find a deterministic way to reproduce the issue .
>
> Hi,
>
> I think I finally found the cause by staring into the code... CCing
> people from all 4 separate threads I know about this issue.
> The problem with finding the cause was that the first report I got from
> Markus was about isolate_freepages_block() overhead, and later Norbert
> reported that reverting a patch for isolate_freepages* helped. But the
> problem seems to be that although the loop in isolate_migratepages exits
> because the scanners almost meet (they are within same pageblock), they
> don't truly meet, therefore compact_finished() decides to continue, but
> isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
> made this situation possible, as free scaner pfn can now point to a
> middle of pageblock.

Indeed.

>
> So I hope the attached patch will fix the soft-lockup issues in
> compact_zone. Please apply on 3.18-rc3 or later without any other reverts,
> and test. It probably won't help Markus and his isolate_freepages_block()
> overhead though...

Yes, I found this bug too, but, it can't explain
isolate_freepages_block() overhead. Anyway, I can't find another bug
related to isolate_freepages_block(). :/

> Thanks,
> Vlastimil
>
> ------8<------
> >From fbf8eb0bcd2897090312e23da6a31bad9cc6b337 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Sat, 8 Nov 2014 22:20:43 +0100
> Subject: [PATCH] mm, compaction: prevent endless loop in migrate scanner
>
> ---
> mm/compaction.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ec74cf0..1b7a1be 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1029,8 +1029,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> }
>
> acct_isolated(zone, cc);
> - /* Record where migration scanner will be restarted */
> - cc->migrate_pfn = low_pfn;
> + /*
> + * Record where migration scanner will be restarted. If we end up in
> + * the same pageblock as the free scanner, make the scanners fully
> + * meet so that compact_finished() terminates compaction.
> + */
> + cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
>
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }

IMHO, proper fix is not to change this logic, but, to change decision
logic in compact_finished() and in compact_zone(). Maybe helper
function would be good for readability.

Thanks.

2014-11-10 07:53:45

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On 11/10/2014 07:07 AM, Joonsoo Kim wrote:
> On Sat, Nov 08, 2014 at 11:18:37PM +0100, Vlastimil Babka wrote:
>> On 11/08/2014 02:11 PM, P. Christeas wrote:
>>
>> Hi,
>>
>> I think I finally found the cause by staring into the code... CCing
>> people from all 4 separate threads I know about this issue.
>> The problem with finding the cause was that the first report I got from
>> Markus was about isolate_freepages_block() overhead, and later Norbert
>> reported that reverting a patch for isolate_freepages* helped. But the
>> problem seems to be that although the loop in isolate_migratepages exits
>> because the scanners almost meet (they are within same pageblock), they
>> don't truly meet, therefore compact_finished() decides to continue, but
>> isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
>> made this situation possible, as free scaner pfn can now point to a
>> middle of pageblock.
>
> Indeed.
>
>>
>> So I hope the attached patch will fix the soft-lockup issues in
>> compact_zone. Please apply on 3.18-rc3 or later without any other reverts,
>> and test. It probably won't help Markus and his isolate_freepages_block()
>> overhead though...
>
> Yes, I found this bug too, but, it can't explain
> isolate_freepages_block() overhead. Anyway, I can't find another bug
> related to isolate_freepages_block(). :/

Thanks for checking.

>> Thanks,
>> Vlastimil
>>
>> ------8<------
>> >From fbf8eb0bcd2897090312e23da6a31bad9cc6b337 Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <[email protected]>
>> Date: Sat, 8 Nov 2014 22:20:43 +0100
>> Subject: [PATCH] mm, compaction: prevent endless loop in migrate scanner
>>
>> ---
>> mm/compaction.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index ec74cf0..1b7a1be 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1029,8 +1029,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>> }
>>
>> acct_isolated(zone, cc);
>> - /* Record where migration scanner will be restarted */
>> - cc->migrate_pfn = low_pfn;
>> + /*
>> + * Record where migration scanner will be restarted. If we end up in
>> + * the same pageblock as the free scanner, make the scanners fully
>> + * meet so that compact_finished() terminates compaction.
>> + */
>> + cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
>>
>> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>> }
>
> IMHO, proper fix is not to change this logic, but, to change decision
> logic in compact_finished() and in compact_zone(). Maybe helper
> function would be good for readability.

OK but I would think that to fix 3.18 ASAP and not introduce more
regressions, go with the patch above first, as it is the minimal fix and
people already test it. Then we can implement your suggestion later as a
cleanup for 3.19?

Vlastimil

> Thanks.
>

2014-11-10 08:04:03

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Mon, Nov 10, 2014 at 08:53:38AM +0100, Vlastimil Babka wrote:
> On 11/10/2014 07:07 AM, Joonsoo Kim wrote:
> >On Sat, Nov 08, 2014 at 11:18:37PM +0100, Vlastimil Babka wrote:
> >>On 11/08/2014 02:11 PM, P. Christeas wrote:
> >>
> >>Hi,
> >>
> >>I think I finally found the cause by staring into the code... CCing
> >>people from all 4 separate threads I know about this issue.
> >>The problem with finding the cause was that the first report I got from
> >>Markus was about isolate_freepages_block() overhead, and later Norbert
> >>reported that reverting a patch for isolate_freepages* helped. But the
> >>problem seems to be that although the loop in isolate_migratepages exits
> >>because the scanners almost meet (they are within same pageblock), they
> >>don't truly meet, therefore compact_finished() decides to continue, but
> >>isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
> >>made this situation possible, as free scaner pfn can now point to a
> >>middle of pageblock.
> >
> >Indeed.
> >
> >>
> >>So I hope the attached patch will fix the soft-lockup issues in
> >>compact_zone. Please apply on 3.18-rc3 or later without any other reverts,
> >>and test. It probably won't help Markus and his isolate_freepages_block()
> >>overhead though...
> >
> >Yes, I found this bug too, but, it can't explain
> >isolate_freepages_block() overhead. Anyway, I can't find another bug
> >related to isolate_freepages_block(). :/
>
> Thanks for checking.
>
> >>Thanks,
> >>Vlastimil
> >>
> >>------8<------
> >>>From fbf8eb0bcd2897090312e23da6a31bad9cc6b337 Mon Sep 17 00:00:00 2001
> >>From: Vlastimil Babka <[email protected]>
> >>Date: Sat, 8 Nov 2014 22:20:43 +0100
> >>Subject: [PATCH] mm, compaction: prevent endless loop in migrate scanner
> >>
> >>---
> >> mm/compaction.c | 8 ++++++--
> >> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/mm/compaction.c b/mm/compaction.c
> >>index ec74cf0..1b7a1be 100644
> >>--- a/mm/compaction.c
> >>+++ b/mm/compaction.c
> >>@@ -1029,8 +1029,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >> }
> >>
> >> acct_isolated(zone, cc);
> >>- /* Record where migration scanner will be restarted */
> >>- cc->migrate_pfn = low_pfn;
> >>+ /*
> >>+ * Record where migration scanner will be restarted. If we end up in
> >>+ * the same pageblock as the free scanner, make the scanners fully
> >>+ * meet so that compact_finished() terminates compaction.
> >>+ */
> >>+ cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> >>
> >> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> >> }
> >
> >IMHO, proper fix is not to change this logic, but, to change decision
> >logic in compact_finished() and in compact_zone(). Maybe helper
> >function would be good for readability.
>
> OK but I would think that to fix 3.18 ASAP and not introduce more
> regressions, go with the patch above first, as it is the minimal fix
> and people already test it. Then we can implement your suggestion
> later as a cleanup for 3.19?

Yeap. Agreed.

Thanks.

2014-11-10 08:14:37

by P. Christeas

[permalink] [raw]
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

On Saturday 08 November 2014, Vlastimil Babka wrote:
> >From fbf8eb0bcd2897090312e23da6a31bad9cc6b337 Mon Sep 17 00:00:00 2001
>
> From: Vlastimil Babka <[email protected]>
> Date: Sat, 8 Nov 2014 22:20:43 +0100
> Subject: [PATCH] mm, compaction: prevent endless loop in migrate scanner

After 30hrs uptime, I also mark this test as PASSED .

:)