2014-11-04 23:20:30

by Norbert Preining

[permalink] [raw]
Subject: khugepaged / firefox going wild in 3.18-rc

Dear all,

(please Cc)

since 3.18-rc started, regularly, but unpredictably, firefox
(Debian iceweasel) goes completely boom to 100% CPU, and there
is also a kernel task khugepaged doing the same.

I need to kill firefox, or better the X session to get it back to normal.

THe logs don't show any strange output

top's top process:

27021 norbert 20 0 1351536 449868 88788 R 100.4 5.6 4:46.16 iceweasel
35 root 39 19 0 0 0 R 100.0 0.0 2:41.10 khugepaged

Switching back to 3.17 does Not* exhibit the same behaviour.

Is this a bug in Firefox/Iceweasel, or is there something luring in the
kernel.


All the best

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------


2014-11-05 00:00:42

by David Rientjes

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

On Wed, 5 Nov 2014, Norbert Preining wrote:

> Dear all,
>
> (please Cc)
>
> since 3.18-rc started, regularly, but unpredictably, firefox
> (Debian iceweasel) goes completely boom to 100% CPU, and there
> is also a kernel task khugepaged doing the same.
>
> I need to kill firefox, or better the X session to get it back to normal.
>
> THe logs don't show any strange output
>
> top's top process:
>
> 27021 norbert 20 0 1351536 449868 88788 R 100.4 5.6 4:46.16 iceweasel
> 35 root 39 19 0 0 0 R 100.0 0.0 2:41.10 khugepaged
>
> Switching back to 3.17 does Not* exhibit the same behaviour.
>
> Is this a bug in Firefox/Iceweasel, or is there something luring in the
> kernel.
>

If you have the ability to kill your X session, then you presumably are
able to capture /proc/pid/stack for these pids to see where it is busy.
It might also be helpful to see how the grep ^thp_ /proc/vmstat and
grep ^compact_ /proc/vmstat counters change over time.

2014-11-05 00:10:30

by Norbert Preining

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

Hi David,

thanks for your answer.

On Tue, 04 Nov 2014, David Rientjes wrote:
> If you have the ability to kill your X session, then you presumably are
> able to capture /proc/pid/stack for these pids to see where it is busy.

Yes I can do that, I can even reproduce it on *every* start of iceweasel
after it happened the first time.

There is no /proc/PID/stack, though .....

> It might also be helpful to see how the grep ^thp_ /proc/vmstat and
> grep ^compact_ /proc/vmstat counters change over time.

I captured that over timne, as well as the contents of
/proc/PID/stat
for iceweasel and khugepaged. There are some numbers steadily
increasing.

Here is what I got:

START
thp_fault_alloc 9092
thp_fault_fallback 87
thp_collapse_alloc 5003
thp_collapse_alloc_failed 19
thp_split 2889
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 400301
compact_free_scanned 8634053
compact_isolated 687993
compact_stall 567
compact_fail 101
compact_success 466


thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===


/proc/PID/stat for iceweasel over some time:
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 14849 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 2 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15130 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15460 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15724 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0


/proc/PID/stat for khugepaged over some time:
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 39779 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40104 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40516 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40797 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0



Hope that helps

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------

2014-11-05 00:12:46

by Norbert Preining

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

Hi David,

one more thing, attached dmesg output with some page faults,
maybe this is connected?

On Wed, 05 Nov 2014, Norbert Preining wrote:
> Hi David,
>
> thanks for your answer.
>
> On Tue, 04 Nov 2014, David Rientjes wrote:
> > If you have the ability to kill your X session, then you presumably are
> > able to capture /proc/pid/stack for these pids to see where it is busy.
>
> Yes I can do that, I can even reproduce it on *every* start of iceweasel
> after it happened the first time.
>
> There is no /proc/PID/stack, though .....
>
> > It might also be helpful to see how the grep ^thp_ /proc/vmstat and
> > grep ^compact_ /proc/vmstat counters change over time.
>
> I captured that over timne, as well as the contents of
> /proc/PID/stat
> for iceweasel and khugepaged. There are some numbers steadily
> increasing.
>
> Here is what I got:
>
> START
> thp_fault_alloc 9092
> thp_fault_fallback 87
> thp_collapse_alloc 5003
> thp_collapse_alloc_failed 19
> thp_split 2889
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 400301
> compact_free_scanned 8634053
> compact_isolated 687993
> compact_stall 567
> compact_fail 101
> compact_success 466
>
>
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
>
>
> /proc/PID/stat for iceweasel over some time:
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 14849 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 2 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15130 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15460 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15724 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
>
>
> /proc/PID/stat for khugepaged over some time:
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 39779 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40104 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40516 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40797 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
>

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------


Attachments:
(No filename) (5.11 kB)
dmesg.txt (5.07 kB)
Download all attachments

2014-11-05 00:20:13

by David Rientjes

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

On Wed, 5 Nov 2014, Norbert Preining wrote:

> Hi David,
>
> one more thing, attached dmesg output with some page faults,
> maybe this is connected?
>

Hmm, I'm not aware of any mm->mmap_sem starvation issues in 3.18-rc, maybe
this is a duplicate of another issue that someone has reported that I
haven't seen. The lengthy output of echo t > /proc/sysrq-trigger should
give a clue as to what is holding it or perhaps this is a more generic
rwsem issue.

2014-11-06 12:25:52

by Vlastimil Babka

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

On 11/05/2014 01:20 AM, David Rientjes wrote:
> On Wed, 5 Nov 2014, Norbert Preining wrote:
>
>> Hi David,
>>
>> one more thing, attached dmesg output with some page faults,
>> maybe this is connected?
>>
>
> Hmm, I'm not aware of any mm->mmap_sem starvation issues in 3.18-rc, maybe
> this is a duplicate of another issue that someone has reported that I
> haven't seen. The lengthy output of echo t > /proc/sysrq-trigger should
> give a clue as to what is holding it or perhaps this is a more generic
> rwsem issue.

Could be that another task holds the mmap_sem during THP allocation attempt on
its own page fault, and compaction goes in some kind of infinite loop. There are
two other threads that look similar:

http://article.gmane.org/gmane.linux.kernel.mm/124451/match=isolate_freepages_block+very+high+intermittent+overhead

https://lkml.org/lkml/2014/11/4/144

I suggested testing a commit revert in one thread, and a possible fix in the
other. If you can reproduce this well, that would be very useful.

khugepaged using CPU also points to either the address space scanning, or
compaction going wrong. Since 8b1645685ac it shouldn't hold mmap_sem during
compaction, but that still leaves page faulters to possibly hold it.

So yeah we would need the stacks of processes that do hog the CPU's, not those
that sleep. As David suggested, a /proc/pid/stack could work. Also can you
please provide /proc/zoneinfo ?

Thanks,
Vlastimil

2014-11-06 12:39:12

by Norbert Preining

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

Hi Vlastimil

thanks for your answer.

In the meantime I have tried rc3, too, with the same effects.

Interestingly, once it goes into a bad state, every future approach
does the same. I started shotwell (photo organizer) and it went into the
same state (khugepaged / shotwell each using about 100% of CPu time).

On Thu, 06 Nov 2014, Vlastimil Babka wrote:
> Could be that another task holds the mmap_sem during THP allocation attempt on
> its own page fault, and compaction goes in some kind of infinite loop. There are

My feeling somehow is about the plugin-container in firefox ...
(flashplayer or something similar, but I might be wrong!). With shotwell,
I have no idea why.

> I suggested testing a commit revert in one thread, and a possible fix in the
> other. If you can reproduce this well, that would be very useful.

Which commit are you talking about? I can easily revert some/all of what you
want and do test runs.

> khugepaged using CPU also points to either the address space scanning, or
> compaction going wrong. Since 8b1645685ac it shouldn't hold mmap_sem during
> compaction, but that still leaves page faulters to possibly hold it.

So, do you mean I should try reverting 8b1645685ac?

> So yeah we would need the stacks of processes that do hog the CPU's, not those
> that sleep. As David suggested, a /proc/pid/stack could work. Also can you
> please provide /proc/zoneinfo ?

Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
this depending on some specific kerenl option?

/proc/zoneinfo I have and can send you the next time.

Thanks a lot and all the best

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------

2014-11-06 13:03:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

On 11/06/2014 01:39 PM, Norbert Preining wrote:
> Hi Vlastimil
>
> thanks for your answer.
>
> In the meantime I have tried rc3, too, with the same effects.
>
> Interestingly, once it goes into a bad state, every future approach
> does the same. I started shotwell (photo organizer) and it went into the
> same state (khugepaged / shotwell each using about 100% of CPu time).
>
> On Thu, 06 Nov 2014, Vlastimil Babka wrote:
>> Could be that another task holds the mmap_sem during THP allocation attempt on
>> its own page fault, and compaction goes in some kind of infinite loop. There are
>
> My feeling somehow is about the plugin-container in firefox ...
> (flashplayer or something similar, but I might be wrong!). With shotwell,
> I have no idea why.

plugin-container is different process than firefox, so it should show its CPU
consumption separately. If you see firefox, it's firefox binary itself.

>> I suggested testing a commit revert in one thread, and a possible fix in the
>> other. If you can reproduce this well, that would be very useful.
>
> Which commit are you talking about? I can easily revert some/all of what you
> want and do test runs.

OK, one possibility is to do (it should apply cleanly)
git revert e14c720efdd73c6d69cd8d07fa894bcd11fe1973

Then there's a patch at the end of this e-mail, which however I doubt would fix
the symptoms you describe.


>> khugepaged using CPU also points to either the address space scanning, or
>> compaction going wrong. Since 8b1645685ac it shouldn't hold mmap_sem during
>> compaction, but that still leaves page faulters to possibly hold it.
>
> So, do you mean I should try reverting 8b1645685ac?

No, not that one. That one should actually reduce the problem you see.

>> So yeah we would need the stacks of processes that do hog the CPU's, not those
>> that sleep. As David suggested, a /proc/pid/stack could work. Also can you
>> please provide /proc/zoneinfo ?
>
> Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
> this depending on some specific kerenl option?

Ah I missed that. Should be CONFIG_STACKTRACE to enable that.

> /proc/zoneinfo I have and can send you the next time.
>
> Thanks a lot and all the best

Great, thank you!
Vlastimil

> Norbert
>
> ------------------------------------------------------------------------
> PREINING, Norbert http://www.preining.info
> JAIST, Japan TeX Live & Debian Developer
> GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
> ------------------------------------------------------------------------
>

------8<------
>From fe9c963cc665cdab50abb41f3babb5b19d08fab1 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Wed, 5 Nov 2014 14:19:18 +0100
Subject: [PATCH] mm, compaction: do not reset deferred compaction
optimistically

In try_to_compact_pages() we reset deferred compaction for a zone where we
think compaction has succeeded. Although this action does not reset the
counters affecting deferred compaction period, just bumping the deferred order
means that another compaction attempt will be able to pass the check in
compaction_deferred() and proceed with compaction.

This is a problem when try_to_compact_pages() thinks compaction was successful
just because the watermark check is missing proper classzone_idx parameter,
but then the allocation attempt itself will fail due to its watermark check
having the proper value. Although __alloc_pages_direct_compact() will re-defer
compaction in such case, this happens only in the case of sync compaction.
Async compaction will leave the zone open for another compaction attempt which
may reset the deferred order again. This could possibly explain what
P. Christeas reported - a system where many processes include the following
backtrace:

[<ffffffff813b1025>] preempt_schedule_irq+0x3c/0x59
[<ffffffff813b4810>] retint_kernel+0x20/0x30
[<ffffffff810d7481>] ? __zone_watermark_ok+0x77/0x85
[<ffffffff810d8256>] zone_watermark_ok+0x1a/0x1c
[<ffffffff810eee56>] compact_zone+0x215/0x4b2
[<ffffffff810ef13f>] compact_zone_order+0x4c/0x5f
[<ffffffff810ef2fe>] try_to_compact_pages+0xc4/0x1e8
[<ffffffff813ad7f8>] __alloc_pages_direct_compact+0x61/0x1bf
[<ffffffff810da299>] __alloc_pages_nodemask+0x409/0x799
[<ffffffff8110d3fd>] new_slab+0x5f/0x21c

The issue has been made visible by commit 53853e2d2bfb ("mm, compaction: defer
each zone individually instead of preferred zone"), since before the commit,
deferred compaction for fallback zones (where classzone_idx matters) was not
considered separately.

Although work is underway to fix the underlying zone watermark check mismatch,
this patch fixes the immediate problem by removing the optimistic defer reset
completely. Its usefulness is questionable anyway, since if the allocation
really succeeds, a full defer reset (including the period counters) follows.

Fixes: 53853e2d2bfb748a8b5aa2fd1de15699266865e0
Reported-by: P. Christeas <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ec74cf0..f0335f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
alloc_flags)) {
*candidate_zone = zone;
/*
- * We think the allocation will succeed in this zone,
- * but it is not certain, hence the false. The caller
- * will repeat this with true if allocation indeed
- * succeeds in this zone.
- */
- compaction_defer_reset(zone, order, false);
- /*
* It is possible that async compaction aborted due to
* need_resched() and the watermarks were ok thanks to
* somebody else freeing memory. The allocation can
--
2.1.2

2014-11-07 13:07:42

by Norbert Preining

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

Hi Vlastimil,

On Thu, 06 Nov 2014, Vlastimil Babka wrote:
> OK, one possibility is to do (it should apply cleanly)
> git revert e14c720efdd73c6d69cd8d07fa894bcd11fe1973

I am running with this reverted now, and I cannot reproduce the
khugepage going overboard anymore.

I tried hard since 2 days now, no chance to reproduce.
Before firefox or shotwell went into boom mode, but not now.
(For now!)

> > Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
> > this depending on some specific kerenl option?
>
> Ah I missed that. Should be CONFIG_STACKTRACE to enable that.

If you want, I can compile a "default" kernel without the above commit
reverted, and turn on STACKTRACE (I have turned it one for *all* future
kernels I compile ;-)

I also didn't try the patch you send me, just the revert from above.

Please let me know which other experiments you want to see.

Thanks

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------

2014-11-07 13:38:20

by Vlastimil Babka

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

On 11/07/2014 02:07 PM, Norbert Preining wrote:
> Hi Vlastimil,
>
> On Thu, 06 Nov 2014, Vlastimil Babka wrote:
>> OK, one possibility is to do (it should apply cleanly)
>> git revert e14c720efdd73c6d69cd8d07fa894bcd11fe1973
>
> I am running with this reverted now, and I cannot reproduce the
> khugepage going overboard anymore.

Great, that's good news if I understand correctly, but ...

> I tried hard since 2 days now, no chance to reproduce.
> Before firefox or shotwell went into boom mode, but not now.
> (For now!)

I suggested the commit to you for revert 1 day ago, and you say you
can't reproduce it for 2 days already? That's a bit suspicious. Did you
already tried to revert it before my suggestion, or were you unable to
reproduce it anymore even before the revert?

>>> Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
>>> this depending on some specific kerenl option?
>>
>> Ah I missed that. Should be CONFIG_STACKTRACE to enable that.
>
> If you want, I can compile a "default" kernel without the above commit

Hmm it would probably be more useful if I sent you some debug printing
patches first... the stacktrace won't probably tell us much we don't
already know.

> reverted, and turn on STACKTRACE (I have turned it one for *all* future
> kernels I compile ;-)

Yeah that's a good idea :)

> I also didn't try the patch you send me, just the revert from above.

Yeah if the revert helped, the other patch probably wouldn't change
anything. It also didn't help the other guy with similar problem.

> Please let me know which other experiments you want to see.

I'll prepare a debugging patch and send with instructions. Meanwhile you
could send the /proc/zoneinfo contents? :)

Thanks!
Vlastimil

> Thanks
>
> Norbert
>
> ------------------------------------------------------------------------
> PREINING, Norbert http://www.preining.info
> JAIST, Japan TeX Live & Debian Developer
> GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
> ------------------------------------------------------------------------
>

2014-11-07 13:57:31

by Norbert Preining

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

Hi Vlastimil,

On Fri, 07 Nov 2014, Vlastimil Babka wrote:
> Great, that's good news if I understand correctly, but ...

no "but ..."

> I suggested the commit to you for revert 1 day ago, and you say you
> can't reproduce it for 2 days already? That's a bit suspicious. Did

No, you suggested it yesterday during the day, and here in Japan the
next day is already over, so my feeling is two days ;-)

So all fine ;-)

> I'll prepare a debugging patch and send with instructions. Meanwhile
> you could send the /proc/zoneinfo contents? :)

When? Running which kernel? Anyway, with the current kernel (reverted
commit as before) I get the attached zoneinfo.

Thanks, and waiting for your patches ;-)

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------


Attachments:
(No filename) (1.05 kB)
zoneinfo.txt (4.53 kB)
Download all attachments

2014-11-07 15:44:30

by Vlastimil Babka

[permalink] [raw]
Subject: Re: khugepaged / firefox going wild in 3.18-rc

On 11/07/2014 02:57 PM, Norbert Preining wrote:
> Hi Vlastimil,
>
> On Fri, 07 Nov 2014, Vlastimil Babka wrote:
>> Great, that's good news if I understand correctly, but ...
>
> no "but ..."
>
>> I suggested the commit to you for revert 1 day ago, and you say you
>> can't reproduce it for 2 days already? That's a bit suspicious. Did
>
> No, you suggested it yesterday during the day, and here in Japan the
> next day is already over, so my feeling is two days ;-)
>
> So all fine ;-)

Ah, good, just wanted to be sure.

>> I'll prepare a debugging patch and send with instructions. Meanwhile
>> you could send the /proc/zoneinfo contents? :)
>
> When? Running which kernel? Anyway, with the current kernel (reverted
> commit as before) I get the attached zoneinfo.

It doesn't matter which kernel. Thanks, but I didn't find anything
suspicious there... expected some oddly aligned zones, but it all seemed
to be aligned to pageblock boundaries.

> Thanks, and waiting for your patches ;-)

Tracing patch attached. You should apply this to the broken kernel, i.e.
without the revert, and have tracing enabled, i.e. CONFIG_FTRACE. There
should be /sys/kernel/debug/tracing directory.
To avoid overhead and noise, I would just run this kernel as usual, and
only when khugepaged/firefox/whatever starts misbehaving, do the following:

cd /sys/kernel/debug/tracing
echo 1 > tracing_on
echo 1 > events/compaction/enable
cat trace_pipe | tee /tmp/trace #(or somewhere else)

You should see events scrolling on the screen, including the misbehaving
processes. After a minute or something, ctrl+c and send me the trace file.

Thanks a lot!
Vlastimil


> Norbert
>

------8<------
>From 59c93237ad2fb2317e61c8f00ea73d93ff8a2813 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Fri, 7 Nov 2014 16:12:14 +0100
Subject: [PATCH] compaction: detailed free scanner tracing

---
include/trace/events/compaction.h | 51 +++++++++++++++++++++++++++++++++------
mm/compaction.c | 9 +++++--
2 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index c6814b9..db83ea4 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -12,38 +12,73 @@
DECLARE_EVENT_CLASS(mm_compaction_isolate_template,

TP_PROTO(unsigned long nr_scanned,
- unsigned long nr_taken),
+ unsigned long nr_taken,
+ unsigned long last_pfn),

- TP_ARGS(nr_scanned, nr_taken),
+ TP_ARGS(nr_scanned, nr_taken, last_pfn),

TP_STRUCT__entry(
__field(unsigned long, nr_scanned)
__field(unsigned long, nr_taken)
+ __field(unsigned long, last_pfn)
),

TP_fast_assign(
__entry->nr_scanned = nr_scanned;
__entry->nr_taken = nr_taken;
+ __entry->last_pfn = last_pfn;
),

- TP_printk("nr_scanned=%lu nr_taken=%lu",
+ TP_printk("nr_scanned=%lu nr_taken=%lu last_pfn=%lu",
__entry->nr_scanned,
- __entry->nr_taken)
+ __entry->nr_taken,
+ __entry->last_pfn)
);

DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_migratepages,

TP_PROTO(unsigned long nr_scanned,
- unsigned long nr_taken),
+ unsigned long nr_taken,
+ unsigned long last_pfn),

- TP_ARGS(nr_scanned, nr_taken)
+ TP_ARGS(nr_scanned, nr_taken, last_pfn)
);

DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_freepages,
TP_PROTO(unsigned long nr_scanned,
- unsigned long nr_taken),
+ unsigned long nr_taken,
+ unsigned long last_pfn),

- TP_ARGS(nr_scanned, nr_taken)
+ TP_ARGS(nr_scanned, nr_taken, last_pfn)
+);
+
+TRACE_EVENT(mm_compaction_isolate_freepages_loop,
+ TP_PROTO(unsigned long low_pfn,
+ unsigned long block_start_pfn,
+ unsigned long isolate_start_pfn,
+ unsigned long block_end_pfn),
+
+ TP_ARGS(low_pfn, block_start_pfn, isolate_start_pfn, block_end_pfn),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, low_pfn)
+ __field(unsigned long, block_start_pfn)
+ __field(unsigned long, isolate_start_pfn)
+ __field(unsigned long, block_end_pfn)
+ ),
+
+ TP_fast_assign(
+ __entry->low_pfn = low_pfn;
+ __entry->block_start_pfn = block_start_pfn;
+ __entry->isolate_start_pfn = isolate_start_pfn;
+ __entry->block_end_pfn = block_end_pfn;
+ ),
+
+ TP_printk("low=%lu block_start=%lu isolate_start=%lu block_end=%lu",
+ __entry->low_pfn,
+ __entry->block_start_pfn,
+ __entry->isolate_start_pfn,
+ __entry->block_end_pfn)
);

TRACE_EVENT(mm_compaction_migratepages,
diff --git a/mm/compaction.c b/mm/compaction.c
index ec74cf0..4931b21 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -426,7 +426,8 @@ isolate_fail:
/* Record how far we have got within the block */
*start_pfn = blockpfn;

- trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
+ trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated,
+ blockpfn);

/*
* If strict isolation is requested by CMA then check that all the
@@ -734,7 +735,8 @@ isolate_success:
if (low_pfn == end_pfn)
update_pageblock_skip(cc, valid_page, nr_isolated, true);

- trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
+ trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated,
+ low_pfn);

count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
if (nr_isolated)
@@ -838,6 +840,9 @@ static void isolate_freepages(struct compact_control *cc)
isolate_start_pfn = block_start_pfn) {
unsigned long isolated;

+ trace_mm_compaction_isolate_freepages_loop(low_pfn,
+ block_start_pfn, isolate_start_pfn, block_end_pfn);
+
/*
* This can iterate a massively long zone without finding any
* suitable migration targets, so periodically check if we need
--
2.1.2