2011-02-01 15:50:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

On Mon, Jan 31, 2011 at 08:28:00PM +0100, Jindřich Makovička wrote:
> Hi,
>
> I am encountering problems when continuously writing larger amounts of
> data to a USB flash drive. My configuration is
>
> x86-64 kernel
> USB stick with 10MB/s write, 30MB/s read speed,
> HDD with ~60-80MB/s read/write
> 8 GiB RAM
>
> When copying 4GB or more in one go from HDD to Flash, during the
> copying, fork() and probably other syscalls involving VM start
> blocking (I first observed the problem in Chrome, which refused to
> display content in new tabs). When one lets the copying finish, the
> system returns to a usable state.
>
> During the limbo, khugepaged is in D state (uninterruptible sleep).

That means no hugepage could be allocated. Maybe memory compaction is
doing an overwork because all pagecache is dirty and can't be
migrated. This should solve it if it's memory compaction:

echo never >/sys/kernel/mm/transparent_hugepage/defrag

kswapd state would be interesting too. Can you sysrq+t?

Probably we should decrease the aggressiveness of memory compaction in
direct reclaim. I've another report that memory compaction for order <
3 allocations is increasing latency, it's not like your problem but it
may be related. The congestion_wait in compaction.c also makes me
uncomfortable, it should bail out and fail I think. Maybe we should
add a bitflag to differentiate the callers that can gracefully handle
failure (like THP or most skb jumbo frame allocations) and those like
the kernel stack that will return -ENOMEM if allocation fails.


2011-02-01 21:24:06

by Jindrich Makovicka

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

2011/2/1 Andrea Arcangeli <[email protected]>:
>> When copying 4GB or more in one go from HDD to Flash, during the
>> copying, fork() and probably other syscalls involving VM start
>> blocking (I first observed the problem in Chrome, which refused to
>> display content in new tabs). When one lets the copying finish, the
>> system returns to a usable state.
>>
>> During the limbo, khugepaged is in D state (uninterruptible sleep).
>
> That means no hugepage could be allocated. Maybe memory compaction is
> doing an overwork because all pagecache is dirty and can't be
> migrated. This should solve it if it's memory compaction:
>
> echo never >/sys/kernel/mm/transparent_hugepage/defrag
>
> kswapd state would be interesting too. Can you sysrq+t?

With -rc2, there is

$ ps aux | grep -E "kswap|khugep"
root 474 0.0 0.0 0 0 ? S 20:44 0:00 [kswapd0]
root 540 0.0 0.0 0 0 ? DN 20:44 0:00 [khugepaged]

Sysrq-t output is attached.

Good news is, I don't see these issues with -rc3.

Regards,
--
Jindrich Makovicka


Attachments:
sysrq-t.txt.gz (13.80 kB)

2011-02-02 00:26:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

On Tue, Feb 01, 2011 at 10:24:00PM +0100, Jindřich Makovička wrote:
> With -rc2, there is
>
> $ ps aux | grep -E "kswap|khugep"
> root 474 0.0 0.0 0 0 ? S 20:44 0:00 [kswapd0]
> root 540 0.0 0.0 0 0 ? DN 20:44 0:00 [khugepaged]
>
> Sysrq-t output is attached.

khugepaged is missing at the top because dmesg is too small to fit all
sysrq+t.

Anyway I see lots of tasks (you've some heavy java load allocating
plenty of hugepages) that allocates transparent hugepages and they're
all stuck in migrate_pages->wait_on_page_writeback and
migrate_pages->writepage.

> Good news is, I don't see these issues with -rc3.

Ah try again, I didn't check the diff between -rc2 and -rc3 to be able
to tell what helped.. but it sounds too easy that got magically fixed
by -rc3.

Anyway it's not THP, it had to be something in compaction, and if it
happens again you can be sure that doing "echo never >defrag" will fix
it (if it really is it). Ironically you can leave khugepaged/defrag
set to "always". It's ok if khugepaged stays in D state (khugepaged
will actually be not noticeable at all in D state with CONFIG_NUMA=n,
because it'd allocate all hugepages without having to hold any
mmap_sem at all, but with CONFIG_NUMA=y it tried to allocate the
hugepage from the right node and it needs to pass a vma down to the
allocator to track the right allocation node, and that requires the
mmap_sem read mode during the allocation to avoid the vma to go away,
but it's no big deal).

Maybe we need to change compaction to never block unless some
__GFP_COMPACTION_WAIT bitflag is set. It's perfectly ok to fail some
hugepage allocation if there's congestion like that without trying so
hard to allocate hugepages. The only thing that would need to pass
down a __GFP_COMPACTION_WAIT would then be fork() in the kernel stack
allocation... everything else should have a 4k fallback. Even
khugepaged doesn't need so hard to compact if the system is under huge
stress.

Usually to reproduce you need "cp /dev/zero /mnt/usbdrive", and that
tends to hang all systems no matter THP or not... it's hard to
quantify what is normal and what is not.

I've another latency issue that is much easier to quantify for some
heavy write fs-network load being reported that is most certainly
related to the use of compaction even for the jumbo frames and large
network skbs. It's still compaction related (not THP related as THP on
but with compaction only used by THP it doesn't happen). I'll let you
know when that is fixed for any patch to try as that may benefit your
workload too. In the meantime if you've have more data let me know.

Thanks,
Andrea

2011-02-03 13:24:35

by Mel Gorman

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

On Tue, Feb 01, 2011 at 04:49:47PM +0100, Andrea Arcangeli wrote:
> On Mon, Jan 31, 2011 at 08:28:00PM +0100, Jind??ich Makovi??ka wrote:
> > Hi,
> >
> > I am encountering problems when continuously writing larger amounts of
> > data to a USB flash drive. My configuration is
> >
> > x86-64 kernel
> > USB stick with 10MB/s write, 30MB/s read speed,
> > HDD with ~60-80MB/s read/write
> > 8 GiB RAM
> >
> > When copying 4GB or more in one go from HDD to Flash, during the
> > copying, fork() and probably other syscalls involving VM start
> > blocking (I first observed the problem in Chrome, which refused to
> > display content in new tabs). When one lets the copying finish, the
> > system returns to a usable state.
> >
> > During the limbo, khugepaged is in D state (uninterruptible sleep).
>
> That means no hugepage could be allocated. Maybe memory compaction is
> doing an overwork because all pagecache is dirty and can't be
> migrated. This should solve it if it's memory compaction:
>

This is very likely. Compaction calls into migration which will wait on
dirty pages after a time. With a large number of dirty pages backed by a
slow drive such as a USB stick, it could be getting stalled there for a
long period of time.

Whether migration sleeps or not can be controlled by the sync parameter
passed into try_to_compact_memory which could be always forced to false
if GFP_NO_KSWAPD?

> echo never >/sys/kernel/mm/transparent_hugepage/defrag
>
> kswapd state would be interesting too. Can you sysrq+t?
>
> Probably we should decrease the aggressiveness of memory compaction in
> direct reclaim. I've another report that memory compaction for order <
> 3 allocations is increasing latency, it's not like your problem but it
> may be related. The congestion_wait in compaction.c also makes me
> uncomfortable, it should bail out and fail I think. Maybe we should
> add a bitflag to differentiate the callers that can gracefully handle
> failure (like THP or most skb jumbo frame allocations) and those like
> the kernel stack that will return -ENOMEM if allocation fails.
>

--
Mel Gorman

2011-02-03 19:06:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

On Thu, Feb 03, 2011 at 01:24:08PM +0000, Mel Gorman wrote:
> This is very likely. Compaction calls into migration which will wait on
> dirty pages after a time. With a large number of dirty pages backed by a
> slow drive such as a USB stick, it could be getting stalled there for a
> long period of time.
>
> Whether migration sleeps or not can be controlled by the sync parameter
> passed into try_to_compact_memory which could be always forced to false
> if GFP_NO_KSWAPD?

I would expect that to hide any regression we could have because of
more dirty cache in the system, yes.

However Jindřich reported not being able to reproduce anything anymore
in -rc3, so I'm unsure if we should make that change anymore. I asked
to try again cause it should too easy that got fixed magically (but I
didn't check closely if there are usb/stroage changes in rc2->rc3 that
may explain this, so it's not impossible but sounds very unlikely that
it got fixed). More likely this isn't reproducible reliably, or maybe
it also happens without compaction and without THP ("cp /dev/zero
/mnt/usbdrive" isn't going to make the system behave nice regardless
of compaction being synchronous, asynchronous, or disabled) and maybe
he tried the copy with more or less free memory.

I'm doing some other performance check for another workload in the
pre-async compaction status, and I'll let you know when I get results.

2011-02-03 21:16:20

by Jindrich Makovicka

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

On Thu, Feb 3, 2011 at 20:06, Andrea Arcangeli <[email protected]> wrote:
> On Thu, Feb 03, 2011 at 01:24:08PM +0000, Mel Gorman wrote:
>> This is very likely. Compaction calls into migration which will wait on
>> dirty pages after a time. With a large number of dirty pages backed by a
>> slow drive such as a USB stick, it could be getting stalled there for a
>> long period of time.
>>
>> Whether migration sleeps or not can be controlled by the sync parameter
>> passed into try_to_compact_memory which could be always forced to false
>> if GFP_NO_KSWAPD?
>
> I would expect that to hide any regression we could have because of
> more dirty cache in the system, yes.
>
> However Jindřich reported not being able to reproduce anything anymore
> in -rc3, so I'm unsure if we should make that change anymore. I asked
> to try again cause it should too easy that got fixed magically

I tried again and reproduced with -rc3 too, sorry for misinformation.

I also tried echo never > /sys/kernel/mm/transparent_hugepage/defrag
, and it seems preventing the system freeze, but still the copying
itself sometimes comes to almost complete stop (GkrellM shows short
spikes of tens of kB/s on the USB /dev/sdX). In this case, khugepaged
is also in DN. I still didn't observe this problem when disabling THP
completely by echo never > enabled .

--
Jindrich Makovicka

2011-02-04 15:49:08

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

On Thu, Feb 03, 2011 at 10:16:17PM +0100, Jindřich Makovička wrote:
> On Thu, Feb 3, 2011 at 20:06, Andrea Arcangeli <[email protected]> wrote:
> > On Thu, Feb 03, 2011 at 01:24:08PM +0000, Mel Gorman wrote:
> >> This is very likely. Compaction calls into migration which will wait on
> >> dirty pages after a time. With a large number of dirty pages backed by a
> >> slow drive such as a USB stick, it could be getting stalled there for a
> >> long period of time.
> >>
> >> Whether migration sleeps or not can be controlled by the sync parameter
> >> passed into try_to_compact_memory which could be always forced to false
> >> if GFP_NO_KSWAPD?
> >
> > I would expect that to hide any regression we could have because of
> > more dirty cache in the system, yes.
> >
> > However Jindřich reported not being able to reproduce anything anymore
> > in -rc3, so I'm unsure if we should make that change anymore. I asked
> > to try again cause it should too easy that got fixed magically
>
> I tried again and reproduced with -rc3 too, sorry for misinformation.
>
> I also tried echo never > /sys/kernel/mm/transparent_hugepage/defrag
> , and it seems preventing the system freeze, but still the copying
> itself sometimes comes to almost complete stop (GkrellM shows short
> spikes of tens of kB/s on the USB /dev/sdX). In this case, khugepaged
> is also in DN. I still didn't observe this problem when disabling THP
> completely by echo never > enabled .

Ok then you may need this too:

echo never > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag

Compaction is likely too heavy so we need to look into that.

2011-02-13 10:47:08

by Jindrich Makovicka

[permalink] [raw]
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2

2011/2/4 Andrea Arcangeli <[email protected]>:
> On Thu, Feb 03, 2011 at 10:16:17PM +0100, Jindřich Makovička wrote:
>> I also tried echo never >  /sys/kernel/mm/transparent_hugepage/defrag
>> , and it seems preventing the system freeze, but still the copying
>> itself sometimes comes to almost complete stop (GkrellM shows short
>> spikes of tens of kB/s on the USB /dev/sdX). In this case, khugepaged
>> is also in DN. I still didn't observe this problem when disabling THP
>> completely by echo never > enabled .
>
> Ok then you may need this too:
>
>  echo never > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
>
> Compaction is likely too heavy so we need to look into that.

Sorry for the delay.

Yes, disabling both defrag settings also solves the problem:

echo never >/sys/kernel/mm/transparent_hugepage/defrag
echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag

--
Jindrich Makovicka