2016-12-14 14:34:58

by Andreas Schwab

[permalink] [raw]
Subject: jemalloc testsuite stalls in memset

When running the jemalloc-4.4.0 testsuite on aarch64 with glibc 2.24 the
test/unit/junk test hangs in memset:

(gdb) r
Starting program: /tmp/jemalloc/jemalloc-4.4.0/test/unit/junk
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
test_junk_small: pass
test_junk_large: pass
^C
Program received signal SIGINT, Interrupt.
memset () at ../sysdeps/aarch64/memset.S:91
91 str q0, [dstin]
(gdb) x/i $pc
=> 0xffffb7ddf54c <memset+140>: str q0, [x0]

x0 is pointing to the start of this mmap'd block:

0xffffb7400000 0xffffb7600000 0x200000 0x0

Any attempt to contine execution or step over the insn still causes the
process to hang here. Only after accessing the memory through the
debugger the test successfully continues to completion.

The kernel has been configured with transparent hugepages.

CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_TRANSPARENT_HUGE_PAGECACHE=y

This issue has been bisected to commit
b8d3c4c3009d42869dc03a1da0efc2aa687d0ab4 ("mm/huge_memory.c: don't split
THP page when MADV_FREE syscall is called").

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


2016-12-14 23:50:36

by Minchan Kim

[permalink] [raw]
Subject: Re: jemalloc testsuite stalls in memset

Hello,

First of all, thanks for the report and sorry I have no time now so maybe
I should investigate the problem next week.

On Wed, Dec 14, 2016 at 03:34:54PM +0100, Andreas Schwab wrote:
> When running the jemalloc-4.4.0 testsuite on aarch64 with glibc 2.24 the
> test/unit/junk test hangs in memset:
>
> (gdb) r
> Starting program: /tmp/jemalloc/jemalloc-4.4.0/test/unit/junk
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> test_junk_small: pass
> test_junk_large: pass
> ^C
> Program received signal SIGINT, Interrupt.
> memset () at ../sysdeps/aarch64/memset.S:91
> 91 str q0, [dstin]
> (gdb) x/i $pc
> => 0xffffb7ddf54c <memset+140>: str q0, [x0]
>
> x0 is pointing to the start of this mmap'd block:
>
> 0xffffb7400000 0xffffb7600000 0x200000 0x0
>
> Any attempt to contine execution or step over the insn still causes the
> process to hang here. Only after accessing the memory through the
> debugger the test successfully continues to completion.

You mean program itself access the address(ie, 0xffffb7400000) is hang
while access the address from the debugger is OK?

Scratch head. :/

Can you reproduce it easily?
Did you test it in real machine or qemu on x86?
Could you show me how I can reproduce it?
I want to test it in x86 machine, first of all.
Unfortunately, I don't have any aarch64 platform now so maybe I have to
run it on qemu on x86 until I can set up aarch64 platform if it is reproducible
on real machine only.

>
> The kernel has been configured with transparent hugepages.
>
> CONFIG_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
> CONFIG_TRANSPARENT_HUGE_PAGECACHE=y

What's the exact kernel version?
I don't think it's HUGE_PAGECACHE problem but to narrow down the scope,
could you test it without CONFIG_TRANSPARENT_HUGE_PAGECACHE?

Thanks.

>
> This issue has been bisected to commit
> b8d3c4c3009d42869dc03a1da0efc2aa687d0ab4 ("mm/huge_memory.c: don't split
> THP page when MADV_FREE syscall is called").
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, [email protected]
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

2016-12-15 09:24:52

by Andreas Schwab

[permalink] [raw]
Subject: Re: jemalloc testsuite stalls in memset

On Dez 15 2016, Minchan Kim <[email protected]> wrote:

> You mean program itself access the address(ie, 0xffffb7400000) is hang
> while access the address from the debugger is OK?

Yes.

> Can you reproduce it easily?

100%

> Did you test it in real machine or qemu on x86?

Both real and kvm.

> Could you show me how I can reproduce it?

Just run make check.

> I want to test it in x86 machine, first of all.
> Unfortunately, I don't have any aarch64 platform now so maybe I have to
> run it on qemu on x86 until I can set up aarch64 platform if it is reproducible
> on real machine only.
>
>>
>> The kernel has been configured with transparent hugepages.
>>
>> CONFIG_TRANSPARENT_HUGEPAGE=y
>> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
>> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
>> CONFIG_TRANSPARENT_HUGE_PAGECACHE=y
>
> What's the exact kernel version?

Anything >= your commit.

> I don't think it's HUGE_PAGECACHE problem but to narrow down the scope,
> could you test it without CONFIG_TRANSPARENT_HUGE_PAGECACHE?

That cannot be deselected.

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

2016-12-16 06:39:51

by Minchan Kim

[permalink] [raw]
Subject: Re: jemalloc testsuite stalls in memset

Hello,

On Thu, Dec 15, 2016 at 10:24:47AM +0100, Andreas Schwab wrote:
> On Dez 15 2016, Minchan Kim <[email protected]> wrote:
>
> > You mean program itself access the address(ie, 0xffffb7400000) is hang
> > while access the address from the debugger is OK?
>
> Yes.
>
> > Can you reproduce it easily?
>
> 100%
>
> > Did you test it in real machine or qemu on x86?
>
> Both real and kvm.
>
> > Could you show me how I can reproduce it?
>
> Just run make check.
>
> > I want to test it in x86 machine, first of all.
> > Unfortunately, I don't have any aarch64 platform now so maybe I have to
> > run it on qemu on x86 until I can set up aarch64 platform if it is reproducible
> > on real machine only.
> >
> >>
> >> The kernel has been configured with transparent hugepages.
> >>
> >> CONFIG_TRANSPARENT_HUGEPAGE=y
> >> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
> >> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
> >> CONFIG_TRANSPARENT_HUGE_PAGECACHE=y
> >
> > What's the exact kernel version?
>
> Anything >= your commit.

Thanks for the info. I cannot setup testing enviroment but when I read code,
it seems we need pmd_wrprotect for non-hardware dirty architecture.

Below helps?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e10a4fe..dc37c9a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1611,6 +1611,7 @@ int madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
tlb->fullmm);
orig_pmd = pmd_mkold(orig_pmd);
orig_pmd = pmd_mkclean(orig_pmd);
+ orig_pmd = pmd_wrprotect(orig_pmd);

set_pmd_at(mm, addr, pmd, orig_pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);

2016-12-16 14:17:22

by Andreas Schwab

[permalink] [raw]
Subject: Re: jemalloc testsuite stalls in memset

On Dez 16 2016, Minchan Kim <[email protected]> wrote:

> Below helps?
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e10a4fe..dc37c9a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1611,6 +1611,7 @@ int madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> tlb->fullmm);
> orig_pmd = pmd_mkold(orig_pmd);
> orig_pmd = pmd_mkclean(orig_pmd);
> + orig_pmd = pmd_wrprotect(orig_pmd);
>
> set_pmd_at(mm, addr, pmd, orig_pmd);
> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);

Thanks, this fixes the issue (tested with 4.9).

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

2016-12-21 23:54:27

by Minchan Kim

[permalink] [raw]
Subject: Re: jemalloc testsuite stalls in memset

Hello, Andreas

Sorry for long delay. I was on vacation.

On Fri, Dec 16, 2016 at 03:16:20PM +0100, Andreas Schwab wrote:
> On Dez 16 2016, Minchan Kim <[email protected]> wrote:
>
> > Below helps?
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index e10a4fe..dc37c9a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1611,6 +1611,7 @@ int madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > tlb->fullmm);
> > orig_pmd = pmd_mkold(orig_pmd);
> > orig_pmd = pmd_mkclean(orig_pmd);
> > + orig_pmd = pmd_wrprotect(orig_pmd);
> >
> > set_pmd_at(mm, addr, pmd, orig_pmd);
> > tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>
> Thanks, this fixes the issue (tested with 4.9).

It was a quick hack to know what exact problem is there and your confirming
helped a lot to understand the problem clear.

More right approach is to support pmd dirty handling in general page fault
handler rather than tweaking MADV_FREE. I just sent a new patch with Ccing
you.

Could you test it, please?
Thanks!