2019-06-13 21:36:28

by Qian Cai

[permalink] [raw]
Subject: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

LTP hugemmap05 test case [1] could not exit itself properly and then degrade the
system performance on arm64 with linux-next (next-20190613). The bisection so
far indicates,

BAD: 30bafbc357f1 Merge remote-tracking branch 'arm64/for-next/core'
GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-next/fixes'

I don't see anything obvious between those two pull requests, so I guess
something in 'arm64/for-next/core' is wrong.

$ git log --oneline 361413ee1992..9b6047220590
9b6047220590 arm64: mm: avoid redundant READ_ONCE(*ptep)
4745224b4509 arm64/mm: Refactor __do_page_fault()
c49bd02f4c74 arm64/mm: Document write abort detection from ESR
8e01076afd97 arm64: Fix comment after #endif
f086f67485c5 arm64: ptrace: add support for syscall emulation
fd3866381be2 arm64: add PTRACE_SYSEMU{,SINGLESTEP} definations to uapi headers
15532fd6f57c ptrace: move clearing of TIF_SYSCALL_EMU flag to core
616810360043 arm64/mm: Drop task_struct argument from __do_page_fault()
a0509313d5de arm64/mm: Drop mmap_sem before calling __do_kernel_fault()
01de1776f62e arm64/mm: Identify user instruction aborts
87dedf7c61ab arm64/mm: Change BUG_ON() to VM_BUG_ON() in [pmd|pud]_set_huge()
2e6aee5af330 arm64: kernel: use aff3 instead of aff2 in comment
27e6e7d63fc2 arm64/cpufeature: Convert hook_lock to raw_spin_lock_t in
cpu_enable_ssbs()
0c1f14ed1226 arm64: mm: make CONFIG_ZONE_DMA32 configurable
f7f0097af67c arm64/mm: Simplify protection flag creation for kernel huge
mappings
7b8c87b297a7 arm64: cacheinfo: Update cache_line_size detected from DT or PPTT
9a83c84c3a49 drivers: base: cacheinfo: Add variable to record max cache line
size
6dcdefcde413 arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state
54b8c7cbc57c arm64/fpsimd: Introduce fpsimd_save_and_flush_cpu_state() and use
it
6fa9b41f6f15 arm64/fpsimd: Remove the prototype for sve_flush_cpu_state()
201d355c15c1 arm64/mm: Move PTE_VALID from SW defined to HW page table entry
definitions
441a62780687 arm64/hugetlb: Use macros for contiguous huge page sizes

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/h
ugetlb/hugemmap/hugemmap05.c

# /opt/ltp/testcases/bin/hugemmap05 -s -m
tst_test.c:1111: INFO: Timeout per run is 0h 05m 00s
hugemmap05.c:235: INFO: original nr_hugepages is 0
hugemmap05.c:248: INFO: original nr_overcommit_hugepages is 0
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Cannot kill test processes!
Congratulation, likely test hit a kernel bug.
Exitting uncleanly...

[ 7792.681691][ T5025] LTP: starting hugemmap05_3 (hugemmap05 -s -m)
[ 7911.149058][ T1309] INFO: task hugemmap05:51035 can't die for more than 122
seconds.
[ 7911.156833][ T1309] hugemmap05      R  running task    27648 51035      1
0x0000000d
[ 7911.164654][ T1309] Call trace:
[ 7911.167823][ T1309]  __switch_to+0x2e0/0x37c
[ 7911.172128][ T1309]  0x3e4ca
[ 7911.175033][ T1309] 
[ 7911.175033][ T1309] Showing all locks held in the system:
[ 7911.182888][ T1309] 1 lock held by khungtaskd/1309:
[ 7911.187778][ T1309]  #0: 0000000037a3e572 (rcu_read_lock){....}, at:
rcu_lock_acquire+0x8/0x38
[ 7911.196655][ T1309] 4 locks held by hugemmap05/51035:
[ 7911.201731][ T1309] 4 locks held by hugemmap05/51038:
[ 7911.206814][ T1309] 
[ 7911.209025][ T1309] =============================================
[ 7911.209025][ T1309] 


2019-06-14 10:20:47

by Will Deacon

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

Hi Qian,

On Thu, Jun 13, 2019 at 05:34:01PM -0400, Qian Cai wrote:
> LTP hugemmap05 test case [1] could not exit itself properly and then degrade the
> system performance on arm64 with linux-next (next-20190613). The bisection so
> far indicates,
>
> BAD: 30bafbc357f1 Merge remote-tracking branch 'arm64/for-next/core'
> GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-next/fixes'

Did you finish the bisection in the end? Also, what config are you using
(you usually have something fairly esoteric ;)?

Thanks,

Will

2019-06-14 12:17:48

by Qian Cai

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

On Fri, 2019-06-14 at 11:20 +0100, Will Deacon wrote:
> Hi Qian,
>
> On Thu, Jun 13, 2019 at 05:34:01PM -0400, Qian Cai wrote:
> > LTP hugemmap05 test case [1] could not exit itself properly and then degrade
> > the
> > system performance on arm64 with linux-next (next-20190613). The bisection
> > so
> > far indicates,
> >
> > BAD:  30bafbc357f1 Merge remote-tracking branch 'arm64/for-next/core'
> > GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
>
> Did you finish the bisection in the end? Also, what config are you using
> (you usually have something fairly esoteric ;)?

No, it is still running.

https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config

2019-06-17 01:32:09

by Anshuman Khandual

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

Hello Qian,

On 06/14/2019 05:45 PM, Qian Cai wrote:
> On Fri, 2019-06-14 at 11:20 +0100, Will Deacon wrote:
>> Hi Qian,
>>
>> On Thu, Jun 13, 2019 at 05:34:01PM -0400, Qian Cai wrote:
>>> LTP hugemmap05 test case [1] could not exit itself properly and then degrade
>>> the
>>> system performance on arm64 with linux-next (next-20190613). The bisection
>>> so
>>> far indicates,
>>>
>>> BAD:  30bafbc357f1 Merge remote-tracking branch 'arm64/for-next/core'
>>> GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
>>
>> Did you finish the bisection in the end? Also, what config are you using
>> (you usually have something fairly esoteric ;)?
>
> No, it is still running.
>
> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
>

Were you able to bisect the problem till a particular commit ?

- Anshuman

2019-06-17 01:43:16

by Qian Cai

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)



> On Jun 16, 2019, at 9:32 PM, Anshuman Khandual <[email protected]> wrote:
>
> Hello Qian,
>
> On 06/14/2019 05:45 PM, Qian Cai wrote:
>> On Fri, 2019-06-14 at 11:20 +0100, Will Deacon wrote:
>>> Hi Qian,
>>>
>>> On Thu, Jun 13, 2019 at 05:34:01PM -0400, Qian Cai wrote:
>>>> LTP hugemmap05 test case [1] could not exit itself properly and then degrade
>>>> the
>>>> system performance on arm64 with linux-next (next-20190613). The bisection
>>>> so
>>>> far indicates,
>>>>
>>>> BAD: 30bafbc357f1 Merge remote-tracking branch 'arm64/for-next/core'
>>>> GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
>>>
>>> Did you finish the bisection in the end? Also, what config are you using
>>> (you usually have something fairly esoteric ;)?
>>
>> No, it is still running.
>>
>> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
>>
>
> Were you able to bisect the problem till a particular commit ?

Not yet, it turned out the test case needs to run a few times (usually within 5) to reproduce, so the previous bisection was totally wrong where it assume the bad commit will fail every time. Once reproduced, the test case becomes unkillable stuck in the D state.

I am still in the middle of running a new round of bisection. The current progress is,

35c99ffa20ed GOOD (survived 20 times)
def0fdae813d BAD

2019-06-24 09:51:29

by Will Deacon

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

Hi Qian Cai,

On Sun, Jun 16, 2019 at 09:41:09PM -0400, Qian Cai wrote:
> > On Jun 16, 2019, at 9:32 PM, Anshuman Khandual <[email protected]> wrote:
> > On 06/14/2019 05:45 PM, Qian Cai wrote:
> >> On Fri, 2019-06-14 at 11:20 +0100, Will Deacon wrote:
> >>> On Thu, Jun 13, 2019 at 05:34:01PM -0400, Qian Cai wrote:
> >>>> LTP hugemmap05 test case [1] could not exit itself properly and then degrade
> >>>> the
> >>>> system performance on arm64 with linux-next (next-20190613). The bisection
> >>>> so
> >>>> far indicates,
> >>>>
> >>>> BAD: 30bafbc357f1 Merge remote-tracking branch 'arm64/for-next/core'
> >>>> GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
> >>>
> >>> Did you finish the bisection in the end? Also, what config are you using
> >>> (you usually have something fairly esoteric ;)?
> >>
> >> No, it is still running.
> >>
> >> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> >>
> >
> > Were you able to bisect the problem till a particular commit ?
>
> Not yet, it turned out the test case needs to run a few times (usually
> within 5) to reproduce, so the previous bisection was totally wrong where
> it assume the bad commit will fail every time. Once reproduced, the test
> case becomes unkillable stuck in the D state.
>
> I am still in the middle of running a new round of bisection. The current
> progress is,
>
> 35c99ffa20ed GOOD (survived 20 times)
> def0fdae813d BAD

Just wondering if you got anywhere with this? We've failed to reproduce the
problem locally.

Will

2019-06-24 14:11:53

by Qian Cai

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

On Mon, 2019-06-24 at 10:35 +0100, Will Deacon wrote:
> Hi Qian Cai,
>
> On Sun, Jun 16, 2019 at 09:41:09PM -0400, Qian Cai wrote:
> > > On Jun 16, 2019, at 9:32 PM, Anshuman Khandual <[email protected]>
> > > wrote:
> > > On 06/14/2019 05:45 PM, Qian Cai wrote:
> > > > On Fri, 2019-06-14 at 11:20 +0100, Will Deacon wrote:
> > > > > On Thu, Jun 13, 2019 at 05:34:01PM -0400, Qian Cai wrote:
> > > > > > LTP hugemmap05 test case [1] could not exit itself properly and then
> > > > > > degrade
> > > > > > the
> > > > > > system performance on arm64 with linux-next (next-20190613). The
> > > > > > bisection
> > > > > > so
> > > > > > far indicates,
> > > > > >
> > > > > > BAD:  30bafbc357f1 Merge remote-tracking branch 'arm64/for-
> > > > > > next/core'
> > > > > > GOOD: 0c3d124a3043 Merge remote-tracking branch 'arm64-fixes/for-
> > > > > > next/fixes'
> > > > >
> > > > > Did you finish the bisection in the end? Also, what config are you
> > > > > using
> > > > > (you usually have something fairly esoteric ;)?
> > > >
> > > > No, it is still running.
> > > >
> > > > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> > > >
> > >
> > > Were you able to bisect the problem till a particular commit ?
> >
> > Not yet, it turned out the test case needs to run a few times (usually
> > within 5) to reproduce, so the previous bisection was totally wrong where
> > it assume the bad commit will fail every time. Once reproduced, the test
> > case becomes unkillable stuck in the D state.
> >
> > I am still in the middle of running a new round of bisection. The current
> > progress is,
> >
> > 35c99ffa20ed GOOD (survived 20 times)
> > def0fdae813d BAD
>
> Just wondering if you got anywhere with this? We've failed to reproduce the
> problem locally.

Unfortunately, I have not had a chance to dig this up yet. The progress I had so
far is,

The issue was there for a long time goes back to 4.20 and probably earlier. It
is not failing every time. The script below could reproduce it usually within 10
0 tires.

i=0; while :; do ./hugemmap05 -m -s; echo $((i++)); sleep 5; done

This can be reproduced in an error path, i.e., shmget() in the test case will
fail every time before triggering the soft lockups.

# ./hugemmap05 -s -m
tst_test.c:1112: INFO: Timeout per run is 0h 05m 00s
hugemmap05.c:235: INFO: original nr_hugepages is 0
hugemmap05.c:248: INFO: original nr_overcommit_hugepages is 0
tst_safe_sysv_ipc.c:111: BROK: hugemmap05.c:97: shmget(218366029, 103079215104,
b80) failed: ENOMEM
hugemmap05.c:192: INFO: restore nr_hugepages to 0.
hugemmap05.c:201: INFO: restore nr_overcommit_hugepages to 0.

Summary:
passed   0
failed   0
skipped  0
warnings 0
0

My understanding is that the soft lockups are triggered in this path,

ipcget
ipcget_public
ops->getnew
newseg
hugetlb_file_setup <- return ENOMEM

[ 1521.471216][ T1309] INFO: task hugemmap05:4718 blocked for more than 860
seconds.
[ 1521.478731][ T1309]       Tainted: G        W         5.2.0-rc4+ #8
[ 1521.485023][ T1309] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1521.493568][ T1309] hugemmap05      D27168  4718      1 0x00000001
[ 1521.499815][ T1309] Call trace:
[ 1521.502985][ T1309]  __switch_to+0x2e0/0x37c
[ 1521.507278][ T1309]  __schedule+0xa0c/0xd9c
[ 1521.511484][ T1309]  schedule+0x60/0x168
[ 1521.515430][ T1309]  __rwsem_down_write_failed_common+0x484/0x7b8
[ 1521.521546][ T1309]  rwsem_down_write_failed+0x20/0x2c
[ 1521.526717][ T1309]  down_write+0xa0/0xa4
[ 1521.530747][ T1309]  ipcget+0x74/0x414
[ 1521.534518][ T1309]  ksys_shmget+0x90/0xc4
[ 1521.538638][ T1309]  __arm64_sys_shmget+0x54/0x88
[ 1521.543366][ T1309]  el0_svc_handler+0x198/0x260
[ 1521.548005][ T1309]  el0_svc+0x8/0xc
[ 1521.551605][ T1309] 
[ 1521.551605][ T1309] Showing all locks held in the system:
[ 1521.559349][ T1309] 1 lock held by khungtaskd/1309:
[ 1521.564251][ T1309]  #0: 00000000033dd0e2 (rcu_read_lock){....}, at:
rcu_lock_acquire+0x8/0x38
[ 1521.573014][ T1309] 2 locks held by hugemmap05/4694:
[ 1521.578010][ T1309] 1 lock held by hugemmap05/4718:
[ 1521.582904][ T1309]  #0: 00000000c62a3d44 (&ids->rwsem){....}, at:
ipcget+0x74/0x414
[ 1521.590707][ T1309] 1 lock held by hugemmap05/4755:
[ 1521.595595][ T1309]  #0: 00000000c62a3d44 (&ids->rwsem){....}, at:
ipcget+0x74/0x414
[ 1521.603373][ T1309] 1 lock held by hugemmap05/4781:
[ 1521.608270][ T1309]  #0: 00000000c62a3d44 (&ids->rwsem){....}, at:
ipcget+0x74/0x414

2019-06-24 22:06:53

by Qian Cai

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for
too long seems unnecessarily and then goes to sleep sometimes due to direct
reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
-ENOMEM),

[  788.765739][ T1315] INFO: task hugemmap05:5001 can't die for more than 122
seconds.
[  788.773512][ T1315] hugemmap05      R  running task    25600  5001      1
0x0000000d
[  788.781348][ T1315] Call trace:
[  788.784536][ T1315]  __switch_to+0x2e0/0x37c
[  788.788848][ T1315]  try_to_free_pages+0x614/0x934
[  788.793679][ T1315]  __alloc_pages_nodemask+0xe88/0x1d60
[  788.799030][ T1315]  alloc_fresh_huge_page+0x16c/0x588
[  788.804206][ T1315]  alloc_surplus_huge_page+0x9c/0x278
[  788.809468][ T1315]  hugetlb_acct_memory+0x114/0x5c4
[  788.814469][ T1315]  hugetlb_reserve_pages+0x170/0x2b0
[  788.819662][ T1315]  hugetlb_file_setup+0x26c/0x3a8
[  788.824600][ T1315]  newseg+0x220/0x63c
[  788.828490][ T1315]  ipcget+0x570/0x674
[  788.832377][ T1315]  ksys_shmget+0x90/0xc4
[  788.836525][ T1315]  __arm64_sys_shmget+0x54/0x88
[  788.841282][ T1315]  el0_svc_handler+0x19c/0x26c
[  788.845952][ T1315]  el0_svc+0x8/0xc

and then all other processes are waiting on the semaphore causes lock
contentions,

[  788.849583][ T1315] INFO: task hugemmap05:5027 blocked for more than 122
seconds.
[  788.857119][ T1315]       Tainted: G        W         5.2.0-rc6-next-20190624
#2
[  788.864566][ T1315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  788.873139][ T1315] hugemmap05      D26960  5027   5026 0x00000000
[  788.879395][ T1315] Call trace:
[  788.882576][ T1315]  __switch_to+0x2e0/0x37c
[  788.886901][ T1315]  __schedule+0xb74/0xf0c
[  788.891136][ T1315]  schedule+0x60/0x168
[  788.895097][ T1315]  rwsem_down_write_slowpath+0x5a0/0x8c8
[  788.900653][ T1315]  down_write+0xc0/0xc4
[  788.904715][ T1315]  ipcget+0x74/0x674
[  788.908516][ T1315]  ksys_shmget+0x90/0xc4
[  788.912664][ T1315]  __arm64_sys_shmget+0x54/0x88
[  788.917420][ T1315]  el0_svc_handler+0x19c/0x26c
[  788.922088][ T1315]  el0_svc+0x8/0xc

Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the
semaphore to protect concurrency access, so it could just be converted to a
spinlock instead.

[1] ./hugemmap05 -s -m

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/huget
lb/hugemmap/hugemmap05.c

2019-06-24 22:07:57

by Mike Kravetz

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

On 6/24/19 2:30 PM, Qian Cai wrote:
> So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for
> too long seems unnecessarily and then goes to sleep sometimes due to direct
> reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
> -ENOMEM),

Thanks for looking into this! I noticed that recent kernels could take a
VERY long time trying to do high order allocations. In my case it was trying
to do dynamic hugetlb page allocations as well [1]. But, IMO this is more
of a general direct reclaim/compation issue than something hugetlb specific.

>
> [ 788.765739][ T1315] INFO: task hugemmap05:5001 can't die for more than 122
> seconds.
> [ 788.773512][ T1315] hugemmap05 R running task 25600 5001 1
> 0x0000000d
> [ 788.781348][ T1315] Call trace:
> [ 788.784536][ T1315] __switch_to+0x2e0/0x37c
> [ 788.788848][ T1315] try_to_free_pages+0x614/0x934
> [ 788.793679][ T1315] __alloc_pages_nodemask+0xe88/0x1d60
> [ 788.799030][ T1315] alloc_fresh_huge_page+0x16c/0x588
> [ 788.804206][ T1315] alloc_surplus_huge_page+0x9c/0x278
> [ 788.809468][ T1315] hugetlb_acct_memory+0x114/0x5c4
> [ 788.814469][ T1315] hugetlb_reserve_pages+0x170/0x2b0
> [ 788.819662][ T1315] hugetlb_file_setup+0x26c/0x3a8
> [ 788.824600][ T1315] newseg+0x220/0x63c
> [ 788.828490][ T1315] ipcget+0x570/0x674
> [ 788.832377][ T1315] ksys_shmget+0x90/0xc4
> [ 788.836525][ T1315] __arm64_sys_shmget+0x54/0x88
> [ 788.841282][ T1315] el0_svc_handler+0x19c/0x26c
> [ 788.845952][ T1315] el0_svc+0x8/0xc
>
> and then all other processes are waiting on the semaphore causes lock
> contentions,

That call to hugetlb_file_setup() via ipcget certainly could take a long
time to execute. In the default case huge pages are reserved to back the
shared memory segment. If these pages were not prealllocated, then the
code will try to dynamically allocate the required number of huge pages.
So, even if [1] were not an issue I think a change here makes sense.

> [ 788.849583][ T1315] INFO: task hugemmap05:5027 blocked for more than 122
> seconds.
> [ 788.857119][ T1315] Tainted: G W 5.2.0-rc6-next-20190624
> #2
> [ 788.864566][ T1315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 788.873139][ T1315] hugemmap05 D26960 5027 5026 0x00000000
> [ 788.879395][ T1315] Call trace:
> [ 788.882576][ T1315] __switch_to+0x2e0/0x37c
> [ 788.886901][ T1315] __schedule+0xb74/0xf0c
> [ 788.891136][ T1315] schedule+0x60/0x168
> [ 788.895097][ T1315] rwsem_down_write_slowpath+0x5a0/0x8c8
> [ 788.900653][ T1315] down_write+0xc0/0xc4
> [ 788.904715][ T1315] ipcget+0x74/0x674
> [ 788.908516][ T1315] ksys_shmget+0x90/0xc4
> [ 788.912664][ T1315] __arm64_sys_shmget+0x54/0x88
> [ 788.917420][ T1315] el0_svc_handler+0x19c/0x26c
> [ 788.922088][ T1315] el0_svc+0x8/0xc
>
> Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the
> semaphore to protect concurrency access, so it could just be converted to a
> spinlock instead.

I do not have enough experience with this ipc code to comment on your proposed
change. But, I will look into it.

[1] https://lkml.org/lkml/2019/4/23/2
--
Mike Kravetz

2019-06-27 18:12:15

by Mike Kravetz

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

On 6/24/19 2:53 PM, Mike Kravetz wrote:
> On 6/24/19 2:30 PM, Qian Cai wrote:
>> So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for
>> too long seems unnecessarily and then goes to sleep sometimes due to direct
>> reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
>> -ENOMEM),
>
> Thanks for looking into this! I noticed that recent kernels could take a
> VERY long time trying to do high order allocations. In my case it was trying
> to do dynamic hugetlb page allocations as well [1]. But, IMO this is more
> of a general direct reclaim/compation issue than something hugetlb specific.
>

<snip>

>> Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the
>> semaphore to protect concurrency access, so it could just be converted to a
>> spinlock instead.
>
> I do not have enough experience with this ipc code to comment on your proposed
> change. But, I will look into it.
>
> [1] https://lkml.org/lkml/2019/4/23/2

I only took a quick look at the ipc code, but there does not appear to be
a quick/easy change to make. The issue is that shared memory creation could
take a long time. With issue [1] above unresolved, creation of hugetlb backed
shared memory segments could take a VERY long time.

I do not believe the test failure is arm specific. Most likely, it is just
because testing was done on a system with memory size to trigger this issue?

My plan is to focus on [1]. When that is resolved, this issue should go away.
--
Mike Kravetz

2019-06-27 18:55:07

by Qian Cai

[permalink] [raw]
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

On Thu, 2019-06-27 at 11:09 -0700, Mike Kravetz wrote:
> On 6/24/19 2:53 PM, Mike Kravetz wrote:
> > On 6/24/19 2:30 PM, Qian Cai wrote:
> > > So the problem is that ipcget_public() has held the semaphore "ids->rwsem"
> > > for
> > > too long seems unnecessarily and then goes to sleep sometimes due to
> > > direct
> > > reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
> > > -ENOMEM),
> >
> > Thanks for looking into this!  I noticed that recent kernels could take a
> > VERY long time trying to do high order allocations.  In my case it was
> > trying
> > to do dynamic hugetlb page allocations as well [1].  But, IMO this is more
> > of a general direct reclaim/compation issue than something hugetlb specific.
> >
>
> <snip>
>
> > > Ideally, it seems only ipc_findkey() and newseg() in this path needs to
> > > hold the
> > > semaphore to protect concurrency access, so it could just be converted to
> > > a
> > > spinlock instead.
> >
> > I do not have enough experience with this ipc code to comment on your
> > proposed
> > change.  But, I will look into it.
> >
> > [1] https://lkml.org/lkml/2019/4/23/2
>
> I only took a quick look at the ipc code, but there does not appear to be
> a quick/easy change to make.  The issue is that shared memory creation could
> take a long time.  With issue [1] above unresolved, creation of hugetlb backed
> shared memory segments could take a VERY long time.
>
> I do not believe the test failure is arm specific.  Most likely, it is just
> because testing was done on a system with memory size to trigger this issue?

I think it is because the arm64 machine has the default hugepage size in 512M
instead of 2M on other arches, but the test case still blindly try to allocate
around 200 of hugepages which the system can't handle gracefully, i.e., return
-ENOMEM in reasonable time.

>
> My plan is to focus on [1].  When that is resolved, this issue should go away.