2016-10-27 14:29:12

by Murphy Zhou

[permalink] [raw]
Subject: multi-threads libvmmalloc fork test hang


# description

nvml test suite vmmalloc_fork test hang.

$ ps -eo stat,comm | grep vmma
S+ vmmalloc_fork
Sl+ vmmalloc_fork
Z+ vmmalloc_fork <defunct>
Sl+ vmmalloc_fork
Z+ vmmalloc_fork <defunct>
Z+ vmmalloc_fork <defunct>
Sl+ vmmalloc_fork
Z+ vmmalloc_fork <defunct>
Z+ vmmalloc_fork <defunct>
Z+ vmmalloc_fork <defunct>

dmesg:

[ 250.499097] INFO: task vmmalloc_fork:9805 blocked for more than 120 seconds.
[ 250.530667] Not tainted 4.9.09fe68ca+ #27
[ 250.550901] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 250.585752] vmmalloc_fork D[ 250.598362] ffffffff8171813c 0 9805 9765 0x00000080
[ 250.623445] ffff88075dc68f80[ 250.636052] 0000000000000000 ffff88076058db00 ffff88017c5b0000 ffff880763b19340[ 250.668510] ffffc9000fe1bbb0 ffffffff8171813c ffffc9000fe1bc20 ffffc9000fe1bbe0[ 250.704220] ffffffff82248898 ffff88076058db00 ffffffff82248898Call Trace:
[ 250.738382] [<ffffffff8171813c>] ? __schedule+0x21c/0x6a0
[ 250.763404] [<ffffffff817185f6>] schedule+0x36/0x80
[ 250.786177] [<ffffffff81284471>] get_unlocked_mapping_entry+0xc1/0x120
[ 250.815869] [<ffffffff81283810>] ? iomap_dax_rw+0x110/0x110
[ 250.841350] [<ffffffff81284c0a>] grab_mapping_entry+0x4a/0x220
[ 250.868442] [<ffffffff812851e9>] iomap_dax_fault+0xa9/0x3b0
[ 250.894437] [<ffffffffa02b15fe>] xfs_filemap_fault+0xce/0xf0 [xfs]
[ 250.922805] [<ffffffff811d3159>] __do_fault+0x79/0x100
[ 250.947035] [<ffffffff811d7a2b>] do_fault+0x49b/0x690
[ 250.970964] [<ffffffffa02b146c>] ? xfs_filemap_pmd_fault+0x9c/0x160 [xfs]
[ 251.001812] [<ffffffff811d94ba>] handle_mm_fault+0x61a/0xa50
[ 251.027736] [<ffffffff8106c3da>] __do_page_fault+0x22a/0x4a0
[ 251.053700] [<ffffffff8106c680>] do_page_fault+0x30/0x80
[ 251.077962] [<ffffffff81003b55>] ? do_syscall_64+0x175/0x180
[ 251.103835] [<ffffffff8171e208>] page_fault+0x28/0x30


# kernel versions:

v4.6 pass in seconds
v4.7 hang
v4.9-rc1 hang
Linus tree to commit 9fe68ca hang

bisect points to
first bad commit: [ac401cc782429cc8560ce4840b1405d603740917] dax: New fault locking

v4.7 with these 3 commits reverted pass:
4d9a2c8 - Jan Kara, 6 months ago : dax: Remove i_mmap_lock protection
bc2466e - Jan Kara, 6 months ago : dax: Use radix tree entry lock to protect cow faults
ac401cc - Jan Kara, 6 months ago : dax: New fault locking

# nvml version:
https://github.com/pmem/nvml.git
to commit:
feab4d6f65102139ce460890c898fcad09ce20ae

# How reproducible:
always

# Test steps:

<git clone and pmem0 setup>

$cd nvml
$make install -j64

$cat > src/test/testconfig.sh <<EOF
PMEM_FS_DIR=/daxmnt
NON_PMEM_FS_DIR=/tmp
EOF

$mkfs.xfs /dev/pmem0
$mkdir -p /daxmnt/
$mount -o dax /dev/pmem0 /daxmnt/

$make -C src/test/vmmalloc_fork/ TEST_TIME=60m clean
$make -C src/test/vmmalloc_fork/ TEST_TIME=60m check
$umount /daxmnt


2016-10-27 13:53:56

by Jan Kara

[permalink] [raw]
Subject: Re: multi-threads libvmmalloc fork test hang

Thanks for the report. I'll try to reproduce it...

Honza

On Thu 27-10-16 19:22:30, Xiong Zhou wrote:
> # description
>
> nvml test suite vmmalloc_fork test hang.
>
> $ ps -eo stat,comm | grep vmma
> S+ vmmalloc_fork
> Sl+ vmmalloc_fork
> Z+ vmmalloc_fork <defunct>
> Sl+ vmmalloc_fork
> Z+ vmmalloc_fork <defunct>
> Z+ vmmalloc_fork <defunct>
> Sl+ vmmalloc_fork
> Z+ vmmalloc_fork <defunct>
> Z+ vmmalloc_fork <defunct>
> Z+ vmmalloc_fork <defunct>
>
> dmesg:
>
> [ 250.499097] INFO: task vmmalloc_fork:9805 blocked for more than 120 seconds.
> [ 250.530667] Not tainted 4.9.09fe68ca+ #27
> [ 250.550901] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 250.585752] vmmalloc_fork D[ 250.598362] ffffffff8171813c 0 9805 9765 0x00000080
> [ 250.623445] ffff88075dc68f80[ 250.636052] 0000000000000000 ffff88076058db00 ffff88017c5b0000 ffff880763b19340[ 250.668510] ffffc9000fe1bbb0 ffffffff8171813c ffffc9000fe1bc20 ffffc9000fe1bbe0[ 250.704220] ffffffff82248898 ffff88076058db00 ffffffff82248898Call Trace:
> [ 250.738382] [<ffffffff8171813c>] ? __schedule+0x21c/0x6a0
> [ 250.763404] [<ffffffff817185f6>] schedule+0x36/0x80
> [ 250.786177] [<ffffffff81284471>] get_unlocked_mapping_entry+0xc1/0x120
> [ 250.815869] [<ffffffff81283810>] ? iomap_dax_rw+0x110/0x110
> [ 250.841350] [<ffffffff81284c0a>] grab_mapping_entry+0x4a/0x220
> [ 250.868442] [<ffffffff812851e9>] iomap_dax_fault+0xa9/0x3b0
> [ 250.894437] [<ffffffffa02b15fe>] xfs_filemap_fault+0xce/0xf0 [xfs]
> [ 250.922805] [<ffffffff811d3159>] __do_fault+0x79/0x100
> [ 250.947035] [<ffffffff811d7a2b>] do_fault+0x49b/0x690
> [ 250.970964] [<ffffffffa02b146c>] ? xfs_filemap_pmd_fault+0x9c/0x160 [xfs]
> [ 251.001812] [<ffffffff811d94ba>] handle_mm_fault+0x61a/0xa50
> [ 251.027736] [<ffffffff8106c3da>] __do_page_fault+0x22a/0x4a0
> [ 251.053700] [<ffffffff8106c680>] do_page_fault+0x30/0x80
> [ 251.077962] [<ffffffff81003b55>] ? do_syscall_64+0x175/0x180
> [ 251.103835] [<ffffffff8171e208>] page_fault+0x28/0x30
>
>
> # kernel versions:
>
> v4.6 pass in seconds
> v4.7 hang
> v4.9-rc1 hang
> Linus tree to commit 9fe68ca hang
>
> bisect points to
> first bad commit: [ac401cc782429cc8560ce4840b1405d603740917] dax: New fault locking
>
> v4.7 with these 3 commits reverted pass:
> 4d9a2c8 - Jan Kara, 6 months ago : dax: Remove i_mmap_lock protection
> bc2466e - Jan Kara, 6 months ago : dax: Use radix tree entry lock to protect cow faults
> ac401cc - Jan Kara, 6 months ago : dax: New fault locking
>
> # nvml version:
> https://github.com/pmem/nvml.git
> to commit:
> feab4d6f65102139ce460890c898fcad09ce20ae
>
> # How reproducible:
> always
>
> # Test steps:
>
> <git clone and pmem0 setup>
>
> $cd nvml
> $make install -j64
>
> $cat > src/test/testconfig.sh <<EOF
> PMEM_FS_DIR=/daxmnt
> NON_PMEM_FS_DIR=/tmp
> EOF
>
> $mkfs.xfs /dev/pmem0
> $mkdir -p /daxmnt/
> $mount -o dax /dev/pmem0 /daxmnt/
>
> $make -C src/test/vmmalloc_fork/ TEST_TIME=60m clean
> $make -C src/test/vmmalloc_fork/ TEST_TIME=60m check
> $umount /daxmnt
--
Jan Kara <[email protected]>
SUSE Labs, CR