2016-08-24 08:16:15

by Anatoly Pugachev

[permalink] [raw]
Subject: [sparc64] ext4 TPC and call trace (process blocked/stuck) on git kernel 4.8.0-rc3+

Hello!

Running fstests (xfstests) suite on sparc64 debian sid/unstable with
linux kernel 4.8.0-rc3+ , I'm getting the following call trace and TPC
on server console and system logs:

mator@nvg5120:~/xfstests-dev$ ./check ext4/022

[1968295.582734] run fstests ext4/022 at 2016-08-23 18:19:00
[1968298.917006] EXT4-fs (loop1): mounted filesystem with ordered data
mode. Opts: acl,user_xattr
[1968303.922338] EXT4-fs (loop1): mounted filesystem with ordered data
mode. Opts: acl,user_xattr
[1968304.032985] Kernel unaligned access at TPC[101c59d4]
ext4_mark_inode_dirty+0xf4/0x200 [ext4]
[1968493.042138] INFO: task 022:18119 blocked for more than 120 seconds.
[1968493.122028] Not tainted 4.8.0-rc3+ #76
[1968493.176979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1968493.276108] 022 D 00000000008e1edc 0 18119 2687
0x308000107000000
[1968493.373659] Call Trace:
[1968493.406193] [00000000008dfa28] schedule+0x68/0xa0
[1968493.467933] [00000000008e1edc] rwsem_down_write_failed+0x13c/0x1e0
[1968493.547804] [00000000008e16e4] down_write+0x24/0x40
[1968493.611787] [00000000102027f4]
ext4_expand_extra_isize_ea+0x54/0x760 [ext4]
[1968493.701731] [00000000101c5a04] ext4_mark_inode_dirty+0x124/0x200 [ext4]
[1968493.786655] [00000000101c945c] ext4_dirty_inode+0x3c/0x60 [ext4]
[1968493.864366] [00000000005b4dc4] __mark_inode_dirty+0x24/0x260
[1968493.937888] [00000000101f7518] ext4_mb_new_blocks+0x118/0xa00 [ext4]
[1968494.019896] [00000000101b8a74] ext4_new_meta_blocks+0xb4/0xe0 [ext4]
[1968494.101994] [000000001020147c] ext4_xattr_block_set+0x7dc/0xa80 [ext4]
[1968494.185871] [0000000010202d44]
ext4_expand_extra_isize_ea+0x5a4/0x760 [ext4]
[1968494.276431] [00000000101c5a04] ext4_mark_inode_dirty+0x124/0x200 [ext4]
[1968494.361345] [00000000101ec074] ext4_ext_truncate+0x14/0xe0 [ext4]
[1968494.440200] [00000000101c6d34] ext4_truncate+0x254/0x3a0 [ext4]
[1968494.516622] [00000000101c90c0] ext4_setattr+0x5c0/0x920 [ext4]
[1968494.591953] [00000000005a7358] notify_change+0x2f8/0x4a0
[1968494.660866] INFO: task jbd2/loop1-8:19633 blocked for more than
120 seconds.
[1968494.750050] Not tainted 4.8.0-rc3+ #76
[1968494.805325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1968494.904421] jbd2/loop1-8 D 0000000010199684 0 19633 2
0x06000000
[1968494.994475] Call Trace:
[1968495.026968] [00000000008dfa28] schedule+0x68/0xa0
[1968495.088380] [0000000010199684]
jbd2_journal_commit_transaction+0x224/0x1580 [jbd2]
[1968495.185728] [000000001019dec0] kjournald2+0xa0/0x240 [jbd2]
[1968495.258797] [0000000000480b70] kthread+0xb0/0xe0
[1968495.319106] [0000000000406044] ret_from_fork+0x1c/0x2c
[1968495.386182] [0000000000000000] (null)

I'll try to bisect it, but not earlier than 4.4 kernel (not sure i'll
able to boot 4.3).

xfstests setup is following:

mator@nvg5120:~/xfstests-dev$ git remote -v
origin git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git (fetch)
origin git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git (push)

mator@nvg5120:~/xfstests-dev$ cat local.config
export TEST_DEV=/dev/loop0
export TEST_DIR=/fst
export SCRATCH_DEV=/dev/loop1
export SCRATCH_MNT=/mnt/scratch

root@nvg5120:/home/mator/xfstests-dev# mkfs.ext4 /dev/loop0

/dev/loop0 and /dev/loop1 , each one is 6Gb file on tmpfs/ramdisk made
with fallocate.

mator@nvg5120:~/xfstests-dev$ findmnt /ramdisk/
TARGET SOURCE FSTYPE OPTIONS
/ramdisk tmpfs tmpfs rw,relatime,size=13631488k
mator@nvg5120:~/xfstests-dev$ sudo losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO
/dev/loop1 0 0 0 0 /ramdisk/scratchvol1 0
/dev/loop0 0 0 0 0 /ramdisk/testvol1 0

Thanks.


2016-08-24 08:27:36

by Eryu Guan

[permalink] [raw]
Subject: Re: [sparc64] ext4 TPC and call trace (process blocked/stuck) on git kernel 4.8.0-rc3+

On Wed, Aug 24, 2016 at 11:13:25AM +0300, Anatoly Pugachev wrote:
> Hello!
>
> Running fstests (xfstests) suite on sparc64 debian sid/unstable with
> linux kernel 4.8.0-rc3+ , I'm getting the following call trace and TPC
> on server console and system logs:
>
> mator@nvg5120:~/xfstests-dev$ ./check ext4/022

ext4/022 is known to hang, and AFAIK the fixes are pending on review and
not merged to mainline kernel yet. Please see this patchset

https://www.spinics.net/lists/linux-ext4/msg53200.html

And ext4/022 is in dangerous group, I think you can skip it by adding
"-x dangerous" to your check options (skip all tests in dangerous
group), e.g.

./check -g auto -x dangerous

Thanks,
Eryu

2017-04-21 16:38:50

by Anatoly Pugachev

[permalink] [raw]
Subject: Re: [sparc64] ext4 TPC and call trace (process blocked/stuck) on git kernel 4.8.0-rc3+

On Wed, Aug 24, 2016 at 11:13 AM, Anatoly Pugachev <[email protected]> wrote:
> Hello!
>
> Running fstests (xfstests) suite on sparc64 debian sid/unstable with
> linux kernel 4.8.0-rc3+ , I'm getting the following call trace and TPC
> on server console and system logs:


got another hung task (sys)log, but not with xfstests and with some
debug (kernel locks) output (kernel is 4.10.0-git-something). Can
someone knowing kernel have a look and maybe spot something. If not,
thanks anyway and machine still works. Thanks!

Apr 19 19:30:43 landau kernel: INFO: task ostree:107749 blocked for
more than 120 seconds.
Apr 19 19:30:43 landau kernel: Tainted: G W
4.10.0-10531-g86292b33d4b7 #77
Apr 19 19:30:43 landau kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 19:30:43 landau kernel: ostree D 0 107749 92509
0x208000103000000
Apr 19 19:30:43 landau kernel: Call Trace:
Apr 19 19:30:43 landau kernel: [0000000000a8e790] schedule+0x30/0xc0
Apr 19 19:30:43 landau kernel: [0000000000681798]
wb_wait_for_completion+0x58/0xa0
Apr 19 19:30:43 landau kernel: [0000000000682048]
__writeback_inodes_sb_nr+0x88/0xc0
Apr 19 19:30:43 landau kernel: [00000000006820e0] writeback_inodes_sb+0x20/0x40
Apr 19 19:30:43 landau kernel: [000000000068a5f4] sync_filesystem+0x34/0xc0
Apr 19 19:30:43 landau kernel: [000000000068a7f8] SyS_syncfs+0x38/0x80
Apr 19 19:30:43 landau kernel: [0000000000406234] linux_sparc_syscall+0x34/0x44
Apr 19 19:30:43 landau kernel:
Showing all locks held in the system:
Apr 19 19:30:43 landau kernel: 2 locks held by khungtaskd/784:
Apr 19 19:30:43 landau kernel: #0: (rcu_read_lock){......}, at:
[<000000000054eb80>] watchdog+0x240/0x7e0
Apr 19 19:30:43 landau kernel: #1: (tasklist_lock){.+.+..}, at:
[<00000000004d2d04>] debug_show_all_locks+0x64/0x1c0
Apr 19 19:30:43 landau kernel: 1 lock held by in:imklog/1484:
Apr 19 19:30:43 landau kernel: #0: (&f->f_pos_lock){+.+.+.}, at:
[<000000000066f8ac>] __fdget_pos+0x4c/0x60
Apr 19 19:30:43 landau kernel: 5 locks held by kworker/u512:0/91935:
Apr 19 19:30:43 landau kernel: #0: ("writeback"){.+.+.+}, at:
[<000000000048a5b4>] process_one_work+0x174/0x840
Apr 19 19:30:43 landau kernel: #1: ((&(&wb->dwork)->work)){+.+.+.},
at: [<000000000048a5b4>] process_one_work+0x174/0x840
Apr 19 19:30:43 landau kernel: #2:
(&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<00000000005c99ac>]
do_writepages+0x2c/0x60
Apr 19 19:30:43 landau kernel: #3: (jbd2_handle){++++..}, at:
[<00000000101647c8>] start_this_handle+0x108/0x3e0 [jbd2]
Apr 19 19:30:43 landau kernel: #4: (&ei->i_data_sem){++++..}, at:
[<000000001019fb38>] ext4_map_blocks+0x138/0x6a0 [ext4]
Apr 19 19:30:43 landau kernel: 1 lock held by ostree/107749:
Apr 19 19:30:43 landau kernel: #0: (&type->s_umount_key#29){++++++},
at: [<000000000068a7f0>] SyS_syncfs+0x30/0x80
Apr 19 19:30:43 landau kernel:
Apr 19 19:30:43 landau kernel: =============================================
Apr 19 20:27:06 landau kernel: INFO: task ostree:73769 blocked for
more than 120 seconds.
Apr 19 20:27:06 landau kernel: Tainted: G W
4.10.0-10531-g86292b33d4b7 #77
Apr 19 20:27:06 landau kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 20:27:06 landau kernel: ostree D 0 73769 58526
0x208000100000000
Apr 19 20:27:06 landau kernel: Call Trace:
Apr 19 20:27:06 landau kernel: [0000000000a8e790] schedule+0x30/0xc0
Apr 19 20:27:06 landau kernel: [0000000000681798]
wb_wait_for_completion+0x58/0xa0
Apr 19 20:27:06 landau kernel: [0000000000682048]
__writeback_inodes_sb_nr+0x88/0xc0
Apr 19 20:27:06 landau kernel: [00000000006820e0] writeback_inodes_sb+0x20/0x40
Apr 19 20:27:06 landau kernel: [000000000068a5f4] sync_filesystem+0x34/0xc0
Apr 19 20:27:06 landau kernel: [000000000068a7f8] SyS_syncfs+0x38/0x80
Apr 19 20:27:06 landau kernel: [0000000000406234] linux_sparc_syscall+0x34/0x44
Apr 19 20:27:06 landau kernel:
Showing all locks held in the system:
Apr 19 20:27:06 landau kernel: 2 locks held by khungtaskd/784:
Apr 19 20:27:06 landau kernel: #0: (rcu_read_lock){......}, at:
[<000000000054eb80>] watchdog+0x240/0x7e0
Apr 19 20:27:06 landau kernel: #1: (tasklist_lock){.+.+..}, at:
[<00000000004d2d04>] debug_show_all_locks+0x64/0x1c0
Apr 19 20:27:06 landau kernel: 1 lock held by in:imklog/1484:
Apr 19 20:27:06 landau kernel: #0: (&f->f_pos_lock){+.+.+.}, at:
[<000000000066f8ac>] __fdget_pos+0x4c/0x60
Apr 19 20:27:06 landau kernel: 5 locks held by kworker/u512:3/46184:
Apr 19 20:27:06 landau kernel: #0: ("writeback"){.+.+.+}, at:
[<000000000048a5b4>] process_one_work+0x174/0x840
Apr 19 20:27:06 landau kernel: #1: ((&(&wb->dwork)->work)){+.+.+.},
at: [<000000000048a5b4>] process_one_work+0x174/0x840
Apr 19 20:27:06 landau kernel: #2:
(&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<00000000005c99ac>]
do_writepages+0x2c/0x60
Apr 19 20:27:06 landau kernel: #3: (jbd2_handle){++++..}, at:
[<00000000101647c8>] start_this_handle+0x108/0x3e0 [jbd2]
Apr 19 20:27:06 landau kernel: #4: (&ei->i_data_sem){++++..}, at:
[<000000001019fb38>] ext4_map_blocks+0x138/0x6a0 [ext4]
Apr 19 20:27:06 landau kernel: 1 lock held by ostree/73769:
Apr 19 20:27:06 landau kernel: #0: (&type->s_umount_key#29){++++++},
at: [<000000000068a7f0>] SyS_syncfs+0x30/0x80
Apr 19 20:27:06 landau kernel:
Apr 19 20:27:06 landau kernel: =============================================
Apr 19 20:45:14 landau kernel: INFO: task ostree:146257 blocked for
more than 120 seconds.
Apr 19 20:45:14 landau kernel: Tainted: G W
4.10.0-10531-g86292b33d4b7 #77
Apr 19 20:45:14 landau kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 20:45:14 landau kernel: ostree D 0 146257 131005
0x208000100000000
Apr 19 20:45:14 landau kernel: Call Trace:
Apr 19 20:45:14 landau kernel: [0000000000a8e790] schedule+0x30/0xc0
Apr 19 20:45:14 landau kernel: [0000000000681798]
wb_wait_for_completion+0x58/0xa0
Apr 19 20:45:14 landau kernel: [0000000000682048]
__writeback_inodes_sb_nr+0x88/0xc0
Apr 19 20:45:14 landau kernel: [00000000006820e0] writeback_inodes_sb+0x20/0x40
Apr 19 20:45:14 landau kernel: [000000000068a5f4] sync_filesystem+0x34/0xc0
Apr 19 20:45:14 landau kernel: [000000000068a7f8] SyS_syncfs+0x38/0x80
Apr 19 20:45:14 landau kernel: [0000000000406234] linux_sparc_syscall+0x34/0x44
Apr 19 20:45:14 landau kernel:
Showing all locks held in the system:
Apr 19 20:45:14 landau kernel: 2 locks held by khungtaskd/784:
Apr 19 20:45:14 landau kernel: #0: (rcu_read_lock){......}, at:
[<000000000054eb80>] watchdog+0x240/0x7e0
Apr 19 20:45:14 landau kernel: #1: (tasklist_lock){.+.+..}, at:
[<00000000004d2d04>] debug_show_all_locks+0x64/0x1c0
Apr 19 20:45:14 landau kernel: 1 lock held by in:imklog/1484:
Apr 19 20:45:14 landau kernel: #0: (&f->f_pos_lock){+.+.+.}, at:
[<000000000066f8ac>] __fdget_pos+0x4c/0x60
Apr 19 20:45:14 landau kernel: 5 locks held by kworker/u512:0/130949:
Apr 19 20:45:14 landau kernel: #0: ("writeback"){.+.+.+}, at:
[<000000000048a5b4>] process_one_work+0x174/0x840
Apr 19 20:45:14 landau kernel: #1: ((&(&wb->dwork)->work)){+.+.+.},
at: [<000000000048a5b4>] process_one_work+0x174/0x840
Apr 19 20:45:14 landau kernel: #2:
(&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<00000000005c99ac>]
do_writepages+0x2c/0x60
Apr 19 20:45:14 landau kernel: #3: (jbd2_handle){++++..}, at:
[<00000000101647c8>] start_this_handle+0x108/0x3e0 [jbd2]
Apr 19 20:45:14 landau kernel: #4: (&ei->i_data_sem){++++..}, at:
[<000000001019fb38>] ext4_map_blocks+0x138/0x6a0 [ext4]
Apr 19 20:45:14 landau kernel: 1 lock held by ostree/146257:
Apr 19 20:45:14 landau kernel: #0: (&type->s_umount_key#29){++++++},
at: [<000000000068a7f0>] SyS_syncfs+0x30/0x80
Apr 19 20:45:14 landau kernel:
Apr 19 20:45:14 landau kernel: =============================================

2017-04-24 04:05:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [sparc64] ext4 TPC and call trace (process blocked/stuck) on git kernel 4.8.0-rc3+

On Fri, Apr 21, 2017 at 07:38:50PM +0300, Anatoly Pugachev wrote:
> On Wed, Aug 24, 2016 at 11:13 AM, Anatoly Pugachev <[email protected]> wrote:
> > Hello!
> >
> > Running fstests (xfstests) suite on sparc64 debian sid/unstable with
> > linux kernel 4.8.0-rc3+ , I'm getting the following call trace and TPC
> > on server console and system logs:
>
>
> got another hung task (sys)log, but not with xfstests and with some
> debug (kernel locks) output (kernel is 4.10.0-git-something). Can
> someone knowing kernel have a look and maybe spot something. If not,
> thanks anyway and machine still works. Thanks!

When you say that the system still works, I assume that means that
writes to root file system (or what ever file system syslog was trying
to log to, typically /var/log) are still succeeding?

These messages are soft lockup messages, and doesn't necessarily means
that the system was locked up forever:

> Apr 19 19:30:43 landau kernel: INFO: task ostree:107749 blocked for
> more than 120 seconds.

It could just mean that a "sync" command took longer than two minutes
to complete.

> Apr 19 19:30:43 landau kernel: Tainted: G W
> 4.10.0-10531-g86292b33d4b7 #77
> Apr 19 19:30:43 landau kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr 19 19:30:43 landau kernel: ostree D 0 107749 92509
> 0x208000103000000
> Apr 19 19:30:43 landau kernel: Call Trace:
> Apr 19 19:30:43 landau kernel: [0000000000a8e790] schedule+0x30/0xc0
> Apr 19 19:30:43 landau kernel: [0000000000681798]
> wb_wait_for_completion+0x58/0xa0
> Apr 19 19:30:43 landau kernel: [0000000000682048]
> __writeback_inodes_sb_nr+0x88/0xc0
> Apr 19 19:30:43 landau kernel: [00000000006820e0] writeback_inodes_sb+0x20/0x40
> Apr 19 19:30:43 landau kernel: [000000000068a5f4] sync_filesystem+0x34/0xc0
> Apr 19 19:30:43 landau kernel: [000000000068a7f8] SyS_syncfs+0x38/0x80
> Apr 19 19:30:43 landau kernel: [0000000000406234] linux_sparc_syscall+0x34/0x44


Without any more details about the system, its workload, whether or
not the system really was hung forever, or it was functioning normally
when you discovered it, it's hard to really say much else given the
lack of information in your e-mail.

Also note that this stack trace has nothing to do with the e-mail
thread from last August which you replied to. At first I thought
there was a connection, which confused me as I tried to look at the
very small amount of information in your bug report.

Cheers,

- Ted