2019-01-23 03:15:37

by He Zhe

[permalink] [raw]
Subject: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1


LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).

qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'

tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
[   47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
[   47.085330] ------------[ cut here ]------------
[   47.085810] kernel BUG at block/blk-mq.c:767!
[   47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
[   47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[   47.088992] Workqueue: kblockd blk_mq_run_work_fn
[   47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
[   47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
[   47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
[   47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
[   47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
[   47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
[   47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
[   47.096071] FS:  0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
[   47.096898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
[   47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.099650] Call Trace:
[   47.099910]  <IRQ>
[   47.100128]  blk_mq_requeue_request+0x58/0x60
[   47.100576]  ide_requeue_and_plug+0x20/0x50
[   47.101014]  ide_intr+0x21a/0x230
[   47.101362]  ? idecd_open+0xc0/0xc0
[   47.101735]  __handle_irq_event_percpu+0x43/0x1e0
[   47.102214]  handle_irq_event_percpu+0x32/0x80
[   47.102668]  handle_irq_event+0x39/0x60
[   47.103074]  handle_edge_irq+0xe8/0x1c0
[   47.103470]  handle_irq+0x20/0x30
[   47.103819]  do_IRQ+0x46/0xe0
[   47.104128]  common_interrupt+0xf/0xf
[   47.104505]  </IRQ>
[   47.104731] RIP: 0010:ide_output_data+0xbc/0x100
[   47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
[   47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
[   47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
[   47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
[   47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
[   47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
[   47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
[   47.111446]  ide_transfer_pc+0x216/0x310
[   47.111848]  ? __const_udelay+0x3d/0x40
[   47.112236]  ? ide_execute_command+0x85/0xb0
[   47.112668]  ? ide_pc_intr+0x3f0/0x3f0
[   47.113051]  ? ide_check_atapi_device+0x110/0x110
[   47.113524]  ide_issue_pc+0x178/0x240
[   47.113901]  ide_cd_do_request+0x15c/0x350
[   47.114314]  ide_queue_rq+0x180/0x6b0
[   47.114686]  ? blk_mq_get_driver_tag+0xa1/0x110
[   47.115153]  blk_mq_dispatch_rq_list+0x90/0x550
[   47.115606]  ? __queue_delayed_work+0x63/0x90
[   47.116054]  ? deadline_fifo_request+0x41/0x90
[   47.116506]  blk_mq_do_dispatch_sched+0x80/0x100
[   47.116976]  blk_mq_sched_dispatch_requests+0xfc/0x170
[   47.117491]  __blk_mq_run_hw_queue+0x6f/0xd0
[   47.117941]  blk_mq_run_work_fn+0x1b/0x20
[   47.118342]  process_one_work+0x14c/0x450
[   47.118747]  worker_thread+0x4a/0x440
[   47.119125]  kthread+0x105/0x140
[   47.119456]  ? process_one_work+0x450/0x450
[   47.119880]  ? kthread_park+0x90/0x90
[   47.120251]  ret_from_fork+0x35/0x40
[   47.120619] Modules linked in:
[   47.120952] ---[ end trace 4562f716e88fdefe ]---
[   47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
[   47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
[   47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
[   47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
[   47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
[   47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
[   47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
[   47.127988] FS:  0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
[   47.128793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
[   47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.131547] Kernel panic - not syncing: Fatal exception in interrupt
[   47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
[   47.134432] ------------[ cut here ]------------
[   47.134898] sched: Unexpected reschedule of offline CPU#9!
[   47.135459] WARNING: CPU: 5 PID: 146 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x3f/0x50
[   47.136622] Modules linked in:
[   47.136932] CPU: 5 PID: 146 Comm: kworker/5:1H Tainted: G      D           5.0.0-rc3 #1
[   47.137901] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[   47.139031] Workqueue: kblockd blk_mq_run_work_fn
[   47.139508] RIP: 0010:native_smp_send_reschedule+0x3f/0x50
[   47.140065] Code: c0 84 c0 74 17 48 8b 05 3f ab 03 01 be fd 00 00 00 48 8b 40 30 e8 f1 eb bb 00 5d c3 89 fe 48 c7 c7 b0 d1 bc 89 e8 01 4b0
[   47.141926] RSP: 0018:ffff9e1ea4b43910 EFLAGS: 00010086
[   47.142459] RAX: 0000000000000000 RBX: ffff9e1e9d518c00 RCX: 0000000000000006
[   47.143169] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff9e1ea4b55410
[   47.143881] RBP: ffff9e1ea4b43910 R08: 0000000000020ef2 R09: 0000000000000005
[   47.144595] R10: 0000000000000000 R11: ffff9e1ea4b43775 R12: ffff9e1e9d519294
[   47.145312] R13: 0000000000000009 R14: 0000000000000000 R15: 0000000000000001
[   47.146021] FS:  0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
[   47.146830] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.147410] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
[   47.148119] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.148828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.149543] Call Trace:
[   47.149794]  <IRQ>
[   47.150004]  try_to_wake_up+0x257/0x470
[   47.150400]  default_wake_function+0x12/0x20
[   47.150831]  autoremove_wake_function+0x12/0x40
[   47.151296]  __wake_up_common+0x8c/0x130
[   47.151696]  __wake_up_common_lock+0x80/0xc0
[   47.152129]  __wake_up+0x13/0x20
[   47.152469]  wake_up_klogd_work_func+0x40/0x70
[   47.152924]  irq_work_run_list+0x4e/0x80
[   47.153324]  irq_work_tick+0x40/0x50
[   47.153687]  update_process_times+0x42/0x60
[   47.154108]  tick_sched_handle.isra.6+0x34/0x40
[   47.154564]  tick_nohz_handler+0x5b/0x90
[   47.154956]  smp_apic_timer_interrupt+0x70/0x180
[   47.155422]  apic_timer_interrupt+0xf/0x20
[   47.155836] RIP: 0010:panic+0x238/0x284
[   47.156235] Code: 48 83 c3 64 eb b0 83 3d 11 e0 47 01 00 74 05 e8 62 3b 02 00 48 c7 c6 c0 56 0e 8a 48 c7 c7 48 35 bd 89 e8 12 e0 04 00 fb8
[   47.158092] RSP: 0018:ffff9e1ea4b43c08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[   47.158856] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[   47.159568] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff9e1ea4b55410
[   47.160283] RBP: ffff9e1ea4b43c80 R08: 0000000000020a2f R09: 0000000000000005
[   47.160997] R10: 0000000000000000 R11: ffff9e1ea4b43a8d R12: 0000000000000000
[   47.161713] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000002
[   47.162431]  ? apic_timer_interrupt+0xa/0x20
[   47.162861]  ? panic+0x235/0x284
[   47.163192]  oops_end+0x8e/0xa0
[   47.163520]  die+0x42/0x50
[   47.163795]  do_trap+0xdd/0x110
[   47.164119]  do_error_trap+0x7b/0xa0
[   47.164493]  ? blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.164983]  do_invalid_op+0x3b/0x50
[   47.165345]  ? blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.165835]  invalid_op+0x14/0x20
[   47.166173] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.166727] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8
[   47.168581] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
[   47.169109] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
[   47.169823] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
[   47.170542] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
[   47.171249] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
[   47.171960] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
[   47.172677]  blk_mq_requeue_request+0x58/0x60
[   47.173119]  ide_requeue_and_plug+0x20/0x50
[   47.173548]  ide_intr+0x21a/0x230
[   47.173887]  ? idecd_open+0xc0/0xc0
[   47.174241]  __handle_irq_event_percpu+0x43/0x1e0
[   47.174720]  handle_irq_event_percpu+0x32/0x80
[   47.175163]  handle_irq_event+0x39/0x60
[   47.175554]  handle_edge_irq+0xe8/0x1c0
[   47.175943]  handle_irq+0x20/0x30
[   47.176280]  do_IRQ+0x46/0xe0
[   47.176586]  common_interrupt+0xf/0xf
[   47.176956]  </IRQ>
[   47.177173] RIP: 0010:ide_output_data+0xbc/0x100
[   47.177649] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
[   47.179512] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
[   47.180269] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
[   47.180982] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
[   47.181694] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
[   47.182403] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
[   47.183119] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
[   47.183838]  ide_transfer_pc+0x216/0x310
[   47.184231]  ? __const_udelay+0x3d/0x40
[   47.184625]  ? ide_execute_command+0x85/0xb0
[   47.185058]  ? ide_pc_intr+0x3f0/0x3f0
[   47.185440]  ? ide_check_atapi_device+0x110/0x110
[   47.185915]  ide_issue_pc+0x178/0x240
[   47.186284]  ide_cd_do_request+0x15c/0x350
[   47.186700]  ide_queue_rq+0x180/0x6b0
[   47.187070]  ? blk_mq_get_driver_tag+0xa1/0x110
[   47.187539]  blk_mq_dispatch_rq_list+0x90/0x550
[   47.187996]  ? __queue_delayed_work+0x63/0x90
[   47.188441]  ? deadline_fifo_request+0x41/0x90
[   47.188895]  blk_mq_do_dispatch_sched+0x80/0x100
[   47.189362]  blk_mq_sched_dispatch_requests+0xfc/0x170
[   47.189891]  __blk_mq_run_hw_queue+0x6f/0xd0
[   47.190320]  blk_mq_run_work_fn+0x1b/0x20
[   47.190727]  process_one_work+0x14c/0x450
[   47.191136]  worker_thread+0x4a/0x440
[   47.191509]  kthread+0x105/0x140
[   47.191845]  ? process_one_work+0x450/0x450
[   47.192273]  ? kthread_park+0x90/0x90
[   47.192649]  ret_from_fork+0x35/0x40
[   47.193009] ---[ end trace 4562f716e88fdeff ]---
[   47.193474] ------------[ cut here ]------------
[   47.193947] sched: Unexpected reschedule of offline CPU#6!
[   47.194494] WARNING: CPU: 5 PID: 146 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x3f/0x50
[   47.195653] Modules linked in:
[   47.195966] CPU: 5 PID: 146 Comm: kworker/5:1H Tainted: G      D W         5.0.0-rc3 #1
[   47.196939] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[   47.198056] Workqueue: kblockd blk_mq_run_work_fn
[   47.198526] RIP: 0010:native_smp_send_reschedule+0x3f/0x50
[   47.199075] Code: c0 84 c0 74 17 48 8b 05 3f ab 03 01 be fd 00 00 00 48 8b 40 30 e8 f1 eb bb 00 5d c3 89 fe 48 c7 c7 b0 d1 bc 89 e8 01 4b0
[   47.200930] RSP: 0018:ffff9e1ea4b43868 EFLAGS: 00010082
[   47.201456] RAX: 0000000000000000 RBX: ffff9e1ea08da400 RCX: 0000000000000006
[   47.202173] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9e1ea4b55410
[   47.202887] RBP: ffff9e1ea4b43868 R08: 000000000002f59b R09: 0000000000000005
[   47.203596] R10: 0000000000000000 R11: ffff9e1ea4b436cd R12: ffff9e1ea08daa94
[   47.204305] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000001
[   47.205036] FS:  0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
[   47.205836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.206409] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
[   47.207123] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.207832] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.208545] Call Trace:
[   47.208797]  <IRQ>
[   47.209005]  try_to_wake_up+0x257/0x470
[   47.209392]  default_wake_function+0x12/0x20
[   47.209826]  __wake_up_common+0x8c/0x130
[   47.210220]  __wake_up_locked+0x16/0x20
[   47.210608]  ep_poll_callback+0x1c1/0x310
[   47.211012]  __wake_up_common+0x8c/0x130
[   47.211408]  __wake_up_common_lock+0x80/0xc0
[   47.211840]  __wake_up+0x13/0x20
[   47.212168]  wake_up_klogd_work_func+0x40/0x70
[   47.212622]  irq_work_run_list+0x4e/0x80
[   47.213022]  irq_work_tick+0x40/0x50
[   47.213382]  update_process_times+0x42/0x60
[   47.213806]  tick_sched_handle.isra.6+0x34/0x40
[   47.214261]  tick_nohz_handler+0x5b/0x90
[   47.214659]  smp_apic_timer_interrupt+0x70/0x180
[   47.215131]  apic_timer_interrupt+0xf/0x20
[   47.215544] RIP: 0010:panic+0x238/0x284
[   47.215931] Code: 48 83 c3 64 eb b0 83 3d 11 e0 47 01 00 74 05 e8 62 3b 02 00 48 c7 c6 c0 56 0e 8a 48 c7 c7 48 35 bd 89 e8 12 e0
[   47.217782] RSP: 0018:ffff9e1ea4b43c08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[   47.218531] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[   47.219256] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff9e1ea4b55410
[   47.219971] RBP: ffff9e1ea4b43c80 R08: 0000000000020a2f R09: 0000000000000005
[   47.220677] R10: 0000000000000000 R11: ffff9e1ea4b43a8d R12: 0000000000000000
[   47.221389] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000002
[   47.222104]  ? apic_timer_interrupt+0xa/0x20
[   47.222536]  ? panic+0x235/0x284
[   47.222866]  oops_end+0x8e/0xa0
[   47.223185]  die+0x42/0x50
[   47.223462]  do_trap+0xdd/0x110
[   47.223786]  do_error_trap+0x7b/0xa0
[   47.224147]  ? blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.224641]  do_invalid_op+0x3b/0x50
[   47.225004]  ? blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.225494]  invalid_op+0x14/0x20
[   47.225837] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
[   47.226388] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
[   47.228260] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
[   47.228790] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
[   47.229520] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
[   47.230246] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
[   47.230977] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
[   47.231699] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
[   47.232423]  blk_mq_requeue_request+0x58/0x60
[   47.232877]  ide_requeue_and_plug+0x20/0x50
[   47.233306]  ide_intr+0x21a/0x230
[   47.233648]  ? idecd_open+0xc0/0xc0
[   47.234022]  __handle_irq_event_percpu+0x43/0x1e0
[   47.234500]  handle_irq_event_percpu+0x32/0x80
[   47.234955]  handle_irq_event+0x39/0x60
[   47.235351]  handle_edge_irq+0xe8/0x1c0
[   47.235744]  handle_irq+0x20/0x30
[   47.236093]  do_IRQ+0x46/0xe0
[   47.236399]  common_interrupt+0xf/0xf
[   47.236779]  </IRQ>
[   47.237006] RIP: 0010:ide_output_data+0xbc/0x100
[   47.237477] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
[   47.239359] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
[   47.240130] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
[   47.240855] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
[   47.241581] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
[   47.242312] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
[   47.243038] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
[   47.243758]  ide_transfer_pc+0x216/0x310
[   47.244166]  ? __const_udelay+0x3d/0x40
[   47.244570]  ? ide_execute_command+0x85/0xb0
[   47.245021]  ? ide_pc_intr+0x3f0/0x3f0
[   47.245409]  ? ide_check_atapi_device+0x110/0x110
[   47.245886]  ide_issue_pc+0x178/0x240
[   47.246267]  ide_cd_do_request+0x15c/0x350
[   47.246689]  ide_queue_rq+0x180/0x6b0
[   47.247072]  ? blk_mq_get_driver_tag+0xa1/0x110
[   47.247536]  blk_mq_dispatch_rq_list+0x90/0x550
[   47.247998]  ? __queue_delayed_work+0x63/0x90
[   47.248457]  ? deadline_fifo_request+0x41/0x90
[   47.248914]  blk_mq_do_dispatch_sched+0x80/0x100
[   47.249395]  blk_mq_sched_dispatch_requests+0xfc/0x170
[   47.249922]  __blk_mq_run_hw_queue+0x6f/0xd0
[   47.250365]  blk_mq_run_work_fn+0x1b/0x20
[   47.250782]  process_one_work+0x14c/0x450
[   47.251188]  worker_thread+0x4a/0x440
[   47.251566]  kthread+0x105/0x140
[   47.251900]  ? process_one_work+0x450/0x450
[   47.252335]  ? kthread_park+0x90/0x90
[   47.252715]  ret_from_fork+0x35/0x40
[   47.253090] ---[ end trace 4562f716e88fdf00 ]---


Zhe



2019-01-23 03:41:15

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>
>
> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>
> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>
> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
> [ 47.085330] ------------[ cut here ]------------
> [ 47.085810] kernel BUG at block/blk-mq.c:767!
> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 47.099650] Call Trace:
> [ 47.099910] <IRQ>
> [ 47.100128] blk_mq_requeue_request+0x58/0x60
> [ 47.100576] ide_requeue_and_plug+0x20/0x50
> [ 47.101014] ide_intr+0x21a/0x230
> [ 47.101362] ? idecd_open+0xc0/0xc0
> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
> [ 47.102214] handle_irq_event_percpu+0x32/0x80
> [ 47.102668] handle_irq_event+0x39/0x60
> [ 47.103074] handle_edge_irq+0xe8/0x1c0
> [ 47.103470] handle_irq+0x20/0x30
> [ 47.103819] do_IRQ+0x46/0xe0
> [ 47.104128] common_interrupt+0xf/0xf
> [ 47.104505] </IRQ>
> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
> [ 47.111446] ide_transfer_pc+0x216/0x310
> [ 47.111848] ? __const_udelay+0x3d/0x40
> [ 47.112236] ? ide_execute_command+0x85/0xb0
> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
> [ 47.113524] ide_issue_pc+0x178/0x240
> [ 47.113901] ide_cd_do_request+0x15c/0x350
> [ 47.114314] ide_queue_rq+0x180/0x6b0
> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
> [ 47.115606] ? __queue_delayed_work+0x63/0x90
> [ 47.116054] ? deadline_fifo_request+0x41/0x90
> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
> [ 47.118342] process_one_work+0x14c/0x450
> [ 47.118747] worker_thread+0x4a/0x440
> [ 47.119125] kthread+0x105/0x140
> [ 47.119456] ? process_one_work+0x450/0x450
> [ 47.119880] ? kthread_park+0x90/0x90
> [ 47.120251] ret_from_fork+0x35/0x40
> [ 47.120619] Modules linked in:
> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> [ 47.134432] ------------[ cut here ]-----------

I’ll take a look at this, thanks for the report.


Jens Axboe


2019-01-23 16:08:44

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On 1/22/19 8:39 PM, Jens Axboe wrote:
> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>
>>
>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>
>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>
>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>> [ 47.085330] ------------[ cut here ]------------
>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 47.099650] Call Trace:
>> [ 47.099910] <IRQ>
>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>> [ 47.101014] ide_intr+0x21a/0x230
>> [ 47.101362] ? idecd_open+0xc0/0xc0
>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>> [ 47.102668] handle_irq_event+0x39/0x60
>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>> [ 47.103470] handle_irq+0x20/0x30
>> [ 47.103819] do_IRQ+0x46/0xe0
>> [ 47.104128] common_interrupt+0xf/0xf
>> [ 47.104505] </IRQ>
>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>> [ 47.111446] ide_transfer_pc+0x216/0x310
>> [ 47.111848] ? __const_udelay+0x3d/0x40
>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>> [ 47.113524] ide_issue_pc+0x178/0x240
>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>> [ 47.118342] process_one_work+0x14c/0x450
>> [ 47.118747] worker_thread+0x4a/0x440
>> [ 47.119125] kthread+0x105/0x140
>> [ 47.119456] ? process_one_work+0x450/0x450
>> [ 47.119880] ? kthread_park+0x90/0x90
>> [ 47.120251] ret_from_fork+0x35/0x40
>> [ 47.120619] Modules linked in:
>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>> [ 47.134432] ------------[ cut here ]-----------
>
> I’ll take a look at this, thanks for the report.

I can't reproduce this, unfortunately. But I'm guessing it might be related
to a race with the requeue and request handling in IDE. Can you try with
the below patch?


diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 8445b484ae69..a10347a6505a 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -439,7 +439,7 @@ static inline void ide_unlock_host(struct ide_host *host)
}
}

-void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
+static void __ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
{
struct request_queue *q = drive->queue;

@@ -451,6 +451,16 @@ void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
blk_mq_delay_run_hw_queue(q->queue_hw_ctx[0], 3);
}

+void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
+{
+ ide_hwif_t *hwif = drive->hwif;
+ unsigned long flags;
+
+ spin_lock_irqsave(&hwif->lock, flags);
+ __ide_requeue_and_plug(drive, rq);
+ spin_unlock_irqrestore(&hwif->lock, flags);
+}
+
/*
* Issue a new request to a device.
*/
@@ -560,9 +570,9 @@ blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
}
} else {
plug_device:
+ __ide_requeue_and_plug(drive, rq);
spin_unlock_irq(&hwif->lock);
ide_unlock_host(host);
- ide_requeue_and_plug(drive, rq);
return BLK_STS_OK;
}

@@ -687,12 +697,14 @@ void ide_timer_expiry (struct timer_list *t)
plug_device = 1;
}
}
+
+ if (plug_device)
+ __ide_requeue_and_plug(drive, rq_in_flight);
+
spin_unlock_irqrestore(&hwif->lock, flags);

- if (plug_device) {
+ if (plug_device)
ide_unlock_host(hwif->host);
- ide_requeue_and_plug(drive, rq_in_flight);
- }
}

/**

--
Jens Axboe


2019-01-24 08:26:41

by He Zhe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1



On 1/24/19 12:05 AM, Jens Axboe wrote:
> On 1/22/19 8:39 PM, Jens Axboe wrote:
>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>
>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>
>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>> [ 47.085330] ------------[ cut here ]------------
>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ 47.099650] Call Trace:
>>> [ 47.099910] <IRQ>
>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>> [ 47.101014] ide_intr+0x21a/0x230
>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>> [ 47.102668] handle_irq_event+0x39/0x60
>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>> [ 47.103470] handle_irq+0x20/0x30
>>> [ 47.103819] do_IRQ+0x46/0xe0
>>> [ 47.104128] common_interrupt+0xf/0xf
>>> [ 47.104505] </IRQ>
>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>> [ 47.118342] process_one_work+0x14c/0x450
>>> [ 47.118747] worker_thread+0x4a/0x440
>>> [ 47.119125] kthread+0x105/0x140
>>> [ 47.119456] ? process_one_work+0x450/0x450
>>> [ 47.119880] ? kthread_park+0x90/0x90
>>> [ 47.120251] ret_from_fork+0x35/0x40
>>> [ 47.120619] Modules linked in:
>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>> [ 47.134432] ------------[ cut here ]-----------
>> I’ll take a look at this, thanks for the report.
> I can't reproduce this, unfortunately. But I'm guessing it might be related
> to a race with the requeue and request handling in IDE. Can you try with
> the below patch?

Thanks for attention, but this patch doesn't help.

From what I reached, to reproduce it, we need to use legacy CONFIG_IDE and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so on.
I can reproduce it with qemux86-64 with "-drive ... if=ide" or "-drive ... if=virtio".


Zhe








2019-01-24 15:42:06

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On 1/24/19 1:23 AM, He Zhe wrote:
>
>
> On 1/24/19 12:05 AM, Jens Axboe wrote:
>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>
>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>
>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>> [ 47.085330] ------------[ cut here ]------------
>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [ 47.099650] Call Trace:
>>>> [ 47.099910] <IRQ>
>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>> [ 47.103470] handle_irq+0x20/0x30
>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>> [ 47.104505] </IRQ>
>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>> [ 47.119125] kthread+0x105/0x140
>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>> [ 47.120619] Modules linked in:
>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>> [ 47.134432] ------------[ cut here ]-----------
>>> I’ll take a look at this, thanks for the report.
>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>> to a race with the requeue and request handling in IDE. Can you try with
>> the below patch?
>
> Thanks for attention, but this patch doesn't help.

OK, thanks for trying.

> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
> "-drive ... if=virtio".

Yes of course, I am trying to reproduce with the legacy stack. Just not
having any luck doing it, even thousands of read_all iterations on /proc
with IO to the IDE drives is still not doing it. From your trace, it
looks like ide-cd causing it, which would mean a cdrom of some sort.

I'll try again!

--
Jens Axboe


2019-01-25 01:43:31

by He Zhe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1



On 1/24/19 11:40 PM, Jens Axboe wrote:
> On 1/24/19 1:23 AM, He Zhe wrote:
>>
>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>
>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>
>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [ 47.099650] Call Trace:
>>>>> [ 47.099910] <IRQ>
>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>> [ 47.104505] </IRQ>
>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>> [ 47.119125] kthread+0x105/0x140
>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>> [ 47.120619] Modules linked in:
>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>> I’ll take a look at this, thanks for the report.
>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>> to a race with the requeue and request handling in IDE. Can you try with
>>> the below patch?
>> Thanks for attention, but this patch doesn't help.
> OK, thanks for trying.
>
>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>> "-drive ... if=virtio".
> Yes of course, I am trying to reproduce with the legacy stack. Just not
> having any luck doing it, even thousands of read_all iterations on /proc
> with IO to the IDE drives is still not doing it. From your trace, it
> looks like ide-cd causing it, which would mean a cdrom of some sort.
>
> I'll try again!

My .config is attached.

Zhe

>


Attachments:
config (127.34 kB)

2019-01-25 02:06:41

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On 1/24/19 6:40 PM, He Zhe wrote:
>
>
> On 1/24/19 11:40 PM, Jens Axboe wrote:
>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>
>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>
>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>
>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> [ 47.099650] Call Trace:
>>>>>> [ 47.099910] <IRQ>
>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>> [ 47.104505] </IRQ>
>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>> [ 47.120619] Modules linked in:
>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>> I’ll take a look at this, thanks for the report.
>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>> the below patch?
>>> Thanks for attention, but this patch doesn't help.
>> OK, thanks for trying.
>>
>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>> "-drive ... if=virtio".
>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>> having any luck doing it, even thousands of read_all iterations on /proc
>> with IO to the IDE drives is still not doing it. From your trace, it
>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>
>> I'll try again!
>
> My .config is attached.

Can you try with this patch?


diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 63627be0811a..3e13c7c243f4 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -778,7 +778,7 @@ static int ide_init_queue(ide_drive_t *drive)
set = &drive->tag_set;
set->ops = &ide_mq_ops;
set->nr_hw_queues = 1;
- set->queue_depth = 32;
+ set->queue_depth = 1;
set->reserved_tags = 1;
set->cmd_size = sizeof(struct ide_request);
set->numa_node = hwif_to_node(hwif);

--
Jens Axboe


2019-01-25 02:23:11

by He Zhe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1



On 1/25/19 10:05 AM, Jens Axboe wrote:
> On 1/24/19 6:40 PM, He Zhe wrote:
>>
>> On 1/24/19 11:40 PM, Jens Axboe wrote:
>>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>>
>>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>>
>>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> [ 47.099650] Call Trace:
>>>>>>> [ 47.099910] <IRQ>
>>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>>> [ 47.104505] </IRQ>
>>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>>> [ 47.120619] Modules linked in:
>>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>>> I’ll take a look at this, thanks for the report.
>>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>>> the below patch?
>>>> Thanks for attention, but this patch doesn't help.
>>> OK, thanks for trying.
>>>
>>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>>> "-drive ... if=virtio".
>>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>>> having any luck doing it, even thousands of read_all iterations on /proc
>>> with IO to the IDE drives is still not doing it. From your trace, it
>>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>>
>>> I'll try again!
>> My .config is attached.
> Can you try with this patch?

With this patch, it cannot boot up.

[    7.832554] Uniform Multi-Platform E-IDE driver
[    7.833067] piix 0000:00:01.1: IDE controller (0x8086:0x7010 rev 0x00)
[    7.833796] piix 0000:00:01.1: not 100% native mode: will probe irqs later
[    7.834493]     ide0: BM-DMA at 0xc060-0xc067
[    7.835361]     ide1: BM-DMA at 0xc068-0xc06f
[    8.104196] hda: QEMU HARDDISK, ATA DISK drive
[    8.728309] hda: MWDMA2 mode selected
[    9.413190] hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
[   10.037303] hdc: MWDMA2 mode selected
[   10.037863] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[   10.038333] ide: failed to init hda
[   10.038733] ide1 at 0x170-0x177,0x376 on irq 15
[   10.039195] ide: failed to init hdc
[   10.039646] piix 0000:00:01.1: IDE controller (0x8086:0x7010 rev 0x00)
[   10.040364] piix 0000:00:01.1: not 100% native mode: will probe irqs later
[   10.041054]     ide0: BM-DMA at 0xc060-0xc067
[   10.041508]     ide1: BM-DMA at 0xc068-0xc06f
[   10.310192] hda: QEMU HARDDISK, ATA DISK drive
[   10.934172] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.1/ide0'
[   10.935001] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc3 #5
[   10.935134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[   10.935134] Call Trace:
[   10.935134]  dump_stack+0x62/0x95
[   10.935134]  sysfs_warn_dup+0x79/0x90
[   10.935134]  sysfs_create_dir_ns+0x211/0x290
[   10.935134]  ? sysfs_create_mount_point+0x80/0x80
[   10.935134]  ? kernel_init_freeable+0x4c9/0x659
[   10.935134]  ? kasan_check_read+0x16/0x20
[   10.935134]  ? _raw_spin_lock+0x8c/0xf0
[   10.935134]  ? _raw_spin_lock_irq+0xf0/0xf0
[   10.935134]  ? msleep+0x55/0x80
[   10.935134]  kobject_add_internal+0x1f7/0x770
[   10.935134]  kobject_add+0x11f/0x1f0
[   10.935134]  ? kobject_add_internal+0x770/0x770
[   10.935134]  ? __kmalloc_track_caller+0x6d/0x250
[   10.935134]  ? kasan_kmalloc+0x11/0x20
[   10.935134]  device_add+0x2ca/0x1460
[   10.935134]  ? kobject_set_name_vargs+0xb4/0x130
[   10.935134]  ? get_device_parent.isra.13+0x5e0/0x5e0
[   10.935134]  ? pm_runtime_init+0x341/0x3f0
[   10.935134]  device_register+0x1a/0x20
[   10.935134]  ide_host_register+0x965/0x1a20
[   10.935134]  ide_pci_init_two+0xb63/0x18e0
[   10.935134]  ? kmem_cache_alloc+0x170/0x200
[   10.935134]  ? __kernfs_new_node+0xd8/0x610
[   10.935134]  ? ide_pci_setup_ports+0x920/0x920
[   10.935134]  ? radix_tree_iter_tag_clear+0x50/0x90
[   10.935134]  ? idr_alloc_u32+0x1a9/0x340
[   10.935134]  ? __save_stack_trace+0x82/0x100
[   10.935134]  ? __fprop_inc_percpu_max+0x1f0/0x1f0
[   10.935134]  ? devres_free+0x40/0x60
[   10.935134]  ? idr_alloc_cyclic+0xf2/0x1e0
[   10.935134]  ? __mutex_lock_slowpath+0x20/0x20
[   10.935134]  ? mutex_lock+0x88/0xf0
[   10.935134]  ? kasan_check_read+0x16/0x20
[   10.935134]  ? rpm_resume+0x1d5/0x1600
[   10.935134]  ? _raw_spin_lock+0x8c/0xf0
[   10.935134]  ? rpm_put_suppliers+0x140/0x140
[   10.935134]  ? kasan_check_read+0x16/0x20
[   10.935134]  ? _raw_spin_lock_irqsave+0x92/0x100
[   10.935134]  ? _raw_read_unlock_irq+0x40/0x40
[   10.935134]  ? mutex_lock+0x88/0xf0
[   10.935134]  ? __mutex_lock_slowpath+0x20/0x20
[   10.935134]  ide_pci_init_one+0x16/0x20
[   10.935134]  piix_init_one+0x4a/0x60
[   10.935134]  pci_device_probe+0x27c/0x490
[   10.935134]  really_probe+0x465/0x720
[   10.935134]  driver_probe_device+0xd1/0x210
[   10.935134]  __driver_attach+0x214/0x280
[   10.935134]  ? driver_probe_device+0x210/0x210
[   10.935134]  bus_for_each_dev+0x11d/0x1e0
[   10.935134]  ? subsys_dev_iter_exit+0x10/0x10
[   10.935134]  ? kasan_check_write+0x19/0x20
[   10.935134]  ? klist_node_init+0x66/0x120
[   10.935134]  driver_attach+0x3d/0x50
[   10.935134]  bus_add_driver+0x421/0x5d0
[   10.935134]  driver_register+0x18c/0x420
[   10.935134]  __pci_register_driver+0x1a8/0x270
[   10.935134]  ide_scan_pcibus+0x20b/0x26d
[   10.935134]  ? piix_ide_init+0x15d/0x15d
[   10.935134]  do_one_initcall+0xd1/0x3b0
[   10.935134]  ? initcall_blacklisted+0x160/0x160
[   10.935134]  ? __memset+0x27/0x30
[   10.935134]  ? kasan_unpoison_shadow+0x3b/0x50
[   10.935134]  kernel_init_freeable+0x4c9/0x659
[   10.935134]  ? rest_init+0x100/0x100
[   10.935134]  kernel_init+0x13/0x130
[   10.935134]  ? rest_init+0x100/0x100
[   10.935134]  ret_from_fork+0x35/0x40
[   10.964317] kobject_add_internal failed for ide0 with -EEXIST, don't try to register things with the same name in the same directory.
[   10.965504] IDE: ide_register_port: device_register error: -17
[   10.966082] ide0: disabling port
[   11.651191] hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
[   12.275172] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.1/ide1'
[   12.276019] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc3 #5
[   12.276134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[   12.276134] Call Trace:
[   12.276134]  dump_stack+0x62/0x95
[   12.276134]  sysfs_warn_dup+0x79/0x90
[   12.276134]  sysfs_create_dir_ns+0x211/0x290
[   12.276134]  ? sysfs_create_mount_point+0x80/0x80
[   12.276134]  ? kernel_init_freeable+0x4c9/0x659
[   12.276134]  ? kasan_check_read+0x16/0x20
[   12.276134]  ? _raw_spin_lock+0x8c/0xf0
[   12.276134]  ? _raw_spin_lock_irq+0xf0/0xf0
[   12.276134]  ? msleep+0x55/0x80
[   12.276134]  kobject_add_internal+0x1f7/0x770
[   12.276134]  kobject_add+0x11f/0x1f0
[   12.276134]  ? kobject_add_internal+0x770/0x770
[   12.276134]  ? __kmalloc_track_caller+0x6d/0x250
[   12.276134]  ? kasan_kmalloc+0x11/0x20
[   12.276134]  device_add+0x2ca/0x1460
[   12.276134]  ? kobject_set_name_vargs+0xb4/0x130
[   12.276134]  ? get_device_parent.isra.13+0x5e0/0x5e0
[   12.276134]  ? pm_runtime_init+0x341/0x3f0
[   12.276134]  device_register+0x1a/0x20
[   12.276134]  ide_host_register+0x965/0x1a20
[   12.276134]  ide_pci_init_two+0xb63/0x18e0
[   12.276134]  ? kmem_cache_alloc+0x170/0x200
[   12.276134]  ? __kernfs_new_node+0xd8/0x610
[   12.276134]  ? ide_pci_setup_ports+0x920/0x920
[   12.276134]  ? radix_tree_iter_tag_clear+0x50/0x90
[   12.276134]  ? idr_alloc_u32+0x1a9/0x340
[   12.276134]  ? __save_stack_trace+0x82/0x100
[   12.276134]  ? __fprop_inc_percpu_max+0x1f0/0x1f0
[   12.276134]  ? devres_free+0x40/0x60
[   12.276134]  ? idr_alloc_cyclic+0xf2/0x1e0
[   12.276134]  ? __mutex_lock_slowpath+0x20/0x20
[   12.276134]  ? mutex_lock+0x88/0xf0
[   12.276134]  ? kasan_check_read+0x16/0x20
[   12.276134]  ? rpm_resume+0x1d5/0x1600
[   12.276134]  ? _raw_spin_lock+0x8c/0xf0
[   12.276134]  ? rpm_put_suppliers+0x140/0x140
[   12.276134]  ? kasan_check_read+0x16/0x20
[   12.276134]  ? _raw_spin_lock_irqsave+0x92/0x100
[   12.276134]  ? _raw_read_unlock_irq+0x40/0x40
[   12.276134]  ? mutex_lock+0x88/0xf0
[   12.276134]  ? __mutex_lock_slowpath+0x20/0x20
[   12.276134]  ide_pci_init_one+0x16/0x20
[   12.276134]  piix_init_one+0x4a/0x60
[   12.276134]  pci_device_probe+0x27c/0x490
[   12.276134]  really_probe+0x465/0x720
[   12.276134]  driver_probe_device+0xd1/0x210
[   12.276134]  __driver_attach+0x214/0x280
[   12.276134]  ? driver_probe_device+0x210/0x210
[   12.276134]  bus_for_each_dev+0x11d/0x1e0
[   12.276134]  ? subsys_dev_iter_exit+0x10/0x10
[   12.276134]  ? kasan_check_write+0x19/0x20
[   12.276134]  ? klist_node_init+0x66/0x120
[   12.276134]  driver_attach+0x3d/0x50
[   12.276134]  bus_add_driver+0x421/0x5d0
[   12.276134]  driver_register+0x18c/0x420
[   12.276134]  __pci_register_driver+0x1a8/0x270
[   12.276134]  ide_scan_pcibus+0x20b/0x26d
[   12.276134]  ? piix_ide_init+0x15d/0x15d
[   12.276134]  do_one_initcall+0xd1/0x3b0
[   12.276134]  ? initcall_blacklisted+0x160/0x160
[   12.276134]  ? __memset+0x27/0x30
[   12.276134]  ? kasan_unpoison_shadow+0x3b/0x50
[   12.276134]  kernel_init_freeable+0x4c9/0x659
[   12.276134]  ? rest_init+0x100/0x100
[   12.276134]  kernel_init+0x13/0x130
[   12.276134]  ? rest_init+0x100/0x100
[   12.276134]  ret_from_fork+0x35/0x40
[   12.305202] kobject_add_internal failed for ide1 with -EEXIST, don't try to register things with the same name in the same directory.
[   12.306385] IDE: ide_register_port: device_register error: -17
[   12.306961] ide1: disabling port
[   12.307320] PIIX_IDE: probe of 0000:00:01.1 failed with error -1
[   12.307948] ide-gd driver 1.18
[   12.308280] ide-cd driver 5.00


Zhe


>
>
> diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
> index 63627be0811a..3e13c7c243f4 100644
> --- a/drivers/ide/ide-probe.c
> +++ b/drivers/ide/ide-probe.c
> @@ -778,7 +778,7 @@ static int ide_init_queue(ide_drive_t *drive)
> set = &drive->tag_set;
> set->ops = &ide_mq_ops;
> set->nr_hw_queues = 1;
> - set->queue_depth = 32;
> + set->queue_depth = 1;
> set->reserved_tags = 1;
> set->cmd_size = sizeof(struct ide_request);
> set->numa_node = hwif_to_node(hwif);
>


2019-01-25 02:28:25

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On 1/24/19 7:22 PM, He Zhe wrote:
>
>
> On 1/25/19 10:05 AM, Jens Axboe wrote:
>> On 1/24/19 6:40 PM, He Zhe wrote:
>>>
>>> On 1/24/19 11:40 PM, Jens Axboe wrote:
>>>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>>>
>>>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>>>
>>>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> [ 47.099650] Call Trace:
>>>>>>>> [ 47.099910] <IRQ>
>>>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>>>> [ 47.104505] </IRQ>
>>>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>>>> [ 47.120619] Modules linked in:
>>>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>>>> I’ll take a look at this, thanks for the report.
>>>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>>>> the below patch?
>>>>> Thanks for attention, but this patch doesn't help.
>>>> OK, thanks for trying.
>>>>
>>>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>>>> "-drive ... if=virtio".
>>>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>>>> having any luck doing it, even thousands of read_all iterations on /proc
>>>> with IO to the IDE drives is still not doing it. From your trace, it
>>>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>>>
>>>> I'll try again!
>>> My .config is attached.
>> Can you try with this patch?
>
> With this patch, it cannot boot up.

Ugh, forgot it's inclusive. You want it at '2', not '1'.

--
Jens Axboe


2019-01-25 03:17:04

by He Zhe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1



On 1/25/19 10:26 AM, Jens Axboe wrote:
> On 1/24/19 7:22 PM, He Zhe wrote:
>>
>> On 1/25/19 10:05 AM, Jens Axboe wrote:
>>> On 1/24/19 6:40 PM, He Zhe wrote:
>>>> On 1/24/19 11:40 PM, Jens Axboe wrote:
>>>>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>>>>
>>>>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>>>>
>>>>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>> [ 47.099650] Call Trace:
>>>>>>>>> [ 47.099910] <IRQ>
>>>>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>>>>> [ 47.104505] </IRQ>
>>>>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>>>>> [ 47.120619] Modules linked in:
>>>>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>>>>> I’ll take a look at this, thanks for the report.
>>>>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>>>>> the below patch?
>>>>>> Thanks for attention, but this patch doesn't help.
>>>>> OK, thanks for trying.
>>>>>
>>>>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>>>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>>>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>>>>> "-drive ... if=virtio".
>>>>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>>>>> having any luck doing it, even thousands of read_all iterations on /proc
>>>>> with IO to the IDE drives is still not doing it. From your trace, it
>>>>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>>>>
>>>>> I'll try again!
>>>> My .config is attached.
>>> Can you try with this patch?
>> With this patch, it cannot boot up.
> Ugh, forgot it's inclusive. You want it at '2', not '1'.

With "set->queue_depth = 2;", reading aganst /sys or normal disk dir is OK, but the following hang. So the related test cases are all timed out.

ps -elf
---- snip ----
5 D root       415     1  0  80   0 -   664 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
5 D root       416     1  0  80   0 -   669 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
5 D root       417     1  0  80   0 -   674 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
5 D root       421     1  0  80   0 -   629 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
5 D root       434     1  0  80   0 -   629 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
5 D root       448     1  0  80   0 -   629 io_sch 02:55 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
5 D root       474     1  0  80   0 -   629 flush_ 02:55 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /dev -r 1

In addition, without this change, it can be reproduced more often with more processes.

Zhe

>


2019-01-25 03:21:01

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On 1/24/19 8:14 PM, He Zhe wrote:
>
>
> On 1/25/19 10:26 AM, Jens Axboe wrote:
>> On 1/24/19 7:22 PM, He Zhe wrote:
>>>
>>> On 1/25/19 10:05 AM, Jens Axboe wrote:
>>>> On 1/24/19 6:40 PM, He Zhe wrote:
>>>>> On 1/24/19 11:40 PM, Jens Axboe wrote:
>>>>>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>>>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>>>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>>>>>
>>>>>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>>>>>
>>>>>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>> [ 47.099650] Call Trace:
>>>>>>>>>> [ 47.099910] <IRQ>
>>>>>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>>>>>> [ 47.104505] </IRQ>
>>>>>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>>>>>> [ 47.120619] Modules linked in:
>>>>>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>>>>>> I’ll take a look at this, thanks for the report.
>>>>>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>>>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>>>>>> the below patch?
>>>>>>> Thanks for attention, but this patch doesn't help.
>>>>>> OK, thanks for trying.
>>>>>>
>>>>>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>>>>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>>>>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>>>>>> "-drive ... if=virtio".
>>>>>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>>>>>> having any luck doing it, even thousands of read_all iterations on /proc
>>>>>> with IO to the IDE drives is still not doing it. From your trace, it
>>>>>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>>>>>
>>>>>> I'll try again!
>>>>> My .config is attached.
>>>> Can you try with this patch?
>>> With this patch, it cannot boot up.
>> Ugh, forgot it's inclusive. You want it at '2', not '1'.
>
> With "set->queue_depth = 2;", reading aganst /sys or normal disk dir is OK, but the following hang. So the related test cases are all timed out.
>
> ps -elf
> ---- snip ----
> 5 D root       415     1  0  80   0 -   664 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
> 5 D root       416     1  0  80   0 -   669 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
> 5 D root       417     1  0  80   0 -   674 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
> 5 D root       421     1  0  80   0 -   629 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
> 5 D root       434     1  0  80   0 -   629 io_sch 02:49 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
> 5 D root       448     1  0  80   0 -   629 io_sch 02:55 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
> 5 D root       474     1  0  80   0 -   629 flush_ 02:55 ttyS0    00:00:00 /opt/ltp/testcases/bin/read_all -d /dev -r 1
>
> In addition, without this change, it can be reproduced more often with more processes.

I've managed to reproduce it now, though I don't quite yet see what is going
on. It's almost like we're NOT honoring the serialization, which results
in a sense request being doubly queued. I'll see if I can figure it out
and produce a patch that works.

--
Jens Axboe


2019-01-25 19:34:23

by Jens Axboe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1

On 1/24/19 8:20 PM, Jens Axboe wrote:
> On 1/24/19 8:14 PM, He Zhe wrote:
>>
>>
>> On 1/25/19 10:26 AM, Jens Axboe wrote:
>>> On 1/24/19 7:22 PM, He Zhe wrote:
>>>>
>>>> On 1/25/19 10:05 AM, Jens Axboe wrote:
>>>>> On 1/24/19 6:40 PM, He Zhe wrote:
>>>>>> On 1/24/19 11:40 PM, Jens Axboe wrote:
>>>>>>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>>>>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>>>>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>>>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>>>>>>
>>>>>>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>>>>>>
>>>>>>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>> [ 47.099650] Call Trace:
>>>>>>>>>>> [ 47.099910] <IRQ>
>>>>>>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>>>>>>> [ 47.104505] </IRQ>
>>>>>>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>>>>>>> [ 47.120619] Modules linked in:
>>>>>>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>>>>>>> I’ll take a look at this, thanks for the report.
>>>>>>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>>>>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>>>>>>> the below patch?
>>>>>>>> Thanks for attention, but this patch doesn't help.
>>>>>>> OK, thanks for trying.
>>>>>>>
>>>>>>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>>>>>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>>>>>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>>>>>>> "-drive ... if=virtio".
>>>>>>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>>>>>>> having any luck doing it, even thousands of read_all iterations on /proc
>>>>>>> with IO to the IDE drives is still not doing it. From your trace, it
>>>>>>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>>>>>>
>>>>>>> I'll try again!
>>>>>> My .config is attached.
>>>>> Can you try with this patch?
>>>> With this patch, it cannot boot up.
>>> Ugh, forgot it's inclusive. You want it at '2', not '1'.
>>
>> With "set->queue_depth = 2;", reading aganst /sys or normal disk dir is OK, but the following hang. So the related test cases are all timed out.
>>
>> ps -elf
>> ---- snip ----
>> 5 D root 415 1 0 80 0 - 664 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
>> 5 D root 416 1 0 80 0 - 669 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
>> 5 D root 417 1 0 80 0 - 674 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
>> 5 D root 421 1 0 80 0 - 629 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
>> 5 D root 434 1 0 80 0 - 629 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
>> 5 D root 448 1 0 80 0 - 629 io_sch 02:55 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
>> 5 D root 474 1 0 80 0 - 629 flush_ 02:55 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /dev -r 1
>>
>> In addition, without this change, it can be reproduced more often with more processes.
>
> I've managed to reproduce it now, though I don't quite yet see what is going
> on. It's almost like we're NOT honoring the serialization, which results
> in a sense request being doubly queued. I'll see if I can figure it out
> and produce a patch that works.

Alright, here's something that has a chance of working. I don't think we
can solve the ordering issue with sense requests without doing something
like this. Can you give it a spin?


diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index da58020a144e..33a28cde126c 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -235,21 +235,28 @@ EXPORT_SYMBOL_GPL(ide_prep_sense);

int ide_queue_sense_rq(ide_drive_t *drive, void *special)
{
- struct request *sense_rq = drive->sense_rq;
+ ide_hwif_t *hwif = drive->hwif;
+ struct request *sense_rq;
+ unsigned long flags;
+
+ spin_lock_irqsave(&hwif->lock, flags);

/* deferred failure from ide_prep_sense() */
if (!drive->sense_rq_armed) {
printk(KERN_WARNING PFX "%s: error queuing a sense request\n",
drive->name);
+ spin_unlock_irqrestore(&hwif->lock, flags);
return -ENOMEM;
}

+ sense_rq = drive->sense_rq;
ide_req(sense_rq)->special = special;
drive->sense_rq_armed = false;

drive->hwif->rq = NULL;

ide_insert_request_head(drive, sense_rq);
+ spin_unlock_irqrestore(&hwif->lock, flags);
return 0;
}
EXPORT_SYMBOL_GPL(ide_queue_sense_rq);
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 8445b484ae69..b137f27a34d5 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -68,8 +68,10 @@ int ide_end_rq(ide_drive_t *drive, struct request *rq, blk_status_t error,
}

if (!blk_update_request(rq, error, nr_bytes)) {
- if (rq == drive->sense_rq)
+ if (rq == drive->sense_rq) {
drive->sense_rq = NULL;
+ drive->sense_rq_active = false;
+ }

__blk_mq_end_request(rq, error);
return 0;
@@ -451,16 +453,11 @@ void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
blk_mq_delay_run_hw_queue(q->queue_hw_ctx[0], 3);
}

-/*
- * Issue a new request to a device.
- */
-blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
- const struct blk_mq_queue_data *bd)
+blk_status_t ide_issue_rq(ide_drive_t *drive, struct request *rq,
+ bool local_requeue)
{
- ide_drive_t *drive = hctx->queue->queuedata;
- ide_hwif_t *hwif = drive->hwif;
+ ide_hwif_t *hwif = drive->hwif;
struct ide_host *host = hwif->host;
- struct request *rq = bd->rq;
ide_startstop_t startstop;

if (!blk_rq_is_passthrough(rq) && !(rq->rq_flags & RQF_DONTPREP)) {
@@ -474,8 +471,6 @@ blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
if (ide_lock_host(host, hwif))
return BLK_STS_DEV_RESOURCE;

- blk_mq_start_request(rq);
-
spin_lock_irq(&hwif->lock);

if (!ide_lock_port(hwif)) {
@@ -510,18 +505,6 @@ blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
hwif->cur_dev = drive;
drive->dev_flags &= ~(IDE_DFLAG_SLEEPING | IDE_DFLAG_PARKED);

- /*
- * we know that the queue isn't empty, but this can happen
- * if ->prep_rq() decides to kill a request
- */
- if (!rq) {
- rq = bd->rq;
- if (!rq) {
- ide_unlock_port(hwif);
- goto out;
- }
- }
-
/*
* Sanity: don't accept a request that isn't a PM request
* if we are currently power managed. This is very important as
@@ -560,9 +543,12 @@ blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
}
} else {
plug_device:
+ if (local_requeue)
+ list_add(&rq->queuelist, &drive->rq_list);
spin_unlock_irq(&hwif->lock);
ide_unlock_host(host);
- ide_requeue_and_plug(drive, rq);
+ if (!local_requeue)
+ ide_requeue_and_plug(drive, rq);
return BLK_STS_OK;
}

@@ -573,6 +559,26 @@ blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_STS_OK;
}

+/*
+ * Issue a new request to a device.
+ */
+blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
+ const struct blk_mq_queue_data *bd)
+{
+ ide_drive_t *drive = hctx->queue->queuedata;
+ ide_hwif_t *hwif = drive->hwif;
+
+ spin_lock_irq(&hwif->lock);
+ if (drive->sense_rq_active) {
+ spin_unlock_irq(&hwif->lock);
+ return BLK_STS_DEV_RESOURCE;
+ }
+ spin_unlock_irq(&hwif->lock);
+
+ blk_mq_start_request(bd->rq);
+ return ide_issue_rq(drive, bd->rq, false);
+}
+
static int drive_is_ready(ide_drive_t *drive)
{
ide_hwif_t *hwif = drive->hwif;
@@ -893,13 +899,8 @@ EXPORT_SYMBOL_GPL(ide_pad_transfer);

void ide_insert_request_head(ide_drive_t *drive, struct request *rq)
{
- ide_hwif_t *hwif = drive->hwif;
- unsigned long flags;
-
- spin_lock_irqsave(&hwif->lock, flags);
+ drive->sense_rq_active = true;
list_add_tail(&rq->queuelist, &drive->rq_list);
- spin_unlock_irqrestore(&hwif->lock, flags);
-
kblockd_schedule_work(&drive->rq_work);
}
EXPORT_SYMBOL_GPL(ide_insert_request_head);
diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
index 102aa3bc3e7f..8af7af6001eb 100644
--- a/drivers/ide/ide-park.c
+++ b/drivers/ide/ide-park.c
@@ -54,7 +54,9 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
scsi_req(rq)->cmd[0] = REQ_UNPARK_HEADS;
scsi_req(rq)->cmd_len = 1;
ide_req(rq)->type = ATA_PRIV_MISC;
+ spin_lock_irq(&hwif->lock);
ide_insert_request_head(drive, rq);
+ spin_unlock_irq(&hwif->lock);

out:
return;
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 63627be0811a..5aeaca24a28f 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -1159,18 +1159,27 @@ static void drive_rq_insert_work(struct work_struct *work)
ide_drive_t *drive = container_of(work, ide_drive_t, rq_work);
ide_hwif_t *hwif = drive->hwif;
struct request *rq;
+ blk_status_t ret;
LIST_HEAD(list);

- spin_lock_irq(&hwif->lock);
- if (!list_empty(&drive->rq_list))
- list_splice_init(&drive->rq_list, &list);
- spin_unlock_irq(&hwif->lock);
+ blk_mq_quiesce_queue(drive->queue);

- while (!list_empty(&list)) {
- rq = list_first_entry(&list, struct request, queuelist);
+ ret = BLK_STS_OK;
+ spin_lock_irq(&hwif->lock);
+ while (!list_empty(&drive->rq_list)) {
+ rq = list_first_entry(&drive->rq_list, struct request, queuelist);
list_del_init(&rq->queuelist);
- blk_execute_rq_nowait(drive->queue, rq->rq_disk, rq, true, NULL);
+
+ spin_unlock_irq(&hwif->lock);
+ ret = ide_issue_rq(drive, rq, true);
+ spin_lock_irq(&hwif->lock);
}
+ spin_unlock_irq(&hwif->lock);
+
+ blk_mq_unquiesce_queue(drive->queue);
+
+ if (ret != BLK_STS_OK)
+ kblockd_schedule_work(&drive->rq_work);
}

static const u8 ide_hwif_to_major[] =
diff --git a/include/linux/ide.h b/include/linux/ide.h
index e7d29ae633cd..971cf76a78a0 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -615,6 +615,7 @@ struct ide_drive_s {

/* current sense rq and buffer */
bool sense_rq_armed;
+ bool sense_rq_active;
struct request *sense_rq;
struct request_sense sense_data;

@@ -1219,6 +1220,7 @@ extern void ide_stall_queue(ide_drive_t *drive, unsigned long timeout);
extern void ide_timer_expiry(struct timer_list *t);
extern irqreturn_t ide_intr(int irq, void *dev_id);
extern blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *, const struct blk_mq_queue_data *);
+extern blk_status_t ide_issue_rq(ide_drive_t *, struct request *, bool);
extern void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq);

void ide_init_disk(struct gendisk *, ide_drive_t *);

--
Jens Axboe


2019-01-26 07:31:14

by He Zhe

[permalink] [raw]
Subject: Re: LTP case read_all_proc fails on qemux86-64 since 5.0-rc1



On 1/26/19 3:32 AM, Jens Axboe wrote:
> On 1/24/19 8:20 PM, Jens Axboe wrote:
>> On 1/24/19 8:14 PM, He Zhe wrote:
>>> On 1/25/19 10:26 AM, Jens Axboe wrote:
>>>> On 1/24/19 7:22 PM, He Zhe wrote:
>>>>> On 1/25/19 10:05 AM, Jens Axboe wrote:
>>>>>> On 1/24/19 6:40 PM, He Zhe wrote:
>>>>>>> On 1/24/19 11:40 PM, Jens Axboe wrote:
>>>>>>>> On 1/24/19 1:23 AM, He Zhe wrote:
>>>>>>>>> On 1/24/19 12:05 AM, Jens Axboe wrote:
>>>>>>>>>> On 1/22/19 8:39 PM, Jens Axboe wrote:
>>>>>>>>>>> On Jan 22, 2019, at 8:13 PM, He Zhe <[email protected]> wrote:
>>>>>>>>>>>> LTP case read_all_proc(read_all -d /proc -q -r 10) often, but not every time, fails with the following call traces, since 600335205b8d "ide: convert to blk-mq"(5.0-rc1) till now(5.0-rc3).
>>>>>>>>>>>>
>>>>>>>>>>>> qemu-system-x86_64 -drive file=rootfs.ext4,if=virtio,format=raw -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -m 16192 -smp cpus=12 -cpu core2duo -enable-kvm -serial mon:stdio -serial null -kernel bzImage -append 'root=/dev/vda rw highres=off console=ttyS0 mem=16192M'
>>>>>>>>>>>>
>>>>>>>>>>>> tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
>>>>>>>>>>>> [ 47.080156] Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon!
>>>>>>>>>>>> [ 47.085330] ------------[ cut here ]------------
>>>>>>>>>>>> [ 47.085810] kernel BUG at block/blk-mq.c:767!
>>>>>>>>>>>> [ 47.086498] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>>>>> [ 47.087022] CPU: 5 PID: 146 Comm: kworker/5:1H Not tainted 5.0.0-rc3 #1
>>>>>>>>>>>> [ 47.087858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
>>>>>>>>>>>> [ 47.088992] Workqueue: kblockd blk_mq_run_work_fn
>>>>>>>>>>>> [ 47.089469] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>>>>> [ 47.090035] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>>>>> [ 47.091930] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>>>>> [ 47.092458] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>>>>> [ 47.093181] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>>>>> [ 47.093906] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>>>>> [ 47.094626] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>>>>> [ 47.095347] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>>>>> [ 47.096071] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>>>>> [ 47.096898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>>>> [ 47.097477] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>>>>> [ 47.098203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>>> [ 47.098929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>>> [ 47.099650] Call Trace:
>>>>>>>>>>>> [ 47.099910] <IRQ>
>>>>>>>>>>>> [ 47.100128] blk_mq_requeue_request+0x58/0x60
>>>>>>>>>>>> [ 47.100576] ide_requeue_and_plug+0x20/0x50
>>>>>>>>>>>> [ 47.101014] ide_intr+0x21a/0x230
>>>>>>>>>>>> [ 47.101362] ? idecd_open+0xc0/0xc0
>>>>>>>>>>>> [ 47.101735] __handle_irq_event_percpu+0x43/0x1e0
>>>>>>>>>>>> [ 47.102214] handle_irq_event_percpu+0x32/0x80
>>>>>>>>>>>> [ 47.102668] handle_irq_event+0x39/0x60
>>>>>>>>>>>> [ 47.103074] handle_edge_irq+0xe8/0x1c0
>>>>>>>>>>>> [ 47.103470] handle_irq+0x20/0x30
>>>>>>>>>>>> [ 47.103819] do_IRQ+0x46/0xe0
>>>>>>>>>>>> [ 47.104128] common_interrupt+0xf/0xf
>>>>>>>>>>>> [ 47.104505] </IRQ>
>>>>>>>>>>>> [ 47.104731] RIP: 0010:ide_output_data+0xbc/0x100
>>>>>>>>>>>> [ 47.105201] Code: 74 22 8d 41 ff 85 c9 74 24 49 8d 54 40 02 41 0f b7 00 66 41 89 01 49 83 c0 02 49 39 d0 75 ef 5b 41 5c 5d c3 4c 89 c6 445
>>>>>>>>>>>> [ 47.107092] RSP: 0018:ffffbd508059bb18 EFLAGS: 00010246 ORIG_RAX: ffffffffffffffdd
>>>>>>>>>>>> [ 47.107862] RAX: ffff9e1ea13a8800 RBX: ffff9e1ea13a9000 RCX: 0000000000000000
>>>>>>>>>>>> [ 47.108581] RDX: 0000000000000170 RSI: ffff9e1ea13c012c RDI: 0000000000000000
>>>>>>>>>>>> [ 47.109293] RBP: ffffbd508059bb28 R08: ffff9e1ea13c0120 R09: 0000000000000170
>>>>>>>>>>>> [ 47.110016] R10: 000000000000000d R11: 000000000000000c R12: ffff9e1ea13a8800
>>>>>>>>>>>> [ 47.110731] R13: 000000000000000c R14: ffff9e1ea13c0000 R15: 0000000000007530
>>>>>>>>>>>> [ 47.111446] ide_transfer_pc+0x216/0x310
>>>>>>>>>>>> [ 47.111848] ? __const_udelay+0x3d/0x40
>>>>>>>>>>>> [ 47.112236] ? ide_execute_command+0x85/0xb0
>>>>>>>>>>>> [ 47.112668] ? ide_pc_intr+0x3f0/0x3f0
>>>>>>>>>>>> [ 47.113051] ? ide_check_atapi_device+0x110/0x110
>>>>>>>>>>>> [ 47.113524] ide_issue_pc+0x178/0x240
>>>>>>>>>>>> [ 47.113901] ide_cd_do_request+0x15c/0x350
>>>>>>>>>>>> [ 47.114314] ide_queue_rq+0x180/0x6b0
>>>>>>>>>>>> [ 47.114686] ? blk_mq_get_driver_tag+0xa1/0x110
>>>>>>>>>>>> [ 47.115153] blk_mq_dispatch_rq_list+0x90/0x550
>>>>>>>>>>>> [ 47.115606] ? __queue_delayed_work+0x63/0x90
>>>>>>>>>>>> [ 47.116054] ? deadline_fifo_request+0x41/0x90
>>>>>>>>>>>> [ 47.116506] blk_mq_do_dispatch_sched+0x80/0x100
>>>>>>>>>>>> [ 47.116976] blk_mq_sched_dispatch_requests+0xfc/0x170
>>>>>>>>>>>> [ 47.117491] __blk_mq_run_hw_queue+0x6f/0xd0
>>>>>>>>>>>> [ 47.117941] blk_mq_run_work_fn+0x1b/0x20
>>>>>>>>>>>> [ 47.118342] process_one_work+0x14c/0x450
>>>>>>>>>>>> [ 47.118747] worker_thread+0x4a/0x440
>>>>>>>>>>>> [ 47.119125] kthread+0x105/0x140
>>>>>>>>>>>> [ 47.119456] ? process_one_work+0x450/0x450
>>>>>>>>>>>> [ 47.119880] ? kthread_park+0x90/0x90
>>>>>>>>>>>> [ 47.120251] ret_from_fork+0x35/0x40
>>>>>>>>>>>> [ 47.120619] Modules linked in:
>>>>>>>>>>>> [ 47.120952] ---[ end trace 4562f716e88fdefe ]---
>>>>>>>>>>>> [ 47.121423] RIP: 0010:blk_mq_add_to_requeue_list+0xc1/0xd0
>>>>>>>>>>>> [ 47.121981] Code: 48 8d 53 48 49 8b 8c 24 b8 04 00 00 48 89 51 08 48 89 4b 48 49 8d 8c 24 b8 04 00 00 48 89 4b 50 49 89 94 24 b8 04 00 008
>>>>>>>>>>>> [ 47.123851] RSP: 0018:ffff9e1ea4b43e40 EFLAGS: 00010002
>>>>>>>>>>>> [ 47.124393] RAX: ffff9e1ea13c0048 RBX: ffff9e1ea13c0000 RCX: 0000000000000006
>>>>>>>>>>>> [ 47.125108] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e1ea13c0000
>>>>>>>>>>>> [ 47.125819] RBP: ffff9e1ea4b43e68 R08: ffffeb5bcf630680 R09: 0000000000000000
>>>>>>>>>>>> [ 47.126539] R10: 0000000000000001 R11: 0000000000000012 R12: ffff9e1ea1033a40
>>>>>>>>>>>> [ 47.127262] R13: ffff9e1ea13a8d00 R14: ffff9e1ea13a9000 R15: 0000000000000046
>>>>>>>>>>>> [ 47.127988] FS: 0000000000000000(0000) GS:ffff9e1ea4b40000(0000) knlGS:0000000000000000
>>>>>>>>>>>> [ 47.128793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>>>> [ 47.129385] CR2: 0000003fda41fda0 CR3: 00000003d8e6a000 CR4: 00000000000006e0
>>>>>>>>>>>> [ 47.130104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>>> [ 47.130823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>>> [ 47.131547] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>>>>>>> [ 47.132609] Kernel Offset: 0x7c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>>>>>> [ 47.133679] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>>>>>>> [ 47.134432] ------------[ cut here ]-----------
>>>>>>>>>>> I’ll take a look at this, thanks for the report.
>>>>>>>>>> I can't reproduce this, unfortunately. But I'm guessing it might be related
>>>>>>>>>> to a race with the requeue and request handling in IDE. Can you try with
>>>>>>>>>> the below patch?
>>>>>>>>> Thanks for attention, but this patch doesn't help.
>>>>>>>> OK, thanks for trying.
>>>>>>>>
>>>>>>>>> From what I reached, to reproduce it, we need to use legacy CONFIG_IDE
>>>>>>>>> and friends where "ide: convert to blk-mq" sits, not CONFIG_ATA and so
>>>>>>>>> on. I can reproduce it with qemux86-64 with "-drive ... if=ide" or
>>>>>>>>> "-drive ... if=virtio".
>>>>>>>> Yes of course, I am trying to reproduce with the legacy stack. Just not
>>>>>>>> having any luck doing it, even thousands of read_all iterations on /proc
>>>>>>>> with IO to the IDE drives is still not doing it. From your trace, it
>>>>>>>> looks like ide-cd causing it, which would mean a cdrom of some sort.
>>>>>>>>
>>>>>>>> I'll try again!
>>>>>>> My .config is attached.
>>>>>> Can you try with this patch?
>>>>> With this patch, it cannot boot up.
>>>> Ugh, forgot it's inclusive. You want it at '2', not '1'.
>>> With "set->queue_depth = 2;", reading aganst /sys or normal disk dir is OK, but the following hang. So the related test cases are all timed out.
>>>
>>> ps -elf
>>> ---- snip ----
>>> 5 D root 415 1 0 80 0 - 664 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
>>> 5 D root 416 1 0 80 0 - 669 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
>>> 5 D root 417 1 0 80 0 - 674 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 10
>>> 5 D root 421 1 0 80 0 - 629 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
>>> 5 D root 434 1 0 80 0 - 629 io_sch 02:49 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
>>> 5 D root 448 1 0 80 0 - 629 io_sch 02:55 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /proc -r 1
>>> 5 D root 474 1 0 80 0 - 629 flush_ 02:55 ttyS0 00:00:00 /opt/ltp/testcases/bin/read_all -d /dev -r 1
>>>
>>> In addition, without this change, it can be reproduced more often with more processes.
>> I've managed to reproduce it now, though I don't quite yet see what is going
>> on. It's almost like we're NOT honoring the serialization, which results
>> in a sense request being doubly queued. I'll see if I can figure it out
>> and produce a patch that works.
> Alright, here's something that has a chance of working. I don't think we
> can solve the ordering issue with sense requests without doing something
> like this. Can you give it a spin?

This works well from my side. Thanks.

Zhe