LinuxLists.cc - [PATCHSET v2] blk-mq: reimplement timeout handling

2017-12-12 19:01:43

Subject: [PATCHSET v2] blk-mq: reimplement timeout handling

Changes from the last version[1]

- BLK_EH_RESET_TIMER handling fixed.

- s/request->gstate_seqc/request->gstate_seq/

- READ_ONCE() added to blk_mq_rq_udpate_state().

- Removed left over blk_clear_rq_complete() invocation from
blk_mq_rq_timed_out().

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules. Unfortunatley, it contains quite a few holes.

It's pretty easy to make blk_mq_check_expired() terminate a later
instance of a request. If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly. Nothing actually timed out. It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patchset replaces the broken synchronization mechanism with a RCU
and generation number based one. Please read the patch description of
the second path for more details.

This patchset contains the following six patches.

0001-blk-mq-protect-completion-path-with-RCU.patch
0002-blk-mq-replace-timeout-synchronization-with-a-RCU-an.patch
0003-blk-mq-use-blk_mq_rq_state-instead-of-testing-REQ_AT.patch
0004-blk-mq-make-blk_abort_request-trigger-timeout-path.patch
0005-blk-mq-remove-REQ_ATOM_COMPLETE-usages-from-blk-mq.patch
0006-blk-mq-remove-REQ_ATOM_STARTED.patch

and is available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-mq-timeout

diffstat follows. Thanks.

block/blk-core.c | 2
block/blk-mq-debugfs.c | 4
block/blk-mq.c | 255 ++++++++++++++++++++++++++++---------------------
block/blk-mq.h | 48 ++++++++-
block/blk-timeout.c | 9 -
block/blk.h | 7 -
include/linux/blk-mq.h | 1
include/linux/blkdev.h | 23 ++++
8 files changed, 226 insertions(+), 123 deletions(-)

--
tejun

[1] http://lkml.kernel.org/r/[email protected]

2017-12-12 19:01:52

Subject: [PATCHSET v2] blk-mq: reimplement timeout handling

Subject: [PATCH 1/6] blk-mq: protect completion path with RCU

Subject: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path

Subject: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

Subject: [PATCH 5/6] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

Subject: [PATCH 3/6] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE

Subject: Re: [PATCHSET v2] blk-mq: reimplement timeout handling

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCHSET v2] blk-mq: reimplement timeout handling

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

Subject: Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

Subject: Re: [PATCH 1/6] blk-mq: protect completion path with RCU

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 1/6] blk-mq: protect completion path with RCU

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 1/6] blk-mq: protect completion path with RCU

Subject: Re: [PATCH 1/6] blk-mq: protect completion path with RCU

Subject: Re: [PATCH 1/6] blk-mq: protect completion path with RCU

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

Subject: Re: [PATCHSET v2] blk-mq: reimplement timeout handling

Subject: Re: [PATCHSET v2] blk-mq: reimplement timeout handling

Subject: Re: [PATCHSET v2] blk-mq: reimplement timeout handling