Hi Jens,
this series contains an already proposed patch by Luca, plus six new
patches. The goals of these patches are summarized in the subject of
this cover letter. I'm including Luca's patch here, because it enabled
the actual use of stable merge, and, as such, triggered an otherwise
silent bug. This series contains also the fix for that bug ("block,
bfq: avoid delayed merge of async queues"), tested by Holger [1].
Thanks,
Paolo
[1] https://lkml.org/lkml/2021/5/18/384
Luca Mariotti (1):
block, bfq: fix delayed stable merge check
Paolo Valente (5):
block, bfq: let also stably merged queues enjoy weight raising
block, bfq: consider also creation time in delayed stable merge
block, bfq: avoid delayed merge of async queues
block, bfq: check waker only for queues with no in-flight I/O
block, bfq: reset waker pointer with shared queues
Pietro Pedroni (1):
block, bfq: boost throughput by extending queue-merging times
block/bfq-iosched.c | 68 +++++++++++++++++++++++++++++++++++----------
1 file changed, 53 insertions(+), 15 deletions(-)
--
2.20.1
Merged bfq_queues are kept out of weight-raising (low-latency)
mechanisms. The reason is that these queues are usually created for
non-interactive and non-soft-real-time tasks. Yet this is not the case
for stably-merged queues. These queues are merged just because they
are created shortly after each other. So they may easily serve the I/O
of an interactive or soft-real time application, if the application
happens to spawn multiple processes.
To address this issue, this commits lets also stably-merged queued
enjoy weight raising.
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index acd1f881273e..da2363f12e53 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1729,10 +1729,23 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
bfqq->entity.new_weight == 40;
*interactive = !in_burst && idle_for_long_time &&
bfqq->entity.new_weight == 40;
+ /*
+ * Merged bfq_queues are kept out of weight-raising
+ * (low-latency) mechanisms. The reason is that these queues
+ * are usually created for non-interactive and
+ * non-soft-real-time tasks. Yet this is not the case for
+ * stably-merged queues. These queues are merged just because
+ * they are created shortly after each other. So they may
+ * easily serve the I/O of an interactive or soft-real time
+ * application, if the application happens to spawn multiple
+ * processes. So let also stably-merged queued enjoy weight
+ * raising.
+ */
wr_or_deserves_wr = bfqd->low_latency &&
(bfqq->wr_coeff > 1 ||
(bfq_bfqq_sync(bfqq) &&
- bfqq->bic && (*interactive || soft_rt)));
+ (bfqq->bic || RQ_BIC(rq)->stably_merged) &&
+ (*interactive || soft_rt)));
/*
* Using the last flag, update budget and check whether bfqq
--
2.20.1
From: Luca Mariotti <[email protected]>
When attempting to schedule a merge of a given bfq_queue with the currently
in-service bfq_queue or with a cooperating bfq_queue among the scheduled
bfq_queues, delayed stable merge is checked for rotational or non-queueing
devs. For this stable merge to be performed, some conditions must be met.
If the current bfq_queue underwent some split from some merged bfq_queue,
one of these conditions is that two hundred milliseconds must elapse from
split, otherwise this condition is always met.
Unfortunately, by mistake, time_is_after_jiffies() was written instead of
time_is_before_jiffies() for this check, verifying that less than two
hundred milliseconds have elapsed instead of verifying that at least two
hundred milliseconds have elapsed.
Fix this issue by replacing time_is_after_jiffies() with
time_is_before_jiffies().
Signed-off-by: Luca Mariotti <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Pietro Pedroni <[email protected]>
---
block/bfq-iosched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index da2363f12e53..c5c0e74977d4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2710,7 +2710,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (unlikely(!bfqd->nonrot_with_queueing)) {
if (bic->stable_merge_bfqq &&
!bfq_bfqq_just_created(bfqq) &&
- time_is_after_jiffies(bfqq->split_time +
+ time_is_before_jiffies(bfqq->split_time +
msecs_to_jiffies(200))) {
struct bfq_queue *stable_merge_bfqq =
bic->stable_merge_bfqq;
--
2.20.1
From: Pietro Pedroni <[email protected]>
One of the methods with which bfq boosts throughput is by merging queues.
One of the merging variants in bfq is the stable merge.
This mechanism is activated between two queues only if they are created
within a certain maximum time T1 from each other.
Merging can happen soon or be delayed. In the second case, before
merging, bfq needs to evaluate a throughput-boost parameter that
indicates whether the queue generates a high throughput is served alone.
Merging occurs when this throughput-boost is not high enough.
In particular, this parameter is evaluated and late merging may occur
only after at least a time T2 from the creation of the queue.
Currently T1 and T2 are set to 180ms and 200ms, respectively.
In this way the merging mechanism rarely occurs because time is not
enough. This results in a noticeable lowering of the overall throughput
with some workloads (see the example below).
This commit introduces two constants bfq_activation_stable_merging and
bfq_late_stable_merging in order to increase the duration of T1 and T2.
Both the stable merging activation time and the late merging
time are set to 600ms. This value has been experimentally evaluated
using sqlite benchmark in the Phoronix Test Suite on a HDD.
The duration of the benchmark before this fix was 111.02s, while now
it has reached 97.02s, a better result than that of all the other
schedulers.
Signed-off-by: Pietro Pedroni <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2a5c1a660f3b..98a42ddb1760 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -364,6 +364,16 @@ static int ref_wr_duration[2];
*/
static const unsigned long max_service_from_wr = 120000;
+/*
+ * Maximum time between the creation of two queues, for stable merge
+ * to be activated (in ms)
+ */
+static const unsigned long bfq_activation_stable_merging = 600;
+/*
+ * Minimum time to be waited before evaluating delayed stable merge (in ms)
+ */
+static const unsigned long bfq_late_stable_merging = 600;
+
#define RQ_BIC(rq) icq_to_bic((rq)->elv.priv[0])
#define RQ_BFQQ(rq) ((rq)->elv.priv[1])
@@ -2711,9 +2721,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (bic->stable_merge_bfqq &&
!bfq_bfqq_just_created(bfqq) &&
time_is_before_jiffies(bfqq->split_time +
- msecs_to_jiffies(200)) &&
+ msecs_to_jiffies(bfq_late_stable_merging)) &&
time_is_before_jiffies(bfqq->creation_time +
- msecs_to_jiffies(200))) {
+ msecs_to_jiffies(bfq_late_stable_merging))) {
struct bfq_queue *stable_merge_bfqq =
bic->stable_merge_bfqq;
int proc_ref = min(bfqq_process_refs(bfqq),
@@ -5494,7 +5504,7 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
*/
if (!last_bfqq_created ||
time_before(last_bfqq_created->creation_time +
- bfqd->bfq_burst_interval,
+ msecs_to_jiffies(bfq_activation_stable_merging),
bfqq->creation_time) ||
bfqq->entity.parent != last_bfqq_created->entity.parent ||
bfqq->ioprio != last_bfqq_created->ioprio ||
--
2.20.1
Commit 85686d0dc194 ("block, bfq: keep shared queues out of the waker
mechanism") leaves shared bfq_queues out of the waker-detection
mechanism. It attains this goal by not updating the pointer
last_completed_rq_bfqq, if the last request completed belongs to a
shared bfq_queue (so that the pointer will not point to the shared
bfq_queue).
Yet this has a side effect: the pointer last_completed_rq_bfqq keeps
pointing, deceptively, to a bfq_queue that actually is not the last
one to have had a request completed. As a consequence, such a
bfq_queue may deceptively be considered as a waker of some bfq_queue,
even of some shared bfq_queue.
To address this issue, reset last_completed_rq_bfqq if the last
request completed belongs to a shared queue.
Fixes: 85686d0dc194 ("block, bfq: keep shared queues out of the waker mechanism")
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a273b2bcea2a..fedb0a8fd388 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6165,11 +6165,13 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
* of other queues. But a false waker will unjustly steal
* bandwidth to its supposedly woken queue. So considering
* also shared queues in the waking mechanism may cause more
- * control troubles than throughput benefits. Then do not set
- * last_completed_rq_bfqq to bfqq if bfqq is a shared queue.
+ * control troubles than throughput benefits. Then reset
+ * last_completed_rq_bfqq if bfqq is a shared queue.
*/
if (!bfq_bfqq_coop(bfqq))
bfqd->last_completed_rq_bfqq = bfqq;
+ else
+ bfqd->last_completed_rq_bfqq = NULL;
/*
* If we are waiting to discover whether the request pattern
--
2.20.1
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ may schedule a merge between a newly created sync
bfq_queue and the last sync bfq_queue created. Such a merging is not
performed immediately, because BFQ needs first to find out whether the
newly created queue actually reaches a higher throughput if not merged
at all (and in that case BFQ will not perform any stable merging). To
check that, a little time must be waited after the creation of the new
queue, so that some I/O can flow in the queue, and statistics on such
I/O can be computed.
Yet, to evaluate the above waiting time, the last split time is
considered as start time, instead of the creation time of the
queue. This is a mistake, because considering the split time is
correct only in the following scenario.
The queue undergoes a non-stable merges on the arrival of its very
first I/O request, due to close I/O with some other queue. While the
queue is merged for close I/O, stable merging is not considered. Yet
the queue may then happen to be split, if the close I/O finishes (or
happens to be a false positive). From this time on, the queue can
again be considered for stable merging. But, again, a little time must
elapse, to let some new I/O flow in the queue and to get updated
statistics. To wait for this time, the split time is to be taken into
account.
Yet, if the queue does not undergo a non-stable merge on the arrival
of its very first request, then BFQ immediately checks whether the
stable merge is to be performed. It happens because the split time for
a queue is initialized to minus infinity when the queue is created.
This commit fixes this mistake by adding the missing condition. Now
the check for delayed stable-merge is performed after a little time is
elapsed not only from the last queue split time, but also from the
creation time of the queue.
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c5c0e74977d4..2a5c1a660f3b 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2711,7 +2711,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (bic->stable_merge_bfqq &&
!bfq_bfqq_just_created(bfqq) &&
time_is_before_jiffies(bfqq->split_time +
- msecs_to_jiffies(200))) {
+ msecs_to_jiffies(200)) &&
+ time_is_before_jiffies(bfqq->creation_time +
+ msecs_to_jiffies(200))) {
struct bfq_queue *stable_merge_bfqq =
bic->stable_merge_bfqq;
int proc_ref = min(bfqq_process_refs(bfqq),
--
2.20.1
Consider two bfq_queues, say Q1 and Q2, with Q2 empty. If a request of
Q1 gets completed shortly before a new request arrives for Q2, then
BFQ flags Q1 as a candidate waker for Q2. Yet, the arrival of this new
request may have a different cause, in the following case. If also Q2
has requests in flight while waiting for the arrival of a new request,
then the completion of its own requests may be the actual cause of the
awakening of the process that sends I/O to Q2. So Q1 may be flagged
wrongly as a candidate waker.
This commit avoids this deceptive flagging, by disabling
candidate-waker flagging for Q2, if Q2 has in-flight I/O.
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7bf073ef9443..a273b2bcea2a 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1985,14 +1985,18 @@ static void bfq_update_io_intensity(struct bfq_queue *bfqq, u64 now_ns)
* Turning back to the detection of a waker queue, a queue Q is deemed
* as a waker queue for bfqq if, for three consecutive times, bfqq
* happens to become non empty right after a request of Q has been
- * completed. In particular, on the first time, Q is tentatively set
- * as a candidate waker queue, while on the third consecutive time
- * that Q is detected, the field waker_bfqq is set to Q, to confirm
- * that Q is a waker queue for bfqq. These detection steps are
- * performed only if bfqq has a long think time, so as to make it more
- * likely that bfqq's I/O is actually being blocked by a
- * synchronization. This last filter, plus the above three-times
- * requirement, make false positives less likely.
+ * completed. In this respect, even if bfqq is empty, we do not check
+ * for a waker if it still has some in-flight I/O. In fact, in this
+ * case bfqq is actually still being served by the drive, and may
+ * receive new I/O on the completion of some of the in-flight
+ * requests. In particular, on the first time, Q is tentatively set as
+ * a candidate waker queue, while on the third consecutive time that Q
+ * is detected, the field waker_bfqq is set to Q, to confirm that Q is
+ * a waker queue for bfqq. These detection steps are performed only if
+ * bfqq has a long think time, so as to make it more likely that
+ * bfqq's I/O is actually being blocked by a synchronization. This
+ * last filter, plus the above three-times requirement, make false
+ * positives less likely.
*
* NOTE
*
@@ -2018,6 +2022,7 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (!bfqd->last_completed_rq_bfqq ||
bfqd->last_completed_rq_bfqq == bfqq ||
bfq_bfqq_has_short_ttime(bfqq) ||
+ bfqq->dispatched > 0 ||
now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC ||
bfqd->last_completed_rq_bfqq == bfqq->waker_bfqq)
return;
--
2.20.1
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ may schedule a merge between a newly created sync
bfq_queue, say Q2, and the last sync bfq_queue created, say Q1. To this
goal, BFQ stores the address of Q1 in the field bic->stable_merge_bfqq
of the bic associated with Q2. So, when the time for the possible merge
arrives, BFQ knows which bfq_queue to merge Q2 with. In particular,
BFQ checks for possible merges on request arrivals.
Yet the same bic may also be associated with an async bfq_queue, say
Q3. So, if a request for Q3 arrives, then the above check may happen
to be executed while the bfq_queue at hand is Q3, instead of Q2. In
this case, Q1 happens to be merged with an async bfq_queue. This is
not only a conceptual mistake, because async queues are to be kept out
of queue merging, but also a bug that leads to inconsistent states.
This commits simply filters async queues out of delayed merges.
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Holger Hoffstätte <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 98a42ddb1760..7bf073ef9443 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2718,7 +2718,13 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
* costly and complicated.
*/
if (unlikely(!bfqd->nonrot_with_queueing)) {
- if (bic->stable_merge_bfqq &&
+ /*
+ * Make sure also that bfqq is sync, because
+ * bic->stable_merge_bfqq may point to some queue (for
+ * stable merging) also if bic is associated with a
+ * sync queue, but this bfqq is async
+ */
+ if (bfq_bfqq_sync(bfqq) && bic->stable_merge_bfqq &&
!bfq_bfqq_just_created(bfqq) &&
time_is_before_jiffies(bfqq->split_time +
msecs_to_jiffies(bfq_late_stable_merging)) &&
--
2.20.1
On 6/19/21 8:09 AM, Paolo Valente wrote:
> Hi Jens,
> this series contains an already proposed patch by Luca, plus six new
> patches. The goals of these patches are summarized in the subject of
> this cover letter. I'm including Luca's patch here, because it enabled
> the actual use of stable merge, and, as such, triggered an otherwise
> silent bug. This series contains also the fix for that bug ("block,
> bfq: avoid delayed merge of async queues"), tested by Holger [1].
Applied, thanks.
--
Jens Axboe
Hello.
On sobota 19. června 2021 16:09:41 CEST Paolo Valente wrote:
> Hi Jens,
> this series contains an already proposed patch by Luca, plus six new
> patches. The goals of these patches are summarized in the subject of
> this cover letter. I'm including Luca's patch here, because it enabled
> the actual use of stable merge, and, as such, triggered an otherwise
> silent bug. This series contains also the fix for that bug ("block,
> bfq: avoid delayed merge of async queues"), tested by Holger [1].
>
> Thanks,
> Paolo
>
> [1] https://lkml.org/lkml/2021/5/18/384
>
> Luca Mariotti (1):
> block, bfq: fix delayed stable merge check
>
> Paolo Valente (5):
> block, bfq: let also stably merged queues enjoy weight raising
> block, bfq: consider also creation time in delayed stable merge
> block, bfq: avoid delayed merge of async queues
> block, bfq: check waker only for queues with no in-flight I/O
> block, bfq: reset waker pointer with shared queues
>
> Pietro Pedroni (1):
> block, bfq: boost throughput by extending queue-merging times
>
> block/bfq-iosched.c | 68 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 53 insertions(+), 15 deletions(-)
>
> --
> 2.20.1
Not sure everything goes fine here. After applying this series on top of the
latest stable 5.12 kernel I got this:
```
[16730.963248] kernel BUG at block/elevator.c:236!
[16730.963254] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[16730.963257] CPU: 11 PID: 109170 Comm: kworker/u64:5 Tainted: G W
5.12.0-pf7 #1
[16730.963260] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
3601 05/26/2021
[16730.963263] Workqueue: dm-thin do_worker [dm_thin_pool]
[16730.963270] RIP: 0010:elv_rqhash_find+0xcc/0xd0
[16730.963274] Code: 41 89 f0 81 e2 00 40 06 00 41 81 e0 1a 00 04 00 44 09 c2
75 a9 be 09 00 00 00 c4 e2 4b f7 50 28 48 03 50 30 48 39 fa 75 c6 c3 <0f> 0b
66 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 48 8b 47 68 48
[16730.963276] RSP: 0018:ffffa558d13b7af8 EFLAGS: 00010046
[16730.963279] RAX: ffff8a0007782d00 RBX: ffff8a0014b93000 RCX: ffffa558d13b7b78
[16730.963281] RDX: ffff8a0014b93000 RSI: 0000000000063082 RDI: 000000001e0fdc00
[16730.963283] RBP: ffff8a000731c770 R08: ffff8a000731c770 R09: fffffff0ffffddfb
[16730.963284] R10: 0000000000000000 R11: 0000000000000400 R12: ffff8a0330365c00
[16730.963286] R13: ffffa558d13b7b30 R14: 0000000000000000 R15: ffff8a0212fc4000
[16730.963288] FS: 0000000000000000(0000) GS:ffff8a070ecc0000(0000) knlGS:
0000000000000000
[16730.963290] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16730.963292] CR2: 00007f1514d90f4c CR3: 0000000315952000 CR4:
0000000000350ee0
[16730.963294] Call Trace:
[16730.963297] elv_merge+0x96/0x120
[16730.963300] blk_mq_sched_try_merge+0x3e/0x370
[16730.963303] bfq_bio_merge+0xd3/0x130
[16730.963306] blk_mq_submit_bio+0x11e/0x6c0
[16730.963309] submit_bio_noacct+0x457/0x530
[16730.963312] raid10_unplug+0x13f/0x1a0 [raid10]
[16730.963316] blk_flush_plug_list+0xa9/0x110
[16730.963319] blk_finish_plug+0x21/0x30
[16730.963322] process_prepared_discard_passdown_pt1+0x204/0x2d0
[dm_thin_pool]
[16730.963327] do_worker+0x18e/0xce0 [dm_thin_pool]
[16730.963335] process_one_work+0x217/0x3e0
[16730.963338] worker_thread+0x4d/0x470
[16730.963343] kthread+0x182/0x1b0
[16730.963349] ret_from_fork+0x22/0x30
…
[16730.963419] ---[ end trace dd7e037f2028257b ]---
[16730.963524] RIP: 0010:elv_rqhash_find+0xcc/0xd0
…
[16730.963547] note: kworker/u64:5[109170] exited with preempt_count 1
[16747.948467] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
```
Which is:
```
229 struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
230 {
…
235 hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
236 BUG_ON(!ELV_ON_HASH(rq));
…
```
Yes, I carry some extra patches besides this series (the list is against v5.12
GA):
```
block, bfq: reset waker pointer with shared queues
block, bfq: check waker only for queues with no in-flight I/O
block, bfq: avoid delayed merge of async queues
block, bfq: boost throughput by extending queue-merging times
block, bfq: consider also creation time in delayed stable merge
block, bfq: fix delayed stable merge check
block, bfq: let also stably merged queues enjoy weight raising
block: Do not pull requests from the scheduler when we cannot dispatch them
blk: Fix lock inversion between ioc lock and bfqd lock
bfq: Remove merged request already in bfq_requests_merged()
block, bfq: avoid circular stable merges
bfq: remove unnecessary BFQ_DEFAULT_GRP_IOPRIO
bfq: reset entity->prio_changed in bfq_init_entity()
bfq: optimize the calculation of bfq_weight_to_ioprio()
bfq: remove unnecessary initialization logic
bfq: keep the minimun bandwidth for CLASS_BE
bfq: limit the IO depth of CLASS_IDLE to 1
bfq: convert the type of bfq_group.bfqd to bfq_data*
bfq: introduce bfq_entity_to_bfqg helper method
bfq/mq-deadline: remove redundant check for passthrough request
blk-mq: bypass IO scheduler's limit_depth for passthrough request
block,bfq: fix the timeout calculation in bfq_bfqq_charge_time
block, bfq: merge bursts of newly-created queues
block, bfq: keep shared queues out of the waker mechanism
block, bfq: fix weight-raising resume with !low_latency
block, bfq: make shared queues inherit wakers
block, bfq: put reqs of waker and woken in dispatch list
block, bfq: always inject I/O of queues blocked by wakers
```
but nothing from there triggered this for quite some time.
Paolo, what do you think?
--
Oleksandr Natalenko (post-factum)
I have tested this with myself and this error does not occur with me
and I have not noticed any regressions. I have applied almost exactly
the same patches as Oleksandr.
pon., 21 cze 2021 o 21:55 Oleksandr Natalenko
<[email protected]> napisał(a):
>
> Hello.
>
> On sobota 19. června 2021 16:09:41 CEST Paolo Valente wrote:
> > Hi Jens,
> > this series contains an already proposed patch by Luca, plus six new
> > patches. The goals of these patches are summarized in the subject of
> > this cover letter. I'm including Luca's patch here, because it enabled
> > the actual use of stable merge, and, as such, triggered an otherwise
> > silent bug. This series contains also the fix for that bug ("block,
> > bfq: avoid delayed merge of async queues"), tested by Holger [1].
> >
> > Thanks,
> > Paolo
> >
> > [1] https://lkml.org/lkml/2021/5/18/384
> >
> > Luca Mariotti (1):
> > block, bfq: fix delayed stable merge check
> >
> > Paolo Valente (5):
> > block, bfq: let also stably merged queues enjoy weight raising
> > block, bfq: consider also creation time in delayed stable merge
> > block, bfq: avoid delayed merge of async queues
> > block, bfq: check waker only for queues with no in-flight I/O
> > block, bfq: reset waker pointer with shared queues
> >
> > Pietro Pedroni (1):
> > block, bfq: boost throughput by extending queue-merging times
> >
> > block/bfq-iosched.c | 68 +++++++++++++++++++++++++++++++++++----------
> > 1 file changed, 53 insertions(+), 15 deletions(-)
> >
> > --
> > 2.20.1
>
> Not sure everything goes fine here. After applying this series on top of the
> latest stable 5.12 kernel I got this:
>
> ```
> [16730.963248] kernel BUG at block/elevator.c:236!
> [16730.963254] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [16730.963257] CPU: 11 PID: 109170 Comm: kworker/u64:5 Tainted: G W
> 5.12.0-pf7 #1
> [16730.963260] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
> 3601 05/26/2021
> [16730.963263] Workqueue: dm-thin do_worker [dm_thin_pool]
> [16730.963270] RIP: 0010:elv_rqhash_find+0xcc/0xd0
> [16730.963274] Code: 41 89 f0 81 e2 00 40 06 00 41 81 e0 1a 00 04 00 44 09 c2
> 75 a9 be 09 00 00 00 c4 e2 4b f7 50 28 48 03 50 30 48 39 fa 75 c6 c3 <0f> 0b
> 66 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 48 8b 47 68 48
> [16730.963276] RSP: 0018:ffffa558d13b7af8 EFLAGS: 00010046
> [16730.963279] RAX: ffff8a0007782d00 RBX: ffff8a0014b93000 RCX: ffffa558d13b7b78
> [16730.963281] RDX: ffff8a0014b93000 RSI: 0000000000063082 RDI: 000000001e0fdc00
> [16730.963283] RBP: ffff8a000731c770 R08: ffff8a000731c770 R09: fffffff0ffffddfb
> [16730.963284] R10: 0000000000000000 R11: 0000000000000400 R12: ffff8a0330365c00
> [16730.963286] R13: ffffa558d13b7b30 R14: 0000000000000000 R15: ffff8a0212fc4000
> [16730.963288] FS: 0000000000000000(0000) GS:ffff8a070ecc0000(0000) knlGS:
> 0000000000000000
> [16730.963290] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16730.963292] CR2: 00007f1514d90f4c CR3: 0000000315952000 CR4:
> 0000000000350ee0
> [16730.963294] Call Trace:
> [16730.963297] elv_merge+0x96/0x120
> [16730.963300] blk_mq_sched_try_merge+0x3e/0x370
> [16730.963303] bfq_bio_merge+0xd3/0x130
> [16730.963306] blk_mq_submit_bio+0x11e/0x6c0
> [16730.963309] submit_bio_noacct+0x457/0x530
> [16730.963312] raid10_unplug+0x13f/0x1a0 [raid10]
> [16730.963316] blk_flush_plug_list+0xa9/0x110
> [16730.963319] blk_finish_plug+0x21/0x30
> [16730.963322] process_prepared_discard_passdown_pt1+0x204/0x2d0
> [dm_thin_pool]
> [16730.963327] do_worker+0x18e/0xce0 [dm_thin_pool]
> [16730.963335] process_one_work+0x217/0x3e0
> [16730.963338] worker_thread+0x4d/0x470
> [16730.963343] kthread+0x182/0x1b0
> [16730.963349] ret_from_fork+0x22/0x30
> …
> [16730.963419] ---[ end trace dd7e037f2028257b ]---
> [16730.963524] RIP: 0010:elv_rqhash_find+0xcc/0xd0
> …
> [16730.963547] note: kworker/u64:5[109170] exited with preempt_count 1
> [16747.948467] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
> fences timed out!
> ```
>
> Which is:
>
> ```
> 229 struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
> 230 {
> …
> 235 hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
> 236 BUG_ON(!ELV_ON_HASH(rq));
> …
> ```
>
> Yes, I carry some extra patches besides this series (the list is against v5.12
> GA):
>
> ```
> block, bfq: reset waker pointer with shared queues
> block, bfq: check waker only for queues with no in-flight I/O
> block, bfq: avoid delayed merge of async queues
> block, bfq: boost throughput by extending queue-merging times
> block, bfq: consider also creation time in delayed stable merge
> block, bfq: fix delayed stable merge check
> block, bfq: let also stably merged queues enjoy weight raising
> block: Do not pull requests from the scheduler when we cannot dispatch them
> blk: Fix lock inversion between ioc lock and bfqd lock
> bfq: Remove merged request already in bfq_requests_merged()
> block, bfq: avoid circular stable merges
> bfq: remove unnecessary BFQ_DEFAULT_GRP_IOPRIO
> bfq: reset entity->prio_changed in bfq_init_entity()
> bfq: optimize the calculation of bfq_weight_to_ioprio()
> bfq: remove unnecessary initialization logic
> bfq: keep the minimun bandwidth for CLASS_BE
> bfq: limit the IO depth of CLASS_IDLE to 1
> bfq: convert the type of bfq_group.bfqd to bfq_data*
> bfq: introduce bfq_entity_to_bfqg helper method
> bfq/mq-deadline: remove redundant check for passthrough request
> blk-mq: bypass IO scheduler's limit_depth for passthrough request
> block,bfq: fix the timeout calculation in bfq_bfqq_charge_time
> block, bfq: merge bursts of newly-created queues
> block, bfq: keep shared queues out of the waker mechanism
> block, bfq: fix weight-raising resume with !low_latency
> block, bfq: make shared queues inherit wakers
> block, bfq: put reqs of waker and woken in dispatch list
> block, bfq: always inject I/O of queues blocked by wakers
> ```
>
> but nothing from there triggered this for quite some time.
>
> Paolo, what do you think?
>
> --
> Oleksandr Natalenko (post-factum)
>
>
Hi,
CCing also Jan and Khazhy, because in your commit log I see also the
commit on bfq_requests_merged().
Is this OOPS reproducible for you?
Thanks,
Paolo
> Il giorno 21 giu 2021, alle ore 21:55, Oleksandr Natalenko <[email protected]> ha scritto:
>
> Hello.
>
> On sobota 19. června 2021 16:09:41 CEST Paolo Valente wrote:
>> Hi Jens,
>> this series contains an already proposed patch by Luca, plus six new
>> patches. The goals of these patches are summarized in the subject of
>> this cover letter. I'm including Luca's patch here, because it enabled
>> the actual use of stable merge, and, as such, triggered an otherwise
>> silent bug. This series contains also the fix for that bug ("block,
>> bfq: avoid delayed merge of async queues"), tested by Holger [1].
>>
>> Thanks,
>> Paolo
>>
>> [1] https://lkml.org/lkml/2021/5/18/384
>>
>> Luca Mariotti (1):
>> block, bfq: fix delayed stable merge check
>>
>> Paolo Valente (5):
>> block, bfq: let also stably merged queues enjoy weight raising
>> block, bfq: consider also creation time in delayed stable merge
>> block, bfq: avoid delayed merge of async queues
>> block, bfq: check waker only for queues with no in-flight I/O
>> block, bfq: reset waker pointer with shared queues
>>
>> Pietro Pedroni (1):
>> block, bfq: boost throughput by extending queue-merging times
>>
>> block/bfq-iosched.c | 68 +++++++++++++++++++++++++++++++++++----------
>> 1 file changed, 53 insertions(+), 15 deletions(-)
>>
>> --
>> 2.20.1
>
> Not sure everything goes fine here. After applying this series on top of the
> latest stable 5.12 kernel I got this:
>
> ```
> [16730.963248] kernel BUG at block/elevator.c:236!
> [16730.963254] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [16730.963257] CPU: 11 PID: 109170 Comm: kworker/u64:5 Tainted: G W
> 5.12.0-pf7 #1
> [16730.963260] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
> 3601 05/26/2021
> [16730.963263] Workqueue: dm-thin do_worker [dm_thin_pool]
> [16730.963270] RIP: 0010:elv_rqhash_find+0xcc/0xd0
> [16730.963274] Code: 41 89 f0 81 e2 00 40 06 00 41 81 e0 1a 00 04 00 44 09 c2
> 75 a9 be 09 00 00 00 c4 e2 4b f7 50 28 48 03 50 30 48 39 fa 75 c6 c3 <0f> 0b
> 66 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 48 8b 47 68 48
> [16730.963276] RSP: 0018:ffffa558d13b7af8 EFLAGS: 00010046
> [16730.963279] RAX: ffff8a0007782d00 RBX: ffff8a0014b93000 RCX: ffffa558d13b7b78
> [16730.963281] RDX: ffff8a0014b93000 RSI: 0000000000063082 RDI: 000000001e0fdc00
> [16730.963283] RBP: ffff8a000731c770 R08: ffff8a000731c770 R09: fffffff0ffffddfb
> [16730.963284] R10: 0000000000000000 R11: 0000000000000400 R12: ffff8a0330365c00
> [16730.963286] R13: ffffa558d13b7b30 R14: 0000000000000000 R15: ffff8a0212fc4000
> [16730.963288] FS: 0000000000000000(0000) GS:ffff8a070ecc0000(0000) knlGS:
> 0000000000000000
> [16730.963290] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16730.963292] CR2: 00007f1514d90f4c CR3: 0000000315952000 CR4:
> 0000000000350ee0
> [16730.963294] Call Trace:
> [16730.963297] elv_merge+0x96/0x120
> [16730.963300] blk_mq_sched_try_merge+0x3e/0x370
> [16730.963303] bfq_bio_merge+0xd3/0x130
> [16730.963306] blk_mq_submit_bio+0x11e/0x6c0
> [16730.963309] submit_bio_noacct+0x457/0x530
> [16730.963312] raid10_unplug+0x13f/0x1a0 [raid10]
> [16730.963316] blk_flush_plug_list+0xa9/0x110
> [16730.963319] blk_finish_plug+0x21/0x30
> [16730.963322] process_prepared_discard_passdown_pt1+0x204/0x2d0
> [dm_thin_pool]
> [16730.963327] do_worker+0x18e/0xce0 [dm_thin_pool]
> [16730.963335] process_one_work+0x217/0x3e0
> [16730.963338] worker_thread+0x4d/0x470
> [16730.963343] kthread+0x182/0x1b0
> [16730.963349] ret_from_fork+0x22/0x30
> …
> [16730.963419] ---[ end trace dd7e037f2028257b ]---
> [16730.963524] RIP: 0010:elv_rqhash_find+0xcc/0xd0
> …
> [16730.963547] note: kworker/u64:5[109170] exited with preempt_count 1
> [16747.948467] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
> fences timed out!
> ```
>
> Which is:
>
> ```
> 229 struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
> 230 {
> …
> 235 hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
> 236 BUG_ON(!ELV_ON_HASH(rq));
> …
> ```
>
> Yes, I carry some extra patches besides this series (the list is against v5.12
> GA):
>
> ```
> block, bfq: reset waker pointer with shared queues
> block, bfq: check waker only for queues with no in-flight I/O
> block, bfq: avoid delayed merge of async queues
> block, bfq: boost throughput by extending queue-merging times
> block, bfq: consider also creation time in delayed stable merge
> block, bfq: fix delayed stable merge check
> block, bfq: let also stably merged queues enjoy weight raising
> block: Do not pull requests from the scheduler when we cannot dispatch them
> blk: Fix lock inversion between ioc lock and bfqd lock
> bfq: Remove merged request already in bfq_requests_merged()
> block, bfq: avoid circular stable merges
> bfq: remove unnecessary BFQ_DEFAULT_GRP_IOPRIO
> bfq: reset entity->prio_changed in bfq_init_entity()
> bfq: optimize the calculation of bfq_weight_to_ioprio()
> bfq: remove unnecessary initialization logic
> bfq: keep the minimun bandwidth for CLASS_BE
> bfq: limit the IO depth of CLASS_IDLE to 1
> bfq: convert the type of bfq_group.bfqd to bfq_data*
> bfq: introduce bfq_entity_to_bfqg helper method
> bfq/mq-deadline: remove redundant check for passthrough request
> blk-mq: bypass IO scheduler's limit_depth for passthrough request
> block,bfq: fix the timeout calculation in bfq_bfqq_charge_time
> block, bfq: merge bursts of newly-created queues
> block, bfq: keep shared queues out of the waker mechanism
> block, bfq: fix weight-raising resume with !low_latency
> block, bfq: make shared queues inherit wakers
> block, bfq: put reqs of waker and woken in dispatch list
> block, bfq: always inject I/O of queues blocked by wakers
> ```
>
> but nothing from there triggered this for quite some time.
>
> Paolo, what do you think?
>
> --
> Oleksandr Natalenko (post-factum)
>
>
Hello.
On úterý 22. června 2021 9:08:43 CEST Paolo Valente wrote:
> CCing also Jan and Khazhy, because in your commit log I see also the
> commit on bfq_requests_merged().
>
> Is this OOPS reproducible for you?
No, I haven't found a reproducer, at least yet. It took half a day of uptime
to hit this, so might not be that easy.
--
Oleksandr Natalenko (post-factum)
On Tue 22-06-21 09:35:05, Oleksandr Natalenko wrote:
> Hello.
>
> On úterý 22. června 2021 9:08:43 CEST Paolo Valente wrote:
> > CCing also Jan and Khazhy, because in your commit log I see also the
> > commit on bfq_requests_merged().
> >
> > Is this OOPS reproducible for you?
>
> No, I haven't found a reproducer, at least yet. It took half a day of uptime
> to hit this, so might not be that easy.
Hum, if you can acquire a crash dump it would be the easiest I guess. We'd
need to find out more about the request we crash on - whether it's
otherwise valid, in what state it is etc...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
Ahoj.
On úterý 22. června 2021 18:29:48 CEST Jan Kara wrote:
> On Tue 22-06-21 09:35:05, Oleksandr Natalenko wrote:
> > On úterý 22. června 2021 9:08:43 CEST Paolo Valente wrote:
> > > CCing also Jan and Khazhy, because in your commit log I see also the
> > > commit on bfq_requests_merged().
> > >
> > > Is this OOPS reproducible for you?
> >
> > No, I haven't found a reproducer, at least yet. It took half a day of
> > uptime to hit this, so might not be that easy.
>
> Hum, if you can acquire a crash dump it would be the easiest I guess. We'd
> need to find out more about the request we crash on - whether it's
> otherwise valid, in what state it is etc...
Yes, I understand that. Once (and if) I reproduce it reliably, of course.
Thanks.
--
Oleksandr Natalenko (post-factum)
Hello.
On úterý 22. června 2021 18:29:48 CEST Jan Kara wrote:
> On Tue 22-06-21 09:35:05, Oleksandr Natalenko wrote:
> > On úterý 22. června 2021 9:08:43 CEST Paolo Valente wrote:
> > > CCing also Jan and Khazhy, because in your commit log I see also the
> > > commit on bfq_requests_merged().
> > >
> > > Is this OOPS reproducible for you?
> >
> > No, I haven't found a reproducer, at least yet. It took half a day of
> > uptime to hit this, so might not be that easy.
>
> Hum, if you can acquire a crash dump it would be the easiest I guess. We'd
> need to find out more about the request we crash on - whether it's
> otherwise valid, in what state it is etc...
Still have no reliable reproducer and no vmcore, however I'm running v5.13
with the following patches applied on top of it:
```
blk: Fix lock inversion between ioc lock and bfqd lock
bfq: Remove merged request already in bfq_requests_merged()
block: Remove unnecessary elevator operation checks
block: Do not pull requests from the scheduler when we cannot dispatch them
block, bfq: reset waker pointer with shared queues
block, bfq: check waker only for queues with no in-flight I/O
block, bfq: avoid delayed merge of async queues
block, bfq: boost throughput by extending queue-merging times
block, bfq: consider also creation time in delayed stable merge
block, bfq: fix delayed stable merge check
block, bfq: let also stably merged queues enjoy weight raising
```
and just got the following crash:
```
[60313.522570] ------------[ cut here ]------------
[60313.522579] WARNING: CPU: 20 PID: 388 at arch/x86/include/asm/kfence.h:44
kfence_protect_page+0x39/0xc0
[60313.522586] Modules linked in: sctp ip6_udp_tunnel udp_tunnel uinput
netconsole blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
nfs lockd grace sunrpc fscache netfs rfcomm nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 cmac algif_hash algif_skcipher nf_tables af_alg bnep tun
nfnetlink nls_iso8859_1 intel_rapl_msr vfat intel_rapl_common iwlmvm fat
mac80211 edac_mce_amd libarc4 btusb eeepc_wmi btrtl asus_wmi iwlwifi btbcm
snd_usb_audio sparse_keymap kvm_amd video wmi_bmof mxm_wmi btintel uvcvideo
snd_hda_codec_realtek videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib kvm
snd_hda_codec_generic bluetooth videobuf2_v4l2 ledtrig_audio
snd_hda_codec_hdmi joydev snd_rawmidi ecdh_generic irqbypass ecc snd_hda_intel
mousedev pl2303 cfg80211 snd_seq_device videobuf2_common crc16 rapl k10temp
snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec igb r8169 sp5100_tco
snd_hda_core realtek i2c_piix4 ipmi_devintf dca snd_hwdep mdio_devres rfkill
snd_pcm libphy ipmi_msghandler wmi
[60313.522630] pinctrl_amd mac_hid acpi_cpufreq tcp_bbr2 vhost_vsock
vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback
videodev mc snd_hrtimer snd_timer snd soundcore nct6775 hwmon_vid crypto_user
fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison
dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted
asn1_encoder tee hid_logitech_hidpp hid_logitech_dj usbhid dm_mod
crct10dif_pclmul crc32_pclmul crc32c_intel raid10 ghash_clmulni_intel
aesni_intel md_mod crypto_simd cryptd amdgpu ccp xhci_pci xhci_pci_renesas
tpm_crb tpm_tis tpm_tis_core tpm rng_core drm_ttm_helper ttm gpu_sched
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec
drm agpgart
[60313.522665] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
5.13.0-pf2 #1
[60313.522668] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
3601 05/26/2021
[60313.522671] Workqueue: kblockd blk_mq_run_work_fn
[60313.522675] RIP: 0010:kfence_protect_page+0x39/0xc0
[60313.522679] Code: 04 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 c7 44
24 04 00 00 00 00 e8 83 20 d5 ff 48 85 c0 74 07 83 7c 24 04 01 74 06 <0f> 0b 31
c0 eb 4c 48 8b 38 48 89 c2 84 db 75 59 48 89 f8 0f 1f 40
[60313.522682] RSP: 0018:ffffb559c0affb28 EFLAGS: 00010046
[60313.522684] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb559c0affb2c
[60313.522687] RDX: ffffb559c0affb2c RSI: 0000000000000000 RDI: 0000000000000000
[60313.522690] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[60313.522692] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000002
[60313.522694] R13: ffffb559c0affc28 R14: 00000000c0affc01 R15: 0000000000000000
[60313.522696] FS: 0000000000000000(0000) GS:ffff8cf44ef00000(0000) knlGS:
0000000000000000
[60313.522698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60313.522700] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
0000000000350ee0
[60313.522702] Call Trace:
[60313.522707] kfence_handle_page_fault+0xa6/0x280
[60313.522710] page_fault_oops+0x9d/0x2d0
[60313.522714] exc_page_fault+0x78/0x180
[60313.522718] asm_exc_page_fault+0x1e/0x30
[60313.522721] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
[60313.522725] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00 4c 39
bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33 <ff> 80 20 01
00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
[60313.522727] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
[60313.522729] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX: 0000000000000000
[60313.522731] RDX: 0000000000000000 RSI: ffff8ced4ad90000 RDI: ffff8ced52fc9f40
[60313.522733] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[60313.522735] R10: 000000000000003f R11: 0000000000000000 R12: ffff8cf20e5a5400
[60313.522737] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420 R15: ffff8cf0e7e91c70
[60313.522741] ? mod_delayed_work_on+0x71/0xe0
[60313.522745] ? __sbitmap_get_word+0x30/0x80
[60313.522748] __blk_mq_do_dispatch_sched+0x218/0x320
[60313.522752] __blk_mq_sched_dispatch_requests+0x107/0x150
[60313.522755] blk_mq_sched_dispatch_requests+0x2f/0x60
[60313.522758] blk_mq_run_work_fn+0x43/0xc0
[60313.522761] process_one_work+0x24e/0x430
[60313.522765] worker_thread+0x54/0x4d0
[60313.522767] ? process_one_work+0x430/0x430
[60313.522770] kthread+0x182/0x1b0
[60313.522773] ? __kthread_init_worker+0x50/0x50
[60313.522776] ret_from_fork+0x22/0x30
[60313.522781] ---[ end trace 55ef262e614b59af ]---
[60313.522786] ------------[ cut here ]------------
[60313.522787] WARNING: CPU: 20 PID: 388 at mm/kfence/core.c:135
kfence_handle_page_fault+0xaa/0x280
[60313.522791] Modules linked in: sctp ip6_udp_tunnel udp_tunnel uinput
netconsole blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
nfs lockd grace sunrpc fscache netfs rfcomm nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 cmac algif_hash algif_skcipher nf_tables af_alg bnep tun
nfnetlink nls_iso8859_1 intel_rapl_msr vfat intel_rapl_common iwlmvm fat
mac80211 edac_mce_amd libarc4 btusb eeepc_wmi btrtl asus_wmi iwlwifi btbcm
snd_usb_audio sparse_keymap kvm_amd video wmi_bmof mxm_wmi btintel uvcvideo
snd_hda_codec_realtek videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib kvm
snd_hda_codec_generic bluetooth videobuf2_v4l2 ledtrig_audio
snd_hda_codec_hdmi joydev snd_rawmidi ecdh_generic irqbypass ecc snd_hda_intel
mousedev pl2303 cfg80211 snd_seq_device videobuf2_common crc16 rapl k10temp
snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec igb r8169 sp5100_tco
snd_hda_core realtek i2c_piix4 ipmi_devintf dca snd_hwdep mdio_devres rfkill
snd_pcm libphy ipmi_msghandler wmi
[60313.522817] pinctrl_amd mac_hid acpi_cpufreq tcp_bbr2 vhost_vsock
vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback
videodev mc snd_hrtimer snd_timer snd soundcore nct6775 hwmon_vid crypto_user
fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison
dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted
asn1_encoder tee hid_logitech_hidpp hid_logitech_dj usbhid dm_mod
crct10dif_pclmul crc32_pclmul crc32c_intel raid10 ghash_clmulni_intel
aesni_intel md_mod crypto_simd cryptd amdgpu ccp xhci_pci xhci_pci_renesas
tpm_crb tpm_tis tpm_tis_core tpm rng_core drm_ttm_helper ttm gpu_sched
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec
drm agpgart
[60313.522840] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
5.13.0-pf2 #1
[60313.522843] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
3601 05/26/2021
[60313.522845] Workqueue: kblockd blk_mq_run_work_fn
[60313.522848] RIP: 0010:kfence_handle_page_fault+0xaa/0x280
[60313.522851] Code: 0f 86 d4 00 00 00 0f b6 f3 41 b8 03 00 00 00 31 c9 4c 89
ea 48 89 ef e8 e4 05 00 00 31 f6 4c 89 ff e8 6a f5 ff ff 84 c0 75 8d <0f> 0b c6
05 7d fd 6b 01 00 45 31 f6 e9 7c ff ff ff 48 8b 0d 36 a0
[60313.522853] RSP: 0018:ffffb559c0affb50 EFLAGS: 00010046
[60313.522855] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb559c0affb2c
[60313.522857] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[60313.522859] RBP: 0000000000000120 R08: 0000000000000000 R09:
0000000000000000
[60313.522860] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000002
[60313.522862] R13: ffffb559c0affc28 R14: 00000000c0affc01 R15: 0000000000000000
[60313.522864] FS: 0000000000000000(0000) GS:ffff8cf44ef00000(0000) knlGS:
0000000000000000
[60313.522866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60313.522868] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
0000000000350ee0
[60313.522870] Call Trace:
[60313.522872] page_fault_oops+0x9d/0x2d0
[60313.522875] exc_page_fault+0x78/0x180
[60313.522878] asm_exc_page_fault+0x1e/0x30
[60313.522880] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
[60313.522883] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00 4c 39
bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33 <ff> 80 20 01
00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
[60313.522885] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
[60313.522887] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX: 0000000000000000
[60313.522889] RDX: 0000000000000000 RSI: ffff8ced4ad90000 RDI: ffff8ced52fc9f40
[60313.522890] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[60313.522892] R10: 000000000000003f R11: 0000000000000000 R12: ffff8cf20e5a5400
[60313.523148] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420 R15: ffff8cf0e7e91c70
[60313.523150] ? mod_delayed_work_on+0x71/0xe0
[60313.523153] ? __sbitmap_get_word+0x30/0x80
[60313.523157] __blk_mq_do_dispatch_sched+0x218/0x320
[60313.523161] __blk_mq_sched_dispatch_requests+0x107/0x150
[60313.523165] blk_mq_sched_dispatch_requests+0x2f/0x60
[60313.523167] blk_mq_run_work_fn+0x43/0xc0
[60313.523170] process_one_work+0x24e/0x430
[60313.523173] worker_thread+0x54/0x4d0
[60313.523176] ? process_one_work+0x430/0x430
[60313.523178] kthread+0x182/0x1b0
[60313.523181] ? __kthread_init_worker+0x50/0x50
[60313.523183] ret_from_fork+0x22/0x30
[60313.523187] ---[ end trace 55ef262e614b59b0 ]---
[60313.523189] BUG: kernel NULL pointer dereference, address: 0000000000000120
[60313.523191] #PF: supervisor write access in kernel mode
[60313.523193] #PF: error_code(0x0002) - not-present page
[60313.523195] PGD 0 P4D 0
[60313.523197] Oops: 0002 [#1] PREEMPT SMP NOPTI
[60313.523200] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
5.13.0-pf2 #1
[60313.523202] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
3601 05/26/2021
[60313.523204] Workqueue: kblockd blk_mq_run_work_fn
[60313.523207] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
[60313.523210] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00 4c 39
bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33 <ff> 80 20 01
00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
[60313.523213] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
[60313.523215] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX: 0000000000000000
[60313.523216] RDX: 0000000000000000 RSI: ffff8ced4ad90000 RDI: ffff8ced52fc9f40
[60313.523218] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[60313.523220] R10: 000000000000003f R11: 0000000000000000 R12: ffff8cf20e5a5400
[60313.523221] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420 R15: ffff8cf0e7e91c70
[60313.523223] FS: 0000000000000000(0000) GS:ffff8cf44ef00000(0000) knlGS:
0000000000000000
[60313.523225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60313.523227] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
0000000000350ee0
[60313.523229] Call Trace:
[60313.523231] ? mod_delayed_work_on+0x71/0xe0
[60313.523233] ? __sbitmap_get_word+0x30/0x80
[60313.523237] __blk_mq_do_dispatch_sched+0x218/0x320
[60313.523240] __blk_mq_sched_dispatch_requests+0x107/0x150
[60313.523243] blk_mq_sched_dispatch_requests+0x2f/0x60
[60313.523246] blk_mq_run_work_fn+0x43/0xc0
[60313.523249] process_one_work+0x24e/0x430
[60313.523251] worker_thread+0x54/0x4d0
[60313.523254] ? process_one_work+0x430/0x430
[60313.523257] kthread+0x182/0x1b0
[60313.523259] ? __kthread_init_worker+0x50/0x50
[60313.523261] ret_from_fork+0x22/0x30
[60313.523265] Modules linked in: sctp ip6_udp_tunnel udp_tunnel uinput
netconsole blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
nfs lockd grace sunrpc fscache netfs rfcomm nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 cmac algif_hash algif_skcipher nf_tables af_alg bnep tun
nfnetlink nls_iso8859_1 intel_rapl_msr vfat intel_rapl_common iwlmvm fat
mac80211 edac_mce_amd libarc4 btusb eeepc_wmi btrtl asus_wmi iwlwifi btbcm
snd_usb_audio sparse_keymap kvm_amd video wmi_bmof mxm_wmi btintel uvcvideo
snd_hda_codec_realtek videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib kvm
snd_hda_codec_generic bluetooth videobuf2_v4l2 ledtrig_audio
snd_hda_codec_hdmi joydev snd_rawmidi ecdh_generic irqbypass ecc snd_hda_intel
mousedev pl2303 cfg80211 snd_seq_device videobuf2_common crc16 rapl k10temp
snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec igb r8169 sp5100_tco
snd_hda_core realtek i2c_piix4 ipmi_devintf dca snd_hwdep mdio_devres rfkill
snd_pcm libphy ipmi_msghandler wmi
[60313.523290] pinctrl_amd mac_hid acpi_cpufreq tcp_bbr2 vhost_vsock
vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback
videodev mc snd_hrtimer snd_timer snd soundcore nct6775 hwmon_vid crypto_user
fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison
dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted
asn1_encoder tee hid_logitech_hidpp hid_logitech_dj usbhid dm_mod
crct10dif_pclmul crc32_pclmul crc32c_intel raid10 ghash_clmulni_intel
aesni_intel md_mod crypto_simd cryptd amdgpu ccp xhci_pci xhci_pci_renesas
tpm_crb tpm_tis tpm_tis_core tpm rng_core drm_ttm_helper ttm gpu_sched
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec
drm agpgart
[60313.523314] CR2: 0000000000000120
[60313.523316] ---[ end trace 55ef262e614b59b1 ]---
[60313.523318] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
[60313.523321] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00 4c 39
bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33 <ff> 80 20 01
00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
[60313.523323] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
[60313.523325] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX: 0000000000000000
[60313.523326] RDX: 0000000000000000 RSI: ffff8ced4ad90000 RDI: ffff8ced52fc9f40
[60313.523328] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[60313.523330] R10: 000000000000003f R11: 0000000000000000 R12: ffff8cf20e5a5400
[60313.523332] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420 R15: ffff8cf0e7e91c70
[60313.523334] FS: 0000000000000000(0000) GS:ffff8cf44ef00000(0000) knlGS:
0000000000000000
[60313.523336] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60313.523338] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
0000000000350ee0
[60313.523339] note: kworker/20:1H[388] exited with preempt_count 1
```
--
Oleksandr Natalenko (post-factum)
Hello.
On sobota 3. července 2021 0:07:53 CEST Oleksandr Natalenko wrote:
> On úterý 22. června 2021 18:29:48 CEST Jan Kara wrote:
> > On Tue 22-06-21 09:35:05, Oleksandr Natalenko wrote:
> > > On úterý 22. června 2021 9:08:43 CEST Paolo Valente wrote:
> > > > CCing also Jan and Khazhy, because in your commit log I see also the
> > > > commit on bfq_requests_merged().
> > > >
> > > > Is this OOPS reproducible for you?
> > >
> > > No, I haven't found a reproducer, at least yet. It took half a day of
> > > uptime to hit this, so might not be that easy.
> >
> > Hum, if you can acquire a crash dump it would be the easiest I guess. We'd
> > need to find out more about the request we crash on - whether it's
> > otherwise valid, in what state it is etc...
>
> Still have no reliable reproducer and no vmcore, however I'm running v5.13
> with the following patches applied on top of it:
>
> ```
> blk: Fix lock inversion between ioc lock and bfqd lock
> bfq: Remove merged request already in bfq_requests_merged()
> block: Remove unnecessary elevator operation checks
> block: Do not pull requests from the scheduler when we cannot dispatch them
> block, bfq: reset waker pointer with shared queues
> block, bfq: check waker only for queues with no in-flight I/O
> block, bfq: avoid delayed merge of async queues
> block, bfq: boost throughput by extending queue-merging times
> block, bfq: consider also creation time in delayed stable merge
> block, bfq: fix delayed stable merge check
> block, bfq: let also stably merged queues enjoy weight raising
> ```
>
> and just got the following crash:
>
> ```
> [60313.522570] ------------[ cut here ]------------
> [60313.522579] WARNING: CPU: 20 PID: 388 at arch/x86/include/asm/kfence.h:44
> kfence_protect_page+0x39/0xc0
> [60313.522586] Modules linked in: sctp ip6_udp_tunnel udp_tunnel uinput
> netconsole blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
> nfs lockd grace sunrpc fscache netfs rfcomm nft_ct nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 cmac algif_hash algif_skcipher nf_tables
> af_alg bnep tun nfnetlink nls_iso8859_1 intel_rapl_msr vfat
> intel_rapl_common iwlmvm fat mac80211 edac_mce_amd libarc4 btusb eeepc_wmi
> btrtl asus_wmi iwlwifi btbcm snd_usb_audio sparse_keymap kvm_amd video
> wmi_bmof mxm_wmi btintel uvcvideo snd_hda_codec_realtek videobuf2_vmalloc
> videobuf2_memops snd_usbmidi_lib kvm snd_hda_codec_generic bluetooth
> videobuf2_v4l2 ledtrig_audio
> snd_hda_codec_hdmi joydev snd_rawmidi ecdh_generic irqbypass ecc
> snd_hda_intel mousedev pl2303 cfg80211 snd_seq_device videobuf2_common
> crc16 rapl k10temp snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec igb
> r8169 sp5100_tco snd_hda_core realtek i2c_piix4 ipmi_devintf dca snd_hwdep
> mdio_devres rfkill snd_pcm libphy ipmi_msghandler wmi
> [60313.522630] pinctrl_amd mac_hid acpi_cpufreq tcp_bbr2 vhost_vsock
> vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback
> videodev mc snd_hrtimer snd_timer snd soundcore nct6775 hwmon_vid
> crypto_user fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data
> dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys
> trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj usbhid dm_mod
> crct10dif_pclmul crc32_pclmul crc32c_intel raid10 ghash_clmulni_intel
> aesni_intel md_mod crypto_simd cryptd amdgpu ccp xhci_pci xhci_pci_renesas
> tpm_crb tpm_tis tpm_tis_core tpm rng_core drm_ttm_helper ttm gpu_sched
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> cec drm agpgart
> [60313.522665] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
> 5.13.0-pf2 #1
> [60313.522668] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
> 3601 05/26/2021
> [60313.522671] Workqueue: kblockd blk_mq_run_work_fn
> [60313.522675] RIP: 0010:kfence_protect_page+0x39/0xc0
> [60313.522679] Code: 04 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 c7
> 44 24 04 00 00 00 00 e8 83 20 d5 ff 48 85 c0 74 07 83 7c 24 04 01 74 06
> <0f> 0b 31 c0 eb 4c 48 8b 38 48 89 c2 84 db 75 59 48 89 f8 0f 1f 40
> [60313.522682] RSP: 0018:ffffb559c0affb28 EFLAGS: 00010046
> [60313.522684] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> ffffb559c0affb2c [60313.522687] RDX: ffffb559c0affb2c RSI: 0000000000000000
> RDI: 0000000000000000 [60313.522690] RBP: 0000000000000000 R08:
> 0000000000000000 R09:
> 0000000000000000
> [60313.522692] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000002
> [60313.522694] R13: ffffb559c0affc28 R14: 00000000c0affc01 R15:
> 0000000000000000 [60313.522696] FS: 0000000000000000(0000)
> GS:ffff8cf44ef00000(0000) knlGS: 0000000000000000
> [60313.522698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60313.522700] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
> 0000000000350ee0
> [60313.522702] Call Trace:
> [60313.522707] kfence_handle_page_fault+0xa6/0x280
> [60313.522710] page_fault_oops+0x9d/0x2d0
> [60313.522714] exc_page_fault+0x78/0x180
> [60313.522718] asm_exc_page_fault+0x1e/0x30
> [60313.522721] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
> [60313.522725] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00
> 4c 39 bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33
> <ff> 80 20 01 00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
> [60313.522727] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
> [60313.522729] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX:
> 0000000000000000 [60313.522731] RDX: 0000000000000000 RSI: ffff8ced4ad90000
> RDI: ffff8ced52fc9f40 [60313.522733] RBP: 0000000000000000 R08:
> 0000000000000001 R09:
> 0000000000000000
> [60313.522735] R10: 000000000000003f R11: 0000000000000000 R12:
> ffff8cf20e5a5400 [60313.522737] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420
> R15: ffff8cf0e7e91c70 [60313.522741] ? mod_delayed_work_on+0x71/0xe0
> [60313.522745] ? __sbitmap_get_word+0x30/0x80
> [60313.522748] __blk_mq_do_dispatch_sched+0x218/0x320
> [60313.522752] __blk_mq_sched_dispatch_requests+0x107/0x150
> [60313.522755] blk_mq_sched_dispatch_requests+0x2f/0x60
> [60313.522758] blk_mq_run_work_fn+0x43/0xc0
> [60313.522761] process_one_work+0x24e/0x430
> [60313.522765] worker_thread+0x54/0x4d0
> [60313.522767] ? process_one_work+0x430/0x430
> [60313.522770] kthread+0x182/0x1b0
> [60313.522773] ? __kthread_init_worker+0x50/0x50
> [60313.522776] ret_from_fork+0x22/0x30
> [60313.522781] ---[ end trace 55ef262e614b59af ]---
> [60313.522786] ------------[ cut here ]------------
> [60313.522787] WARNING: CPU: 20 PID: 388 at mm/kfence/core.c:135
> kfence_handle_page_fault+0xaa/0x280
> [60313.522791] Modules linked in: sctp ip6_udp_tunnel udp_tunnel uinput
> netconsole blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
> nfs lockd grace sunrpc fscache netfs rfcomm nft_ct nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 cmac algif_hash algif_skcipher nf_tables
> af_alg bnep tun nfnetlink nls_iso8859_1 intel_rapl_msr vfat
> intel_rapl_common iwlmvm fat mac80211 edac_mce_amd libarc4 btusb eeepc_wmi
> btrtl asus_wmi iwlwifi btbcm snd_usb_audio sparse_keymap kvm_amd video
> wmi_bmof mxm_wmi btintel uvcvideo snd_hda_codec_realtek videobuf2_vmalloc
> videobuf2_memops snd_usbmidi_lib kvm snd_hda_codec_generic bluetooth
> videobuf2_v4l2 ledtrig_audio
> snd_hda_codec_hdmi joydev snd_rawmidi ecdh_generic irqbypass ecc
> snd_hda_intel mousedev pl2303 cfg80211 snd_seq_device videobuf2_common
> crc16 rapl k10temp snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec igb
> r8169 sp5100_tco snd_hda_core realtek i2c_piix4 ipmi_devintf dca snd_hwdep
> mdio_devres rfkill snd_pcm libphy ipmi_msghandler wmi
> [60313.522817] pinctrl_amd mac_hid acpi_cpufreq tcp_bbr2 vhost_vsock
> vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback
> videodev mc snd_hrtimer snd_timer snd soundcore nct6775 hwmon_vid
> crypto_user fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data
> dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys
> trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj usbhid dm_mod
> crct10dif_pclmul crc32_pclmul crc32c_intel raid10 ghash_clmulni_intel
> aesni_intel md_mod crypto_simd cryptd amdgpu ccp xhci_pci xhci_pci_renesas
> tpm_crb tpm_tis tpm_tis_core tpm rng_core drm_ttm_helper ttm gpu_sched
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> cec drm agpgart
> [60313.522840] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
> 5.13.0-pf2 #1
> [60313.522843] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
> 3601 05/26/2021
> [60313.522845] Workqueue: kblockd blk_mq_run_work_fn
> [60313.522848] RIP: 0010:kfence_handle_page_fault+0xaa/0x280
> [60313.522851] Code: 0f 86 d4 00 00 00 0f b6 f3 41 b8 03 00 00 00 31 c9 4c
> 89 ea 48 89 ef e8 e4 05 00 00 31 f6 4c 89 ff e8 6a f5 ff ff 84 c0 75 8d
> <0f> 0b c6 05 7d fd 6b 01 00 45 31 f6 e9 7c ff ff ff 48 8b 0d 36 a0
> [60313.522853] RSP: 0018:ffffb559c0affb50 EFLAGS: 00010046
> [60313.522855] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> ffffb559c0affb2c [60313.522857] RDX: 0000000000000000 RSI: 0000000000000000
> RDI:
> 0000000000000000
> [60313.522859] RBP: 0000000000000120 R08: 0000000000000000 R09:
> 0000000000000000
> [60313.522860] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000002
> [60313.522862] R13: ffffb559c0affc28 R14: 00000000c0affc01 R15:
> 0000000000000000 [60313.522864] FS: 0000000000000000(0000)
> GS:ffff8cf44ef00000(0000) knlGS: 0000000000000000
> [60313.522866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60313.522868] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
> 0000000000350ee0
> [60313.522870] Call Trace:
> [60313.522872] page_fault_oops+0x9d/0x2d0
> [60313.522875] exc_page_fault+0x78/0x180
> [60313.522878] asm_exc_page_fault+0x1e/0x30
> [60313.522880] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
> [60313.522883] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00
> 4c 39 bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33
> <ff> 80 20 01 00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
> [60313.522885] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
> [60313.522887] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX:
> 0000000000000000 [60313.522889] RDX: 0000000000000000 RSI: ffff8ced4ad90000
> RDI: ffff8ced52fc9f40 [60313.522890] RBP: 0000000000000000 R08:
> 0000000000000001 R09:
> 0000000000000000
> [60313.522892] R10: 000000000000003f R11: 0000000000000000 R12:
> ffff8cf20e5a5400 [60313.523148] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420
> R15: ffff8cf0e7e91c70 [60313.523150] ? mod_delayed_work_on+0x71/0xe0
> [60313.523153] ? __sbitmap_get_word+0x30/0x80
> [60313.523157] __blk_mq_do_dispatch_sched+0x218/0x320
> [60313.523161] __blk_mq_sched_dispatch_requests+0x107/0x150
> [60313.523165] blk_mq_sched_dispatch_requests+0x2f/0x60
> [60313.523167] blk_mq_run_work_fn+0x43/0xc0
> [60313.523170] process_one_work+0x24e/0x430
> [60313.523173] worker_thread+0x54/0x4d0
> [60313.523176] ? process_one_work+0x430/0x430
> [60313.523178] kthread+0x182/0x1b0
> [60313.523181] ? __kthread_init_worker+0x50/0x50
> [60313.523183] ret_from_fork+0x22/0x30
> [60313.523187] ---[ end trace 55ef262e614b59b0 ]---
> [60313.523189] BUG: kernel NULL pointer dereference, address:
> 0000000000000120 [60313.523191] #PF: supervisor write access in kernel mode
> [60313.523193] #PF: error_code(0x0002) - not-present page
> [60313.523195] PGD 0 P4D 0
> [60313.523197] Oops: 0002 [#1] PREEMPT SMP NOPTI
> [60313.523200] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
> 5.13.0-pf2 #1
> [60313.523202] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
> 3601 05/26/2021
> [60313.523204] Workqueue: kblockd blk_mq_run_work_fn
> [60313.523207] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
> [60313.523210] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00
> 4c 39 bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33
> <ff> 80 20 01 00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
> [60313.523213] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
> [60313.523215] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX:
> 0000000000000000 [60313.523216] RDX: 0000000000000000 RSI: ffff8ced4ad90000
> RDI: ffff8ced52fc9f40 [60313.523218] RBP: 0000000000000000 R08:
> 0000000000000001 R09:
> 0000000000000000
> [60313.523220] R10: 000000000000003f R11: 0000000000000000 R12:
> ffff8cf20e5a5400 [60313.523221] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420
> R15: ffff8cf0e7e91c70 [60313.523223] FS: 0000000000000000(0000)
> GS:ffff8cf44ef00000(0000) knlGS: 0000000000000000
> [60313.523225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60313.523227] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
> 0000000000350ee0
> [60313.523229] Call Trace:
> [60313.523231] ? mod_delayed_work_on+0x71/0xe0
> [60313.523233] ? __sbitmap_get_word+0x30/0x80
> [60313.523237] __blk_mq_do_dispatch_sched+0x218/0x320
> [60313.523240] __blk_mq_sched_dispatch_requests+0x107/0x150
> [60313.523243] blk_mq_sched_dispatch_requests+0x2f/0x60
> [60313.523246] blk_mq_run_work_fn+0x43/0xc0
> [60313.523249] process_one_work+0x24e/0x430
> [60313.523251] worker_thread+0x54/0x4d0
> [60313.523254] ? process_one_work+0x430/0x430
> [60313.523257] kthread+0x182/0x1b0
> [60313.523259] ? __kthread_init_worker+0x50/0x50
> [60313.523261] ret_from_fork+0x22/0x30
> [60313.523265] Modules linked in: sctp ip6_udp_tunnel udp_tunnel uinput
> netconsole blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
> nfs lockd grace sunrpc fscache netfs rfcomm nft_ct nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 cmac algif_hash algif_skcipher nf_tables
> af_alg bnep tun nfnetlink nls_iso8859_1 intel_rapl_msr vfat
> intel_rapl_common iwlmvm fat mac80211 edac_mce_amd libarc4 btusb eeepc_wmi
> btrtl asus_wmi iwlwifi btbcm snd_usb_audio sparse_keymap kvm_amd video
> wmi_bmof mxm_wmi btintel uvcvideo snd_hda_codec_realtek videobuf2_vmalloc
> videobuf2_memops snd_usbmidi_lib kvm snd_hda_codec_generic bluetooth
> videobuf2_v4l2 ledtrig_audio
> snd_hda_codec_hdmi joydev snd_rawmidi ecdh_generic irqbypass ecc
> snd_hda_intel mousedev pl2303 cfg80211 snd_seq_device videobuf2_common
> crc16 rapl k10temp snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec igb
> r8169 sp5100_tco snd_hda_core realtek i2c_piix4 ipmi_devintf dca snd_hwdep
> mdio_devres rfkill snd_pcm libphy ipmi_msghandler wmi
> [60313.523290] pinctrl_amd mac_hid acpi_cpufreq tcp_bbr2 vhost_vsock
> vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback
> videodev mc snd_hrtimer snd_timer snd soundcore nct6775 hwmon_vid
> crypto_user fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data
> dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys
> trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj usbhid dm_mod
> crct10dif_pclmul crc32_pclmul crc32c_intel raid10 ghash_clmulni_intel
> aesni_intel md_mod crypto_simd cryptd amdgpu ccp xhci_pci xhci_pci_renesas
> tpm_crb tpm_tis tpm_tis_core tpm rng_core drm_ttm_helper ttm gpu_sched
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> cec drm agpgart
> [60313.523314] CR2: 0000000000000120
> [60313.523316] ---[ end trace 55ef262e614b59b1 ]---
> [60313.523318] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
> [60313.523321] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00
> 4c 39 bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33
> <ff> 80 20 01 00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
> [60313.523323] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
> [60313.523325] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX:
> 0000000000000000 [60313.523326] RDX: 0000000000000000 RSI: ffff8ced4ad90000
> RDI: ffff8ced52fc9f40 [60313.523328] RBP: 0000000000000000 R08:
> 0000000000000001 R09:
> 0000000000000000
> [60313.523330] R10: 000000000000003f R11: 0000000000000000 R12:
> ffff8cf20e5a5400 [60313.523332] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420
> R15: ffff8cf0e7e91c70 [60313.523334] FS: 0000000000000000(0000)
> GS:ffff8cf44ef00000(0000) knlGS: 0000000000000000
> [60313.523336] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60313.523338] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
> 0000000000350ee0
> [60313.523339] note: kworker/20:1H[388] exited with preempt_count 1
> ```
This is just to let you know that I'm running v5.13.7 without any extra
patches under block/ applied, and the issue is not reproducible.
I'll probably defer investigating this till v5.14 unless it is fixed there
already.
--
Oleksandr Natalenko (post-factum)
On Mon 02-08-21 22:40:26, Oleksandr Natalenko wrote:
> Hello.
>
> On sobota 3. července 2021 0:07:53 CEST Oleksandr Natalenko wrote:
> > On úterý 22. června 2021 18:29:48 CEST Jan Kara wrote:
> > > On Tue 22-06-21 09:35:05, Oleksandr Natalenko wrote:
> > > > On úterý 22. června 2021 9:08:43 CEST Paolo Valente wrote:
> > > > > CCing also Jan and Khazhy, because in your commit log I see also the
> > > > > commit on bfq_requests_merged().
> > > > >
> > > > > Is this OOPS reproducible for you?
> > > >
> > > > No, I haven't found a reproducer, at least yet. It took half a day of
> > > > uptime to hit this, so might not be that easy.
> > >
> > > Hum, if you can acquire a crash dump it would be the easiest I guess. We'd
> > > need to find out more about the request we crash on - whether it's
> > > otherwise valid, in what state it is etc...
> >
> > Still have no reliable reproducer and no vmcore, however I'm running v5.13
> > with the following patches applied on top of it:
> >
> > ```
> > blk: Fix lock inversion between ioc lock and bfqd lock
> > bfq: Remove merged request already in bfq_requests_merged()
> > block: Remove unnecessary elevator operation checks
> > block: Do not pull requests from the scheduler when we cannot dispatch them
> > block, bfq: reset waker pointer with shared queues
> > block, bfq: check waker only for queues with no in-flight I/O
> > block, bfq: avoid delayed merge of async queues
> > block, bfq: boost throughput by extending queue-merging times
> > block, bfq: consider also creation time in delayed stable merge
> > block, bfq: fix delayed stable merge check
> > block, bfq: let also stably merged queues enjoy weight raising
> > ```
> >
> > and just got the following crash:
> >
> > ```
> > [60313.522570] ------------[ cut here ]------------
> > [60313.522579] WARNING: CPU: 20 PID: 388 at arch/x86/include/asm/kfence.h:44
> > kfence_protect_page+0x39/0xc0
<snip>
> > [60313.522665] CPU: 20 PID: 388 Comm: kworker/20:1H Tainted: G W
> > 5.13.0-pf2 #1
> > [60313.522668] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS
> > 3601 05/26/2021
> > [60313.522671] Workqueue: kblockd blk_mq_run_work_fn
> > [60313.522675] RIP: 0010:kfence_protect_page+0x39/0xc0
> > [60313.522679] Code: 04 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 c7
> > 44 24 04 00 00 00 00 e8 83 20 d5 ff 48 85 c0 74 07 83 7c 24 04 01 74 06
> > <0f> 0b 31 c0 eb 4c 48 8b 38 48 89 c2 84 db 75 59 48 89 f8 0f 1f 40
> > [60313.522682] RSP: 0018:ffffb559c0affb28 EFLAGS: 00010046
> > [60313.522684] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > ffffb559c0affb2c [60313.522687] RDX: ffffb559c0affb2c RSI: 0000000000000000
> > RDI: 0000000000000000 [60313.522690] RBP: 0000000000000000 R08:
> > 0000000000000000 R09:
> > 0000000000000000
> > [60313.522692] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000002
> > [60313.522694] R13: ffffb559c0affc28 R14: 00000000c0affc01 R15:
> > 0000000000000000 [60313.522696] FS: 0000000000000000(0000)
> > GS:ffff8cf44ef00000(0000) knlGS: 0000000000000000
> > [60313.522698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [60313.522700] CR2: 0000000000000120 CR3: 000000013ebce000 CR4:
> > 0000000000350ee0
> > [60313.522702] Call Trace:
> > [60313.522707] kfence_handle_page_fault+0xa6/0x280
> > [60313.522710] page_fault_oops+0x9d/0x2d0
> > [60313.522714] exc_page_fault+0x78/0x180
> > [60313.522718] asm_exc_page_fault+0x1e/0x30
> > [60313.522721] RIP: 0010:bfq_dispatch_request+0x4c3/0x1280
> > [60313.522725] Code: 4c 89 e7 e8 ef da ff ff 4c 89 ff 89 c6 e8 75 64 00 00
> > 4c 39 bb a0 00 00 00 0f 84 86 04 00 00 49 8b 84 24 90 00 00 00 48 8b 33
> > <ff> 80 20 01 00 00 48 89 34 24 48 8b 46 08 4c 8b 58 08 4c 89 5c 24
> > [60313.522727] RSP: 0018:ffffb559c0affcd0 EFLAGS: 00010046
> > [60313.522729] RAX: 0000000000000000 RBX: ffff8ced4d6a1000 RCX:
> > 0000000000000000 [60313.522731] RDX: 0000000000000000 RSI: ffff8ced4ad90000
> > RDI: ffff8ced52fc9f40 [60313.522733] RBP: 0000000000000000 R08:
> > 0000000000000001 R09:
> > 0000000000000000
> > [60313.522735] R10: 000000000000003f R11: 0000000000000000 R12:
> > ffff8cf20e5a5400 [60313.522737] R13: ffff8cf0e7e91c70 R14: ffff8ced4d6a1420
> > R15: ffff8cf0e7e91c70 [60313.522741] ? mod_delayed_work_on+0x71/0xe0
> > [60313.522745] ? __sbitmap_get_word+0x30/0x80
> > [60313.522748] __blk_mq_do_dispatch_sched+0x218/0x320
> > [60313.522752] __blk_mq_sched_dispatch_requests+0x107/0x150
> > [60313.522755] blk_mq_sched_dispatch_requests+0x2f/0x60
> > [60313.522758] blk_mq_run_work_fn+0x43/0xc0
> > [60313.522761] process_one_work+0x24e/0x430
> > [60313.522765] worker_thread+0x54/0x4d0
> > [60313.522767] ? process_one_work+0x430/0x430
> > [60313.522770] kthread+0x182/0x1b0
> > [60313.522773] ? __kthread_init_worker+0x50/0x50
> > [60313.522776] ret_from_fork+0x22/0x30
> > [60313.522781] ---[ end trace 55ef262e614b59af ]---
<snip>
> This is just to let you know that I'm running v5.13.7 without any extra
> patches under block/ applied, and the issue is not reproducible.
>
> I'll probably defer investigating this till v5.14 unless it is fixed there
> already.
Thanks for info! I've looked some more into this. It appears the trapping
instruction is:
static void bfq_dispatch_remove(struct request_queue *q, struct request *rq)
{
struct bfq_queue *bfqq = RQ_BFQQ(rq);
/*
* For consistency, the next instruction should have been
* executed after removing the request from the queue and
* dispatching it. We execute instead this instruction before
* bfq_remove_request() (and hence introduce a temporary
* inconsistency), for efficiency. In fact, should this
* dispatch occur for a non in-service bfqq, this anticipated
* increment prevents two counters related to bfqq->dispatched
* from risking to be, first, uselessly decremented, and then
* incremented again when the (new) value of bfqq->dispatched
* happens to be taken into account.
*/
>>> bfqq->dispatched++;
bfq_update_peak_rate(q->elevator->elevator_data, rq);
bfq_remove_request(q, rq);
}
The bfqq is in RAX and apparently is NULL. This function was called from
bfq_dispatch_rq_from_bfqq() and the above can only happen if bfqq->next_rq
was pointing to a request that was no longer attached to any bfq queue
(maybe rq was being freed?). But at this point I have no idea how this
could have possibly happened with current codebase. Can you maybe test with
5.14-rc4 to see whether current upstream has the issue?
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR