LinuxLists.cc - [PATCH AUTOSEL 5.11 01/22] interconnect: core: fix error return code of icc_link

2021-04-06 01:33:39

Subject: [PATCH AUTOSEL 5.11 01/22] interconnect: core: fix error return code of icc_link_destroy()

From: Jia-Ju Bai <[email protected]>

[ Upstream commit 715ea61532e731c62392221238906704e63d75b6 ]

When krealloc() fails and new is NULL, no error return code of
icc_link_destroy() is assigned.
To fix this bug, ret is assigned with -ENOMEM hen new is NULL.

Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Georgi Djakov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/interconnect/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 5ad519c9f239..8a1e70e00876 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -942,6 +942,8 @@ int icc_link_destroy(struct icc_node *src, struct icc_node *dst)
GFP_KERNEL);
if (new)
src->links = new;
+ else
+ ret = -ENOMEM;

out:
mutex_unlock(&icc_lock);
--
2.30.2

2021-04-06 01:34:33

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 08/22] ftrace: Check if pages were allocated before calling free_pages()

From: "Steven Rostedt (VMware)" <[email protected]>

[ Upstream commit 59300b36f85f254260c81d9dd09195fa49eb0f98 ]

It is possible that on error pg->size can be zero when getting its order,
which would return a -1 value. It is dangerous to pass in an order of -1
to free_pages(). Check if order is greater than or equal to zero before
calling free_pages().

Link: https://lore.kernel.org/lkml/[email protected]/

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/trace/ftrace.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b7e29db127fa..3ba52d4e1314 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3231,7 +3231,8 @@ ftrace_allocate_pages(unsigned long num_to_init)
pg = start_pg;
while (pg) {
order = get_count_order(pg->size / ENTRIES_PER_PAGE);
- free_pages((unsigned long)pg->records, order);
+ if (order >= 0)
+ free_pages((unsigned long)pg->records, order);
start_pg = pg->next;
kfree(pg);
pg = start_pg;
@@ -6451,7 +6452,8 @@ void ftrace_release_mod(struct module *mod)
clear_mod_from_hashes(pg);

order = get_count_order(pg->size / ENTRIES_PER_PAGE);
- free_pages((unsigned long)pg->records, order);
+ if (order >= 0)
+ free_pages((unsigned long)pg->records, order);
tmp_page = pg->next;
kfree(pg);
ftrace_number_of_pages -= 1 << order;
@@ -6811,7 +6813,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
if (!pg->index) {
*last_pg = pg->next;
order = get_count_order(pg->size / ENTRIES_PER_PAGE);
- free_pages((unsigned long)pg->records, order);
+ if (order >= 0)
+ free_pages((unsigned long)pg->records, order);
ftrace_number_of_pages -= 1 << order;
ftrace_number_of_groups--;
kfree(pg);
--
2.30.2

2021-04-06 01:34:34

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 09/22] tools/kvm_stat: Add restart delay

From: Stefan Raspl <[email protected]>

[ Upstream commit 75f94ecbd0dfd2ac4e671f165f5ae864b7301422 ]

If this service is enabled and the system rebooted, Systemd's initial
attempt to start this unit file may fail in case the kvm module is not
loaded. Since we did not specify a delay for the retries, Systemd
restarts with a minimum delay a number of times before giving up and
disabling the service. Which means a subsequent kvm module load will
have kvm running without monitoring.
Adding a delay to fix this.

Signed-off-by: Stefan Raspl <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/kvm/kvm_stat/kvm_stat.service | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/kvm/kvm_stat/kvm_stat.service b/tools/kvm/kvm_stat/kvm_stat.service
index 71aabaffe779..8f13b843d5b4 100644
--- a/tools/kvm/kvm_stat/kvm_stat.service
+++ b/tools/kvm/kvm_stat/kvm_stat.service
@@ -9,6 +9,7 @@ Type=simple
ExecStart=/usr/bin/kvm_stat -dtcz -s 10 -L /var/log/kvm_stat.csv
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
+RestartSec=60s
SyslogIdentifier=kvm_stat
SyslogLevel=debug

--
2.30.2

2021-04-06 01:37:19

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 07/22] scsi: iscsi: Fix race condition between login and sync thread

From: Gulam Mohamed <[email protected]>

[ Upstream commit 9e67600ed6b8565da4b85698ec659b5879a6c1c6 ]

A kernel panic was observed due to a timing issue between the sync thread
and the initiator processing a login response from the target. The session
reopen can be invoked both from the session sync thread when iscsid
restarts and from iscsid through the error handler. Before the initiator
receives the response to a login, another reopen request can be sent from
the error handler/sync session. When the initial login response is
subsequently processed, the connection has been closed and the socket has
been released.

To fix this a new connection state, ISCSI_CONN_BOUND, is added:

- Set the connection state value to ISCSI_CONN_DOWN upon
iscsi_if_ep_disconnect() and iscsi_if_stop_conn()

- Set the connection state to the newly created value ISCSI_CONN_BOUND
after bind connection (transport->bind_conn())

- In iscsi_set_param(), return -ENOTCONN if the connection state is not
either ISCSI_CONN_BOUND or ISCSI_CONN_UP

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Gulam Mohamed <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

index 91074fd97f64..f4bf62b007a0 100644

Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/scsi_transport_iscsi.c | 14 +++++++++++++-
include/scsi/scsi_transport_iscsi.h | 1 +
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index c53c3f9fa526..f648452d8d66 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2478,6 +2478,7 @@ static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
*/
mutex_lock(&conn_mutex);
conn->transport->stop_conn(conn, flag);
+ conn->state = ISCSI_CONN_DOWN;
mutex_unlock(&conn_mutex);

}
@@ -2904,6 +2905,13 @@ iscsi_set_param(struct iscsi_transport *transport, struct iscsi_uevent *ev)
default:
err = transport->set_param(conn, ev->u.set_param.param,
data, ev->u.set_param.len);
+ if ((conn->state == ISCSI_CONN_BOUND) ||
+ (conn->state == ISCSI_CONN_UP)) {
+ err = transport->set_param(conn, ev->u.set_param.param,
+ data, ev->u.set_param.len);
+ } else {
+ return -ENOTCONN;
+ }
}

return err;
@@ -2963,6 +2971,7 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
mutex_lock(&conn->ep_mutex);
conn->ep = NULL;
mutex_unlock(&conn->ep_mutex);
+ conn->state = ISCSI_CONN_DOWN;
}

transport->ep_disconnect(ep);
@@ -3730,6 +3739,8 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
ev->r.retcode = transport->bind_conn(session, conn,
ev->u.b_conn.transport_eph,
ev->u.b_conn.is_leading);
+ if (!ev->r.retcode)
+ conn->state = ISCSI_CONN_BOUND;
mutex_unlock(&conn_mutex);

if (ev->r.retcode || !transport->ep_connect)
@@ -3969,7 +3980,8 @@ iscsi_conn_attr(local_ipaddr, ISCSI_PARAM_LOCAL_IPADDR);
static const char *const connection_state_names[] = {
[ISCSI_CONN_UP] = "up",
[ISCSI_CONN_DOWN] = "down",
- [ISCSI_CONN_FAILED] = "failed"
+ [ISCSI_CONN_FAILED] = "failed",
+ [ISCSI_CONN_BOUND] = "bound"
};

static ssize_t show_conn_state(struct device *dev,
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index 8a26a2ffa952..fc5a39839b4b 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -193,6 +193,7 @@ enum iscsi_connection_state {
ISCSI_CONN_UP = 0,
ISCSI_CONN_DOWN,
ISCSI_CONN_FAILED,
+ ISCSI_CONN_BOUND,
};

struct iscsi_cls_conn {
--
2.30.2

2021-04-06 01:37:34

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 21/22] riscv,entry: fix misaligned base for excp_vect_table

From: Zihao Yu <[email protected]>

[ Upstream commit ac8d0b901f0033b783156ab2dc1a0e73ec42409b ]

In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
base of the table is not 8-byte aligned, loading an entry in the table
will raise a misaligned exception. Although such exception will be
handled by opensbi/bbl, this still causes performance degradation.

Signed-off-by: Zihao Yu <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/riscv/kernel/entry.S | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 744f3209c48d..76274a4a1d8e 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -447,6 +447,7 @@ ENDPROC(__switch_to)
#endif

.section ".rodata"
+ .align LGREG
/* Exception vector table */
ENTRY(excp_vect_table)
RISCV_PTR do_trap_insn_misaligned
--
2.30.2

2021-04-06 01:37:35

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 02/22] gfs2: Flag a withdraw if init_threads() fails

From: Andrew Price <[email protected]>

[ Upstream commit 62dd0f98a0e5668424270b47a0c2e973795faba7 ]

Interrupting mount with ^C quickly enough can cause the kthread_run()
calls in gfs2's init_threads() to fail and the error path leads to a
deadlock on the s_umount rwsem. The abridged chain of events is:

[mount path]
get_tree_bdev()
sget_fc()
alloc_super()
down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); [acquired]
gfs2_fill_super()
gfs2_make_fs_rw()
init_threads()
kthread_run()
( Interrupted )
[Error path]
gfs2_gl_hash_clear()
flush_workqueue(glock_workqueue)
wait_for_completion()

[workqueue context]
glock_work_func()
run_queue()
do_xmote()
freeze_go_sync()
freeze_super()
down_write(&sb->s_umount) [deadlock]

In freeze_go_sync() there is a gfs2_withdrawn() check that we can use to
make sure freeze_super() is not called in the error path, so add a
gfs2_withdraw_delayed() call when init_threads() fails.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=212231

Reported-by: Alexander Aring <[email protected]>
Signed-off-by: Andrew Price <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/gfs2/super.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 754ea2a137b4..34ca312457a6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -169,8 +169,10 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
int error;

error = init_threads(sdp);
- if (error)
+ if (error) {
+ gfs2_withdraw_delayed(sdp);
return error;
+ }

j_gl->gl_ops->go_inval(j_gl, DIO_METADATA);
if (gfs2_withdrawn(sdp)) {
--
2.30.2

2021-04-06 01:37:36

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 12/22] XArray: Fix splitting to non-zero orders

From: "Matthew Wilcox (Oracle)" <[email protected]>

[ Upstream commit 3012110d71f41410932924e1d188f9eb57f1f824 ]

Splitting an order-4 entry into order-2 entries would leave the array
containing pointers to 000040008000c000 instead of 000044448888cccc.
This is a one-character fix, but enhance the test suite to check this
case.

Reported-by: Zi Yan <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
lib/test_xarray.c | 26 ++++++++++++++------------
lib/xarray.c | 4 ++--
2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 8294f43f4981..8b1c318189ce 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -1530,24 +1530,24 @@ static noinline void check_store_range(struct xarray *xa)

#ifdef CONFIG_XARRAY_MULTI
static void check_split_1(struct xarray *xa, unsigned long index,
- unsigned int order)
+ unsigned int order, unsigned int new_order)
{
- XA_STATE(xas, xa, index);
- void *entry;
- unsigned int i = 0;
+ XA_STATE_ORDER(xas, xa, index, new_order);
+ unsigned int i;

xa_store_order(xa, index, order, xa, GFP_KERNEL);

xas_split_alloc(&xas, xa, order, GFP_KERNEL);
xas_lock(&xas);
xas_split(&xas, xa, order);
+ for (i = 0; i < (1 << order); i += (1 << new_order))
+ __xa_store(xa, index + i, xa_mk_index(index + i), 0);
xas_unlock(&xas);

- xa_for_each(xa, index, entry) {
- XA_BUG_ON(xa, entry != xa);
- i++;
+ for (i = 0; i < (1 << order); i++) {
+ unsigned int val = index + (i & ~((1 << new_order) - 1));
+ XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val));
}
- XA_BUG_ON(xa, i != 1 << order);

xa_set_mark(xa, index, XA_MARK_0);
XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0));
@@ -1557,14 +1557,16 @@ static void check_split_1(struct xarray *xa, unsigned long index,

static noinline void check_split(struct xarray *xa)
{
- unsigned int order;
+ unsigned int order, new_order;

XA_BUG_ON(xa, !xa_empty(xa));

for (order = 1; order < 2 * XA_CHUNK_SHIFT; order++) {
- check_split_1(xa, 0, order);
- check_split_1(xa, 1UL << order, order);
- check_split_1(xa, 3UL << order, order);
+ for (new_order = 0; new_order < order; new_order++) {
+ check_split_1(xa, 0, order, new_order);
+ check_split_1(xa, 1UL << order, order, new_order);
+ check_split_1(xa, 3UL << order, order, new_order);
+ }
}
}
#else
diff --git a/lib/xarray.c b/lib/xarray.c
index 5fa51614802a..ed775dee1074 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1011,7 +1011,7 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order,

do {
unsigned int i;
- void *sibling;
+ void *sibling = NULL;
struct xa_node *node;

node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
@@ -1021,7 +1021,7 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order,
for (i = 0; i < XA_CHUNK_SIZE; i++) {
if ((i & mask) == 0) {
RCU_INIT_POINTER(node->slots[i], entry);
- sibling = xa_mk_sibling(0);
+ sibling = xa_mk_sibling(i);
} else {
RCU_INIT_POINTER(node->slots[i], sibling);
}
--
2.30.2

2021-04-06 01:46:21

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 19/22] io_uring: don't mark S_ISBLK async work as unbounded

From: Jens Axboe <[email protected]>

[ Upstream commit 4b982bd0f383db9132e892c0c5144117359a6289 ]

S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.

Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5c4378694d54..d9a673976f2d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1546,7 +1546,7 @@ static void io_prep_async_work(struct io_kiocb *req)
if (req->flags & REQ_F_ISREG) {
if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL))
io_wq_hash_work(&req->work, file_inode(req->file));
- } else {
+ } else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) {
if (def->unbound_nonreg_file)
req->work.flags |= IO_WQ_WORK_UNBOUND;
}
--
2.30.2

2021-04-06 01:46:59

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 20/22] riscv: evaluate put_user() arg before enabling user access

From: Ben Dooks <[email protected]>

[ Upstream commit 285a76bb2cf51b0c74c634f2aaccdb93e1f2a359 ]

The <asm/uaccess.h> header has a problem with put_user(a, ptr) if
the 'a' is not a simple variable, such as a function. This can lead
to the compiler producing code as so:

1: enable_user_access()
2: evaluate 'a' into register 'r'
3: put 'r' to 'ptr'
4: disable_user_acess()

The issue is that 'a' is now being evaluated with the user memory
protections disabled. So we try and force the evaulation by assigning
'x' to __val at the start, and hoping the compiler barriers in
enable_user_access() do the job of ordering step 2 before step 1.

This has shown up in a bug where 'a' sleeps and thus schedules out
and loses the SR_SUM flag. This isn't sufficient to fully fix, but
should reduce the window of opportunity. The first instance of this
we found is in scheudle_tail() where the code does:

$ less -N kernel/sched/core.c

4263 if (current->set_child_tid)
4264 put_user(task_pid_vnr(current), current->set_child_tid);

Here, the task_pid_vnr(current) is called within the block that has
enabled the user memory access. This can be made worse with KASAN
which makes task_pid_vnr() a rather large call with plenty of
opportunity to sleep.

Signed-off-by: Ben Dooks <[email protected]>
Reported-by: [email protected]
Suggested-by: Arnd Bergman <[email protected]>

--
Changes since v1:
- fixed formatting and updated the patch description with more info

Changes since v2:
- fixed commenting on __put_user() ([email protected])

Change since v3:
- fixed RFC in patch title. Should be ready to merge.

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/riscv/include/asm/uaccess.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index 824b2c9da75b..f944062c9d99 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -306,7 +306,9 @@ do { \
* data types like structures or arrays.
*
* @ptr must have pointer-to-simple-variable type, and @x must be assignable
- * to the result of dereferencing @ptr.
+ * to the result of dereferencing @ptr. The value of @x is copied to avoid
+ * re-ordering where @x is evaluated inside the block that enables user-space
+ * access (thus bypassing user space protection if @x is a function).
*
* Caller must check the pointer with access_ok() before calling this
* function.
@@ -316,12 +318,13 @@ do { \
#define __put_user(x, ptr) \
({ \
__typeof__(*(ptr)) __user *__gu_ptr = (ptr); \
+ __typeof__(*__gu_ptr) __val = (x); \
long __pu_err = 0; \
\
__chk_user_ptr(__gu_ptr); \
\
__enable_user_access(); \
- __put_user_nocheck(x, __gu_ptr, __pu_err); \
+ __put_user_nocheck(__val, __gu_ptr, __pu_err); \
__disable_user_access(); \
\
__pu_err; \
--
2.30.2

2021-04-06 01:50:03

by Sasha Levin

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 18/22] null_blk: fix command timeout completion handling

From: Damien Le Moal <[email protected]>

[ Upstream commit de3510e52b0a398261271455562458003b8eea62 ]

Memory backed or zoned null block devices may generate actual request
timeout errors due to the submission path being blocked on memory
allocation or zone locking. Unlike fake timeouts or injected timeouts,
the request submission path will call blk_mq_complete_request() or
blk_mq_end_request() for these real timeout errors, causing a double
completion and use after free situation as the block layer timeout
handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in
blk_mq_check_expired(). This problem often triggers a NULL pointer
dereference such as:

BUG: kernel NULL pointer dereference, address: 0000000000000050
RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20
...
Call Trace:
dd_finish_request+0x56/0x80
blk_mq_free_request+0x37/0x130
null_handle_cmd+0xbf/0x250 [null_blk]
? null_queue_rq+0x67/0xd0 [null_blk]
blk_mq_dispatch_rq_list+0x122/0x850
__blk_mq_do_dispatch_sched+0xbb/0x2c0
__blk_mq_sched_dispatch_requests+0x13d/0x190
blk_mq_sched_dispatch_requests+0x30/0x60
__blk_mq_run_hw_queue+0x49/0x90
process_one_work+0x26c/0x580
worker_thread+0x55/0x3c0
? process_one_work+0x580/0x580
kthread+0x134/0x150
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x1f/0x30

This problem very often triggers when running the full btrfs xfstests
on a memory-backed zoned null block device in a VM with limited amount
of memory.

Avoid this by executing blk_mq_complete_request() in null_timeout_rq()
only for commands that are marked for a fake timeout completion using
the fake_timeout boolean in struct null_cmd. For timeout errors injected
through debugfs, the timeout handler will execute
blk_mq_complete_request()i as before. This is safe as the submission
path does not execute complete requests in this case.

In null_timeout_rq(), also make sure to set the command error field to
BLK_STS_TIMEOUT and to propagate this error through to the request
completion.

Reported-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
Tested-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/block/null_blk/main.c | 26 +++++++++++++++++++++-----
drivers/block/null_blk/null_blk.h | 1 +
2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 5357c3a4a36f..4f6af7a5921e 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1369,10 +1369,13 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd, sector_t sector,
}

if (dev->zoned)
- cmd->error = null_process_zoned_cmd(cmd, op,
- sector, nr_sectors);
+ sts = null_process_zoned_cmd(cmd, op, sector, nr_sectors);
else
- cmd->error = null_process_cmd(cmd, op, sector, nr_sectors);
+ sts = null_process_cmd(cmd, op, sector, nr_sectors);
+
+ /* Do not overwrite errors (e.g. timeout errors) */
+ if (cmd->error == BLK_STS_OK)
+ cmd->error = sts;

out:
nullb_complete_cmd(cmd);
@@ -1451,8 +1454,20 @@ static bool should_requeue_request(struct request *rq)

static enum blk_eh_timer_return null_timeout_rq(struct request *rq, bool res)
{
+ struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
+
pr_info("rq %p timed out\n", rq);
- blk_mq_complete_request(rq);
+
+ /*
+ * If the device is marked as blocking (i.e. memory backed or zoned
+ * device), the submission path may be blocked waiting for resources
+ * and cause real timeouts. For these real timeouts, the submission
+ * path will complete the request using blk_mq_complete_request().
+ * Only fake timeouts need to execute blk_mq_complete_request() here.
+ */
+ cmd->error = BLK_STS_TIMEOUT;
+ if (cmd->fake_timeout)
+ blk_mq_complete_request(rq);
return BLK_EH_DONE;
}

@@ -1473,6 +1488,7 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
cmd->rq = bd->rq;
cmd->error = BLK_STS_OK;
cmd->nq = nq;
+ cmd->fake_timeout = should_timeout_request(bd->rq);

blk_mq_start_request(bd->rq);

@@ -1489,7 +1505,7 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_STS_OK;
}
}
- if (should_timeout_request(bd->rq))
+ if (cmd->fake_timeout)
return BLK_STS_OK;

return null_handle_cmd(cmd, sector, nr_sectors, req_op(bd->rq));
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 83504f3cc9d6..4876d5adb12d 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -22,6 +22,7 @@ struct nullb_cmd {
blk_status_t error;
struct nullb_queue *nq;
struct hrtimer timer;
+ bool fake_timeout;
};

struct nullb_queue {
--
2.30.2