LinuxLists.cc - [PATCH AUTOSEL 5.11 01/44] virtiofs: Fail dax mount if device does not support it

2021-03-25 11:29:11

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 01/44] virtiofs: Fail dax mount if device does not support it

From: Vivek Goyal <[email protected]>

[ Upstream commit 3f9b9efd82a84f27e95d0414f852caf1fa839e83 ]

Right now "mount -t virtiofs -o dax myfs /mnt/virtiofs" succeeds even
if filesystem deivce does not have a cache window and hence DAX can't
be supported.

This gives a false sense to user that they are using DAX with virtiofs
but fact of the matter is that they are not.

Fix this by returning error if dax can't be supported and user has asked
for it.

Signed-off-by: Vivek Goyal <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/fuse/virtio_fs.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8868ac31a3c0..4ee6f734ba83 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1324,8 +1324,15 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)

/* virtiofs allocates and installs its own fuse devices */
ctx->fudptr = NULL;
- if (ctx->dax)
+ if (ctx->dax) {
+ if (!fs->dax_dev) {
+ err = -EINVAL;
+ pr_err("virtio-fs: dax can't be enabled as filesystem"
+ " device does not support it.\n");
+ goto err_free_fuse_devs;
+ }
ctx->dax_dev = fs->dax_dev;
+ }
err = fuse_fill_super_common(sb, ctx);
if (err < 0)
goto err_free_fuse_devs;
--
2.30.1

2021-03-25 11:29:12

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 33/44] thermal/core: Add NULL pointer check before using cooling device stats

From: Manaf Meethalavalappu Pallikunhi <[email protected]>

[ Upstream commit 2046a24ae121cd107929655a6aaf3b8c5beea01f ]

There is a possible chance that some cooling device stats buffer
allocation fails due to very high cooling device max state value.
Later cooling device update sysfs can try to access stats data
for the same cooling device. It will lead to NULL pointer
dereference issue.

Add a NULL pointer check before accessing thermal cooling device
stats data. It fixes the following bug

[ 26.812833] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
[ 27.122960] Call trace:
[ 27.122963] do_raw_spin_lock+0x18/0xe8
[ 27.122966] _raw_spin_lock+0x24/0x30
[ 27.128157] thermal_cooling_device_stats_update+0x24/0x98
[ 27.128162] cur_state_store+0x88/0xb8
[ 27.128166] dev_attr_store+0x40/0x58
[ 27.128169] sysfs_kf_write+0x50/0x68
[ 27.133358] kernfs_fop_write+0x12c/0x1c8
[ 27.133362] __vfs_write+0x54/0x160
[ 27.152297] vfs_write+0xcc/0x188
[ 27.157132] ksys_write+0x78/0x108
[ 27.162050] ksys_write+0xf8/0x108
[ 27.166968] __arm_smccc_hvc+0x158/0x4b0
[ 27.166973] __arm_smccc_hvc+0x9c/0x4b0
[ 27.186005] el0_svc+0x8/0xc

Signed-off-by: Manaf Meethalavalappu Pallikunhi <[email protected]>
Signed-off-by: Daniel Lezcano <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/thermal/thermal_sysfs.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 0866e949339b..9b73532464e5 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -754,6 +754,9 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
{
struct cooling_dev_stats *stats = cdev->stats;

+ if (!stats)
+ return;
+
spin_lock(&stats->lock);

if (stats->state == new_state)
--
2.30.1

2021-03-25 11:29:15

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 18/44] ASoC: cs42l42: Always wait at least 3ms after reset

From: Lucas Tanure <[email protected]>

[ Upstream commit 19325cfea04446bc79b36bffd4978af15f46a00e ]

This delay is part of the power-up sequence defined in the datasheet.
A runtime_resume is a power-up so must also include the delay.

Signed-off-by: Lucas Tanure <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
sound/soc/codecs/cs42l42.c | 3 ++-
sound/soc/codecs/cs42l42.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c
index d5078ce79fad..4d82d24c7828 100644
--- a/sound/soc/codecs/cs42l42.c
+++ b/sound/soc/codecs/cs42l42.c
@@ -1794,7 +1794,7 @@ static int cs42l42_i2c_probe(struct i2c_client *i2c_client,
dev_dbg(&i2c_client->dev, "Found reset GPIO\n");
gpiod_set_value_cansleep(cs42l42->reset_gpio, 1);
}
- mdelay(3);
+ usleep_range(CS42L42_BOOT_TIME_US, CS42L42_BOOT_TIME_US * 2);

/* Request IRQ */
ret = devm_request_threaded_irq(&i2c_client->dev,
@@ -1919,6 +1919,7 @@ static int cs42l42_runtime_resume(struct device *dev)
}

gpiod_set_value_cansleep(cs42l42->reset_gpio, 1);
+ usleep_range(CS42L42_BOOT_TIME_US, CS42L42_BOOT_TIME_US * 2);

regcache_cache_only(cs42l42->regmap, false);
regcache_sync(cs42l42->regmap);
diff --git a/sound/soc/codecs/cs42l42.h b/sound/soc/codecs/cs42l42.h
index 9b017b76828a..866d7c873e3c 100644
--- a/sound/soc/codecs/cs42l42.h
+++ b/sound/soc/codecs/cs42l42.h
@@ -740,6 +740,7 @@
#define CS42L42_FRAC2_VAL(val) (((val) & 0xff0000) >> 16)

#define CS42L42_NUM_SUPPLIES 5
+#define CS42L42_BOOT_TIME_US 3000

static const char *const cs42l42_supply_names[CS42L42_NUM_SUPPLIES] = {
"VA",
--
2.30.1

2021-03-25 11:29:19

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 23/44] vhost: Fix vhost_vq_reset()

From: Laurent Vivier <[email protected]>

[ Upstream commit beb691e69f4dec7bfe8b81b509848acfd1f0dbf9 ]

vhost_reset_is_le() is vhost_init_is_le(), and in the case of
cross-endian legacy, vhost_init_is_le() depends on vq->user_be.

vq->user_be is set by vhost_disable_cross_endian().

But in vhost_vq_reset(), we have:

vhost_reset_is_le(vq);
vhost_disable_cross_endian(vq);

And so user_be is used before being set.

To fix that, reverse the lines order as there is no other dependency
between them.

Signed-off-by: Laurent Vivier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/vhost/vhost.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a262e12c6dc2..5ccb0705beae 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -332,8 +332,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->error_ctx = NULL;
vq->kick = NULL;
vq->log_ctx = NULL;
- vhost_reset_is_le(vq);
vhost_disable_cross_endian(vq);
+ vhost_reset_is_le(vq);
vq->busyloop_timeout = 0;
vq->umem = NULL;
vq->iotlb = NULL;
--
2.30.1

2021-03-25 11:29:28

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 34/44] locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling

From: Waiman Long <[email protected]>

[ Upstream commit 5de2055d31ea88fd9ae9709ac95c372a505a60fa ]

The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the
function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx)
combination is repetitive.

In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify
ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an
clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx &&
ww_ctx) by just (ww_ctx).

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/locking/mutex.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5352ce50a97e..2c25b830203c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -636,7 +636,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
*/
static __always_inline bool
mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx,
- const bool use_ww_ctx, struct mutex_waiter *waiter)
+ struct mutex_waiter *waiter)
{
if (!waiter) {
/*
@@ -712,7 +712,7 @@ mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx,
#else
static __always_inline bool
mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx,
- const bool use_ww_ctx, struct mutex_waiter *waiter)
+ struct mutex_waiter *waiter)
{
return false;
}
@@ -932,6 +932,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
struct ww_mutex *ww;
int ret;

+ if (!use_ww_ctx)
+ ww_ctx = NULL;
+
might_sleep();

#ifdef CONFIG_DEBUG_MUTEXES
@@ -939,7 +942,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
#endif

ww = container_of(lock, struct ww_mutex, base);
- if (use_ww_ctx && ww_ctx) {
+ if (ww_ctx) {
if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
return -EALREADY;

@@ -956,10 +959,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);

if (__mutex_trylock(lock) ||
- mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, NULL)) {
+ mutex_optimistic_spin(lock, ww_ctx, NULL)) {
/* got the lock, yay! */
lock_acquired(&lock->dep_map, ip);
- if (use_ww_ctx && ww_ctx)
+ if (ww_ctx)
ww_mutex_set_context_fastpath(ww, ww_ctx);
preempt_enable();
return 0;
@@ -970,7 +973,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* After waiting to acquire the wait_lock, try again.
*/
if (__mutex_trylock(lock)) {
- if (use_ww_ctx && ww_ctx)
+ if (ww_ctx)
__ww_mutex_check_waiters(lock, ww_ctx);

goto skip_wait;
@@ -1023,7 +1026,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
goto err;
}

- if (use_ww_ctx && ww_ctx) {
+ if (ww_ctx) {
ret = __ww_mutex_check_kill(lock, &waiter, ww_ctx);
if (ret)
goto err;
@@ -1036,7 +1039,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* ww_mutex needs to always recheck its position since its waiter
* list is not FIFO ordered.
*/
- if ((use_ww_ctx && ww_ctx) || !first) {
+ if (ww_ctx || !first) {
first = __mutex_waiter_is_first(lock, &waiter);
if (first)
__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
@@ -1049,7 +1052,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* or we must see its unlock and acquire.
*/
if (__mutex_trylock(lock) ||
- (first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, &waiter)))
+ (first && mutex_optimistic_spin(lock, ww_ctx, &waiter)))
break;

spin_lock(&lock->wait_lock);
@@ -1058,7 +1061,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
acquired:
__set_current_state(TASK_RUNNING);

- if (use_ww_ctx && ww_ctx) {
+ if (ww_ctx) {
/*
* Wound-Wait; we stole the lock (!first_waiter), check the
* waiters as anyone might want to wound us.
@@ -1078,7 +1081,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
/* got the lock - cleanup and rejoice! */
lock_acquired(&lock->dep_map, ip);

- if (use_ww_ctx && ww_ctx)
+ if (ww_ctx)
ww_mutex_lock_acquired(ww, ww_ctx);

spin_unlock(&lock->wait_lock);
--
2.30.1

2021-03-25 11:29:30

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 36/44] nvmet-tcp: fix kmap leak when data digest in use

From: Elad Grupi <[email protected]>

[ Upstream commit bac04454ef9fada009f0572576837548b190bf94 ]

When data digest is enabled we should unmap pdu iovec before handling
the data digest pdu.

Signed-off-by: Elad Grupi <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/nvme/target/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 8b0485ada315..d658c6e8263a 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1098,11 +1098,11 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
cmd->rbytes_done += ret;
}

+ nvmet_tcp_unmap_pdu_iovec(cmd);
if (queue->data_digest) {
nvmet_tcp_prep_recv_ddgst(cmd);
return 0;
}
- nvmet_tcp_unmap_pdu_iovec(cmd);

if (!(cmd->flags & NVMET_TCP_F_INIT_FAILED) &&
cmd->rbytes_done == cmd->req.transfer_len) {
--
2.30.1

2021-03-25 11:29:32

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 37/44] io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls

From: Stefan Metzmacher <[email protected]>

[ Upstream commit 76cd979f4f38a27df22efb5773a0d567181a9392 ]

We never want to generate any SIGPIPE, -EPIPE only is much better.

Signed-off-by: Stefan Metzmacher <[email protected]>
Link: https://lore.kernel.org/r/38961085c3ec49fd21550c7788f214d1ff02d2d4.1615908477.git.metze@samba.org
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8e45331d1fed..4ac24e75ae63 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4645,7 +4645,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
kmsg = &iomsg;
}

- flags = req->sr_msg.msg_flags;
+ flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
if (flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
else if (force_nonblock)
@@ -4689,7 +4689,7 @@ static int io_send(struct io_kiocb *req, bool force_nonblock,
msg.msg_controllen = 0;
msg.msg_namelen = 0;

- flags = req->sr_msg.msg_flags;
+ flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
if (flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
else if (force_nonblock)
@@ -4883,7 +4883,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
1, req->sr_msg.len);
}

- flags = req->sr_msg.msg_flags;
+ flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
if (flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
else if (force_nonblock)
@@ -4941,7 +4941,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock,
msg.msg_iocb = NULL;
msg.msg_flags = 0;

- flags = req->sr_msg.msg_flags;
+ flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
if (flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
else if (force_nonblock)
--
2.30.1

2021-03-25 11:29:33

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 39/44] nouveau: Skip unvailable ttm page entries

From: Tobias Klausmann <[email protected]>

[ Upstream commit e94c55b8e0a0bbe9a026250cf31e2fa45957d776 ]

Starting with commit f295c8cfec833c2707ff1512da10d65386dde7af
("drm/nouveau: fix dma syncing warning with debugging on.")
the following oops occures:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 1013 Comm: Xorg.bin Tainted: G E 5.11.0-desktop-rc0+ #2
Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018
RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]
Call Trace:
nouveau_bo_validate+0x5d/0x80 [nouveau]
nouveau_gem_ioctl_pushbuf+0x662/0x1120 [nouveau]
? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau]
drm_ioctl_kernel+0xa6/0xf0 [drm]
drm_ioctl+0x1f4/0x3a0 [drm]
? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau]
nouveau_drm_ioctl+0x50/0xa0 [nouveau]
__x64_sys_ioctl+0x7e/0xb0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
---[ end trace ccfb1e7f4064374f ]---
RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]

The underlying problem is not introduced by the commit, yet it uncovered the
underlying issue. The cited commit relies on valid pages. This is not given for
due to some bugs. For now, just warn and work around the issue by just ignoring
the bad ttm objects.
Below is some debug info gathered while debugging this issue:

nouveau 0000:01:00.0: DRM: ttm_dma->num_pages: 2048
nouveau 0000:01:00.0: DRM: ttm_dma->pages is NULL
nouveau 0000:01:00.0: DRM: ttm_dma: 00000000e96058e7
nouveau 0000:01:00.0: DRM: ttm_dma->page_flags:
nouveau 0000:01:00.0: DRM: ttm_dma: Populated: 1
nouveau 0000:01:00.0: DRM: ttm_dma: No Retry: 0
nouveau 0000:01:00.0: DRM: ttm_dma: SG: 256
nouveau 0000:01:00.0: DRM: ttm_dma: Zero Alloc: 0
nouveau 0000:01:00.0: DRM: ttm_dma: Swapped: 0

Signed-off-by: Tobias Klausmann <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index f1c9a22083be..e05565f284dc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -551,6 +551,10 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)

if (!ttm_dma)
return;
+ if (!ttm_dma->pages) {
+ NV_DEBUG(drm, "ttm_dma 0x%p: pages NULL\n", ttm_dma);
+ return;
+ }

/* Don't waste time looping if the object is coherent */
if (nvbo->force_coherent)
@@ -583,6 +587,10 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)

if (!ttm_dma)
return;
+ if (!ttm_dma->pages) {
+ NV_DEBUG(drm, "ttm_dma 0x%p: pages NULL\n", ttm_dma);
+ return;
+ }

/* Don't waste time looping if the object is coherent */
if (nvbo->force_coherent)
--
2.30.1

2021-03-25 11:30:04

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 42/44] signal: don't allow sending any signals to PF_IO_WORKER threads

From: Jens Axboe <[email protected]>

[ Upstream commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5 ]

They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/signal.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index 5ad8566534e7..55526b941011 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -833,6 +833,9 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,

if (!valid_signal(sig))
return -EINVAL;
+ /* PF_IO_WORKER threads don't take any signals */
+ if (t->flags & PF_IO_WORKER)
+ return -ESRCH;

if (!si_fromuser(info))
return 0;
--
2.30.1

2021-03-25 11:30:17

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

From: "Eric W. Biederman" <[email protected]>

[ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]

Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.

Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/signal.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 55526b941011..00a3840f6037 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));

- if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
+ if (unlikely(fatal_signal_pending(task) ||
+ (task->flags & (PF_EXITING | PF_IO_WORKER))))
return false;

if (mask & JOBCTL_STOP_SIGMASK)
--
2.30.1

2021-03-25 11:30:26

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 21/44] kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing

From: Jens Axboe <[email protected]>

[ Upstream commit 15b2219facadec583c24523eed40fa45865f859f ]

Don't send fake signals to PF_IO_WORKER threads, they don't accept
signals. Just treat them like kthreads in this regard, all they need
is a wakeup as no forced kernel/user transition is needed.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/freezer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index dc520f01f99d..1a2d57d1327c 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p)
return false;
}

- if (!(p->flags & PF_KTHREAD))
+ if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))
fake_signal_wake_up(p);
else
wake_up_state(p, TASK_INTERRUPTIBLE);
--
2.30.1

2021-03-25 11:30:31

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 26/44] btrfs: track qgroup released data in own variable in insert_prealloc_file_extent

From: Qu Wenruo <[email protected]>

[ Upstream commit fbf48bb0b197e6894a04c714728c952af7153bf3 ]

There is a piece of weird code in insert_prealloc_file_extent(), which
looks like:

ret = btrfs_qgroup_release_data(inode, file_offset, len);
if (ret < 0)
return ERR_PTR(ret);
if (trans) {
ret = insert_reserved_file_extent(trans, inode,
file_offset, &stack_fi,
true, ret);
...
}
extent_info.is_new_extent = true;
extent_info.qgroup_reserved = ret;
...

Note how the variable @ret is abused here, and if anyone is adding code
just after btrfs_qgroup_release_data() call, it's super easy to
overwrite the @ret and cause tons of qgroup related bugs.

Fix such abuse by introducing new variable @qgroup_released, so that we
won't reuse the existing variable @ret.

Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/btrfs/inode.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9b4f75568261..8f36071769fa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9674,6 +9674,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
struct btrfs_path *path;
u64 start = ins->objectid;
u64 len = ins->offset;
+ int qgroup_released;
int ret;

memset(&stack_fi, 0, sizeof(stack_fi));
@@ -9686,14 +9687,14 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE);
/* Encryption and other encoding is reserved and all 0 */

- ret = btrfs_qgroup_release_data(inode, file_offset, len);
- if (ret < 0)
- return ERR_PTR(ret);
+ qgroup_released = btrfs_qgroup_release_data(inode, file_offset, len);
+ if (qgroup_released < 0)
+ return ERR_PTR(qgroup_released);

if (trans) {
ret = insert_reserved_file_extent(trans, inode,
file_offset, &stack_fi,
- true, ret);
+ true, qgroup_released);
if (ret)
return ERR_PTR(ret);
return trans;
@@ -9706,7 +9707,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
extent_info.file_offset = file_offset;
extent_info.extent_buf = (char *)&stack_fi;
extent_info.is_new_extent = true;
- extent_info.qgroup_reserved = ret;
+ extent_info.qgroup_reserved = qgroup_released;
extent_info.insertions = 0;

path = btrfs_alloc_path();
--
2.30.1

2021-03-25 11:30:32

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 25/44] io_uring: halt SQO submission on ctx exit

From: Pavel Begunkov <[email protected]>

[ Upstream commit f6d54255f4235448d4bbe442362d4caa62da97d5 ]

io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
potentially running submitting new requests. It's not a disaster because
of using a "try" variant of percpu_ref_get, but is far from nice.

Remove ctx from the sqd ctx list earlier, before cancellation loop, so
SQPOLL can't find it and so won't submit new requests.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/io_uring.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index b82fe753a6d0..8e45331d1fed 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8797,6 +8797,14 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
__io_cqring_overflow_flush(ctx, true, NULL, NULL);
mutex_unlock(&ctx->uring_lock);

+ /* prevent SQPOLL from submitting new requests */
+ if (ctx->sq_data) {
+ io_sq_thread_park(ctx->sq_data);
+ list_del_init(&ctx->sqd_list);
+ io_sqd_update_thread_idle(ctx->sq_data);
+ io_sq_thread_unpark(ctx->sq_data);
+ }
+
io_kill_timeouts(ctx, NULL, NULL);
io_poll_remove_all(ctx, NULL, NULL);

--
2.30.1

2021-03-25 11:31:28

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 32/44] ASoC: rt711: add snd_soc_component remove callback

From: Bard Liao <[email protected]>

[ Upstream commit 899b12542b0897f92de9ba30944937c39ebb246d ]

We do some IO operations in the snd_soc_component_set_jack callback
function and snd_soc_component_set_jack() will be called when soc
component is removed. However, we should not access SoundWire registers
when the bus is suspended.
So set regcache_cache_only(regmap, true) to avoid accessing in the
soc component removal process.

Signed-off-by: Bard Liao <[email protected]>
Reviewed-by: Kai Vehmanen <[email protected]>
Reviewed-by: Rander Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
sound/soc/codecs/rt711.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/sound/soc/codecs/rt711.c b/sound/soc/codecs/rt711.c
index 85f744184a60..047f4e677d78 100644
--- a/sound/soc/codecs/rt711.c
+++ b/sound/soc/codecs/rt711.c
@@ -895,6 +895,13 @@ static int rt711_probe(struct snd_soc_component *component)
return 0;
}

+static void rt711_remove(struct snd_soc_component *component)
+{
+ struct rt711_priv *rt711 = snd_soc_component_get_drvdata(component);
+
+ regcache_cache_only(rt711->regmap, true);
+}
+
static const struct snd_soc_component_driver soc_codec_dev_rt711 = {
.probe = rt711_probe,
.set_bias_level = rt711_set_bias_level,
@@ -905,6 +912,7 @@ static const struct snd_soc_component_driver soc_codec_dev_rt711 = {
.dapm_routes = rt711_audio_map,
.num_dapm_routes = ARRAY_SIZE(rt711_audio_map),
.set_jack = rt711_set_jack_detect,
+ .remove = rt711_remove,
};

static int rt711_set_sdw_stream(struct snd_soc_dai *dai, void *sdw_stream,
--
2.30.1

2021-03-25 11:31:31

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 35/44] locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()

From: Waiman Long <[email protected]>

[ Upstream commit bee645788e07eea63055d261d2884ea45c2ba857 ]

In ww_acquire_init(), mutex_acquire() is gated by CONFIG_DEBUG_LOCK_ALLOC.
The dep_map in the ww_acquire_ctx structure is also gated by the
same config. However mutex_release() in ww_acquire_fini() is gated by
CONFIG_DEBUG_MUTEXES. It is possible to set CONFIG_DEBUG_MUTEXES without
setting CONFIG_DEBUG_LOCK_ALLOC though it is an unlikely configuration.
That may cause a compilation error as dep_map isn't defined in this
case. Fix this potential problem by enclosing mutex_release() inside
CONFIG_DEBUG_LOCK_ALLOC.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/ww_mutex.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 850424e5d030..6ecf2a0220db 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -173,9 +173,10 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx)
*/
static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
{
-#ifdef CONFIG_DEBUG_MUTEXES
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
mutex_release(&ctx->dep_map, _THIS_IP_);
-
+#endif
+#ifdef CONFIG_DEBUG_MUTEXES
DEBUG_LOCKS_WARN_ON(ctx->acquired);
if (!IS_ENABLED(CONFIG_PROVE_LOCKING))
/*
--
2.30.1

2021-03-25 11:31:38

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 40/44] static_call: Align static_call_is_init() patching condition

From: Peter Zijlstra <[email protected]>

[ Upstream commit 698bacefe993ad2922c9d3b1380591ad489355e9 ]

The intent is to avoid writing init code after init (because the text
might have been freed). The code is needlessly different between
jump_label and static_call and not obviously correct.

The existing code relies on the fact that the module loader clears the
init layout, such that within_module_init() always fails, while
jump_label relies on the module state which is more obvious and
matches the kernel logic.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Jarkko Sakkinen <[email protected]>
Tested-by: Sumit Garg <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/static_call.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/static_call.c b/kernel/static_call.c
index 84565c2a41b8..781ff0fd031d 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -144,6 +144,7 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func)
};

for (site_mod = &first; site_mod; site_mod = site_mod->next) {
+ bool init = system_state < SYSTEM_RUNNING;
struct module *mod = site_mod->mod;

if (!site_mod->sites) {
@@ -163,6 +164,7 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func)
if (mod) {
stop = mod->static_call_sites +
mod->num_static_call_sites;
+ init = mod->state == MODULE_STATE_COMING;
}
#endif

@@ -170,16 +172,8 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func)
site < stop && static_call_key(site) == key; site++) {
void *site_addr = static_call_addr(site);

- if (static_call_is_init(site)) {
- /*
- * Don't write to call sites which were in
- * initmem and have since been freed.
- */
- if (!mod && system_state >= SYSTEM_RUNNING)
- continue;
- if (mod && !within_module_init((unsigned long)site_addr, mod))
- continue;
- }
+ if (!init && static_call_is_init(site))
+ continue;

if (!kernel_text_address((unsigned long)site_addr)) {
WARN_ONCE(1, "can't patch static call site at %pS",
--
2.30.1

2021-03-25 11:31:55

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 38/44] Revert "PM: ACPI: reboot: Use S5 for reboot"

From: Josef Bacik <[email protected]>

[ Upstream commit 9d3fcb28f9b9750b474811a2964ce022df56336e ]

This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.

This patch causes a panic when rebooting my Dell Poweredge r440. I do
not have the full panic log as it's lost at that stage of the reboot and
I do not have a serial console. Reverting this patch makes my system
able to reboot again.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/reboot.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index eb1b15850761..a6ad5eb2fa73 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -244,8 +244,6 @@ void migrate_to_reboot_cpu(void)
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
- if (pm_power_off_prepare)
- pm_power_off_prepare();
migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
--
2.30.1

2021-03-25 11:32:11

[permalink] [raw]

Subject: [PATCH AUTOSEL 5.11 44/44] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL

From: Stefan Metzmacher <[email protected]>

[ Upstream commit 0031275d119efe16711cd93519b595e6f9b4b330 ]

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: [email protected]
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/io_uring.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4ac24e75ae63..c06237c87527 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4625,6 +4625,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
struct io_async_msghdr iomsg, *kmsg;
struct socket *sock;
unsigned flags;
+ int min_ret = 0;
int ret;

sock = sock_from_file(req->file);
@@ -4651,6 +4652,9 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
else if (force_nonblock)
flags |= MSG_DONTWAIT;

+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
if (force_nonblock && ret == -EAGAIN)
return io_setup_async_msg(req, kmsg);
@@ -4660,7 +4664,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
if (kmsg->iov != kmsg->fast_iov)
kfree(kmsg->iov);
req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
+ if (ret < min_ret)
req_set_fail_links(req);
__io_req_complete(req, ret, 0, cs);
return 0;
@@ -4674,6 +4678,7 @@ static int io_send(struct io_kiocb *req, bool force_nonblock,
struct iovec iov;
struct socket *sock;
unsigned flags;
+ int min_ret = 0;
int ret;

sock = sock_from_file(req->file);
@@ -4695,6 +4700,9 @@ static int io_send(struct io_kiocb *req, bool force_nonblock,
else if (force_nonblock)
flags |= MSG_DONTWAIT;

+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&msg.msg_iter);
+
msg.msg_flags = flags;
ret = sock_sendmsg(sock, &msg);
if (force_nonblock && ret == -EAGAIN)
@@ -4702,7 +4710,7 @@ static int io_send(struct io_kiocb *req, bool force_nonblock,
if (ret == -ERESTARTSYS)
ret = -EINTR;

- if (ret < 0)
+ if (ret < min_ret)
req_set_fail_links(req);
__io_req_complete(req, ret, 0, cs);
return 0;
@@ -4854,6 +4862,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
struct socket *sock;
struct io_buffer *kbuf;
unsigned flags;
+ int min_ret = 0;
int ret, cflags = 0;

sock = sock_from_file(req->file);
@@ -4889,6 +4898,9 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
else if (force_nonblock)
flags |= MSG_DONTWAIT;

+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
ret = __sys_recvmsg_sock(sock, &kmsg->msg, req->sr_msg.umsg,
kmsg->uaddr, flags);
if (force_nonblock && ret == -EAGAIN)
@@ -4901,7 +4913,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
if (kmsg->iov != kmsg->fast_iov)
kfree(kmsg->iov);
req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
+ if (ret < min_ret || ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))))
req_set_fail_links(req);
__io_req_complete(req, ret, cflags, cs);
return 0;
@@ -4917,6 +4929,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock,
struct socket *sock;
struct iovec iov;
unsigned flags;
+ int min_ret = 0;
int ret, cflags = 0;

sock = sock_from_file(req->file);
@@ -4947,6 +4960,9 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock,
else if (force_nonblock)
flags |= MSG_DONTWAIT;

+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&msg.msg_iter);
+
ret = sock_recvmsg(sock, &msg, flags);
if (force_nonblock && ret == -EAGAIN)
return -EAGAIN;
@@ -4955,7 +4971,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock,
out_free:
if (req->flags & REQ_F_BUFFER_SELECTED)
cflags = io_put_recv_kbuf(req);
- if (ret < 0)
+ if (ret < min_ret || ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))))
req_set_fail_links(req);
__io_req_complete(req, ret, cflags, cs);
return 0;
--
2.30.1

2021-03-25 11:43:45

by Stefan Metzmacher

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 42/44] signal: don't allow sending any signals to PF_IO_WORKER threads

Am 25.03.21 um 12:24 schrieb Sasha Levin:
> From: Jens Axboe <[email protected]>
>
> [ Upstream commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5 ]
>
> They don't take signals individually, and even if they share signals with
> the parent task, don't allow them to be delivered through the worker
> thread. Linux does allow this kind of behavior for regular threads, but
> it's really a compatability thing that we need not care about for the IO
> threads.
>
> Reported-by: Stefan Metzmacher <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> kernel/signal.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 5ad8566534e7..55526b941011 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -833,6 +833,9 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
>
> if (!valid_signal(sig))
> return -EINVAL;
> + /* PF_IO_WORKER threads don't take any signals */
> + if (t->flags & PF_IO_WORKER)
> + return -ESRCH;

Why is that proposed for 5.11 and 5.10 now?

Are the create_io_thread() patches already backported?

I think we should hold on with the backports until
everything is stable in v5.12 final.

I'm still about to test on top of v5.12-rc4
and have a pending mail why I think this particular change is
wrong even in 5.12.

Jens, did you send these to stable?

metze

2021-03-25 11:44:59

by Stefan Metzmacher

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

Am 25.03.21 um 12:24 schrieb Sasha Levin:
> From: "Eric W. Biederman" <[email protected]>
>
> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>
> Just like we don't allow normal signals to IO threads, don't deliver a
> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
> signals in general, and have no means of flushing out a stop either.
>
> Longer term, we may want to look into allowing stop of these threads,
> as it relates to eg process freezing. For now, this prevents a spin
> issue if a SIGSTOP is delivered to the parent task.
>
> Reported-by: Stefan Metzmacher <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> kernel/signal.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 55526b941011..00a3840f6037 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>
> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
> + if (unlikely(fatal_signal_pending(task) ||
> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
> return false;
>
> if (mask & JOBCTL_STOP_SIGMASK)
>

Again, why is this proposed for 5.11 and 5.10 already?

metze

2021-03-25 12:09:41

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

Stefan Metzmacher <[email protected]> writes:

> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>> From: "Eric W. Biederman" <[email protected]>
>>
>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>
>> Just like we don't allow normal signals to IO threads, don't deliver a
>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>> signals in general, and have no means of flushing out a stop either.
>>
>> Longer term, we may want to look into allowing stop of these threads,
>> as it relates to eg process freezing. For now, this prevents a spin
>> issue if a SIGSTOP is delivered to the parent task.
>>
>> Reported-by: Stefan Metzmacher <[email protected]>
>> Signed-off-by: Jens Axboe <[email protected]>
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> Signed-off-by: Sasha Levin <[email protected]>
>> ---
>> kernel/signal.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 55526b941011..00a3840f6037 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>
>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>> + if (unlikely(fatal_signal_pending(task) ||
>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>> return false;
>>
>> if (mask & JOBCTL_STOP_SIGMASK)
>>
>
> Again, why is this proposed for 5.11 and 5.10 already?

Has the bit about the io worker kthreads been backported?
If so this isn't horrible. If not this is nonsense.

Eric

2021-03-25 12:11:39

by David Sterba

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 26/44] btrfs: track qgroup released data in own variable in insert_prealloc_file_extent

On Thu, Mar 25, 2021 at 07:24:41AM -0400, Sasha Levin wrote:
> From: Qu Wenruo <[email protected]>
>
> [ Upstream commit fbf48bb0b197e6894a04c714728c952af7153bf3 ]
>
> There is a piece of weird code in insert_prealloc_file_extent(), which
> looks like:
>
> ret = btrfs_qgroup_release_data(inode, file_offset, len);
> if (ret < 0)
> return ERR_PTR(ret);
> if (trans) {
> ret = insert_reserved_file_extent(trans, inode,
> file_offset, &stack_fi,
> true, ret);
> ...
> }
> extent_info.is_new_extent = true;
> extent_info.qgroup_reserved = ret;
> ...
>
> Note how the variable @ret is abused here, and if anyone is adding code
> just after btrfs_qgroup_release_data() call, it's super easy to
> overwrite the @ret and cause tons of qgroup related bugs.
>
> Fix such abuse by introducing new variable @qgroup_released, so that we
> won't reuse the existing variable @ret.
>
> Signed-off-by: Qu Wenruo <[email protected]>
> Reviewed-by: David Sterba <[email protected]>
> Signed-off-by: David Sterba <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>

This patch is a preparatory work and does not make sense for backport
standalone. Either this one plus
https://lore.kernel.org/linux-btrfs/[email protected]/
or neither. And IIRC it does not apply directly and needs some
additional review before it can be backported to older code base, so it
has no CC: stable tags.

2021-03-25 12:13:14

by Stefan Metzmacher

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
> Stefan Metzmacher <[email protected]> writes:
>
>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>> From: "Eric W. Biederman" <[email protected]>
>>>
>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>
>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>> signals in general, and have no means of flushing out a stop either.
>>>
>>> Longer term, we may want to look into allowing stop of these threads,
>>> as it relates to eg process freezing. For now, this prevents a spin
>>> issue if a SIGSTOP is delivered to the parent task.
>>>
>>> Reported-by: Stefan Metzmacher <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>> Signed-off-by: Sasha Levin <[email protected]>
>>> ---
>>> kernel/signal.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>> index 55526b941011..00a3840f6037 100644
>>> --- a/kernel/signal.c
>>> +++ b/kernel/signal.c
>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>
>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>> + if (unlikely(fatal_signal_pending(task) ||
>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>> return false;
>>>
>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>
>>
>> Again, why is this proposed for 5.11 and 5.10 already?
>
> Has the bit about the io worker kthreads been backported?
> If so this isn't horrible. If not this is nonsense.

I don't know, I hope not...

But I just tested v5.12-rc4 and attaching to
an application with iothreads with gdb is still not possible,
it still loops forever trying to attach to the iothreads.

And I tested 'kill -9 $pidofiothread', and it feezed the whole
machine...

So there's still work to do in order to get 5.12 stable.

I'm short on time currently, but I hope to send more details soon.

metze

2021-03-25 13:40:14

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>
> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>> Stefan Metzmacher <[email protected]> writes:
>>
>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>> From: "Eric W. Biederman" <[email protected]>
>>>>
>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>
>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>> signals in general, and have no means of flushing out a stop either.
>>>>
>>>> Longer term, we may want to look into allowing stop of these threads,
>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>
>>>> Reported-by: Stefan Metzmacher <[email protected]>
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>> Signed-off-by: Sasha Levin <[email protected]>
>>>> ---
>>>> kernel/signal.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>> index 55526b941011..00a3840f6037 100644
>>>> --- a/kernel/signal.c
>>>> +++ b/kernel/signal.c
>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>>
>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>>> + if (unlikely(fatal_signal_pending(task) ||
>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>> return false;
>>>>
>>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>>
>>>
>>> Again, why is this proposed for 5.11 and 5.10 already?
>>
>> Has the bit about the io worker kthreads been backported?
>> If so this isn't horrible. If not this is nonsense.

No not yet - my plan is to do that, but not until we're 100% satisfied
with it.

> I don't know, I hope not...
>
> But I just tested v5.12-rc4 and attaching to
> an application with iothreads with gdb is still not possible,
> it still loops forever trying to attach to the iothreads.

I do see the looping, gdb apparently doesn't give up when it gets
-EPERM trying to attach to the threads. Which isn't really a kernel
thing, but:

> And I tested 'kill -9 $pidofiothread', and it feezed the whole
> machine...

that sounds very strange, I haven't seen anything like that running
the exact same scenario.

> So there's still work to do in order to get 5.12 stable.
>
> I'm short on time currently, but I hope to send more details soon.

Thanks! I'll play with it this morning and see if I can provoke
something odd related to STOP/attach.

--
Jens Axboe

2021-03-25 13:58:34

by Stefan Metzmacher

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

Am 25.03.21 um 14:38 schrieb Jens Axboe:
> On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>>
>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>>> Stefan Metzmacher <[email protected]> writes:
>>>
>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>>> From: "Eric W. Biederman" <[email protected]>
>>>>>
>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>>
>>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>>> signals in general, and have no means of flushing out a stop either.
>>>>>
>>>>> Longer term, we may want to look into allowing stop of these threads,
>>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>>
>>>>> Reported-by: Stefan Metzmacher <[email protected]>
>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>>> Signed-off-by: Sasha Levin <[email protected]>
>>>>> ---
>>>>> kernel/signal.c | 3 ++-
>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>>> index 55526b941011..00a3840f6037 100644
>>>>> --- a/kernel/signal.c
>>>>> +++ b/kernel/signal.c
>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>>>
>>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>>>> + if (unlikely(fatal_signal_pending(task) ||
>>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>>> return false;
>>>>>
>>>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>>>
>>>>
>>>> Again, why is this proposed for 5.11 and 5.10 already?
>>>
>>> Has the bit about the io worker kthreads been backported?
>>> If so this isn't horrible. If not this is nonsense.
>
> No not yet - my plan is to do that, but not until we're 100% satisfied
> with it.

Do you understand why the patches where autoselected for 5.11 and 5.10?

>> I don't know, I hope not...
>>
>> But I just tested v5.12-rc4 and attaching to
>> an application with iothreads with gdb is still not possible,
>> it still loops forever trying to attach to the iothreads.
>
> I do see the looping, gdb apparently doesn't give up when it gets
> -EPERM trying to attach to the threads. Which isn't really a kernel
> thing, but:

Maybe we need to remove the iothreads from /proc/pid/tasks/

>> And I tested 'kill -9 $pidofiothread', and it feezed the whole
>> machine...
>
> that sounds very strange, I haven't seen anything like that running
> the exact same scenario.
>
>> So there's still work to do in order to get 5.12 stable.
>>
>> I'm short on time currently, but I hope to send more details soon.
>
> Thanks! I'll play with it this morning and see if I can provoke
> something odd related to STOP/attach.

Thanks!

Somehow I have the impression that your same_thread_group_account patch
may fix a lot of things...

metze

2021-03-25 14:04:40

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

On 3/25/21 7:56 AM, Stefan Metzmacher wrote:
> Am 25.03.21 um 14:38 schrieb Jens Axboe:
>> On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>>>
>>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>>>> Stefan Metzmacher <[email protected]> writes:
>>>>
>>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>>>> From: "Eric W. Biederman" <[email protected]>
>>>>>>
>>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>>>
>>>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>>>> signals in general, and have no means of flushing out a stop either.
>>>>>>
>>>>>> Longer term, we may want to look into allowing stop of these threads,
>>>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>>>
>>>>>> Reported-by: Stefan Metzmacher <[email protected]>
>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>>>> Signed-off-by: Sasha Levin <[email protected]>
>>>>>> ---
>>>>>> kernel/signal.c | 3 ++-
>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>>>> index 55526b941011..00a3840f6037 100644
>>>>>> --- a/kernel/signal.c
>>>>>> +++ b/kernel/signal.c
>>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>>>>
>>>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>>>>> + if (unlikely(fatal_signal_pending(task) ||
>>>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>>>> return false;
>>>>>>
>>>>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>>>>
>>>>>
>>>>> Again, why is this proposed for 5.11 and 5.10 already?
>>>>
>>>> Has the bit about the io worker kthreads been backported?
>>>> If so this isn't horrible. If not this is nonsense.
>>
>> No not yet - my plan is to do that, but not until we're 100% satisfied
>> with it.
>
> Do you understand why the patches where autoselected for 5.11 and 5.10?

As far as I know, selections like these (AUTOSEL) are done by some bot
that uses whatever criteria to see if they should be applied for earlier
revisions. I'm sure Sasha can expand on that :-)

Hence it's reasonable to expect that sometimes it'll pick patches that
should not go into stable, at least not just yet. It's important to
understand that this message is just a notice that it's queued up for
stable -rc, not that it's _in_ stable just yet. There's time to object.

>>> I don't know, I hope not...
>>>
>>> But I just tested v5.12-rc4 and attaching to
>>> an application with iothreads with gdb is still not possible,
>>> it still loops forever trying to attach to the iothreads.
>>
>> I do see the looping, gdb apparently doesn't give up when it gets
>> -EPERM trying to attach to the threads. Which isn't really a kernel
>> thing, but:
>
> Maybe we need to remove the iothreads from /proc/pid/tasks/

Is that how it finds them? It's arguably a bug in gdb that it just
keeps retrying, but it would be nice if we can ensure that it just
ignores them. Because if gdb triggers something like that, probably
others too...

>>> And I tested 'kill -9 $pidofiothread', and it feezed the whole
>>> machine...
>>
>> that sounds very strange, I haven't seen anything like that running
>> the exact same scenario.
>>
>>> So there's still work to do in order to get 5.12 stable.
>>>
>>> I'm short on time currently, but I hope to send more details soon.
>>
>> Thanks! I'll play with it this morning and see if I can provoke
>> something odd related to STOP/attach.
>
> Thanks!
>
> Somehow I have the impression that your same_thread_group_account patch
> may fix a lot of things...

Maybe? I'll look closer.

--
Jens Axboe

2021-03-25 15:02:58

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

On Thu, Mar 25, 2021 at 08:02:11AM -0600, Jens Axboe wrote:
>On 3/25/21 7:56 AM, Stefan Metzmacher wrote:
>> Am 25.03.21 um 14:38 schrieb Jens Axboe:
>>> On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>>>>
>>>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>>>>> Stefan Metzmacher <[email protected]> writes:
>>>>>
>>>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>>>>> From: "Eric W. Biederman" <[email protected]>
>>>>>>>
>>>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>>>>
>>>>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>>>>> signals in general, and have no means of flushing out a stop either.
>>>>>>>
>>>>>>> Longer term, we may want to look into allowing stop of these threads,
>>>>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>>>>
>>>>>>> Reported-by: Stefan Metzmacher <[email protected]>
>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>>>>> Signed-off-by: Sasha Levin <[email protected]>
>>>>>>> ---
>>>>>>> kernel/signal.c | 3 ++-
>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>>>>> index 55526b941011..00a3840f6037 100644
>>>>>>> --- a/kernel/signal.c
>>>>>>> +++ b/kernel/signal.c
>>>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>>>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>>>>>
>>>>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>>>>>> + if (unlikely(fatal_signal_pending(task) ||
>>>>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>>>>> return false;
>>>>>>>
>>>>>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>>>>>
>>>>>>
>>>>>> Again, why is this proposed for 5.11 and 5.10 already?
>>>>>
>>>>> Has the bit about the io worker kthreads been backported?
>>>>> If so this isn't horrible. If not this is nonsense.
>>>
>>> No not yet - my plan is to do that, but not until we're 100% satisfied
>>> with it.
>>
>> Do you understand why the patches where autoselected for 5.11 and 5.10?
>
>As far as I know, selections like these (AUTOSEL) are done by some bot
>that uses whatever criteria to see if they should be applied for earlier
>revisions. I'm sure Sasha can expand on that :-)

Right, it's just heuristics that help catch commits that don't have a
stable tag but should have one.

>Hence it's reasonable to expect that sometimes it'll pick patches that
>should not go into stable, at least not just yet. It's important to
>understand that this message is just a notice that it's queued up for
>stable -rc, not that it's _in_ stable just yet. There's time to object.

Right, it's even more than that: this mail (tagged with "AUTOSEL") is a
notification that happens at least a week before the patch will go in
the stable queue.

If you think any AUTOSEL patches don't need to be backported, it's
usually enough to just quickly nack them.

--
Thanks,
Sasha

2021-03-25 15:11:52

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

On 3/25/21 8:02 AM, Jens Axboe wrote:
> On 3/25/21 7:56 AM, Stefan Metzmacher wrote:
>> Am 25.03.21 um 14:38 schrieb Jens Axboe:
>>> On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>>>>
>>>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>>>>> Stefan Metzmacher <[email protected]> writes:
>>>>>
>>>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>>>>> From: "Eric W. Biederman" <[email protected]>
>>>>>>>
>>>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>>>>
>>>>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>>>>> signals in general, and have no means of flushing out a stop either.
>>>>>>>
>>>>>>> Longer term, we may want to look into allowing stop of these threads,
>>>>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>>>>
>>>>>>> Reported-by: Stefan Metzmacher <[email protected]>
>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>>>>> Signed-off-by: Sasha Levin <[email protected]>
>>>>>>> ---
>>>>>>> kernel/signal.c | 3 ++-
>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>>>>> index 55526b941011..00a3840f6037 100644
>>>>>>> --- a/kernel/signal.c
>>>>>>> +++ b/kernel/signal.c
>>>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>>>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>>>>>
>>>>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>>>>>> + if (unlikely(fatal_signal_pending(task) ||
>>>>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>>>>> return false;
>>>>>>>
>>>>>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>>>>>
>>>>>>
>>>>>> Again, why is this proposed for 5.11 and 5.10 already?
>>>>>
>>>>> Has the bit about the io worker kthreads been backported?
>>>>> If so this isn't horrible. If not this is nonsense.
>>>
>>> No not yet - my plan is to do that, but not until we're 100% satisfied
>>> with it.
>>
>> Do you understand why the patches where autoselected for 5.11 and 5.10?
>
> As far as I know, selections like these (AUTOSEL) are done by some bot
> that uses whatever criteria to see if they should be applied for earlier
> revisions. I'm sure Sasha can expand on that :-)
>
> Hence it's reasonable to expect that sometimes it'll pick patches that
> should not go into stable, at least not just yet. It's important to
> understand that this message is just a notice that it's queued up for
> stable -rc, not that it's _in_ stable just yet. There's time to object.
>
>>>> I don't know, I hope not...
>>>>
>>>> But I just tested v5.12-rc4 and attaching to
>>>> an application with iothreads with gdb is still not possible,
>>>> it still loops forever trying to attach to the iothreads.
>>>
>>> I do see the looping, gdb apparently doesn't give up when it gets
>>> -EPERM trying to attach to the threads. Which isn't really a kernel
>>> thing, but:
>>
>> Maybe we need to remove the iothreads from /proc/pid/tasks/
>
> Is that how it finds them? It's arguably a bug in gdb that it just
> keeps retrying, but it would be nice if we can ensure that it just
> ignores them. Because if gdb triggers something like that, probably
> others too...
>
>>>> And I tested 'kill -9 $pidofiothread', and it feezed the whole
>>>> machine...
>>>
>>> that sounds very strange, I haven't seen anything like that running
>>> the exact same scenario.
>>>
>>>> So there's still work to do in order to get 5.12 stable.
>>>>
>>>> I'm short on time currently, but I hope to send more details soon.
>>>
>>> Thanks! I'll play with it this morning and see if I can provoke
>>> something odd related to STOP/attach.
>>
>> Thanks!
>>
>> Somehow I have the impression that your same_thread_group_account patch
>> may fix a lot of things...
>
> Maybe? I'll look closer.

It needs a bit more love than that. If you have threads already in your
app, then we just want to skip over the PF_IO_WORKER threads. We can't
just terminate the loop.

Something like the below works for me.

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3851bfcdba56..abff2fe10bfa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3723,7 +3723,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
*/
pos = task = task->group_leader;
do {
- if (!nr--)
+ if (same_thread_group(task, pos) && !nr--)
goto found;
} while_each_thread(task, pos);
fail:
@@ -3744,16 +3744,22 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
*/
static struct task_struct *next_tid(struct task_struct *start)
{
- struct task_struct *pos = NULL;
+ struct task_struct *tmp, *pos = NULL;
+
rcu_read_lock();
- if (pid_alive(start)) {
- pos = next_thread(start);
- if (thread_group_leader(pos))
- pos = NULL;
- else
- get_task_struct(pos);
+ if (!pid_alive(start))
+ goto no_thread;
+ list_for_each_entry_rcu(tmp, &start->thread_group, thread_group) {
+ if (!thread_group_leader(tmp) && same_thread_group(start, tmp)) {
+ get_task_struct(tmp);
+ pos = tmp;
+ break;
+ }
}
+no_thread:
rcu_read_unlock();
+ if (!pos)
+ return NULL;
put_task_struct(start);
return pos;
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3f6a0fcaa10c..4f621e386abf 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -668,11 +668,18 @@ static inline bool thread_group_leader(struct task_struct *p)
}

static inline
-bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
{
return p1->signal == p2->signal;
}

+static inline
+bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+{
+ return same_thread_group_account(p1, p2) &&
+ !((p1->flags | p2->flags) & PF_IO_WORKER);
+}
+
static inline struct task_struct *next_thread(const struct task_struct *p)
{
return list_entry_rcu(p->thread_group.next,
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f611658eeab..625110cacc2a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
* those pending times and rely only on values updated on tick or
* other scheduler action.
*/
- if (same_thread_group(current, tsk))
+ if (same_thread_group_account(current, tsk))
(void) task_sched_runtime(current);

rcu_read_lock();

--
Jens Axboe

2021-03-30 21:17:31

[permalink] [raw]

Subject: Re: [PATCH AUTOSEL 5.11 26/44] btrfs: track qgroup released data in own variable in insert_prealloc_file_extent

On Thu, Mar 25, 2021 at 01:08:02PM +0100, David Sterba wrote:
>On Thu, Mar 25, 2021 at 07:24:41AM -0400, Sasha Levin wrote:
>> From: Qu Wenruo <[email protected]>
>>
>> [ Upstream commit fbf48bb0b197e6894a04c714728c952af7153bf3 ]
>>
>> There is a piece of weird code in insert_prealloc_file_extent(), which
>> looks like:
>>
>> ret = btrfs_qgroup_release_data(inode, file_offset, len);
>> if (ret < 0)
>> return ERR_PTR(ret);
>> if (trans) {
>> ret = insert_reserved_file_extent(trans, inode,
>> file_offset, &stack_fi,
>> true, ret);
>> ...
>> }
>> extent_info.is_new_extent = true;
>> extent_info.qgroup_reserved = ret;
>> ...
>>
>> Note how the variable @ret is abused here, and if anyone is adding code
>> just after btrfs_qgroup_release_data() call, it's super easy to
>> overwrite the @ret and cause tons of qgroup related bugs.
>>
>> Fix such abuse by introducing new variable @qgroup_released, so that we
>> won't reuse the existing variable @ret.
>>
>> Signed-off-by: Qu Wenruo <[email protected]>
>> Reviewed-by: David Sterba <[email protected]>
>> Signed-off-by: David Sterba <[email protected]>
>> Signed-off-by: Sasha Levin <[email protected]>
>
>This patch is a preparatory work and does not make sense for backport
>standalone. Either this one plus
>https://lore.kernel.org/linux-btrfs/[email protected]/
>or neither. And IIRC it does not apply directly and needs some
>additional review before it can be backported to older code base, so it
>has no CC: stable tags.

I'll drop it, thanks!

--
Thanks,
Sasha