2021-12-13 12:38:14

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH v2 0/2] Fix warnings in blktrace

These two patches fix wrong usage of lock primitives with
CONFIG_PREEMPT_RT in the blktrace code.

This version fixes a missing piece in the blktrace patch.

The patches apply to the PREEMPT_RT tree.

Wander Lairson Costa (2):
block: Avoid sleeping function called from invalid context bug
blktrace: switch trace spinlock to a raw spinlock

block/blk-cgroup.c | 4 ++--
kernel/trace/blktrace.c | 18 +++++++++---------
2 files changed, 11 insertions(+), 11 deletions(-)

--
2.27.0



2021-12-13 12:40:07

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH v2 1/2] block: Avoid sleeping function called from invalid context bug

This was caught during QA test:

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:942
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 243401, name: sed
INFO: lockdep is turned off.
Preemption disabled at:
[<ffffffff89b26268>] blk_cgroup_bio_start+0x28/0xd0

CPU: 2 PID: 243401 Comm: sed Kdump: loaded Not tainted 4.18.0-353.rt7.138.el8.x86_64+debug #1
Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015
Call Trace:
dump_stack+0x5c/0x80
___might_sleep.cold.89+0xf5/0x109
rt_spin_lock+0x3e/0xd0
? __blk_add_trace+0x428/0x4b0
__blk_add_trace+0x428/0x4b0
blk_add_trace_bio+0x16e/0x1c0
generic_make_request_checks+0x7e8/0x8c0
generic_make_request+0x3c/0x420
? membarrier_private_expedited+0xd0/0x2b0
? lock_release+0x1ca/0x450
? submit_bio+0x3c/0x160
? _raw_spin_unlock_irqrestore+0x3c/0x80
submit_bio+0x3c/0x160
? rt_mutex_futex_unlock+0x66/0xa0
iomap_submit_ioend.isra.36+0x4a/0x70
xfs_vm_writepages+0x65/0x90 [xfs]
do_writepages+0x41/0xe0
? rt_mutex_futex_unlock+0x66/0xa0
__filemap_fdatawrite_range+0xce/0x110
xfs_release+0x11c/0x160 [xfs]
__fput+0xd5/0x270
task_work_run+0xa1/0xd0
exit_to_usermode_loop+0x14d/0x160
do_syscall_64+0x23b/0x240
entry_SYSCALL_64_after_hwframe+0x6a/0xdf

We replace the get/put_cpu() call by get/put_cpu_light to avoid this bug.

Signed-off-by: Wander Lairson Costa <[email protected]>
---
block/blk-cgroup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 663aabfeba18..0a532bb3003c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1911,7 +1911,7 @@ void blk_cgroup_bio_start(struct bio *bio)
struct blkg_iostat_set *bis;
unsigned long flags;

- cpu = get_cpu();
+ cpu = get_cpu_light();
bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
flags = u64_stats_update_begin_irqsave(&bis->sync);

@@ -1928,7 +1928,7 @@ void blk_cgroup_bio_start(struct bio *bio)
u64_stats_update_end_irqrestore(&bis->sync, flags);
if (cgroup_subsys_on_dfl(io_cgrp_subsys))
cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
- put_cpu();
+ put_cpu_light();
}

static int __init blkcg_init(void)
--
2.27.0


2021-12-13 12:40:21

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH v2 2/2] blktrace: switch trace spinlock to a raw spinlock

TRACE_EVENT disables preemption before calling the callback. Because of
that blktrace triggers the following bug under PREEMPT_RT:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 119, name: kworker/u2:2
5 locks held by kworker/u2:2/119:
#0: ffff8c2e4a88f538 ((wq_completion)xfs-cil/dm-0){+.+.}-{0:0}, at: process_one_work+0x200/0x450
#1: ffffab3840ac7e68 ((work_completion)(&cil->xc_push_work)){+.+.}-{0:0}, at: process_one_work+0x200/0x450
#2: ffff8c2e4a887128 (&cil->xc_ctx_lock){++++}-{3:3}, at: xlog_cil_push_work+0xb7/0x670 [xfs]
#3: ffffffffa6a63780 (rcu_read_lock){....}-{1:2}, at: blk_add_trace_bio+0x0/0x1f0
#4: ffffffffa6610620 (running_trace_lock){+.+.}-{2:2}, at: __blk_add_trace+0x3ef/0x480
Preemption disabled at:
[<ffffffffa4d35c05>] migrate_enable+0x45/0x140
CPU: 0 PID: 119 Comm: kworker/u2:2 Kdump: loaded Not tainted 5.14.0-25.rt21.25.light.el9.x86_64+debug #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs]
Call Trace:
? migrate_enable+0x45/0x140
dump_stack_lvl+0x57/0x7d
___might_sleep.cold+0xe3/0xf7
rt_spin_lock+0x3a/0xd0
? __blk_add_trace+0x3ef/0x480
__blk_add_trace+0x3ef/0x480
blk_add_trace_bio+0x18d/0x1f0
trace_block_bio_queue+0xb5/0x150
submit_bio_checks+0x1f0/0x520
? sched_clock_cpu+0xb/0x100
submit_bio_noacct+0x30/0x1d0
? bio_associate_blkg+0x66/0x190
xlog_cil_push_work+0x1b6/0x670 [xfs]
? register_lock_class+0x43/0x4f0
? xfs_swap_extents+0x5f0/0x5f0 [xfs]
process_one_work+0x275/0x450
? process_one_work+0x200/0x450
worker_thread+0x55/0x3c0
? process_one_work+0x450/0x450
kthread+0x188/0x1a0
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30

To avoid this bug, we switch the trace lock to a raw spinlock.

Signed-off-by: Wander Lairson Costa <[email protected]>
---
kernel/trace/blktrace.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 1183c88634aa..a86e022f7155 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -34,7 +34,7 @@ static struct trace_array *blk_tr;
static bool blk_tracer_enabled __read_mostly;

static LIST_HEAD(running_trace_list);
-static __cacheline_aligned_in_smp DEFINE_SPINLOCK(running_trace_lock);
+static __cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(running_trace_lock);

/* Select an alternative, minimalistic output than the original one */
#define TRACE_BLK_OPT_CLASSIC 0x1
@@ -121,12 +121,12 @@ static void trace_note_tsk(struct task_struct *tsk)
struct blk_trace *bt;

tsk->btrace_seq = blktrace_seq;
- spin_lock_irqsave(&running_trace_lock, flags);
+ raw_spin_lock_irqsave(&running_trace_lock, flags);
list_for_each_entry(bt, &running_trace_list, running_list) {
trace_note(bt, tsk->pid, BLK_TN_PROCESS, tsk->comm,
sizeof(tsk->comm), 0);
}
- spin_unlock_irqrestore(&running_trace_lock, flags);
+ raw_spin_unlock_irqrestore(&running_trace_lock, flags);
}

static void trace_note_time(struct blk_trace *bt)
@@ -666,9 +666,9 @@ static int __blk_trace_startstop(struct request_queue *q, int start)
blktrace_seq++;
smp_mb();
bt->trace_state = Blktrace_running;
- spin_lock_irq(&running_trace_lock);
+ raw_spin_lock_irq(&running_trace_lock);
list_add(&bt->running_list, &running_trace_list);
- spin_unlock_irq(&running_trace_lock);
+ raw_spin_unlock_irq(&running_trace_lock);

trace_note_time(bt);
ret = 0;
@@ -676,9 +676,9 @@ static int __blk_trace_startstop(struct request_queue *q, int start)
} else {
if (bt->trace_state == Blktrace_running) {
bt->trace_state = Blktrace_stopped;
- spin_lock_irq(&running_trace_lock);
+ raw_spin_lock_irq(&running_trace_lock);
list_del_init(&bt->running_list);
- spin_unlock_irq(&running_trace_lock);
+ raw_spin_unlock_irq(&running_trace_lock);
relay_flush(bt->rchan);
ret = 0;
}
@@ -1608,9 +1608,9 @@ static int blk_trace_remove_queue(struct request_queue *q)

if (bt->trace_state == Blktrace_running) {
bt->trace_state = Blktrace_stopped;
- spin_lock_irq(&running_trace_lock);
+ raw_spin_lock_irq(&running_trace_lock);
list_del_init(&bt->running_list);
- spin_unlock_irq(&running_trace_lock);
+ raw_spin_unlock_irq(&running_trace_lock);
relay_flush(bt->rchan);
}

--
2.27.0


2021-12-14 07:13:55

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] block: Avoid sleeping function called from invalid context bug

Hi Wander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linux/master linus/master v5.16-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Wander-Lairson-Costa/Fix-warnings-in-blktrace/20211213-204207
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20211214/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/e53f7f8c1ce0b19fef6164247fea08d17d5f771d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Wander-Lairson-Costa/Fix-warnings-in-blktrace/20211213-204207
git checkout e53f7f8c1ce0b19fef6164247fea08d17d5f771d
# save the config file to linux build tree
mkdir build_dir
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

block/blk-cgroup.c: In function 'blk_cgroup_bio_start':
>> block/blk-cgroup.c:1915:8: error: implicit declaration of function 'get_cpu_light'; did you mean 'em_cpu_get'? [-Werror=implicit-function-declaration]
1915 | cpu = get_cpu_light();
| ^~~~~~~~~~~~~
| em_cpu_get
>> block/blk-cgroup.c:1932:2: error: implicit declaration of function 'put_cpu_light'; did you mean 'fput_light'? [-Werror=implicit-function-declaration]
1932 | put_cpu_light();
| ^~~~~~~~~~~~~
| fput_light
cc1: some warnings being treated as errors


vim +1915 block/blk-cgroup.c

1908
1909 void blk_cgroup_bio_start(struct bio *bio)
1910 {
1911 int rwd = blk_cgroup_io_type(bio), cpu;
1912 struct blkg_iostat_set *bis;
1913 unsigned long flags;
1914
> 1915 cpu = get_cpu_light();
1916 bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
1917 flags = u64_stats_update_begin_irqsave(&bis->sync);
1918
1919 /*
1920 * If the bio is flagged with BIO_CGROUP_ACCT it means this is a split
1921 * bio and we would have already accounted for the size of the bio.
1922 */
1923 if (!bio_flagged(bio, BIO_CGROUP_ACCT)) {
1924 bio_set_flag(bio, BIO_CGROUP_ACCT);
1925 bis->cur.bytes[rwd] += bio->bi_iter.bi_size;
1926 }
1927 bis->cur.ios[rwd]++;
1928
1929 u64_stats_update_end_irqrestore(&bis->sync, flags);
1930 if (cgroup_subsys_on_dfl(io_cgrp_subsys))
1931 cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
> 1932 put_cpu_light();
1933 }
1934

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

Subject: Re: [PATCH v2 0/2] Fix warnings in blktrace

On 2021-12-13 09:37:35 [-0300], Wander Lairson Costa wrote:
> These two patches fix wrong usage of lock primitives with
> CONFIG_PREEMPT_RT in the blktrace code.
>
> This version fixes a missing piece in the blktrace patch.
>
> The patches apply to the PREEMPT_RT tree.

I've seen it, need to think about it.

Sebastian

Subject: Re: [PATCH v2 1/2] block: Avoid sleeping function called from invalid context bug

On 2021-12-13 09:37:36 [-0300], Wander Lairson Costa wrote:
> ---
> block/blk-cgroup.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 663aabfeba18..0a532bb3003c 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -1911,7 +1911,7 @@ void blk_cgroup_bio_start(struct bio *bio)
> struct blkg_iostat_set *bis;
> unsigned long flags;
>
> - cpu = get_cpu();
> + cpu = get_cpu_light();
> bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
> flags = u64_stats_update_begin_irqsave(&bis->sync);
>
> @@ -1928,7 +1928,7 @@ void blk_cgroup_bio_start(struct bio *bio)
> u64_stats_update_end_irqrestore(&bis->sync, flags);
> if (cgroup_subsys_on_dfl(io_cgrp_subsys))
> cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
> - put_cpu();
> + put_cpu_light();
> }

Are you sure patch and backtrace match? There is also
u64_stats_update_begin_irqsave() which disables preemption on RT. So by
doing what you are suggesting, you only avoid disabling preemption in
cgroup_rstat_updated() which acquires a raw_spinlock_t.

Sebastian

Subject: Re: [PATCH v2 2/2] blktrace: switch trace spinlock to a raw spinlock

On 2021-12-13 09:37:37 [-0300], Wander Lairson Costa wrote:

> To avoid this bug, we switch the trace lock to a raw spinlock.

Steven, do you think this is a good idea?

Sebastian

2021-12-15 17:31:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] blktrace: switch trace spinlock to a raw spinlock

On Wed, 15 Dec 2021 18:08:34 +0100
Sebastian Andrzej Siewior <[email protected]> wrote:

> On 2021-12-13 09:37:37 [-0300], Wander Lairson Costa wrote:
> …
> > To avoid this bug, we switch the trace lock to a raw spinlock.
>
> Steven, do you think this is a good idea?
>

blktrace is actually maintained by Jens.

-- Steve

2021-12-15 19:34:41

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] blktrace: switch trace spinlock to a raw spinlock

On 12/15/21 10:31 AM, Steven Rostedt wrote:
> On Wed, 15 Dec 2021 18:08:34 +0100
> Sebastian Andrzej Siewior <[email protected]> wrote:
>
>> On 2021-12-13 09:37:37 [-0300], Wander Lairson Costa wrote:
>> …
>>> To avoid this bug, we switch the trace lock to a raw spinlock.
>>
>> Steven, do you think this is a good idea?
>>
>
> blktrace is actually maintained by Jens.

Looks like there are two patches, and none were CC'ed linux-block.
Please resend.

--
Jens Axboe


2021-12-16 20:21:25

by Wander Costa

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] block: Avoid sleeping function called from invalid context bug

On Wed, Dec 15, 2021 at 1:58 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2021-12-13 09:37:36 [-0300], Wander Lairson Costa wrote:
> > ---
> > block/blk-cgroup.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 663aabfeba18..0a532bb3003c 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -1911,7 +1911,7 @@ void blk_cgroup_bio_start(struct bio *bio)
> > struct blkg_iostat_set *bis;
> > unsigned long flags;
> >
> > - cpu = get_cpu();
> > + cpu = get_cpu_light();
> > bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
> > flags = u64_stats_update_begin_irqsave(&bis->sync);
> >
> > @@ -1928,7 +1928,7 @@ void blk_cgroup_bio_start(struct bio *bio)
> > u64_stats_update_end_irqrestore(&bis->sync, flags);
> > if (cgroup_subsys_on_dfl(io_cgrp_subsys))
> > cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
> > - put_cpu();
> > + put_cpu_light();
> > }
>
> Are you sure patch and backtrace match? There is also
> u64_stats_update_begin_irqsave() which disables preemption on RT. So by
> doing what you are suggesting, you only avoid disabling preemption in
> cgroup_rstat_updated() which acquires a raw_spinlock_t.
>

You're right. I double-checked and noticed my tree didn't have the
call to u64_stats_update_begin_irqsave. Only patch 2 is necessary
indeed.

> Sebastian
>


2021-12-16 20:22:03

by Wander Costa

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] blktrace: switch trace spinlock to a raw spinlock

On Wed, Dec 15, 2021 at 4:34 PM Jens Axboe <[email protected]> wrote:
>
> On 12/15/21 10:31 AM, Steven Rostedt wrote:
> > On Wed, 15 Dec 2021 18:08:34 +0100
> > Sebastian Andrzej Siewior <[email protected]> wrote:
> >
> >> On 2021-12-13 09:37:37 [-0300], Wander Lairson Costa wrote:
> >> …
> >>> To avoid this bug, we switch the trace lock to a raw spinlock.
> >>
> >> Steven, do you think this is a good idea?
> >>
> >
> > blktrace is actually maintained by Jens.
>
> Looks like there are two patches, and none were CC'ed linux-block.
> Please resend.
>
Ok, I am going to do it, thanks.

> --
> Jens Axboe
>