When using irq_mode=NULL_IRQ_TIMER, blk_start_queue() is called from the
hrtimer interrupt. null_request_fn() calls spin_unlock_irq() and this
causes the following splat from lockdep since interrupts are being
enabled while we're in an interrupt handler.
When we're in null_request_fn() we can't know the status of the flags
saved before blk_start_queue() was called, but we can use in_irq() to
conditionally enable interrupts only if we're not in a hard interrupt in
order to handle this case.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 398 at kernel/locking/lockdep.c:2608 trace_hardirqs_on_caller+0x11a/0x1b0()
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
CPU: 0 PID: 398 Comm: mkfs.ext4 Not tainted 4.4.0-rc6+ #77
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
Call Trace:
<IRQ> [<ffffffff8134467c>] dump_stack+0x4e/0x82
[<ffffffff810501e2>] warn_slowpath_common+0x82/0xc0
[<ffffffff81560aac>] ? _raw_spin_unlock_irq+0x2c/0x60
[<ffffffff813da2b0>] ? null_softirq_done_fn+0x30/0x30
[<ffffffff8105026c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff8109cdaa>] trace_hardirqs_on_caller+0x11a/0x1b0
[<ffffffff8109ce4d>] trace_hardirqs_on+0xd/0x10
[<ffffffff81560aac>] _raw_spin_unlock_irq+0x2c/0x60
[<ffffffff813da31e>] null_request_fn+0x4e/0xb0
[<ffffffff8131dbc3>] __blk_run_queue+0x33/0x40
[<ffffffff8131de1f>] blk_start_queue+0x3f/0x80
[<ffffffff813da277>] end_cmd+0x117/0x120
[<ffffffff813da2c2>] null_cmd_timer_expired+0x12/0x20
[<ffffffff810bbaab>] __hrtimer_run_queues+0x12b/0x4b0
[<ffffffff810bc69f>] hrtimer_interrupt+0xaf/0x1b0
[<ffffffff810361f6>] local_apic_timer_interrupt+0x36/0x60
[<ffffffff81563ced>] smp_apic_timer_interrupt+0x3d/0x50
[<ffffffff8156217c>] apic_timer_interrupt+0x8c/0xa0
<EOI> [<ffffffff81344c9b>] ? fprop_fraction_percpu+0xeb/0x110
[<ffffffff8112bc8f>] ? __wb_calc_thresh+0x2f/0xc0
[<ffffffff8112bc8f>] __wb_calc_thresh+0x2f/0xc0
[<ffffffff8112bb8c>] ? domain_dirty_limits+0x1bc/0x1f0
[<ffffffff8112db85>] balance_dirty_pages_ratelimited+0x6d5/0xfb0
[<ffffffff810b0967>] ? rcu_read_lock_sched_held+0x77/0x90
[<ffffffff811a2c7a>] ? __block_commit_write.isra.1+0x7a/0xb0
[<ffffffff8112224c>] generic_perform_write+0x14c/0x1c0
[<ffffffff81123310>] __generic_file_write_iter+0x190/0x1f0
[<ffffffff811a763b>] blkdev_write_iter+0x7b/0x100
[<ffffffff8116b46a>] __vfs_write+0xaa/0xe0
[<ffffffff8116b875>] vfs_write+0x95/0x100
[<ffffffff81188a6f>] ? __fget_light+0x6f/0x90
[<ffffffff8116c037>] SyS_pwrite64+0x77/0x90
[<ffffffff815613b2>] entry_SYSCALL_64_fastpath+0x12/0x76
---[ end trace 39b7df36fb237be1 ]---
Signed-off-by: Rabin Vincent <[email protected]>
---
drivers/block/null_blk.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index a428e4e..d16c5dc 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -342,13 +342,20 @@ static int null_rq_prep_fn(struct request_queue *q, struct request *req)
static void null_request_fn(struct request_queue *q)
{
struct request *rq;
+ bool irq = in_irq();
while ((rq = blk_fetch_request(q)) != NULL) {
struct nullb_cmd *cmd = rq->special;
- spin_unlock_irq(q->queue_lock);
+ if (irq)
+ spin_unlock(q->queue_lock);
+ else
+ spin_unlock_irq(q->queue_lock);
null_handle_cmd(cmd);
- spin_lock_irq(q->queue_lock);
+ if (irq)
+ spin_lock(q->queue_lock);
+ else
+ spin_lock_irq(q->queue_lock);
}
}
--
2.6.2
On 12/25/2015 07:26 AM, Rabin Vincent wrote:
> When using irq_mode=NULL_IRQ_TIMER, blk_start_queue() is called from the
> hrtimer interrupt. null_request_fn() calls spin_unlock_irq() and this
> causes the following splat from lockdep since interrupts are being
> enabled while we're in an interrupt handler.
>
> When we're in null_request_fn() we can't know the status of the flags
> saved before blk_start_queue() was called, but we can use in_irq() to
> conditionally enable interrupts only if we're not in a hard interrupt in
> order to handle this case.
Using in_irq() to check for this is... nasty. You have two choices here.
Either you don't enable interrupts ever. That's safe from the
perspective of the driver, since we don't block in handling the command.
That means just unconditionally using spin_unlock() in the request_fn.
Or you punt queue running to process context, by manually clearing the
queue stopped flag and using blk_run_queue_async() instead.
--
Jens Axboe
On 12/25/2015 09:03 PM, Jens Axboe wrote:
> On 12/25/2015 07:26 AM, Rabin Vincent wrote:
>> When using irq_mode=NULL_IRQ_TIMER, blk_start_queue() is called from the
>> hrtimer interrupt. null_request_fn() calls spin_unlock_irq() and this
>> causes the following splat from lockdep since interrupts are being
>> enabled while we're in an interrupt handler.
>>
>> When we're in null_request_fn() we can't know the status of the flags
>> saved before blk_start_queue() was called, but we can use in_irq() to
>> conditionally enable interrupts only if we're not in a hard interrupt in
>> order to handle this case.
>
> Using in_irq() to check for this is... nasty. You have two choices here.
> Either you don't enable interrupts ever. That's safe from the
> perspective of the driver, since we don't block in handling the command.
> That means just unconditionally using spin_unlock() in the request_fn.
> Or you punt queue running to process context, by manually clearing the
> queue stopped flag and using blk_run_queue_async() instead.
Something like this should work, can you test it?
--
Jens Axboe
On Mon, Dec 28, 2015 at 11:52:24AM -0700, Jens Axboe wrote:
> Something like this should work, can you test it?
Spits out a bunch of the following. It's this assertion:
static inline void queue_lockdep_assert_held(struct request_queue *q)
{
if (q->queue_lock)
lockdep_assert_held(q->queue_lock);
}
[ 11.797139] ------------[ cut here ]------------
[ 11.797670] WARNING: CPU: 0 PID: 6 at /home/rabin/dev/linux/include/linux/blkdev.h:505 blk_start_queue_async+0x54/0x60()
[ 11.798810] Modules linked in:
[ 11.799139] CPU: 0 PID: 6 Comm: kworker/u4:0 Not tainted 4.4.0-rc6+ #82
[ 11.799811] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 11.800012] Workqueue: writeback wb_workfn (flush-253:0)
[ 11.800012] ffffffff817fdd30 ffff88000d603e40 ffffffff813446dc 0000000000000000
[ 11.800012] ffff88000d603e78 ffffffff810501e2 ffff88000005bc00 ffff88000005bc00
[ 11.800012] ffff88000d60db40 ffffffff813da2b0 ffff88000d60dc28 ffff88000d603e88
[ 11.800012] Call Trace:
[ 11.800012] <IRQ> [<ffffffff813446dc>] dump_stack+0x4e/0x82
[ 11.800012] [<ffffffff810501e2>] warn_slowpath_common+0x82/0xc0
[ 11.800012] [<ffffffff813da2b0>] ? null_softirq_done_fn+0x30/0x30
[ 11.800012] [<ffffffff810502da>] warn_slowpath_null+0x1a/0x20
[ 11.800012] [<ffffffff8131e094>] blk_start_queue_async+0x54/0x60
[ 11.800012] [<ffffffff813da244>] end_cmd+0x84/0xc0
[ 11.800012] [<ffffffff813da2c2>] null_cmd_timer_expired+0x12/0x20
[ 11.800012] [<ffffffff810bbaab>] __hrtimer_run_queues+0x12b/0x4b0
[ 11.800012] [<ffffffff810bc69f>] hrtimer_interrupt+0xaf/0x1b0
[ 11.800012] [<ffffffff8112446d>] ? mempool_alloc_slab+0x1d/0x30
[ 11.800012] [<ffffffff810361f6>] local_apic_timer_interrupt+0x36/0x60
[ 11.800012] [<ffffffff81563ced>] smp_apic_timer_interrupt+0x3d/0x50
[ 11.800012] [<ffffffff8156217c>] apic_timer_interrupt+0x8c/0xa0
[ 11.800012] <EOI> [<ffffffff8112446d>] ? mempool_alloc_slab+0x1d/0x30
[ 11.800012] [<ffffffff8116431b>] ? __slab_alloc.isra.17.constprop.19+0x6b/0xa0
[ 11.800012] [<ffffffff81164558>] kmem_cache_alloc+0x208/0x240
[ 11.800012] [<ffffffff8112446d>] ? mempool_alloc_slab+0x1d/0x30
[ 11.800012] [<ffffffff8112446d>] mempool_alloc_slab+0x1d/0x30
[ 11.800012] [<ffffffff81124348>] mempool_alloc+0x68/0x170
[ 11.800012] [<ffffffff8109cc5f>] ? mark_held_locks+0x6f/0xa0
[ 11.800012] [<ffffffff81560a5f>] ? _raw_spin_unlock_irqrestore+0x5f/0x80
[ 11.800012] [<ffffffff81318128>] bio_alloc_bioset+0xb8/0x230
[ 11.800012] [<ffffffff811a32b3>] submit_bh_wbc.isra.2+0x73/0x130
[ 11.800012] [<ffffffff811a5230>] __block_write_full_page.constprop.3+0x110/0x360
[ 11.800012] [<ffffffff811a5870>] ? I_BDEV+0x20/0x20
[ 11.800012] [<ffffffff811a5870>] ? I_BDEV+0x20/0x20
[ 11.800012] [<ffffffff811a559c>] block_write_full_page+0x11c/0x1b0
[ 11.800012] [<ffffffff811a6418>] blkdev_writepage+0x18/0x20
[ 11.800012] [<ffffffff8112b9a6>] __writepage+0x16/0x40
[ 11.800012] [<ffffffff8112c746>] write_cache_pages+0x226/0x5f0
[ 11.800012] [<ffffffff8112b990>] ? bdi_set_max_ratio+0x70/0x70
[ 11.800012] [<ffffffff8112cb54>] generic_writepages+0x44/0x60
[ 11.800012] [<ffffffff81099e60>] ? __lock_is_held+0x50/0x70
[ 11.800012] [<ffffffff8112edf1>] do_writepages+0x21/0x30
[ 11.800012] [<ffffffff81198a3f>] __writeback_single_inode+0x7f/0x6f0
[ 11.800012] [<ffffffff8119937d>] writeback_sb_inodes+0x2cd/0x5b0
[ 11.800012] [<ffffffff811996ec>] __writeback_inodes_wb+0x8c/0xc0
[ 11.800012] [<ffffffff8119997b>] wb_writeback+0x25b/0x520
[ 11.800012] [<ffffffff8119aa38>] wb_workfn+0x2a8/0x5d0
[ 11.800012] [<ffffffff810eee79>] ? ring_buffer_event_data+0x9/0x10
[ 11.800012] [<ffffffff81068af6>] process_one_work+0x206/0x650
[ 11.800012] [<ffffffff81068a52>] ? process_one_work+0x162/0x650
[ 11.800012] [<ffffffff81068f8e>] worker_thread+0x4e/0x450
[ 11.800012] [<ffffffff81068f40>] ? process_one_work+0x650/0x650
[ 11.800012] [<ffffffff8106faef>] kthread+0xef/0x110
[ 11.800012] [<ffffffff8106fa00>] ? kthread_stop+0x280/0x280
[ 11.800012] [<ffffffff8156171f>] ret_from_fork+0x3f/0x70
[ 11.800012] [<ffffffff8106fa00>] ? kthread_stop+0x280/0x280
[ 11.800012] ---[ end trace b21adbd5c353edec ]---
On 12/28/2015 12:07 PM, Rabin Vincent wrote:
> On Mon, Dec 28, 2015 at 11:52:24AM -0700, Jens Axboe wrote:
>> Something like this should work, can you test it?
>
> Spits out a bunch of the following. It's this assertion:
>
> static inline void queue_lockdep_assert_held(struct request_queue *q)
> {
> if (q->queue_lock)
> lockdep_assert_held(q->queue_lock);
> }
Ah right, we clear the stopped flag. This one should be better.
--
Jens Axboe
On Mon, Dec 28, 2015 at 12:11:17PM -0700, Jens Axboe wrote:
> On 12/28/2015 12:07 PM, Rabin Vincent wrote:
> >On Mon, Dec 28, 2015 at 11:52:24AM -0700, Jens Axboe wrote:
> >>Something like this should work, can you test it?
> >
> >Spits out a bunch of the following. It's this assertion:
> >
> >static inline void queue_lockdep_assert_held(struct request_queue *q)
> >{
> > if (q->queue_lock)
> > lockdep_assert_held(q->queue_lock);
> >}
>
> Ah right, we clear the stopped flag. This one should be better.
Yes, this one appears to work fine.
Tested-by: Rabin Vincent <[email protected]>
Thanks.
On 12/28/2015 12:51 PM, Rabin Vincent wrote:
> On Mon, Dec 28, 2015 at 12:11:17PM -0700, Jens Axboe wrote:
>> On 12/28/2015 12:07 PM, Rabin Vincent wrote:
>>> On Mon, Dec 28, 2015 at 11:52:24AM -0700, Jens Axboe wrote:
>>>> Something like this should work, can you test it?
>>>
>>> Spits out a bunch of the following. It's this assertion:
>>>
>>> static inline void queue_lockdep_assert_held(struct request_queue *q)
>>> {
>>> if (q->queue_lock)
>>> lockdep_assert_held(q->queue_lock);
>>> }
>>
>> Ah right, we clear the stopped flag. This one should be better.
>
> Yes, this one appears to work fine.
>
> Tested-by: Rabin Vincent <[email protected]>
Thanks for testing, I've added it.
--
Jens Axboe