Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752637AbbGRRTo (ORCPT ); Sat, 18 Jul 2015 13:19:44 -0400 Received: from mail-la0-f52.google.com ([209.85.215.52]:36749 "EHLO mail-la0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751704AbbGRRTn (ORCPT ); Sat, 18 Jul 2015 13:19:43 -0400 Date: Sat, 18 Jul 2015 20:19:38 +0300 From: Mike Krinkin To: Akinobu Mita Cc: LKML , Jens Axboe Subject: Re: Use after free bug in null_blk driver Message-ID: <20150718171938.GA3025@kmu-tp-x230> References: <20150718145130.GA29256@kmu-tp-x230> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8515 Lines: 157 On Sun, Jul 19, 2015 at 01:18:44AM +0900, Akinobu Mita wrote: > 2015-07-18 23:51 GMT+09:00 Mike Krinkin : > > Hi, > > > > i noticed that loading null_blk with queue_mode=1 and irqmode=2 parameters > > and slab poisoning enabled causes general protection fault: > > > > [ 20.671974] general protection fault: 0000 [#1] SMP > > [ 20.678050] Modules linked in: null_blk(+) usbhid hid psmouse floppy > > [ 20.688351] CPU: 0 PID: 147 Comm: modprobe Not tainted 4.2.0-rc2+ #76 > > [ 20.695961] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 20.705143] task: ffff88007f2ed100 ti: ffff88007f0a4000 task.ti: ffff88007f0a4000 > > [ 20.715511] RIP: 0010:[] [] null_cmd_timer_expired+0x62/0xc0 [null_blk] > > [ 20.730541] RSP: 0018:ffff88007d203e78 EFLAGS: 00010082 > > [ 20.737301] RAX: ffff88007ec16b80 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 > > [ 20.747034] RDX: ffff88007f260888 RSI: 0000000000000000 RDI: ffff88007f260850 > > [ 20.762601] RBP: ffff88007d203e98 R08: 0000000000000001 R09: 0000000000000000 > > [ 20.771472] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 > > [ 20.782522] R13: ffff88007d218780 R14: 0000000000000000 R15: ffff88007d20fba8 > > [ 20.792347] FS: 00007f02e1500740(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000 > > [ 20.805319] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 20.813120] CR2: 00007ffd52478dd8 CR3: 000000007f3dd000 CR4: 00000000000007f0 > > [ 20.825357] Stack: > > [ 20.827764] ffff88007d218788 ffff88007d20fa00 ffff88007d20fac0 ffffffffa00280f0 > > [ 20.838269] ffff88007d203f08 ffffffff810f822b 000000045dd35ed6 ffff88007d20fa18 > > [ 20.853377] 00000001810f89e7 ffff88007d20fa00 ffff88007d20fb28 000000045dd35ed6 > > [ 20.863415] Call Trace: > > [ 20.866398] > > [ 20.869088] [] ? null_softirq_done_fn+0x30/0x30 [null_blk] > > [ 20.880219] [] __hrtimer_run_queues+0x11b/0x6e0 > > [ 20.891048] [] hrtimer_interrupt+0xab/0x1b0 > > [ 20.899542] [] local_apic_timer_interrupt+0x3c/0x70 > > [ 20.911730] [] smp_apic_timer_interrupt+0x41/0x60 > > [ 20.921854] [] apic_timer_interrupt+0x70/0x80 > > [ 20.931084] > > [ 20.933406] [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 > > [ 20.945250] [] hrtimer_start_range_ns+0x1a7/0x540 > > [ 20.955952] [] ? trace_hardirqs_on_caller+0x151/0x1e0 > > [ 20.963974] [] null_request_fn+0xd3/0x100 [null_blk] > > [ 20.977159] [] __blk_run_queue+0x37/0x50 > > [ 20.983782] [] __elv_add_request+0x15d/0x440 > > [ 20.991540] [] blk_queue_bio+0x415/0x520 > > [ 20.998876] [] generic_make_request+0xcc/0x110 > > [ 21.007095] [] submit_bio+0x67/0x150 > > [ 21.013603] [] submit_bh_wbc+0x14c/0x180 > > [ 21.019730] [] block_read_full_page+0x2b0/0x3a0 > > [ 21.028023] [] ? I_BDEV+0x20/0x20 > > [ 21.034575] [] ? blkdev_readpages+0x20/0x20 > > [ 21.040947] [] blkdev_readpage+0x18/0x20 > > [ 21.048237] [] do_read_cache_page+0x7d/0x1a0 > > [ 21.056016] [] read_cache_page+0x1c/0x20 > > [ 21.063300] [] read_dev_sector+0x30/0x90 > > [ 21.069804] [] ? check_partition+0x1f0/0x1f0 > > [ 21.076932] [] adfspart_check_ICS+0x42/0x250 > > [ 21.084317] [] ? snprintf+0x34/0x40 > > [ 21.090670] [] ? check_partition+0x1f0/0x1f0 > > [ 21.097864] [] check_partition+0x100/0x1f0 > > [ 21.104535] [] rescan_partitions+0xa8/0x2b0 > > [ 21.112087] [] __blkdev_get+0x313/0x460 > > [ 21.120037] [] blkdev_get+0x41/0x3a0 > > [ 21.126266] [] ? unlock_new_inode+0x5d/0x90 > > [ 21.134043] [] ? bdget+0x130/0x150 > > [ 21.140369] [] ? disk_get_part+0xd/0x1d0 > > [ 21.147357] [] add_disk+0x436/0x4d0 > > [ 21.153846] [] ? sprintf+0x40/0x50 > > [ 21.159867] [] null_init+0x38e/0x1000 [null_blk] > > [ 21.167691] [] ? 0xffffffffa004e000 > > [ 21.174266] [] do_one_initcall+0xb3/0x1e0 > > [ 21.182276] [] ? kmem_cache_alloc_trace+0x36f/0x3b0 > > [ 21.190291] [] do_init_module+0x61/0x1ec > > [ 21.197112] [] load_module+0x24fe/0x2810 > > [ 21.204237] [] ? m_show+0x1a0/0x1a0 > > [ 21.213498] [] SyS_finit_module+0x80/0xb0 > > [ 21.221360] [] entry_SYSCALL_64_fastpath+0x16/0x7a > > [ 21.230181] Code: e8 24 40 3b e1 48 89 c3 eb 08 4d 85 e4 4c 89 e3 74 e2 48 8d 7b f0 4c 8b 23 e8 db fe ff ff 48 8b 43 28 48 85 c0 74 e3 48 8b 58 30 <48> 83 bb 38 01 00 00 00 75 d5 48 8b 83 98 08 00 00 a8 04 74 ca > > [ 21.281572] RIP [] null_cmd_timer_expired+0x62/0xc0 [null_blk] > > [ 21.289770] RSP > > [ 21.294601] ---[ end trace cd40ca6c9001dd7e ]--- > > [ 21.299719] Kernel panic - not syncing: Fatal exception in interrupt > > [ 21.309176] Kernel Offset: disabled > > [ 21.312773] ---[ end Kernel panic - not syncing: Fatal exception in interrupt > > > > I suppose, it was introduced in commit 8b70f45e2eb2 ("null_blk: restart request > > processing on completion handler") by Akinobu Mita . The > > Right. That commit introduced use-after-free bug. > > > following fix solves problem for me: > > > > diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c > > index 69de41a..910f354 100644 > > --- a/drivers/block/null_blk.c > > +++ b/drivers/block/null_blk.c > > @@ -242,7 +242,6 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer) > > do { > > cmd = container_of(entry, struct nullb_cmd, ll_list); > > entry = entry->next; > > - end_cmd(cmd); > > > > if (cmd->rq) { > > struct request_queue *q = cmd->rq->q; > > @@ -254,6 +253,7 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer) > > spin_unlock(q->queue_lock); > > } > > } > > + end_cmd(cmd); > > } while (entry); > > } > > But this change re-introduces the problem which I was trying to fix > in the commit 8b70f45e2eb2. We need to call end_cmd() before > blk_start_queue(). So the corrct fix should save cmd->rq->q to > local variable before end_cmd() and call blk_start_queue() with > the saved value. Yes, you are right, correct fix looks something like this: diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 69de41a..3177b24 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -240,19 +240,19 @@ static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer) while ((entry = llist_del_all(&cq->list)) != NULL) { entry = llist_reverse_order(entry); do { + struct request_queue *q = NULL; + cmd = container_of(entry, struct nullb_cmd, ll_list); entry = entry->next; + if (cmd->rq) + q = cmd->rq->q; end_cmd(cmd); - if (cmd->rq) { - struct request_queue *q = cmd->rq->q; - - if (!q->mq_ops && blk_queue_stopped(q)) { - spin_lock(q->queue_lock); - if (blk_queue_stopped(q)) - blk_start_queue(q); - spin_unlock(q->queue_lock); - } + if (q && !q->mq_ops && blk_queue_stopped(q)) { + spin_lock(q->queue_lock); + if (blk_queue_stopped(q)) + blk_start_queue(q); + spin_unlock(q->queue_lock); } } while (entry); } But anyway, will you send a patch or should i submit a patch with this fix? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/