Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753810AbbDTQsp (ORCPT ); Mon, 20 Apr 2015 12:48:45 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:50900 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753618AbbDTQsm convert rfc822-to-8bit (ORCPT ); Mon, 20 Apr 2015 12:48:42 -0400 Date: Tue, 21 Apr 2015 00:48:30 +0800 From: Ming Lei To: Dongsu Park Cc: Linux Kernel Mailing List , Jens Axboe , Christoph Hellwig Subject: Re: panic with CPU hotplug + blk-mq + scsi-mq Message-ID: <20150421004830.78ac8e14@tom-ThinkPad-T410> In-Reply-To: <20150420155240.GB31401@posteo.de> References: <20150417094152.GA2838@posteo.de> <20150420080759.GA31401@posteo.de> <20150420155240.GB31401@posteo.de> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 37503 Lines: 659 On Mon, 20 Apr 2015 17:52:40 +0200 Dongsu Park wrote: > On 20.04.2015 21:12, Ming Lei wrote: > > On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park > > wrote: > > > Hi Ming, > > > > > > On 18.04.2015 00:23, Ming Lei wrote: > > >> > Does anyone have an idea? > > >> > > >> As far as I can see, at least two problems exist: > > >> - race between timeout and CPU hotplug > > >> - in case of shared tags, during CPU online handling, about setting > > >> and checking hctx->tags > > >> > > >> So could you please test the attached two patches to see if they fix your issue? > > >> I run them in my VM, and looks opps does disappear. > > > > > > Thanks for the patches. > > > But it still panics also with your patches, both v1 and v2. > > > I tested it multiple times, and hit the bug every time. > > > > Could you share us what the exact test you are running? > > Such as, CPU numbers, virtio-scsi hw queue number, and > > multi-lun or not, and your workload if it is specific. > > It would be probably helpful to just share my Qemu command line: > > /usr/bin/qemu-system-x86_64 -M pc -cpu host -enable-kvm -m 2048 \ > -smp 4,cores=1,maxcpus=4,threads=1 \ > -object memory-backend-ram,size=1024M,id=ram-node0 \ > -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ > -object memory-backend-ram,size=1024M,id=ram-node1 \ > -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \ > -serial stdio -name vm-0fa2eb90-51f3-4b65-aa72-97cea3ead7bf \ > -uuid 0fa2eb90-51f3-4b65-aa72-97cea3ead7bf \ > -monitor telnet:0.0.0.0:9400,server,nowait \ > -rtc base=utc -boot menu=off,order=c -L /usr/share/qemu \ > -device virtio-scsi-pci,id=scsi0,num_queues=8,bus=pci.0,addr=0x7 \ > -drive file=./mydebian2.qcow2,if=none,id=drive-virtio-disk0,aio=native,cache=writeback \ > -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ > -drive file=./tfile00.img,if=none,id=drive-scsi0-0-0-0,aio=native \ > -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ > -drive file=./tfile01.img,if=none,id=drive-scsi0-0-0-1,aio=native \ > -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 \ > -k en-us -vga cirrus -netdev user,id=vnet0,net=192.168.122.0/24 \ > -net nic,vlan=0,model=virtio,macaddr=52:54:00:5b:d7:00 \ > -net tap,vlan=0,ifname=dntap0,vhost=on,script=no,downscript=no \ > -vnc 0.0.0.0:1 -virtfs local,path=/Dev,mount_tag=homedev,security_model=none > > (where each of tfile0[01].img is 16-GiB image) > > And there's nothing special about workload. Inside the guest, I go to > a 9pfs-mounted directory, where kernel source is available. > When I just do 'make install', then the guest immediately crashes. > That's the simplest way to make it crash. Thanks for providing that. The trick is just in CPU number and virito-scsi hw queue number, and that is why I asked that, :-) Now the problem is quite clear, before CPU1 online, suppose CPU3 is mapped hw queue 6, and CPU 3 will map to hw queue 5 after CPU1 is offline, unfortunately current code can't allocate tags for hw queue 5 even it becomes mapped. The following updated patch(include original patch 2) will fix the problem, and patch 1 is required too. So the following patch should fix your hotplug issue. ------- >From 8c0edcbbdfbab67dc8ae2fd46cca6a86e0cadcba Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Sun, 19 Apr 2015 23:32:46 +0800 Subject: [PATCH v1 2/2] blk-mq: fix CPU hotplug handling Firstly the hctx->tags have to be set as NULL if it is to be disabled no matter if set->tags[i] is NULL or not in blk_mq_map_swqueue() because shared tags can be freed already from another request queue. The same situation has to be considered in blk_mq_hctx_cpu_online() too. Finally one unmapped hw queue can be remapped after CPU topo is changed, we need to allocate tags for the hw queue in blk_mq_map_swqueue() too. Then tags allocation for hw queue can be removed in hctx cpu online notifier, and it is reasonable to do that after remapping is done. Cc: Reported-by: Dongsu Park Signed-off-by: Ming Lei --- block/blk-mq.c | 34 +++++++++++++--------------------- 1 file changed, 13 insertions(+), 21 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 1277f70..a0ae38a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1574,22 +1574,6 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx, int cpu) return NOTIFY_OK; } -static int blk_mq_hctx_cpu_online(struct blk_mq_hw_ctx *hctx, int cpu) -{ - struct request_queue *q = hctx->queue; - struct blk_mq_tag_set *set = q->tag_set; - - if (set->tags[hctx->queue_num]) - return NOTIFY_OK; - - set->tags[hctx->queue_num] = blk_mq_init_rq_map(set, hctx->queue_num); - if (!set->tags[hctx->queue_num]) - return NOTIFY_STOP; - - hctx->tags = set->tags[hctx->queue_num]; - return NOTIFY_OK; -} - static int blk_mq_hctx_notify(void *data, unsigned long action, unsigned int cpu) { @@ -1597,8 +1581,11 @@ static int blk_mq_hctx_notify(void *data, unsigned long action, if (action == CPU_DEAD || action == CPU_DEAD_FROZEN) return blk_mq_hctx_cpu_offline(hctx, cpu); - else if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) - return blk_mq_hctx_cpu_online(hctx, cpu); + + /* + * In case of CPU online, tags will be reallocated + * after new mapping is done in blk_mq_map_swqueue(). + */ return NOTIFY_OK; } @@ -1778,6 +1765,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) unsigned int i; struct blk_mq_hw_ctx *hctx; struct blk_mq_ctx *ctx; + struct blk_mq_tag_set *set = q->tag_set; queue_for_each_hw_ctx(q, hctx, i) { cpumask_clear(hctx->cpumask); @@ -1806,16 +1794,20 @@ static void blk_mq_map_swqueue(struct request_queue *q) * disable it and free the request entries. */ if (!hctx->nr_ctx) { - struct blk_mq_tag_set *set = q->tag_set; - if (set->tags[i]) { blk_mq_free_rq_map(set, set->tags[i], i); set->tags[i] = NULL; - hctx->tags = NULL; } + hctx->tags = NULL; continue; } + /* unmapped hw queue can be remapped after CPU topo changed */ + if (!set->tags[i]) + set->tags[i] = blk_mq_init_rq_map(set, hctx->queue_num); + hctx->tags = set->tags[i]; + WARN_ON(!hctx->tags); + /* * Set the map size to the number of mapped software queues. * This is more accurate and more efficient than looping -- 1.7.9.5 > Dongsu > > > I can not reproduce it in my VM. > > One interesting point is that the oops always happened > > on CPU3 in your tests, looks like the mapping is broken > > for CPU3's ctx in case of CPU 1 offline? > > > > > Cheers, > > > Dongsu > > > > > > ---- [beginning of call traces] ---- > > > [ 22.942214] smpboot: CPU 1 is now offline > > > [ 30.686284] random: nonblocking pool is initialized > > > [ 39.857305] fuse init (API version 7.23) > > > [ 40.563853] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > > > [ 40.564005] IP: [] __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.564005] PGD 7a363067 PUD 7cadc067 PMD 0 > > > [ 40.564005] Oops: 0000 [#1] SMP > > > [ 40.564005] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin dm_multipath loop r > > > tc_cmos 9pnet_virtio 9pnet serio_raw acpi_cpufreq i2c_piix4 virtio_net > > > [ 40.564005] CPU: 3 PID: 6349 Comm: grub-mount Not tainted 4.0.0+ #320 > > > [ 40.564005] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > > > [ 40.564005] task: ffff880079011560 ti: ffff88007a1c8000 task.ti: ffff88007a1c8000 > > > [ 40.564005] RIP: 0010:[] [] __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.564005] RSP: 0018:ffff88007a1cb838 EFLAGS: 00010246 > > > [ 40.564005] RAX: 0000000000000075 RBX: ffff88007913c400 RCX: 0000000000000078 > > > [ 40.564005] RDX: ffff88007fddbb80 RSI: 0000000000000010 RDI: ffff88007913c400 > > > [ 40.564005] RBP: ffff88007a1cb888 R08: ffff88007fddbb80 R09: 0000000000000001 > > > [ 40.564005] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000010 > > > [ 40.564005] R13: 0000000000000010 R14: ffff88007a1cb988 R15: ffff88007fddbb80 > > > [ 40.564005] FS: 00002b7c8b6807c0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > > [ 40.564005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 40.564005] CR2: 0000000000000018 CR3: 0000000079b0b000 CR4: 00000000001407e0 > > > [ 40.564005] Stack: > > > [ 40.564005] ffff88007a1cb918 ffff88007fdd58c0 0000000000000078 ffffffff813b5d28 > > > [ 40.564005] ffff88007a1cb878 ffff88007913c400 0000000000000010 0000000000000010 > > > [ 40.564005] ffff88007a1cb988 ffff88007fddbb80 ffff88007a1cb908 ffffffff813b9225 > > > [ 40.564005] Call Trace: > > > [ 40.564005] [] ? blk_mq_queue_enter+0x98/0x2b0 > > > [ 40.564005] [] bt_get+0x65/0x1d0 > > > [ 40.564005] [] ? blk_mq_queue_enter+0x98/0x2b0 > > > [ 40.564005] [] ? wait_woken+0x90/0x90 > > > [ 40.564005] [] blk_mq_get_tag+0xa7/0xd0 > > > [ 40.564005] [] ? sched_clock_cpu+0x88/0xb0 > > > [ 40.564005] [] __blk_mq_alloc_request+0x1b/0x1f0 > > > [ 40.564005] [] blk_mq_map_request+0xb1/0x200 > > > [ 40.564005] [] blk_mq_make_request+0x6e/0x2c0 > > > [ 40.564005] [] ? generic_make_request_checks+0x1ff/0x3d0 > > > [ 40.564005] [] ? bio_add_page+0x5e/0x70 > > > [ 40.564005] [] generic_make_request+0xc0/0x110 > > > [ 40.564005] [] submit_bio+0x68/0x150 > > > [ 40.564005] [] ? lru_cache_add+0x1c/0x50 > > > [ 40.564005] [] mpage_bio_submit+0x2a/0x40 > > > [ 40.564005] [] mpage_readpages+0x10c/0x130 > > > [ 40.564005] [] ? I_BDEV+0x10/0x10 > > > [ 40.564005] [] ? I_BDEV+0x10/0x10 > > > [ 40.564005] [] ? __page_cache_alloc+0x137/0x160 > > > [ 40.564005] [] blkdev_readpages+0x1d/0x20 > > > [ 40.564005] [] __do_page_cache_readahead+0x28f/0x310 > > > [ 40.564005] [] ? __do_page_cache_readahead+0x15e/0x310 > > > [ 40.564005] [] ondemand_readahead+0xe2/0x460 > > > [ 40.564005] [] ? pagecache_get_page+0x2d/0x1b0 > > > [ 40.564005] [] page_cache_sync_readahead+0x31/0x50 > > > [ 40.564005] [] generic_file_read_iter+0x4ec/0x600 > > > [ 40.564005] [] blkdev_read_iter+0x37/0x40 > > > [ 40.564005] [] new_sync_read+0x7e/0xb0 > > > [ 40.564005] [] __vfs_read+0x18/0x50 > > > [ 40.564005] [] vfs_read+0x8d/0x150 > > > [ 40.564005] [] SyS_read+0x49/0xb0 > > > [ 40.564005] [] system_call_fastpath+0x12/0x17 > > > [ 40.564005] Code: 97 18 03 00 00 bf 04 00 00 00 41 f7 f1 83 f8 04 0f 43 f8 b8 ff ff ff ff 44 39 d7 0f 86 c1 00 00 00 41 8b 00 48 89 4d c0 49 89 f5 <8b> 4e 08 8b 56 0c 4c 89 45 b0 c7 45 c8 00 00 00 00 41 89 c4 89 > > > [ 40.564005] RIP [] __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.564005] RSP > > > [ 40.564005] CR2: 0000000000000018 > > > [ 40.686846] ---[ end trace 32b76e93ea582fae ]--- > > > [ 40.688354] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > > > [ 40.689123] IP: [] __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.689123] PGD 0 > > > [ 40.689123] Oops: 0000 [#2] SMP > > > [ 40.689123] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin dm_multipath loop rtc_cmos 9pnet_virtio 9pnet serio_raw acpi_cpufreq i2c_piix4 virtio_net > > > [ 40.689123] CPU: 3 PID: 559 Comm: kworker/3:2 Tainted: G D 4.0.0+ #320 > > > [ 40.689123] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > > > [ 40.689123] Workqueue: events_freezable_power_ disk_events_workfn > > > [ 40.689123] task: ffff88007a17d580 ti: ffff88007caa4000 task.ti: ffff88007caa4000 > > > [ 40.689123] RIP: 0010:[] [] __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.689123] RSP: 0018:ffff88007caa7958 EFLAGS: 00010246 > > > [ 40.689123] RAX: 0000000000000075 RBX: ffff88007913c400 RCX: 0000000000000078 > > > [ 40.689123] RDX: ffff88007fddbb80 RSI: 0000000000000010 RDI: ffff88007913c400 > > > [ 40.689123] RBP: ffff88007caa79a8 R08: ffff88007fddbb80 R09: 0000000000000000 > > > [ 40.689123] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > > > [ 40.689123] R13: 0000000000000010 R14: ffff88007caa7ab8 R15: ffff88007fddbb80 > > > [ 40.689123] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > > [ 40.689123] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 40.689123] CR2: 0000000000000018 CR3: 0000000001c0b000 CR4: 00000000001407e0 > > > [ 40.689123] Stack: > > > [ 40.689123] ffffffff810c5ea5 0000000000000292 0000000000000078 0000000000000002 > > > [ 40.689123] 0000000000000000 ffff88007913c400 0000000000000000 0000000000000010 > > > [ 40.689123] ffff88007caa7ab8 ffff88007fddbb80 ffff88007caa7a28 ffffffff813b9225 > > > [ 40.689123] Call Trace: > > > [ 40.689123] [] ? cpuacct_charge+0x5/0x1b0 > > > [ 40.689123] [] bt_get+0x65/0x1d0 > > > [ 40.689123] [] ? wait_woken+0x90/0x90 > > > [ 40.689123] [] blk_mq_get_tag+0xa7/0xd0 > > > [ 40.689123] [] __blk_mq_alloc_request+0x1b/0x1f0 > > > [ 40.689123] [] blk_mq_alloc_request+0x9a/0x230 > > > [ 40.689123] [] blk_get_request+0x2c/0xf0 > > > [ 40.689123] [] scsi_execute+0x3d/0x1f0 > > > [ 40.689123] [] scsi_execute_req_flags+0x8e/0x100 > > > [ 40.689123] [] ? cpuacct_charge+0x5/0x1b0 > > > [ 40.689123] [] scsi_test_unit_ready+0x83/0x130 > > > [ 40.689123] [] sd_check_events+0x14e/0x1b0 > > > [ 40.689123] [] disk_check_events+0x51/0x170 > > > [ 40.689123] [] disk_events_workfn+0x1c/0x20 > > > [ 40.689123] [] process_one_work+0x1c9/0x500 > > > [ 40.689123] [] ? process_one_work+0x15d/0x500 > > > [ 40.689123] [] ? worker_thread+0xc7/0x460 > > > [ 40.689123] [] worker_thread+0x4b/0x460 > > > [ 40.689123] [] ? rescuer_thread+0x2e0/0x2e0 > > > [ 40.689123] [] ? rescuer_thread+0x2e0/0x2e0 > > > [ 40.689123] [] kthread+0xe7/0x100 > > > [ 40.689123] [] ? trace_hardirqs_on+0xd/0x10 > > > [ 40.689123] [] ? kthread_create_on_node+0x230/0x230 > > > [ 40.689123] [] ret_from_fork+0x58/0x90 > > > [ 40.689123] [] ? kthread_create_on_node+0x230/0x230 > > > [ 40.689123] Code: 97 18 03 00 00 bf 04 00 00 00 41 f7 f1 83 f8 04 0f 43 f8 b8 ff ff ff ff 44 39 d7 0f 86 c1 00 00 00 41 8b 00 48 89 4d c0 49 89 f5 <8b> 4e 08 8b 56 0c 4c 89 45 b0 c7 45 c8 00 00 00 00 41 89 c4 89 > > > [ 40.689123] RIP [] __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.689123] RSP > > > [ 40.689123] CR2: 0000000000000018 > > > [ 40.689123] ---[ end trace 32b76e93ea582faf ]--- > > > [ 40.844044] BUG: unable to handle kernel paging request at ffffffffffffff98 > > > [ 40.845007] IP: [] kthread_data+0x10/0x20 > > > [ 40.845007] PGD 1c0c067 PUD 1c0e067 PMD 0 > > > [ 40.845007] Oops: 0000 [#3] SMP > > > [ 40.845007] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin dm_multipath loop rtc_cmos 9pnet_virtio 9pnet serio_raw acpi_cpufreq i2c_piix4 virtio_net > > > [ 40.845007] CPU: 3 PID: 559 Comm: kworker/3:2 Tainted: G D 4.0.0+ #320 > > > [ 40.845007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > > > [ 40.845007] task: ffff88007a17d580 ti: ffff88007caa4000 task.ti: ffff88007caa4000 > > > [ 40.845007] RIP: 0010:[] [] kthread_data+0x10/0x20 > > > [ 40.845007] RSP: 0018:ffff88007caa75e8 EFLAGS: 00010092 > > > [ 40.845007] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000000f > > > [ 40.845007] RDX: 000000000000000f RSI: 0000000000000003 RDI: ffff88007a17d580 > > > [ 40.845007] RBP: ffff88007caa75e8 R08: ffff88007a17d610 R09: 0000000000000000 > > > [ 40.845007] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007fdd4dc0 > > > [ 40.845007] R13: ffff88007a17d580 R14: 0000000000000003 R15: 0000000000000000 > > > [ 40.845007] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > > [ 40.845007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 40.845007] CR2: 0000000000000028 CR3: 0000000001c0b000 CR4: 00000000001407e0 > > > [ 40.845007] Stack: > > > [ 40.845007] ffff88007caa7608 ffffffff81094495 ffff88007caa7608 00000000001d4dc0 > > > [ 40.845007] ffff88007caa7678 ffffffff8169ef42 ffffffff81079dde 0000000000000292 > > > [ 40.845007] ffffffff81079e12 ffff88007a17d580 ffff88007caa7678 0000000000000296 > > > [ 40.845007] Call Trace: > > > [ 40.845007] [] wq_worker_sleeping+0x15/0xa0 > > > [ 40.845007] [] __schedule+0x932/0xc20 > > > [ 40.845007] [] ? do_exit+0x6ee/0xb10 > > > [ 40.845007] [] ? do_exit+0x722/0xb10 > > > [ 40.845007] [] schedule+0x37/0x90 > > > [ 40.845007] [] do_exit+0x7f6/0xb10 > > > [ 40.845007] [] ? kmsg_dump+0xee/0x1f0 > > > [ 40.845007] [] oops_end+0x8d/0xd0 > > > [ 40.845007] [] no_context+0x119/0x370 > > > [ 40.845007] [] ? sched_clock_local+0x25/0x90 > > > [ 40.845007] [] __bad_area_nosemaphore+0x85/0x210 > > > [ 40.845007] [] bad_area_nosemaphore+0x13/0x20 > > > [ 40.845007] [] __do_page_fault+0xae/0x460 > > > [ 40.845007] [] do_page_fault+0xc/0x10 > > > [ 40.845007] [] page_fault+0x22/0x30 > > > [ 40.845007] [] ? __bt_get.isra.5+0x7d/0x1e0 > > > [ 40.845007] [] ? __lock_is_held+0x5e/0x90 > > > [ 40.845007] [] ? cpuacct_charge+0x5/0x1b0 > > > [ 40.845007] [] bt_get+0x65/0x1d0 > > > [ 40.845007] [] ? wait_woken+0x90/0x90 > > > [ 40.845007] [] blk_mq_get_tag+0xa7/0xd0 > > > [ 40.845007] [] __blk_mq_alloc_request+0x1b/0x1f0 > > > [ 40.845007] [] blk_mq_alloc_request+0x9a/0x230 > > > [ 40.845007] [] blk_get_request+0x2c/0xf0 > > > [ 40.845007] [] scsi_execute+0x3d/0x1f0 > > > [ 40.845007] [] scsi_execute_req_flags+0x8e/0x100 > > > [ 40.845007] [] ? cpuacct_charge+0x5/0x1b0 > > > [ 40.845007] [] scsi_test_unit_ready+0x83/0x130 > > > [ 40.845007] [] sd_check_events+0x14e/0x1b0 > > > [ 40.845007] [] disk_check_events+0x51/0x170 > > > [ 40.845007] [] disk_events_workfn+0x1c/0x20 > > > [ 40.845007] [] process_one_work+0x1c9/0x500 > > > [ 40.845007] [] ? process_one_work+0x15d/0x500 > > > [ 40.845007] [] ? worker_thread+0xc7/0x460 > > > [ 40.845007] [] worker_thread+0x4b/0x460 > > > [ 40.845007] [] ? rescuer_thread+0x2e0/0x2e0 > > > [ 40.845007] [] ? rescuer_thread+0x2e0/0x2e0 > > > [ 40.845007] [] kthread+0xe7/0x100 > > > [ 40.845007] [] ? trace_hardirqs_on+0xd/0x10 > > > [ 40.845007] [] ? kthread_create_on_node+0x230/0x230 > > > [ 40.845007] [] ret_from_fork+0x58/0x90 > > > [ 40.845007] [] ? kthread_create_on_node+0x230/0x230 > > > [ 40.845007] Code: 00 48 89 e5 5d 48 8b 40 88 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 20 04 00 00 55 48 89 e5 <48> 8b 40 98 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 > > > [ 40.845007] RIP [] kthread_data+0x10/0x20 > > > [ 40.845007] RSP > > > [ 40.845007] CR2: ffffffffffffff98 > > > [ 40.845007] ---[ end trace 32b76e93ea582fb0 ]--- > > > [ 40.845007] Fixing recursive fault but reboot is needed! > > > ---- [end of call traces] ---- > > > > > >> Thanks, > > >> Ming Lei > > >> > > > >> > Regards, > > >> > Dongsu > > >> > > > >> > ---- [beginning of call traces] ---- > > >> > [ 47.274292] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > > >> > [ 47.275013] IP: [] __bt_get.isra.5+0x7d/0x1e0 > > >> > [ 47.275013] PGD 79c55067 PUD 7ba17067 PMD 0 > > >> > [ 47.275013] Oops: 0000 [#1] SMP > > >> > [ 47.275013] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin loop dm_multipath 9pnet_virtio rtc_cmos 9pnet acpi_cpufreq serio_raw i2c_piix4 virtio_net > > >> > [ 47.275013] CPU: 3 PID: 6232 Comm: blkid Not tainted 4.0.0 #303 > > >> > [ 47.275013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > > >> > [ 47.275013] task: ffff88003dfbc020 ti: ffff880079bac000 task.ti: ffff880079bac000 > > >> > [ 47.275013] RIP: 0010:[] [] __bt_get.isra.5+0x7d/0x1e0 > > >> > [ 47.275013] RSP: 0018:ffff880079baf898 EFLAGS: 00010246 > > >> > [ 47.275013] RAX: 000000000000003c RBX: ffff880079198400 RCX: 0000000000000078 > > >> > [ 47.275013] RDX: ffff88007fddbb80 RSI: 0000000000000010 RDI: ffff880079198400 > > >> > [ 47.275013] RBP: ffff880079baf8e8 R08: ffff88007fddbb80 R09: 0000000000000000 > > >> > [ 47.275013] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000010 > > >> > [ 47.275013] R13: 0000000000000010 R14: ffff880079baf9e8 R15: ffff88007fddbb80 > > >> > [ 47.275013] FS: 00002b270c049800(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > >> > [ 47.275013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >> > [ 47.275013] CR2: 0000000000000018 CR3: 000000007ca8d000 CR4: 00000000001407e0 > > >> > [ 47.275013] Stack: > > >> > [ 47.275013] ffff880079baf978 ffff88007fdd58c0 0000000000000078 ffffffff814071ff > > >> > [ 47.275013] ffff880079baf8d8 ffff880079198400 0000000000000010 0000000000000010 > > >> > [ 47.275013] ffff880079baf9e8 ffff88007fddbb80 ffff880079baf968 ffffffff8140b4e5 > > >> > [ 47.275013] Call Trace: > > >> > [ 47.275013] [] ? blk_mq_queue_enter+0x9f/0x2d0 > > >> > [ 47.275013] [] bt_get+0x65/0x1e0 > > >> > [ 47.275013] [] ? blk_mq_queue_enter+0x9f/0x2d0 > > >> > [ 47.275013] [] ? wait_woken+0xa0/0xa0 > > >> > [ 47.275013] [] blk_mq_get_tag+0xa7/0xd0 > > >> > [ 47.275013] [] __blk_mq_alloc_request+0x1b/0x200 > > >> > [ 47.275013] [] blk_mq_map_request+0xd6/0x4e0 > > >> > [ 47.275013] [] blk_mq_make_request+0x6e/0x2d0 > > >> > [ 47.275013] [] ? generic_make_request_checks+0x674/0x6a0 > > >> > [ 47.275013] [] ? bio_add_page+0x5e/0x70 > > >> > [ 47.275013] [] generic_make_request+0xc0/0x110 > > >> > [ 47.275013] [] submit_bio+0x68/0x150 > > >> > [ 47.275013] [] ? lru_cache_add+0x1c/0x50 > > >> > [ 47.275013] [] mpage_bio_submit+0x2a/0x40 > > >> > [ 47.275013] [] mpage_readpages+0x10c/0x130 > > >> > [ 47.275013] [] ? I_BDEV+0x10/0x10 > > >> > [ 47.275013] [] ? I_BDEV+0x10/0x10 > > >> > [ 47.275013] [] ? __page_cache_alloc+0x137/0x160 > > >> > [ 47.275013] [] blkdev_readpages+0x1d/0x20 > > >> > [ 47.275013] [] __do_page_cache_readahead+0x29f/0x320 > > >> > [ 47.275013] [] ? __do_page_cache_readahead+0x165/0x320 > > >> > [ 47.275013] [] force_page_cache_readahead+0x34/0x60 > > >> > [ 47.275013] [] page_cache_sync_readahead+0x46/0x50 > > >> > [ 47.275013] [] generic_file_read_iter+0x52c/0x640 > > >> > [ 47.275013] [] blkdev_read_iter+0x37/0x40 > > >> > [ 47.275013] [] new_sync_read+0x7e/0xb0 > > >> > [ 47.275013] [] __vfs_read+0x18/0x50 > > >> > [ 47.275013] [] vfs_read+0x8d/0x150 > > >> > [ 47.275013] [] SyS_read+0x49/0xb0 > > >> > [ 47.275013] [] system_call_fastpath+0x12/0x17 > > >> > [ 47.275013] Code: 97 18 03 00 00 bf 04 00 00 00 41 f7 f1 83 f8 04 0f 43 f8 b8 ff ff ff ff 44 39 d7 0f 86 c1 00 00 00 41 8b 00 48 89 4d c0 49 89 f5 <8b> 4e 08 8b 56 0c 4c 89 45 b0 c7 45 c8 00 00 00 00 41 89 c4 89 > > >> > [ 47.275013] RIP [] __bt_get.isra.5+0x7d/0x1e0 > > >> > [ 47.275013] RSP > > >> > [ 47.275013] CR2: 0000000000000018 > > >> > [ 47.275013] ---[ end trace 9a650b674f0fae74 ]--- > > >> > [ 47.701261] note: kworker/3:2[225] exited with preempt_count 1 > > >> > [ 47.815398] BUG: unable to handle kernel paging request at ffffffffffffff98 > > >> > [ 47.816324] IP: [] kthread_data+0x10/0x20 > > >> > [ 47.816324] PGD 1c0c067 PUD 1c0e067 PMD 0 > > >> > [ 47.816324] Oops: 0000 [#3] SMP > > >> > [ 47.816324] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin loop dm_multipath 9pnet_virtio rtc_cmos 9pnet acpi_cpufreq serio_raw i2c_piix4 virtio_net > > >> > [ 47.816324] CPU: 3 PID: 225 Comm: kworker/3:2 Tainted: G D W 4.0.0 #303 > > >> > [ 47.816324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > > >> > [ 47.816324] task: ffff88007ac90000 ti: ffff88007906c000 task.ti: ffff88007906c000 > > >> > [ 47.816324] RIP: 0010:[] [] kthread_data+0x10/0x20 > > >> > [ 47.816324] RSP: 0018:ffff88007906f5e8 EFLAGS: 00010092 > > >> > [ 47.816324] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000000f > > >> > [ 47.816324] RDX: 000000000000000f RSI: 0000000000000003 RDI: ffff88007ac90000 > > >> > [ 47.816324] RBP: ffff88007906f5e8 R08: ffff88007ac90090 R09: 0000000000000000 > > >> > [ 47.816324] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88007fdd4dc0 > > >> > [ 47.816324] R13: ffff88007ac90000 R14: 0000000000000003 R15: 0000000000000000 > > >> > [ 47.816324] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > >> > [ 47.816324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >> > [ 47.816324] CR2: 0000000000000028 CR3: 0000000001c0b000 CR4: 00000000001407e0 > > >> > [ 47.816324] Stack: > > >> > [ 47.816324] ffff88007906f608 ffffffff81099f35 ffff88007906f608 00000000001d4dc0 > > >> > [ 47.816324] ffff88007906f678 ffffffff816ff757 ffffffff8107cfc6 0000000000000292 > > >> > [ 47.816324] ffffffff8107cffa ffff88007ac90000 ffff88007906f678 0000000000000296 > > >> > [ 47.816324] Call Trace: > > >> > [ 47.816324] [] wq_worker_sleeping+0x15/0xa0 > > >> > [ 47.816324] [] __schedule+0xa77/0x1080 > > >> > [ 47.816324] [] ? do_exit+0x756/0xbf0 > > >> > [ 47.816324] [] ? do_exit+0x78a/0xbf0 > > >> > [ 47.816324] [] schedule+0x37/0x90 > > >> > [ 47.816324] [] do_exit+0x866/0xbf0 > > >> > [ 47.816324] [] ? kmsg_dump+0xfe/0x200 > > >> > [ 47.816324] [] oops_end+0x8d/0xd0 > > >> > [ 47.816324] [] no_context+0x119/0x370 > > >> > [ 47.816324] [] ? cpuacct_charge+0x5/0x1c0 > > >> > [ 47.816324] [] ? sched_clock_local+0x25/0x90 > > >> > [ 47.816324] [] __bad_area_nosemaphore+0x85/0x210 > > >> > [ 47.816324] [] bad_area_nosemaphore+0x13/0x20 > > >> > [ 47.816324] [] __do_page_fault+0xb6/0x490 > > >> > [ 47.816324] [] do_page_fault+0xc/0x10 > > >> > [ 47.816324] [] page_fault+0x22/0x30 > > >> > [ 47.816324] [] ? __bt_get.isra.5+0x7d/0x1e0 > > >> > [ 47.816324] [] bt_get+0x65/0x1e0 > > >> > [ 47.816324] [] ? wait_woken+0xa0/0xa0 > > >> > [ 47.816324] [] blk_mq_get_tag+0xa7/0xd0 > > >> > [ 47.816324] [] __blk_mq_alloc_request+0x1b/0x200 > > >> > [ 47.816324] [] blk_mq_alloc_request+0xa1/0x250 > > >> > [ 47.816324] [] blk_get_request+0x2c/0xf0 > > >> > [ 47.816324] [] ? __might_sleep+0x4d/0x90 > > >> > [ 47.816324] [] scsi_execute+0x3d/0x1f0 > > >> > [ 47.816324] [] scsi_execute_req_flags+0x8e/0x100 > > >> > [ 47.816324] [] scsi_test_unit_ready+0x83/0x130 > > >> > [ 47.816324] [] sd_check_events+0x14e/0x1b0 > > >> > [ 47.816324] [] disk_check_events+0x51/0x170 > > >> > [ 47.816324] [] disk_events_workfn+0x1c/0x20 > > >> > [ 47.816324] [] process_one_work+0x1e8/0x800 > > >> > [ 47.816324] [] ? process_one_work+0x15d/0x800 > > >> > [ 47.816324] [] ? worker_thread+0xda/0x470 > > >> > [ 47.816324] [] worker_thread+0x53/0x470 > > >> > [ 47.816324] [] ? process_one_work+0x800/0x800 > > >> > [ 47.816324] [] ? process_one_work+0x800/0x800 > > >> > [ 47.816324] [] kthread+0xf2/0x110 > > >> > [ 47.816324] [] ? trace_hardirqs_on+0xd/0x10 > > >> > [ 47.816324] [] ? kthread_create_on_node+0x230/0x230 > > >> > [ 47.816324] [] ret_from_fork+0x58/0x90 > > >> > [ 47.816324] [] ? kthread_create_on_node+0x230/0x230 > > >> > [ 47.816324] Code: 00 48 89 e5 5d 48 8b 40 88 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 20 04 00 00 55 48 89 e5 <48> 8b 40 98 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 > > >> > [ 47.816324] RIP [] kthread_data+0x10/0x20 > > >> > [ 47.816324] RSP > > >> > [ 47.816324] CR2: ffffffffffffff98 > > >> > [ 47.816324] ---[ end trace 9a650b674f0fae76 ]--- > > >> > [ 47.816324] Fixing recursive fault but reboot is needed! > > >> > ---- [end of call traces] ---- > > >> > -- > > >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > >> > the body of a message to majordomo@vger.kernel.org > > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> > Please read the FAQ at http://www.tux.org/lkml/ > > > > > >> From 9aed1bd79531d91513cd16ed90872e4349425acc Mon Sep 17 00:00:00 2001 > > >> From: Ming Lei > > >> Date: Fri, 17 Apr 2015 23:50:48 -0400 > > >> Subject: [PATCH 1/2] block: blk-mq: fix race between timeout and CPU hotplug > > >> > > >> Firstly during CPU hotplug, even queue is freezed, timeout > > >> handler still may come and access hctx->tags, which may cause > > >> use after free, so this patch deactivates timeout handler > > >> inside CPU hotplug notifier. > > >> > > >> Secondly, tags can be shared by more than one queues, so we > > >> have to check if the hctx has been disabled, otherwise > > >> still use-after-free on tags can be triggered. > > >> > > >> Cc: > > >> Reported-by: Dongsu Park > > >> Signed-off-by: Ming Lei > > >> --- > > >> block/blk-mq.c | 13 ++++++++++--- > > >> 1 file changed, 10 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > > >> index 67f01a0..58a3b4c 100644 > > >> --- a/block/blk-mq.c > > >> +++ b/block/blk-mq.c > > >> @@ -677,8 +677,11 @@ static void blk_mq_rq_timer(unsigned long priv) > > >> data.next = blk_rq_timeout(round_jiffies_up(data.next)); > > >> mod_timer(&q->timeout, data.next); > > >> } else { > > >> - queue_for_each_hw_ctx(q, hctx, i) > > >> - blk_mq_tag_idle(hctx); > > >> + queue_for_each_hw_ctx(q, hctx, i) { > > >> + /* the hctx may be disabled, so we have to check here */ > > >> + if (hctx->tags) > > >> + blk_mq_tag_idle(hctx); > > >> + } > > >> } > > >> } > > >> > > >> @@ -2085,9 +2088,13 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb, > > >> */ > > >> list_for_each_entry(q, &all_q_list, all_q_node) > > >> blk_mq_freeze_queue_start(q); > > >> - list_for_each_entry(q, &all_q_list, all_q_node) > > >> + list_for_each_entry(q, &all_q_list, all_q_node) { > > >> blk_mq_freeze_queue_wait(q); > > >> > > >> + /* deactivate timeout handler */ > > >> + del_timer_sync(&q->timeout); > > >> + } > > >> + > > >> list_for_each_entry(q, &all_q_list, all_q_node) > > >> blk_mq_queue_reinit(q); > > >> > > >> -- > > >> 1.9.1 > > >> > > > > > >> From 8b70c8612543859173230fbd16a63bacf84ba23a Mon Sep 17 00:00:00 2001 > > >> From: Ming Lei > > >> Date: Sat, 18 Apr 2015 00:01:31 -0400 > > >> Subject: [PATCH 2/2] blk-mq: fix CPU hotplug handling > > >> > > >> Firstly the hctx->tags have to be set as NULL if it is to be disabled > > >> no matter if set->tags[i] is NULL or not in blk_mq_map_swqueue() because > > >> shared tags can be freed already from another request_queue. > > >> > > >> The same situation has to be considered in blk_mq_hctx_cpu_online() > > >> too. > > >> > > >> Cc: > > >> Reported-by: Dongsu Park > > >> Signed-off-by: Ming Lei > > >> --- > > >> block/blk-mq.c | 17 +++++++++++------ > > >> 1 file changed, 11 insertions(+), 6 deletions(-) > > >> > > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > > >> index 58a3b4c..612d5c6 100644 > > >> --- a/block/blk-mq.c > > >> +++ b/block/blk-mq.c > > >> @@ -1580,15 +1580,20 @@ static int blk_mq_hctx_cpu_online(struct blk_mq_hw_ctx *hctx, int cpu) > > >> { > > >> struct request_queue *q = hctx->queue; > > >> struct blk_mq_tag_set *set = q->tag_set; > > >> + struct blk_mq_tags *tags = set->tags[hctx->queue_num]; > > >> > > >> - if (set->tags[hctx->queue_num]) > > >> + /* tags can be shared by more than one queues */ > > >> + if (hctx->tags) > > >> return NOTIFY_OK; > > >> > > >> - set->tags[hctx->queue_num] = blk_mq_init_rq_map(set, hctx->queue_num); > > >> - if (!set->tags[hctx->queue_num]) > > >> - return NOTIFY_STOP; > > >> + if (!tags) { > > >> + tags = blk_mq_init_rq_map(set, hctx->queue_num); > > >> + if (!tags) > > >> + return NOTIFY_STOP; > > >> + set->tags[hctx->queue_num] = tags; > > >> + } > > >> > > >> - hctx->tags = set->tags[hctx->queue_num]; > > >> + hctx->tags = tags; > > >> return NOTIFY_OK; > > >> } > > >> > > >> @@ -1813,8 +1818,8 @@ static void blk_mq_map_swqueue(struct request_queue *q) > > >> if (set->tags[i]) { > > >> blk_mq_free_rq_map(set, set->tags[i], i); > > >> set->tags[i] = NULL; > > >> - hctx->tags = NULL; > > >> } > > >> + hctx->tags = NULL; > > >> continue; > > >> } > > >> > > >> -- > > >> 1.9.1 > > >> > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/