Received: by 10.223.164.202 with SMTP id h10csp5551822wrb; Tue, 21 Nov 2017 11:31:26 -0800 (PST) X-Google-Smtp-Source: AGs4zMZT2UCOAAefcTvwzUfG9SfPb8yGfqqFOkR905XGQYzTAPONGv58A83HGoo84SBdVL6ymGEl X-Received: by 10.84.235.140 with SMTP id p12mr15959705plk.153.1511292686766; Tue, 21 Nov 2017 11:31:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511292686; cv=none; d=google.com; s=arc-20160816; b=UsdWeabdByLZo31Q4m+48vHzzWsptwDwYsyvoq+U9uiF5fdaPZHDEFI5v+JZQhywcZ +RDNud3rNi7xpIFMAfMtygLfPgKdTEvvjnDoTOYnxoMYjm879HrMMF8rxYSzNMQwRTou xL5O4XDplIEc08emAZV/JPxecJOk0iYNFNFi3EqfwhLq6AeVlX47ShZ5zKr3e8xUR07z sCTiFWHyhKuiWJwBk+3Gj9FckpK30JYRCXALL51qea7GxF7z5zjs/4wAgvYN583hTnU9 wDGik7B/HU2zhjdMHGdivfSDycEGX8cPR4Bx3lSsyEMHT/fdXaurKxqTSn/oJAzOXE/z I3vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:dkim-signature :arc-authentication-results; bh=OquPTIbmlMnhcXR0EjBSnIGN/7hnUoLcxMprlqLQ07I=; b=iQrRzxwz4puqTkW7LLw419/QTn+d9/ybC+GkScCJskUP7puKdm53zPG953Q/tWGoSN bvnrlQocH4M4PsnsWtr8VnD0EzwcmUDiZb5wRIBAXYlkpBHtLcQ3AO39KKZ3odRdb2jV vGr1+YTzXwNi4MUeAAaiYBLlN9Qfyi0jA3/PmebKq958IQbuLQSJ3s49j4vBr3Zn8y6Z zCwGHdGMRyW2WS14xhlaeFNtghj2tmjVhNerK5azAnSrVfAkfdQ78m/D1GrsIDtVwa1G MlFcxz8hWRhuKypH6nxpUYdaqXlakElTHE1qn5V/KeA2W7erdMa5DBDB9Omz4mwyT1ez zWlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=gqP2y6IV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b91si1569102plb.662.2017.11.21.11.31.15; Tue, 21 Nov 2017 11:31:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=gqP2y6IV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751391AbdKUTai (ORCPT + 76 others); Tue, 21 Nov 2017 14:30:38 -0500 Received: from mail-io0-f196.google.com ([209.85.223.196]:45948 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751145AbdKUTae (ORCPT ); Tue, 21 Nov 2017 14:30:34 -0500 Received: by mail-io0-f196.google.com with SMTP id z74so20643692iof.12 for ; Tue, 21 Nov 2017 11:30:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=OquPTIbmlMnhcXR0EjBSnIGN/7hnUoLcxMprlqLQ07I=; b=gqP2y6IVDKCs38gWgcrBACTLodL4TT7EEq8JAEsgdBTAsu5XdbEgMTA072EVr/fY1G xZRUXqHLaVouiIhNb7ZnxCSAVMNbpbfVuRHA+WNvPiZ3C8kVs5aYGyJNiSUk1MFH/wu3 sttbxdwnjxXnboJSyrOSiNua7iLQppwhApwPOYupSkzuJmuRltDutqqbNfY+xzVkzDEt CUf83+lMG6pFWqUkreKTUbmpfF1JDYz5UXgcgwYd/gGEMC8r569/Q1i6PhlbN/ddhOL3 Uqjfl7KcG0S49mdUsNQBOHpf7ca7Vp3UTs27uSLE/utXfNd/DugBfRFsn1sPU8GdYlLz lOCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=OquPTIbmlMnhcXR0EjBSnIGN/7hnUoLcxMprlqLQ07I=; b=eKJzBZKQZ9TqocfA4p/vR4kTP7ZxbME1DDDeQMetPFUkb2dM6XgfgN/QCAEDNiaagc Ik0q4os3+1qCV8Zpq3+NDJlTpY8a4DEIcYZiG+MroBwC2ORJSemf/HwWAOfcHGzFx1EK +y3CUDrWlTpjITJDQgyNmzKpun83dgg1EeQM7WsfrSweQDXW4WW3PdhP99prxt29+poo tMh8sQq5M7TuXu0iAbtgl5V3oWdAsiMyxsDjJooPAEkyfD5mIZ3I3JoxxwpnE8Vi0WSI Y+BNkPSZWwCByEh0QnJ5HZD2YLJDa6SqUPTLlPSP8AKpcblaCackJCt27iAJh2un1f4Y n75g== X-Gm-Message-State: AJaThX7vUUwcs1YWSWmsxHlmXPe9l8Zje9uTrKF4WuySoBuNTuVk7RVo bnV770SdY8RUcWxNvRwR9GuTfA== X-Received: by 10.107.4.65 with SMTP id 62mr19697404ioe.163.1511292633641; Tue, 21 Nov 2017 11:30:33 -0800 (PST) Received: from [192.168.1.160] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w96sm6467510ioe.76.2017.11.21.11.30.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Nov 2017 11:30:32 -0800 (PST) Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) To: Christian Borntraeger , Bart Van Assche , "virtualization@lists.linux-foundation.org" , "linux-block@vger.kernel.org" , "mst@redhat.com" , "jasowang@redhat.com" , "linux-kernel@vger.kernel.org" , Christoph Hellwig References: <9c5eec5d-f542-4d76-6933-6fe31203ce09@de.ibm.com> <1511205644.2396.32.camel@wdc.com> <04526c98-ffc5-1eca-3aa8-50f9212c4323@de.ibm.com> <5c9f2228-0a8b-8225-7038-e6cb3f31ca0b@kernel.dk> <2e44dbd3-2f90-c267-560c-91d1d4b0e892@de.ibm.com> <823b9dd5-7781-5a72-03ff-bc931433fc19@kernel.dk> <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com> <055f040d-3f9a-a8fd-e8e2-326c6b9094a1@kernel.dk> <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com> <8fedc2ad-d775-7789-742c-92ca928a3aee@kernel.dk> From: Jens Axboe Message-ID: Date: Tue, 21 Nov 2017 12:30:31 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/2017 12:15 PM, Christian Borntraeger wrote: > > > On 11/21/2017 07:39 PM, Jens Axboe wrote: >> On 11/21/2017 11:27 AM, Jens Axboe wrote: >>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >>>> >>>> >>>> On 11/21/2017 07:09 PM, Jens Axboe wrote: >>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote: >>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >>>>>>> Bisect points to >>>>>>> >>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >>>>>>> Author: Christoph Hellwig >>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200 >>>>>>> >>>>>>> blk-mq: Create hctx for each present CPU >>>>>>> >>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >>>>>>> >>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot >>>>>>> of churn due to frequent soft offline / online operations. Instead >>>>>>> allocate one for each present CPU to avoid this and dramatically simplify >>>>>>> the code. >>>>>>> >>>>>>> Signed-off-by: Christoph Hellwig >>>>>>> Reviewed-by: Jens Axboe >>>>>>> Cc: Keith Busch >>>>>>> Cc: linux-block@vger.kernel.org >>>>>>> Cc: linux-nvme@lists.infradead.org >>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de >>>>>>> Signed-off-by: Thomas Gleixner >>>>>>> Cc: Oleksandr Natalenko >>>>>>> Cc: Mike Galbraith >>>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>> >>>>>> I wonder if we're simply not getting the masks updated correctly. I'll >>>>>> take a look. >>>>> >>>>> Can't make it trigger here. We do init for each present CPU, which means >>>>> that if I offline a few CPUs here and register a queue, those still show >>>>> up as present (just offline) and get mapped accordingly. >>>>> >>>>> From the looks of it, your setup is different. If the CPU doesn't show >>>>> up as present and it gets hotplugged, then I can see how this condition >>>>> would trigger. What environment are you running this in? We might have >>>>> to re-introduce the cpu hotplug notifier, right now we just monitor >>>>> for a dead cpu and handle that. >>>> >>>> I am not doing a hot unplug and the replug, I use KVM and add a previously >>>> not available CPU. >>>> >>>> in libvirt/virsh speak: >>>> 4 >>> >>> So that's why we run into problems. It's not present when we load the device, >>> but becomes present and online afterwards. >>> >>> Christoph, we used to handle this just fine, your patch broke it. >>> >>> I'll see if I can come up with an appropriate fix. >> >> Can you try the below? > > > It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq: > > > output with 2 cpus: > /sys/kernel/debug/block/vda > /sys/kernel/debug/block/vda/hctx0 > /sys/kernel/debug/block/vda/hctx0/cpu0 > /sys/kernel/debug/block/vda/hctx0/cpu0/completed > /sys/kernel/debug/block/vda/hctx0/cpu0/merged > /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched > /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list > /sys/kernel/debug/block/vda/hctx0/active > /sys/kernel/debug/block/vda/hctx0/run > /sys/kernel/debug/block/vda/hctx0/queued > /sys/kernel/debug/block/vda/hctx0/dispatched > /sys/kernel/debug/block/vda/hctx0/io_poll > /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap > /sys/kernel/debug/block/vda/hctx0/sched_tags > /sys/kernel/debug/block/vda/hctx0/tags_bitmap > /sys/kernel/debug/block/vda/hctx0/tags > /sys/kernel/debug/block/vda/hctx0/ctx_map > /sys/kernel/debug/block/vda/hctx0/busy > /sys/kernel/debug/block/vda/hctx0/dispatch > /sys/kernel/debug/block/vda/hctx0/flags > /sys/kernel/debug/block/vda/hctx0/state > /sys/kernel/debug/block/vda/sched > /sys/kernel/debug/block/vda/sched/dispatch > /sys/kernel/debug/block/vda/sched/starved > /sys/kernel/debug/block/vda/sched/batching > /sys/kernel/debug/block/vda/sched/write_next_rq > /sys/kernel/debug/block/vda/sched/write_fifo_list > /sys/kernel/debug/block/vda/sched/read_next_rq > /sys/kernel/debug/block/vda/sched/read_fifo_list > /sys/kernel/debug/block/vda/write_hints > /sys/kernel/debug/block/vda/state > /sys/kernel/debug/block/vda/requeue_list > /sys/kernel/debug/block/vda/poll_stat Try this, basically just a revert. diff --git a/block/blk-mq.c b/block/blk-mq.c index 11097477eeab..bc1950fa9ef6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -37,6 +37,9 @@ #include "blk-wbt.h" #include "blk-mq-sched.h" +static DEFINE_MUTEX(all_q_mutex); +static LIST_HEAD(all_q_list); + static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie); static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, INIT_LIST_HEAD(&__ctx->rq_list); __ctx->queue = q; - /* If the cpu isn't present, the cpu is mapped to first hctx */ - if (!cpu_present(i)) + /* If the cpu isn't online, the cpu is mapped to first hctx */ + if (!cpu_online(i)) continue; hctx = blk_mq_map_queue(q, i); @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set, } } -static void blk_mq_map_swqueue(struct request_queue *q) +static void blk_mq_map_swqueue(struct request_queue *q, + const struct cpumask *online_mask) { unsigned int i, hctx_idx; struct blk_mq_hw_ctx *hctx; @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q) } /* - * Map software to hardware queues. - * - * If the cpu isn't present, the cpu is mapped to first hctx. + * Map software to hardware queues */ - for_each_present_cpu(i) { + for_each_possible_cpu(i) { + /* If the cpu isn't online, the cpu is mapped to first hctx */ + if (!cpumask_test_cpu(i, online_mask)) + continue; + hctx_idx = q->mq_map[i]; /* unmapped hw queue can be remapped after CPU topo changed */ if (!set->tags[hctx_idx] && @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, blk_queue_softirq_done(q, set->ops->complete); blk_mq_init_cpu_queues(q, set->nr_hw_queues); + + get_online_cpus(); + mutex_lock(&all_q_mutex); + + list_add_tail(&q->all_q_node, &all_q_list); blk_mq_add_queue_tag_set(set, q); - blk_mq_map_swqueue(q); + blk_mq_map_swqueue(q, cpu_online_mask); + + mutex_unlock(&all_q_mutex); + put_online_cpus(); if (!(set->flags & BLK_MQ_F_NO_SCHED)) { int ret; @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q) { struct blk_mq_tag_set *set = q->tag_set; + mutex_lock(&all_q_mutex); + list_del_init(&q->all_q_node); + mutex_unlock(&all_q_mutex); + blk_mq_del_queue_tag_set(q); + blk_mq_exit_hw_queues(q, set, set->nr_hw_queues); } /* Basically redo blk_mq_init_queue with queue frozen */ -static void blk_mq_queue_reinit(struct request_queue *q) +static void blk_mq_queue_reinit(struct request_queue *q, + const struct cpumask *online_mask) { WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth)); @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q) * we should change hctx numa_node according to the new topology (this * involves freeing and re-allocating memory, worth doing?) */ - blk_mq_map_swqueue(q); + blk_mq_map_swqueue(q, online_mask); blk_mq_sysfs_register(q); blk_mq_debugfs_register_hctxs(q); } +/* + * New online cpumask which is going to be set in this hotplug event. + * Declare this cpumasks as global as cpu-hotplug operation is invoked + * one-by-one and dynamically allocating this could result in a failure. + */ +static struct cpumask cpuhp_online_new; + +static void blk_mq_queue_reinit_work(void) +{ + struct request_queue *q; + + mutex_lock(&all_q_mutex); + /* + * We need to freeze and reinit all existing queues. Freezing + * involves synchronous wait for an RCU grace period and doing it + * one by one may take a long time. Start freezing all queues in + * one swoop and then wait for the completions so that freezing can + * take place in parallel. + */ + list_for_each_entry(q, &all_q_list, all_q_node) + blk_freeze_queue_start(q); + list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_freeze_queue_wait(q); + + list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_queue_reinit(q, &cpuhp_online_new); + + list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_unfreeze_queue(q); + + mutex_unlock(&all_q_mutex); +} + +static int blk_mq_queue_reinit_dead(unsigned int cpu) +{ + cpumask_copy(&cpuhp_online_new, cpu_online_mask); + blk_mq_queue_reinit_work(); + return 0; +} + +/* + * Before hotadded cpu starts handling requests, new mappings must be + * established. Otherwise, these requests in hw queue might never be + * dispatched. + * + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0 + * for CPU0, and ctx1 for CPU1). + * + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list + * and set bit0 in pending bitmap as ctx1->index_hw is still zero. + * + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list. + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is + * ignored. + */ +static int blk_mq_queue_reinit_prepare(unsigned int cpu) +{ + cpumask_copy(&cpuhp_online_new, cpu_online_mask); + cpumask_set_cpu(cpu, &cpuhp_online_new); + blk_mq_queue_reinit_work(); + return 0; +} + static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set) { int i; @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, blk_mq_update_queue_map(set); list_for_each_entry(q, &set->tag_list, tag_set_list) { blk_mq_realloc_hw_ctxs(set, q); - blk_mq_queue_reinit(q); + blk_mq_queue_reinit(q, cpu_online_mask); } list_for_each_entry(q, &set->tag_list, tag_set_list) @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie) return __blk_mq_poll(hctx, rq); } +void blk_mq_disable_hotplug(void) +{ + mutex_lock(&all_q_mutex); +} + +void blk_mq_enable_hotplug(void) +{ + mutex_unlock(&all_q_mutex); +} + static int __init blk_mq_init(void) { /* @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void) cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, blk_mq_hctx_notify_dead); + + cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare", + blk_mq_queue_reinit_prepare, + blk_mq_queue_reinit_dead); return 0; } subsys_initcall(blk_mq_init); diff --git a/block/blk-mq.h b/block/blk-mq.h index 6c7c3ff5bf62..83b13ef1915e 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, void blk_mq_request_bypass_insert(struct request *rq, bool run_queue); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); +/* + * CPU hotplug helpers + */ +void blk_mq_enable_hotplug(void); +void blk_mq_disable_hotplug(void); /* * CPU -> queue mappings diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 201ab7267986..c31d4e3bf6d0 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -76,6 +76,7 @@ enum cpuhp_state { CPUHP_XEN_EVTCHN_PREPARE, CPUHP_ARM_SHMOBILE_SCU_PREPARE, CPUHP_SH_SH3X_PREPARE, + CPUHP_BLK_MQ_PREPARE, CPUHP_NET_FLOW_PREPARE, CPUHP_TOPOLOGY_PREPARE, CPUHP_NET_IUCV_PREPARE, -- Jens Axboe From 1584704309190962044@xxx Tue Nov 21 19:16:38 +0000 2017 X-GM-THRID: 1584670276912512570 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread