Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3193762pxv; Sun, 27 Jun 2021 22:07:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxX4JDsvL4zCtWI93EEqvftNuIoVst+38yqZjePxtn79nHJZYvmBF08ZFGJcF3ZcDakk/ek X-Received: by 2002:aa7:d893:: with SMTP id u19mr30711612edq.304.1624856828586; Sun, 27 Jun 2021 22:07:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624856828; cv=none; d=google.com; s=arc-20160816; b=AepIw4oCjbwqe1KJlMi+/4sZp3e+HJ6userKzjyy0JYqCsc3SJowM2OLfFv8cqcKIO sSPoPQWQtuDAu0vB5Xf+sa7w4MDn9eAKy8y1fzedwBAwkQ/EcVajm+TbzC9SyhsD+5cg lA2tUp5yFBRloEJY1cJJN9ldvT/lk8Yj32Of9NDwUQsCbd3kS+lxUiuRtqBgPywsJUvH udOZGiIiqwMk2YP0VLB94IPlXbwOra4v/jmGKo2MhyP5mRt9jzTReLtYi5ojfNEqZkhs +0MozW/IHFNQSS49eXWwJi9XRCw93mXY7Bn1NCihOJxo5eYSMGdj1HVEKJFipDcAoHZv Xt9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=HNOACgFDiUJg+I5VVNhuPXKpATRWk+P3y3rpSndAr+M=; b=DnJmZPLVrmx/zma33vHTLmT+BXhFd9K2m3mGxrP6abfnBmTvmJsLd02WHY9vb2tmuw 2MO8Kl7ErgJgOdzCKs7zWxhdi5j3xLlFI72WGByhWRVxwm7iz2FXyIvP2AHfAeKAgbqr op56bFG3iTL+h3qXB8Qp7OuIRSQHM0I8qAxx1VTvcErCQ+eurotss4Jd0lyCM6oFPBCV 3yC9nPQ7LDHDFI3239UTbD1FOowdQ3plM8rRBczA121DMtqsTUpGgMUwGvpTOWJugwtD oSt6HSau96HWnYDVs042BRk8L+xi1rgi3zGSqMCJ+DbuyU32XM15L47hVuFWbqGhdirv nYdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="l27YZYv/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v1si15309094ejd.643.2021.06.27.22.06.45; Sun, 27 Jun 2021 22:07:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="l27YZYv/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232072AbhF1FE0 (ORCPT + 99 others); Mon, 28 Jun 2021 01:04:26 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:30266 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229578AbhF1FEZ (ORCPT ); Mon, 28 Jun 2021 01:04:25 -0400 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15S4jPvT147889; Mon, 28 Jun 2021 01:01:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id; s=pp1; bh=HNOACgFDiUJg+I5VVNhuPXKpATRWk+P3y3rpSndAr+M=; b=l27YZYv/5+8tzKBk42G8uZJ3UiRwCNPo/Oc/e3BG9Nex3z/QfG3Xyt/BaiUxirUDyjmX MxmbeZQ2LMrDUF1/PJkNkjVJq5DQzYyR2rbYY3rWMFKMmf0iLaGbSbvZR5vXXqG8/cW4 7UlFDqFIyGrnAFectjcoDwEYPGuir1DnMhS9LKJqXLwy1ZTkky7vC0WQJzxE70o2LFQv GfpyUCUNmsJniUrznj+GvVXcDwq5pvVXDfMEhip0zfku93R1X3nMRC5GgB55HRRPttEC G5M/CuyFJ20pxQM+891Py9ZhNtJBNl1Ax1V1nWP1ClCWei+4nC0wAKa48/jIL6b4+J6G 0Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 39f4uebgeg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Jun 2021 01:01:57 -0400 Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 15S4Z512115422; Mon, 28 Jun 2021 01:01:56 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com with ESMTP id 39f4uebgdt-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Jun 2021 01:01:56 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 15S4X9iW023347; Mon, 28 Jun 2021 04:39:27 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma04dal.us.ibm.com with ESMTP id 39ekx9d5nu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Jun 2021 04:39:27 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 15S4dRh340960470 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Jun 2021 04:39:27 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F1E07112065; Mon, 28 Jun 2021 04:39:26 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 80563112063; Mon, 28 Jun 2021 04:39:26 +0000 (GMT) Received: from localhost.localdomain (unknown [9.40.195.89]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 28 Jun 2021 04:39:26 +0000 (GMT) From: wenxiong@linux.vnet.ibm.com To: ming.lei@redhat.com Cc: linux-kernel@vger.kernel.org, james.smart@broadcom.com, dwagner@suse.de, wenxiong@us.ibm.com, Wen Xiong Subject: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port Date: Sun, 27 Jun 2021 22:14:32 -0500 Message-Id: <1624850072-17776-1-git-send-email-wenxiong@linux.vnet.ibm.com> X-Mailer: git-send-email 1.6.0.2 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: xxzHLPShmb1IQc0eSZVSHD39h0mpBlro X-Proofpoint-ORIG-GUID: N_GKISwL3-6fcmTho52Kf006AdfUXcqw X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.790 definitions=2021-06-28_03:2021-06-25,2021-06-28 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 spamscore=0 adultscore=0 malwarescore=0 suspectscore=0 priorityscore=1501 impostorscore=0 phishscore=0 lowpriorityscore=0 bulkscore=0 clxscore=1011 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106280032 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Wen Xiong Error inject: 1. run hash ppc64_cpu 2>/dev/null && ppc64_cpu --smt=4 2. Disable one SVC port (at switch) down for 10 mins 3. Enable port back 4. Linux crash System has two cores with 16 cpus like cpu0-cpu15. All cpus are online when system boots up. core0: cpu0-cpu7 online core1: cpu8-cpu15 online Issue the following cpu houplug command in ppc: cpu0-cpu3 are online cpu4-cpu7 are offline cpu8-cpu11 are online cpu12-cpu15 are offline After this cpu hotplug operations, the state of hctx are changed: - cpu0-cpu3(online): no change - cpu4-cpu7(offline): mask off. The state for each hctx set to INACTIVE, also realloc htcx for this cpu. - cpu8-cpu11(oneline): cpus are still active but hctxs are disable after calling realloc hctx - cpu12-cpu15(offline): mask off, The state for each hctx set to INACTIVE, hctxs are disable. From nvme/fc driver: nvme_fc_create_association() ->nvme_fc_recreate_io_queues() if ctrl->ioq_live=ture ->nvme_fc_connect_io_queues() ->blk_mq_update_nr_hw_queues() ->nvme_fc_connect_io_queues() ->nvmf_connect_io_queue() nvme_fc_connect_io_queues(struct nvme_fc_ctrl *ctrl, u16 qsize) { for (i = 1; i < ctrl->ctrl.queue_count; i++) { ret = nvmf_connect_io_queue(&ctrl->ctrl, i, false); set_bit(NVME_FC_Q_LIVE, &ctrl->queues[i].flags); } } After cpu hotplug, i loop from 1->8, let see what's happned if pass i: i = 1, call blk_mq_alloc_request_hctx with id = 0 ok i = 2, call blk_mq_alloc_request_hctx with id = 1 ok i = 3, call blk_mq_alloc_request_hctx with id = 2 ok i = 4, call blk_mq_alloc_request_hctx with id = 3 ok i = 5, call blk_mq_alloc_request_hctx with id = 4 crash (cpu = 2048) i = 6, call blk_mq_alloc_request_hctx with id = 5 crash (cpu = 2048) i = 7, call blk_mq_alloc_request_hctx with id = 6 crash (cpu = 2048) i = 8, call blk_mq_alloc_request_hctx with id = 7 crash (cpu = 2048) cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask); The patch fixed the crash issue when doing bouncing port on storage side + cpu hotplug. --- block/blk-mq-tag.c | 3 ++- block/blk-mq.c | 4 +--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 2a37731e8244..b927233bb6bb 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -171,7 +171,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) * Give up this allocation if the hctx is inactive. The caller will * retry on an active hctx. */ - if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))) { + if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state)) + && data->hctx->queue_num > num_online_cpus()) { blk_mq_put_tag(tags, data->ctx, tag + tag_offset); return BLK_MQ_NO_TAG; } diff --git a/block/blk-mq.c b/block/blk-mq.c index c86c01bfecdb..5e31bd9b06c2 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -436,7 +436,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, .cmd_flags = op, }; u64 alloc_time_ns = 0; - unsigned int cpu; unsigned int tag; int ret; @@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, data.hctx = q->queue_hw_ctx[hctx_idx]; if (!blk_mq_hw_queue_mapped(data.hctx)) goto out_queue_exit; - cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask); - data.ctx = __blk_mq_get_ctx(q, cpu); + data.ctx = __blk_mq_get_ctx(q, hctx_idx); if (!q->elevator) blk_mq_tag_busy(data.hctx); -- 2.27.0