Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3979238pxv; Mon, 28 Jun 2021 18:22:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeX6biOGYSTuASyCMUEq+e1J6lZkiB9GP/6h7/sLZpB7Mv6+uO/gP0xY45b5u59nrT8sH9 X-Received: by 2002:a05:6402:1c8e:: with SMTP id cy14mr37572345edb.271.1624929752779; Mon, 28 Jun 2021 18:22:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624929752; cv=none; d=google.com; s=arc-20160816; b=l9bvk02M47MsO9wBTPHiFGhu6h5mAG5KapoaOVerCf9xq/OsmerREyGszx6Vll8vB+ inR8sZt4BCAP5kHNTDhwxmEmbdieEwDVwudwrU5dAEUbk6uDmLE1csVytDy2yuj1D70B jWIig7GE26h6PhXLY0t1t7gJhhr1zQXcJu7XZn9ihWuTCTXrXFvXS6rSpMJ9IDklINNe 0g5urR9YsmjRNddwZCWHjNJx9rkphFWC7b/4A3N2Z2DiDt9V01AlhoPBEajPHzWB7PrG 0JOEe90YgMds9P1f9fdc3rjAIH8XWySZHzE9KLLUa6YPGxBKGq4YwhdnRPluoHRs9grz 5mfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=8my+B2u6Mv5G6UAGgaoAGTY9mezH7lY6HPdTDeZfUv8=; b=n2iDZrCbY7t1UqsJQZequHRgEA8O3kwhST16+oecq1hqU99HMrUMmrEESLKNkSIo4o dayFCSZuj1rvJGAPeafNZiroQ6rL8FIybNzQHX5rXLDX6mUHOxC3xK0RrDlobuu4K4OR lZtN7u+/7zwbNu1wPZVfib9RjJ6xxFK0OkY4F29mrsXwvAOVHXYxTH4HY2T9D7qvQE5u BW4ByQ/i9g/1ZDGjw5wcJpZlugqB8JpuoEP+U5VA0kwA+CbkdBYb2rrA0+xvDtXSJJCu XsMZb8TJ1eyGJlxGU/Y8x7Zis8BFV439k4T8dWieXEjmYDG5TQ7vXsYYmHM2P+5zlfbX foAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZN7eurGt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hs39si11572768ejc.324.2021.06.28.18.22.09; Mon, 28 Jun 2021 18:22:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZN7eurGt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231472AbhF2BXI (ORCPT + 99 others); Mon, 28 Jun 2021 21:23:08 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26254 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230163AbhF2BXI (ORCPT ); Mon, 28 Jun 2021 21:23:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1624929641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8my+B2u6Mv5G6UAGgaoAGTY9mezH7lY6HPdTDeZfUv8=; b=ZN7eurGtGRQTaT3giYOndBVxkC38XJLY7ODsk1EQ+bg6cxElfuCAMGSjjMagCqqLLoNelv jjyPeOveUgIja0GTB6S105EV327g0LJ0e+0s01BlyiKSY2l4u7OoT7/J6w2JM4ZUyMeAoP zCnsM1E71pd2+B4/0rjPrWxxb9yJnUE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-518-6CCA5cs1Nu2CnSI4SivSIQ-1; Mon, 28 Jun 2021 21:20:39 -0400 X-MC-Unique: 6CCA5cs1Nu2CnSI4SivSIQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A270D18D6A2C; Tue, 29 Jun 2021 01:20:38 +0000 (UTC) Received: from T590 (ovpn-12-115.pek2.redhat.com [10.72.12.115]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 850075D9F0; Tue, 29 Jun 2021 01:20:31 +0000 (UTC) Date: Tue, 29 Jun 2021 09:20:27 +0800 From: Ming Lei To: wenxiong@us.ibm.com Cc: Daniel Wagner , linux-kernel@vger.kernel.org, james.smart@broadcom.com, wenxiong@us.ibm.com, sagi@grimberg.me Subject: Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port Message-ID: References: <1624850072-17776-1-git-send-email-wenxiong@linux.vnet.ibm.com> <20210628090703.apaowrsazl53lza4@beryllium.lan> <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Wenxiong, On Mon, Jun 28, 2021 at 01:17:34PM -0500, wenxiong wrote: > > > > > The root cause is that blk-mq doesn't work well on tag allocation from > > specified hctx(blk_mq_alloc_request_hctx), and blk-mq assumes that any > > request allocation can't cross hctx inactive/offline, see > > blk_mq_hctx_notify_offline() > > Hi Ming, > > I tried to pass online cpu_id(like cpu=8 in my case) to > blk_mq_alloc_request_hctx(), > data.hctx = q->queue_hw_ctx[hctx_idx]; > but looks like data.hctx returned with NULL. So system crashed if accessing > data.hctx later. > > blk-mq request allocation can't cross hctx inactive/offline but blk-mq still > reallocate the hctx for offline cpus(like cpu=4,5,6,7 in my case) in > blk_mq_realloc_hw_ctxs() and hctx are NULL for online(cpu=8 in my case)cpus. > > Below is my understanding for hctxs, please correct me if I am wrong: > > Assume a system has two cores with 16 cpus: > Before doing cpu hot plug events: > cpu0-cpu7(core 0) : hctx->state is ACTIVE and q->hctx is not NULL. > cpu8-cpu15(core 1): hctx->state is ACTIVE and q->hctx is not NULL > > After doing cpu hot plug events(the second half of each core are offline) > cpu0-cpu3: online, hctx->state is ACTIVE and q->hctx is not NULL. > cpu4-cpu7: offline,hctx->state is INACTIVE and q->hctx is not NULL > cpu8-cpu11: online, hctx->state is ACTIVE but q->hctx = NULL > cpu12-cpu15:offline, hctx->state is INACTIVE and q->hctx = NULL > > So num_online_cpus() is 8 after cpu hotplug events. Either way not working > for me, no matter I pass 8 online cpus or 4 online/4 offline cpus. > > Is this correct? If nvmf pass online cpu ids to blk-mq, why it still > crashes/fails? NVMe users have to pass correct hctx_idx to blk_mq_alloc_request_hctx(), but from the info you provided, they don't provide valid hctx_idx to blk-mq, so q->queue_hw_ctx[hctx_idx] is NULL and kernel panic. I believe Daniel's following patch may fix this specific issue if your controller is FC: [1] https://lore.kernel.org/linux-nvme/YNXTaUMAFCA84jfZ@T590/T/#t Thanks, Ming