Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4374684pxv; Tue, 29 Jun 2021 05:43:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5ZXmyr5QkZu+IuYdwr/JMuKfQRBRXRZqqmrrtFCMBxB1uMaJWhDowPClGpnQe9UrOeq+/ X-Received: by 2002:a05:6402:5142:: with SMTP id n2mr711140edd.241.1624970614384; Tue, 29 Jun 2021 05:43:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624970614; cv=none; d=google.com; s=arc-20160816; b=mXRUZprgMsSZQmEtZFg5UXRwUGfo0x4kCq0R/oYBb7lMoxsJIeScjMwEbMtd7QLlHY 4rxQzHrzhlrF0bGn7gAZh+FwE2vuXVYYpikoTfGLsU5SdJrIqBmtrIftUkHpIjLLgWVs g3ESSkJ9HS/WUrVY3XLWKsIKZkXHQvMOFhjB8GyR4u0XAtmSeNptUSaCtFP0udaRwKcy r7Tdgm4fKt6MycK5e1VAsnrCZBxPuc4w8RZSQK1sqPE35HKVj/TklRUFyTRpbH73iEn0 Iiy55CCURUKvRZYpIL/Drfo+vogGiVSffbXnNXEiUknXkEWOAyTfqo0KTtLXZCeD12Un u9TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=twhDQ4HTaqBbixqNEeTtjk8dI06dUYLdtqAEjTaLXSM=; b=01R3xqmX2IKbYnj0DPcsfvtc0H0ww0IyEdLREjZJwGLRd89Q2ID9ucQVrj9x18AHnF JfLlUYAnVBetxgcOkIYikzQDTDqN1X6pdyYPN8NFfd6klfqigcJndIF9fE3CFIqjMG36 zCOweah7lc23SO47dBEp9OK82aq7K/6bP2Ng6v4IFJTGb9mBn0H5Ti0bCbaARqdN7Y9l 6NP42bVoUagnti/PWdH0TdH2onpPq48GjVLPMmEsGf3plvNMSRETIMn9WL5obNsTcNbe koocDrVFonlT+JruwzxysDAMKMOUUuEB6k4yNKAl6tOrikBwMuhJZvF+yGN3cFKI4Nti lyZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XtQkohbw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b13si12445005ede.243.2021.06.29.05.43.10; Tue, 29 Jun 2021 05:43:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XtQkohbw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232685AbhF2KJG (ORCPT + 99 others); Tue, 29 Jun 2021 06:09:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34197 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232420AbhF2KJF (ORCPT ); Tue, 29 Jun 2021 06:09:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1624961197; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=twhDQ4HTaqBbixqNEeTtjk8dI06dUYLdtqAEjTaLXSM=; b=XtQkohbwqf8QdGVrbG9xx0VuInXEvV7av3KMBn4U+UWLDiA9OEviVMwKmHP69sez2zrdDm ejEetHYj0dFrt7aZU/2S7EPf4XV6sSZVWEyDs9Lb6dTFniD02KgYF3FdkdP5V5SB+EdoH4 B3Suemjwaoex8Wrgty0yFfthuYDpi2w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-204-Hm-WF9O1MFu1rVgXmaK_tw-1; Tue, 29 Jun 2021 06:06:36 -0400 X-MC-Unique: Hm-WF9O1MFu1rVgXmaK_tw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D6F41804142; Tue, 29 Jun 2021 10:06:34 +0000 (UTC) Received: from T590 (ovpn-13-8.pek2.redhat.com [10.72.13.8]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A17D45C1D0; Tue, 29 Jun 2021 10:06:26 +0000 (UTC) Date: Tue, 29 Jun 2021 18:06:21 +0800 From: Ming Lei To: Daniel Wagner Cc: Wen Xiong , james.smart@broadcom.com, linux-kernel@vger.kernel.org, sagi@grimberg.me, wenxiong@linux.vnet.ibm.com Subject: Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port Message-ID: References: <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com> <20210629082542.vm3yh6k36d2zh3k5@beryllium.lan> <20210629083549.unco3f7atybqypw3@beryllium.lan> <20210629092719.n33t2pnjiwwe6qun@beryllium.lan> <20210629094938.r3h5cb7wwu2v3r3m@beryllium.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210629094938.r3h5cb7wwu2v3r3m@beryllium.lan> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 29, 2021 at 11:49:38AM +0200, Daniel Wagner wrote: > On Tue, Jun 29, 2021 at 05:35:51PM +0800, Ming Lei wrote: > > With the two patches I posted, __nvme_submit_sync_cmd() shouldn't return > > error, can you observe the error? > > There are still ways the allocation can fail: > > ret = blk_queue_enter(q, flags); > if (ret) > return ERR_PTR(ret); > > ret = -EXDEV; > data.hctx = q->queue_hw_ctx[hctx_idx]; > if (!blk_mq_hw_queue_mapped(data.hctx)) > goto out_queue_exit; The above failure is supposed to be handled as error, either queue is frozen or hctx is unmapped. > > No, I don't see any errors. I am still trying to reproduce it on real > hardware. The setup with blktests running in Qemu did work with all > patches applied (the once from me and your patches). > > About the error argument: Later in the code path, e.g. in > __nvme_submit_sync_cmd() transport errors (incl. canceled request) are > handled as well, hence the upper layer will see errors during connection > attempts. My point is, there is nothing special about the connection > attempt failing. We have error handling code in place and the above > state machine has to deal with it. My two patches not only avoids the kernel panic, but also allow request to be allocated successfully, then connect io queue request can be submitted to driver even though all CPUs in hctx->cpumask is offline, then nvmef can be setup well. That is the difference with yours to fail the request allocation, then connect io queues can't be done, and the whole host can't be setup successfully, then become a brick. The point is that cpu offline shouldn't fail to setup nvme fc/rdma/tcp/loop. > > Anyway, avoiding the if in the hotpath is a good thing. I just don't > think your argument about no error can happen is correct. Again, it isn't related with avoiding the if, and it isn't in hotpath at all. Thanks, Ming