Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp433586ybm; Tue, 26 May 2020 22:09:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRg+/43fh2Jhwsxl/u/Xicbz34IZ1tOyuH8+4xjpeFT/k0nrHxLnh6yj0fVMbXwiAzffoz X-Received: by 2002:a17:906:9617:: with SMTP id s23mr4125840ejx.505.1590556164992; Tue, 26 May 2020 22:09:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590556164; cv=none; d=google.com; s=arc-20160816; b=xaBEFAfBe225PI7YlVHUIHNzfkZn8DyFt8oBeuOcLluoYA6/3v5WV+NMItv9JigzNk WvREVZWo/SnLtbooridfHz5LYcCZKiRwNT5UtjEWR6uKWAiLpZi2JyeYC6qvBS9l7292 xXQlrhcBax60WW7VPrUMcsSED4E/MaOG2+COGixokbff/69nu4cmV+b2oaocwnvhFoZ6 EqXm+wbWfqZiJDSq/PJUTgKeA8NUdI6QvuTcczxLNruITpyvCksnTMp3Z0+6dliFNC7Q Z+hV1A/BRftnbkUnBmN2jSDcuVXEchGk1OfO82alzkZ7MKWyfCfA5uJcXJq/z7/5Eudr 8jBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=fo26F0NoW7UXAZI20P+HiT0dZgkQVqLV8VIU39gdYLc=; b=hXhrZugEJ/QgSp5R2LiEwwICPoJGC2HbvmJF2OxTuFTMtk8a3nLjw7wl9SyEqImoBg rFxQNghibaUk4Ri5ng0tdWHmzncmFR4FI54Imhc1cyJv12It3S7fEy6M5pHkGRsYqtCT bt/K0pY3UbPuvxPMDE+tP+41EGp93O/G6mKSOuouiDdjOFXVbYPFbWDKYEifgc82sMP7 h928KzR/AYZ/mCKP7JWOoApKiw/vZE9plTHDvMndTj8oOgzpAWDvd584rsYAjdSDehPJ In7+pj26VrdhXQxcIwDEXxuV9iXnJ9OI7LovCAbQy/CS9sl4GWjbEbCc88EN4Ntoa9sp dOFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RIu5+DdQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ju5si1143575ejb.676.2020.05.26.22.09.02; Tue, 26 May 2020 22:09:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RIu5+DdQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727969AbgE0ChS (ORCPT + 99 others); Tue, 26 May 2020 22:37:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725267AbgE0ChR (ORCPT ); Tue, 26 May 2020 22:37:17 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7921EC061A0F for ; Tue, 26 May 2020 19:37:17 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id k26so1096960wmi.4 for ; Tue, 26 May 2020 19:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fo26F0NoW7UXAZI20P+HiT0dZgkQVqLV8VIU39gdYLc=; b=RIu5+DdQ1VaplU0z7eJUcwrnEvkUPhv62+7muM9zNQ0E015+xMKbUgeS5ko57IQucf FITrEU2ZXrPskZADXu/RIjwEUK6gSMkm+SSt/pCSVymTBZgzxlQuUBcEm8EkXU788H5C MvSelDzU3ynksHlGQr25M+Px2uhaBkBfy93bR5mPlZteeXCrbZJhKjBVPOUQD2QPBUpk ZmvPUDQdD9kjHT3FPrCkd6Gl/XIN+ucasRiVdW0181Zs2kD1or00i92mghg5vGLu2TBr KPYM7bbLfbfx98KAPLVoxGLGJmNPJqhqAcuhi2/s66stfFP1bcvALS3r3ztmP/KwZspb xNfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fo26F0NoW7UXAZI20P+HiT0dZgkQVqLV8VIU39gdYLc=; b=FByKMlpZrEAUTeEiVZBtpHBDuAVWwgJRmOyWGl5HtlGwOMQ313FFDA/mRRqZaNCjBD 3n5FvQVOprnYS0B/no2EVWmzLelPvCrLYIltj9EdJZhnh8eqRLj3zZAX5b6w6hiBCTiL 7Go1LiPyiL9A/oeen8Le3f/uAz0IKh1rIc0TtomJ4KZADouZT1EvLbya3J1xbZCg8Qzb VqOcyMBVwN1FUrIcvnreEyszVApxlkma694gjKCwsAchnf1HipMdg721Z5Emh6XQTbrN vsMfiDhoHogA/4Hux/vPC/9Vem8FqGs880Jzcvhq7qRnl2APq6z7AAt2jQE3uqNQr5J9 V57w== X-Gm-Message-State: AOAM530nmpShmykkD/YHh31veT+tgMrbkJIXWhlxVnrrdfO2ara8kjD/ huH4wj3L531qBZ4Qkzi/xOKgci60jNog7AlqpP0= X-Received: by 2002:a05:600c:22c9:: with SMTP id 9mr1902422wmg.162.1590547036270; Tue, 26 May 2020 19:37:16 -0700 (PDT) MIME-Version: 1.0 References: <20200527004955.19463-1-dongli.zhang@oracle.com> In-Reply-To: <20200527004955.19463-1-dongli.zhang@oracle.com> From: Ming Lei Date: Wed, 27 May 2020 10:37:05 +0800 Message-ID: Subject: Re: [PATCH 1/1] nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll() To: Dongli Zhang Cc: linux-nvme , Keith Busch , Jens Axboe , Sagi Grimberg , Linux Kernel Mailing List , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 27, 2020 at 8:57 AM Dongli Zhang wrote: > > There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g., > when doing live reset while polling the nvme device. > > CPU X CPU Y > nvme_poll() > nvme_dev_disable() > -> nvme_stop_queues() > -> nvme_suspend_io_queues() > -> nvme_suspend_queue() > -> spin_lock(&nvmeq->cq_poll_lock); > -> nvme_reap_pending_cqes() > -> nvme_process_cq() -> nvme_process_cq() > > In the above scenario, the nvme_process_cq() for the same queue may be > running on both CPU X and CPU Y concurrently. > > It is much more easier to reproduce the issue when CONFIG_PREEMPT is > enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer > time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace > period. > > This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in > nvme_reap_pending_cqes(). > > Signed-off-by: Dongli Zhang > --- > drivers/nvme/host/pci.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 3726dc780d15..cc46e250fcac 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -1382,16 +1382,19 @@ static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown) > > /* > * Called only on a device that has been disabled and after all other threads > - * that can check this device's completion queues have synced. This is the > - * last chance for the driver to see a natural completion before > - * nvme_cancel_request() terminates all incomplete requests. > + * that can check this device's completion queues have synced, except > + * nvme_poll(). This is the last chance for the driver to see a natural > + * completion before nvme_cancel_request() terminates all incomplete requests. > */ > static void nvme_reap_pending_cqes(struct nvme_dev *dev) > { > int i; > > - for (i = dev->ctrl.queue_count - 1; i > 0; i--) > + for (i = dev->ctrl.queue_count - 1; i > 0; i--) { > + spin_lock(&dev->queues[i].cq_poll_lock); > nvme_process_cq(&dev->queues[i]); > + spin_unlock(&dev->queues[i].cq_poll_lock); > + } > } Looks a real race, and the fix is fine: Reviewed-by: Ming Lei thanks, Ming Lei