Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1832887imm; Mon, 3 Sep 2018 10:35:45 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYa38T8vfAMBDwPrBZ+Xu44Q1ccEKtJmNXV9nx8YOafsIz42PL/5rCb8A7FTvfivWS6ll1W X-Received: by 2002:a63:6343:: with SMTP id x64-v6mr26030781pgb.173.1535996145319; Mon, 03 Sep 2018 10:35:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535996145; cv=none; d=google.com; s=arc-20160816; b=kt4OLpWAT+0ERYF+ebPToPf2pNIQ/o9m6WijKTVVUDZVALLbE8mKfMlxR3FgpqoT1v yGhGdVLxdkgucMa9Oz6T2QJ4J9dxaJj1MzzdiuJOItWVt/SC5PpTwQF2dxIikFR8WN68 TQ08WRaan1n8vgjkZ94lIc3ngN9mY0vz1BuZ9w20Jauce3sFjahkjZMZmjFgPwuUFVO/ WM0Ppa4BCVj+Tzy4lTpbZSGTuEuPYQ0rwfTaE+rHuHddEPTJwSz0gWZww9LPBVylFois Ixq/NDHnLIi6log6rqSdIAUcMnNDyCW1rLXids++4aSH2iUXm037HMrDN5IWMxF7MRwD +4pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=SJk5ti7oIiHuxDkzWO9ZBZ1jVnK021uwu+oRD0YpeDc=; b=eHy/J50rQJEoF6JzPCs6fwzH94lSY0WZZ0LLuETj+7qsbs1io4D7VpQunQpmbdNoQp ljOpqCLBEvb4+Y2flZhOwdHrSPt0oMRieiw5C1hlTWXL+kjMBeD/eAJLQ8+81GIGj/XD ZvYuI3ucOUK9YmdiBroyyIm4p4VtcquQiejB741+moOEZegDDByZWrqsVaS0uU+vz0TU Q5Ap0+4GxOgyQsTzPLqqwlxxVzzFHqcoA+XIZarJMYAoxaGs18SeMYyAOoJEBBgSw7Pv cdZNksHo+YmutQb4+u0ItLZJhxYbFVvg69GRa40ncPah5K+ufK1mf0oFjarAd8g8wddS oG3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q74-v6si21077491pfq.32.2018.09.03.10.35.30; Mon, 03 Sep 2018 10:35:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731189AbeICVze (ORCPT + 99 others); Mon, 3 Sep 2018 17:55:34 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:47762 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728335AbeICVze (ORCPT ); Mon, 3 Sep 2018 17:55:34 -0400 Received: from localhost (ip-213-127-74-90.ip.prioritytelecom.net [213.127.74.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 720B2D16; Mon, 3 Sep 2018 17:34:24 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Michal Wnukowski , Keith Busch , Sagi Grimberg , Christoph Hellwig Subject: [PATCH 4.18 046/123] nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event Date: Mon, 3 Sep 2018 18:56:30 +0200 Message-Id: <20180903165721.410878907@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180903165719.499675257@linuxfoundation.org> References: <20180903165719.499675257@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.18-stable review patch. If anyone has any objections, please let me know. ------------------ From: Michal Wnukowski commit f1ed3df20d2d223e0852cc4ac1f19bba869a7e3c upstream. In many architectures loads may be reordered with older stores to different locations. In the nvme driver the following two operations could be reordered: - Write shadow doorbell (dbbuf_db) into memory. - Read EventIdx (dbbuf_ei) from memory. This can result in a potential race condition between driver and VM host processing requests (if given virtual NVMe controller has a support for shadow doorbell). If that occurs, then the NVMe controller may decide to wait for MMIO doorbell from guest operating system, and guest driver may decide not to issue MMIO doorbell on any of subsequent commands. This issue is purely timing-dependent one, so there is no easy way to reproduce it. Currently the easiest known approach is to run "Oracle IO Numbers" (orion) that is shipped with Oracle DB: orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \ concat -write 40 -duration 120 -matrix row -testname nvme_test Where nvme_test is a .lun file that contains a list of NVMe block devices to run test against. Limiting number of vCPUs assigned to given VM instance seems to increase chances for this bug to occur. On test environment with VM that got 4 NVMe drives and 1 vCPU assigned the virtual NVMe controller hang could be observed within 10-20 minutes. That correspond to about 400-500k IO operations processed (or about 100GB of IO read/writes). Orion tool was used as a validation and set to run in a loop for 36 hours (equivalent of pushing 550M IO operations). No issues were observed. That suggest that the patch fixes the issue. Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Signed-off-by: Michal Wnukowski Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg [hch: updated changelog and comment a bit] Signed-off-by: Christoph Hellwig Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/host/pci.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -316,6 +316,14 @@ static bool nvme_dbbuf_update_and_check_ old_value = *dbbuf_db; *dbbuf_db = value; + /* + * Ensure that the doorbell is updated before reading the event + * index from memory. The controller needs to provide similar + * ordering to ensure the envent index is updated before reading + * the doorbell. + */ + mb(); + if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value)) return false; }