Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1343190imm; Fri, 12 Oct 2018 16:45:13 -0700 (PDT) X-Google-Smtp-Source: ACcGV62wlQqXnvZqDjDT2TZBvhX1hyy79omVW7KpVlQfapqe5Sq/+yrix+e6I83HJ2t4dzMtLDGd X-Received: by 2002:a63:4904:: with SMTP id w4-v6mr7322699pga.303.1539387913518; Fri, 12 Oct 2018 16:45:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539387913; cv=none; d=google.com; s=arc-20160816; b=DTAMLUUzka4BZBE5PUfcXtY63F5/wEXfn8zZ86yOCaQGVGXDWpc6z8RXvhwlSwKf7i 3TRX0d+SSzL6PrbJhCJPknr0ooldNHaGL0C0VxWDCaUGfpVHgZYjFMfd/TZ/xq2uC9T/ ubGq45rd8Xqtrx56ba6QhFEUywFjYRJBl2JyvhRvyLk5J0kaJELVHMu+TkVnmJK0c16j WUtkitsxRZeQzXukbS45LLFMET77dTzX35yrUV7daRTWdTLhPZJTcf2KHQTqIk/GItz+ zzpu3Bt23+AA8T/ktVNBSUljMVISTh3G0WdrMoi5O3tNWSkk9b0zimn+eONow8rliO4n 0l0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=VC0Vng2iv3gqMDKoLyBXjqT9Z3a9MxsZyOQobHEjcjs=; b=jBVvNvkLG4xFCUjQNjcoWqG0TGfVccl49hCmzaRX+xZw8YyqER6x3uiB/ekYOEOXOT 3NjZ1yJ8iADjZGKt782Nfd5rapIuTc+0uagS5VTkReAG7kK0Opvq5wPctXijCzPtWr3k 2rOrDGNcIPShjEUjr/h6AJuR0UlzdKcC1pZ5jgF3dHLkAutFMsUEhMtyt8cVy03tk6yc j24Vuhwqr9rVHZ7jAuBrzo890m8jkmqClIrhD8miZOy7fUg7uXreZks6PW/XD8YEywff 3xryD2kfLFGZdWDEUQAO2EkfNw0jdHGut9WMvt+w2ImWaWnAM0xh43GHIWQBzCJ9qGfs SpSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Z2kQhr9u; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m6-v6si2860955pfg.282.2018.10.12.16.44.25; Fri, 12 Oct 2018 16:45:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Z2kQhr9u; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726099AbeJMHRy (ORCPT + 99 others); Sat, 13 Oct 2018 03:17:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:36944 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725954AbeJMHRy (ORCPT ); Sat, 13 Oct 2018 03:17:54 -0400 Received: from localhost (unknown [69.71.4.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A559221470; Fri, 12 Oct 2018 23:43:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1539387787; bh=cWvNVFtpqgxwHEOE+rzrb5rU6OU3+lS9DtFcSf9Qt5E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Z2kQhr9uhhqkfAaR8RhgEVYZH/dF5I3K5gYP+QLq6hk0KwLM1dPpl92Glw8qaJCQj z0ktnmF9HSSZWGrxFfQJa5dswSwmvE0TbOJJ6+nU+gqbwXhZVyEow5rkmO1PxDFV2d hqMydNgdVALvFf1s3q5COfHnb4O4JKYPvv07J500= Date: Fri, 12 Oct 2018 18:43:06 -0500 From: Bjorn Helgaas To: Suganath Prabu Subramani Cc: lukas@wunner.de, linux-scsi@vger.kernel.org, linux-pci@vger.kernel.org, Andy Shevchenko , Sathya Prakash , Sreekanth Reddy , linux-kernel@vger.kernel.org, benh@kernel.crashing.org, ruscur@russell.cc, sbobroff@linux.ibm.com, Oliver Subject: Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available Message-ID: <20181012234306.GZ5906@bhelgaas-glaptop.roam.corp.google.com> References: <1537935759-14754-1-git-send-email-suganath-prabu.subramani@broadcom.com> <1537935759-14754-2-git-send-email-suganath-prabu.subramani@broadcom.com> <20180926213241.GI28024@bhelgaas-glaptop.roam.corp.google.com> <20180927070327.wvl5glb4c7gtjoa3@wunner.de> <20180927184746.GM28024@bhelgaas-glaptop.roam.corp.google.com> <20180927190959.dio4qwti3i4accjm@wunner.de> <20181001204051.GD142872@bhelgaas-glaptop.roam.corp.google.com> <20181002140443.GA120535@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 08, 2018 at 12:14:40PM +0530, Suganath Prabu Subramani wrote: > On Tue, Oct 2, 2018 at 7:34 PM Bjorn Helgaas wrote: > > On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote: > > > I think the names "pci_device_is_present()" and > > > "mpt3sas_base_pci_device_is_available()" contribute to the > > > problem because they make promises that can't be kept -- all we > > > can say is that the device *was* present, but we don't know > > > whether it is *still* present. > > In the patch we are using '!' (i.e. not operation) of > pci_device_is_present(), which is logically same as pci_device_is > absent, and it is same for mpt3sas_base_pci_device_is_available(). > > My understanding is that, you want us to rename these functions for > better readability. Is that correct ? I think "pci_device_is_present()" is a poor name, but I'm not suggesting you fix it in this patch. Changing that would be a PCI core change that would also touch the tg3, igb, nvme, and now mpt3sas drivers that use it. My personal opinion is that you should do something like the patch below. The main point is that you should for the device being unreachable *at the places where you might learn that*. If you WRITE to a device that has been removed, the write will mostly likely get dropped and you won't learn anything. But when you READ from a device that has been removed, you'll most likely get ~0 data back, and you can check for that. Of course, it's also possible that pci_device_is_present() can tell you something. So you *could* sprinkle those all over, as in your patch. But you end up with code like this: if (!pci_device_is_present()) return; writel(); readl(); Here, the device might be removed right after pci_device_is_present() returns true. You do the writel() and the readl() anyway, so you really haven't gained anything. And if the readl() returned ~0, you assume it's valid data when it's not. It's better to do this: writel(); val = readl(); if (val == ~0) { /* device is unreachable, clean things up */ ... } (Obviously it's possible that ~0 is a valid value for whatever you read from the device. That depends on the device and you have to use your knowledge of the hardware. If you read ~0 from a register where that might be a valid value, you can read from a different register for with ~0 is *not* a valid value.) I don't claim the patch below is complete because I don't know anything about your hardware and what sort of registers or ring buffers it uses. You still have to arrange to flush or complete whatever is outstanding when you learn the device is gone. Bjorn diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 59d7844ee022..37b09362b31a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -6145,6 +6145,11 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc) drsprintk(ioc, pr_info(MPT3SAS_FMT "wrote magic sequence: count(%d), host_diagnostic(0x%08x)\n", ioc->name, count, host_diagnostic)); + if (host_diagnostic == ~0) { + ioc->remove_host = 1; + pr_err(MPT3SAS_FMT "HBA unreachable\n", ioc->name); + goto out; + } } while ((host_diagnostic & MPI2_DIAG_DIAG_WRITE_ENABLE) == 0); @@ -6237,6 +6242,11 @@ _base_make_ioc_ready(struct MPT3SAS_ADAPTER *ioc, enum reset_type type) ioc_state = mpt3sas_base_get_iocstate(ioc, 0); dhsprintk(ioc, pr_info(MPT3SAS_FMT "%s: ioc_state(0x%08x)\n", ioc->name, __func__, ioc_state)); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable (ioc_state=0x%x)\n", + ioc->name, __func__, ioc_state); + return -EFAULT; + } /* if in RESET state, it should move to READY state shortly */ count = 0; @@ -6251,6 +6261,11 @@ _base_make_ioc_ready(struct MPT3SAS_ADAPTER *ioc, enum reset_type type) } ssleep(1); ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable (ioc_state=0x%x)\n", + ioc->name, __func__, ioc_state); + return -EFAULT; + } } } @@ -6854,6 +6869,11 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc) ioc->pending_io_count = 0; ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable\n", + ioc->name, __func__); + return; + } if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL) return; @@ -6909,6 +6929,14 @@ mpt3sas_base_hard_reset_handler(struct MPT3SAS_ADAPTER *ioc, MPT3_DIAG_BUFFER_IS_RELEASED))) { is_trigger = 1; ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable\n", + ioc->name, __func__); + ioc->schedule_dead_ioc_flush_running_cmds(ioc); + r = 0; + mutex_unlock(&ioc->reset_in_progress_mutex); + goto out_unlocked; + } if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) is_fault = 1; } diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 53133cfd420f..d0d4c8d94a10 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -2653,6 +2653,11 @@ mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, u64 lun, } ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_info(MPT3SAS_FMT "%s: HBA unreachable\n", + __func__, ioc->name); + return FAILED; + } if (ioc_state & MPI2_DOORBELL_USED) { dhsprintk(ioc, pr_info(MPT3SAS_FMT "unexpected doorbell active!\n", ioc->name));