Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp863431imu; Fri, 11 Jan 2019 10:20:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN4r6RUCwdTXlLV4e9u0CEVM/y6Wh3EAIg5qDBgAGQ3AJg/l8pPG8sHU+nFMWazKBbrLBkyl X-Received: by 2002:a63:fd0a:: with SMTP id d10mr14438624pgh.164.1547230841289; Fri, 11 Jan 2019 10:20:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547230841; cv=none; d=google.com; s=arc-20160816; b=pgNOty/Sd4wNj4Iy7YeutVBCQ/qMBMvpUz/FctZaMh/aOCDWgliGZUGGHt27ryS6I0 31w52bBrbGMqUwzjX3yA+Wpi28thIi59N5vP9MxHDuWFSWu7iEBixSVUfEP27PW3T0l1 0WJ1xXzfAY37P1klW+VwZCUWC7v1bYy/UBMIOcbgL/miiL4iyuvSrvm031w0q7liD9np pCmx+zMtQICQBkgNcfoEBu+BgTxPLlQjKlCNr0LccsRUOdqHasY7E89P7fH0D+p7ARXi Ms2awjp7sRERnEtiSzjJqtVlBh5do2fwNh5RB6vJv89BZCCArccQ1kfbqS9cThyYssJQ gYhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=vyHmZKKOaRqB3h8L/DM426N6ViRMkwLNhAML/+eB80Q=; b=jxcieYvRKL/xNFo85H+N74CCtu6CfQmW6/YoE5z+jhMSS/2JePQ4I+WWyNmguhggx2 gSBLqm2yV7JRMUilHI7Zevj7TD753GLMfRZH3SKqZ40LVmVajqw18lx3/dTZfDi+WZ5q U1nnByg7DnfKmSq1U0Gc9cE++6BXtHQp63hNCNdVGsm/Z5jK/fK735rAeVEo3s6fOGC3 qhjNAX4sumM0PnDwm24GMPANxiZh5hJAlPvnRkbTBw+3kCqAwmlrxam+kKDqEMBB8EvP yddIRUzjFfS6Z/W7QGxUT4G7DLe12PjHX7FGwFwn7CAZ2u5Ua+gcJCCa6r5MCTD3Yd0N dVRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=wmWeaQ0f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s80si51985636pfa.130.2019.01.11.10.20.26; Fri, 11 Jan 2019 10:20:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=wmWeaQ0f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733161AbfAKOXN (ORCPT + 99 others); Fri, 11 Jan 2019 09:23:13 -0500 Received: from mail.kernel.org ([198.145.29.99]:35522 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730291AbfAKOS7 (ORCPT ); Fri, 11 Jan 2019 09:18:59 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7168F21783; Fri, 11 Jan 2019 14:18:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547216337; bh=X7g7zLRb5N05f6ui152XAkkGEJtbWR4HbtAm9/2Oo9E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=wmWeaQ0fPfnGVWmOBcTTRW1OhuRuUcFH8e8vBbjtaCRmImIkfL7+zDNGcNt+wwZIn 5dcqZtqEUEdhMrY0E4miMhAUrxXTyfwpXBCCMXyNgCVc4vaMRTOCzqojilRGBMJNL9 KxGxqj3SFzRBJyvXp3FTxboXZw4PTECTTZ1PWdQM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Steffen Maier , Jens Remus , "Martin K. Petersen" Subject: [PATCH 4.4 63/88] scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown Date: Fri, 11 Jan 2019 15:08:32 +0100 Message-Id: <20190111131056.636926513@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190111131045.137499039@linuxfoundation.org> References: <20190111131045.137499039@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Steffen Maier commit 60a161b7e5b2a252ff0d4c622266a7d8da1120ce upstream. Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up. Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel. As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification. Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item. Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error(). An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency"); v2.6.37 commit d3e1088d6873 ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit fca55b6fb587 ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit c57a39a45a76 ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")] Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work. Signed-off-by: Steffen Maier Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: #2.6.27+ Reviewed-by: Jens Remus Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_aux.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/drivers/s390/scsi/zfcp_aux.c +++ b/drivers/s390/scsi/zfcp_aux.c @@ -275,16 +275,16 @@ static void zfcp_free_low_mem_buffers(st */ int zfcp_status_read_refill(struct zfcp_adapter *adapter) { - while (atomic_read(&adapter->stat_miss) > 0) + while (atomic_add_unless(&adapter->stat_miss, -1, 0)) if (zfcp_fsf_status_read(adapter->qdio)) { + atomic_inc(&adapter->stat_miss); /* undo add -1 */ if (atomic_read(&adapter->stat_miss) >= adapter->stat_read_buf_num) { zfcp_erp_adapter_reopen(adapter, 0, "axsref1"); return 1; } break; - } else - atomic_dec(&adapter->stat_miss); + } return 0; }