Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2306926pxb; Fri, 5 Feb 2021 14:33:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJyfB4ZYjG6fGLTb8O65PsfpQn6K1DebL2IPH10QCWb8CTC8bI05VXs0qFJf1IwxvMtoKyfw X-Received: by 2002:a17:906:688f:: with SMTP id n15mr6108421ejr.71.1612564389444; Fri, 05 Feb 2021 14:33:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612564389; cv=none; d=google.com; s=arc-20160816; b=AkL23pMi/W6oj5O8vcwvQ00+F5YH2GI3pbYgfnqLq3j68SMmuiqgC2t5Fu1YFy0O3h zg8BaJO7NpouEswnQ/PVyjdUNWvZhXuGbrn+u6XwCTkUsnANNK7NtU3cHFsdHPV+AFYs AIcwiig4MZwouYZBrPTkD1e7vd0I73wihyH14PFWj33UbX6+8Q9mOWGxnFLGPW/wWQUu X6K+vQyewBRLxoGA80MWCPFQkFYUgDlZAGwbsu+aI3tZVpG6tnFh0GWkGE/CKwV3CguY 9mlEfmpn3KASCcZAAYgZAjc11vlmrh4PGQyF+6kdDEmIotmotZ3ixFlJM5ZHWUYpAoy6 bkyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=OGYPCTds570rmYu6R24KsjJ2OovpDOZYQkZDAZ+Y7YU=; b=ISTHj4TDgm+8ObADcl/1wm4i+Ht3x34FZC4zexheMVfvTGGekUK1j01SDBRqAP1O/b pkX4z6Hh9FOV9gm27wLgF45wZISzHnUFYo0UPOyAuYIx/81tQd9xY5S25Zpe35H6ztCX JUnf0jnqhKJJ7jYxOQWbNgQ5jYqkreqsSuuxeK3aDjfg0K6qu5qrmP4Zxv0spceGo0ch REZQv4mgaUP1gGMPPRK5ot6O7Hipd6PgzL4wCiPF3YA50Yxfe1apnGK9eInH6vAVG5eS LuQAdYmZwKb56fZWfDCnKAfEAjtGKl2m4QDHwjT9x0PKgUJ6YzykXp9f4MAk2PjCVfve DwGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=JeYSg8rA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id df22si6254579edb.163.2021.02.05.14.32.45; Fri, 05 Feb 2021 14:33:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=JeYSg8rA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233147AbhBEWbA (ORCPT + 99 others); Fri, 5 Feb 2021 17:31:00 -0500 Received: from mail.kernel.org ([198.145.29.99]:44670 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232829AbhBEOyd (ORCPT ); Fri, 5 Feb 2021 09:54:33 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 909DF6507C; Fri, 5 Feb 2021 14:13:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1612534400; bh=WLCOk0yKmp1Fxy9rZNxmYOR/Idxo7Q1TOjFo7D7uzTc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JeYSg8rAzvflb1yPYGzBAnlyskE1+OwaNbRAFxca0uJoNQYuzkAbLCaRaVCjOpxWk QFn0PmrpahvFzfNr5pRCjPP2gC8zajMLjClV9ZVLR8dwjj13ZZ22mXi+pV8NB1bwLB Df0PkBlnLWCNvB9/EN5QWFLb7yCvt+ShQCtX3GEA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Brian King , "Martin K. Petersen" , Sasha Levin Subject: [PATCH 5.4 23/32] scsi: ibmvfc: Set default timeout to avoid crash during migration Date: Fri, 5 Feb 2021 15:07:38 +0100 Message-Id: <20210205140653.334716730@linuxfoundation.org> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210205140652.348864025@linuxfoundation.org> References: <20210205140652.348864025@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Brian King [ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ] While testing live partition mobility, we have observed occasional crashes of the Linux partition. What we've seen is that during the live migration, for specific configurations with large amounts of memory, slow network links, and workloads that are changing memory a lot, the partition can end up being suspended for 30 seconds or longer. This resulted in the following scenario: CPU 0 CPU 1 ------------------------------- ---------------------------------- scsi_queue_rq migration_store -> blk_mq_start_request -> rtas_ibm_suspend_me -> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me _______________________________________V | V -> IPI from CPU 1 -> rtas_percpu_suspend_me -> __rtas_suspend_last_cpu -- Linux partition suspended for > 30 seconds -- -> for_each_online_cpu(cpu) plpar_hcall_norets(H_PROD -> scsi_dispatch_cmd -> scsi_times_out -> scsi_abort_command -> queue_delayed_work -> ibmvfc_queuecommand_lck -> ibmvfc_send_event -> ibmvfc_send_crq - returns H_CLOSED <- returns SCSI_MLQUEUE_HOST_BUSY -> __blk_mq_requeue_request -> scmd_eh_abort_handler -> scsi_try_to_abort_cmd - returns SUCCESS -> scsi_queue_insert Normally, the SCMD_STATE_COMPLETE bit would protect against the command completion and the timeout, but that doesn't work here, since we don't check that at all in the SCSI_MLQUEUE_HOST_BUSY path. In this case we end up calling scsi_queue_insert on a request that has already been queued, or possibly even freed, and we crash. The patch below simply increases the default I/O timeout to avoid this race condition. This is also the timeout value that nearly all IBM SAN storage recommends setting as the default value. Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com Signed-off-by: Brian King Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 8a76284b59b08..523809a8a2323 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -2881,8 +2881,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev) unsigned long flags = 0; spin_lock_irqsave(shost->host_lock, flags); - if (sdev->type == TYPE_DISK) + if (sdev->type == TYPE_DISK) { sdev->allow_restart = 1; + blk_queue_rq_timeout(sdev->request_queue, 120 * HZ); + } spin_unlock_irqrestore(shost->host_lock, flags); return 0; } -- 2.27.0