Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4092199pxb; Mon, 8 Feb 2021 07:46:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJyR+H+xp71kZ0oMs3ZRl0huAfC10aMkGfVMrGYaQ9cv9h0DSsnv0isveFl/hbco4xn6PzpY X-Received: by 2002:a17:906:d93:: with SMTP id m19mr17262605eji.212.1612799183589; Mon, 08 Feb 2021 07:46:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612799183; cv=none; d=google.com; s=arc-20160816; b=qWk0zmTdhKZWpKqsyGU0tZH2eGFjYYqUAceAKR2DGtyt3J9Stdv/QQq/FVW5LcTj+p fJytZTrYlWUVweCmKXRlH5RiktV+nExrFaY6/tL9hcBOorNZGmX8Z/VGhBp9cIfnqYWD 8vJLyMQppMLPam6o+6RsXFVbYdqxmNSnuK3iy9tWqPJVzabOlb+touKthJkaW/7XyPPU u6SviXy6adHjpoJU8FL4eEVT0HcuaouI+PgcxwUEocNYw+QaaaF565tLASMIzOlzLHrH JJAS75LUfWMwMw+Pea8qUsgjvLW8aT8395r+9oumretF2+ArQbcgEw5LSxf1pCQXmidM JbGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=T7utcdMEGNUuT65GawkjhAkUIztGYHMCb3Unde2EoHQ=; b=FUgd2COtDBe6ROhL9uHLVl2n5TJWfB37rAKc4Qdflh3FkC0/z7nKbW3qdVPlOsMkYc fo0ps/TPePxNBF/MjApOZ7FRR+EZ1IebPjPodvqFtLIKdIcxlGw8uzqpWn3buuzK6RhA XCzVCe10kohIErzeXXR3PWGU3Md5NbLWFK8xGZqr5S3cBVIxVhwWQuDWoY8f+OUWqL5p QUeG8Y4o+767wkSwraBVG8b9aMGfHpTJvLvol85/KRHQ5oSTHKMhy+qHPiqt28w5qMXU SmDbDdaGm7Vhu/Ac3bs1FxplBlqLNxdzK3DK3aqrZcYfaGRBSZFZDd2RYGPxe+JT7mH3 949g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=OrZKehvl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k8si11576746edn.92.2021.02.08.07.45.57; Mon, 08 Feb 2021 07:46:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=OrZKehvl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233983AbhBHPnG (ORCPT + 99 others); Mon, 8 Feb 2021 10:43:06 -0500 Received: from mail.kernel.org ([198.145.29.99]:52452 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231180AbhBHPFs (ORCPT ); Mon, 8 Feb 2021 10:05:48 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 183F764ECA; Mon, 8 Feb 2021 15:04:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1612796649; bh=Nogwv7lW236FlG9laAEQyERHcHhF0IvFYA0aUlAnmuo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OrZKehvlFIZ0DHk3ti0I+iRS4wygyQcuKg9oUDK1CbR6lFWeNwQTaTRcMONIyn8IJ BzkvOMSfOKpQFDy0BT642Ap+6rFpBeTSbwG1ol0XZJbGMh09QtlFzx0HLXF0J28Brx RHFaC41Ttgy7Pi6iQEs0IdA6U0zEF0tUTbEjtWoQ= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Brian King , "Martin K. Petersen" , Sasha Levin Subject: [PATCH 4.9 16/43] scsi: ibmvfc: Set default timeout to avoid crash during migration Date: Mon, 8 Feb 2021 16:00:42 +0100 Message-Id: <20210208145806.966964025@linuxfoundation.org> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210208145806.281758651@linuxfoundation.org> References: <20210208145806.281758651@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Brian King [ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ] While testing live partition mobility, we have observed occasional crashes of the Linux partition. What we've seen is that during the live migration, for specific configurations with large amounts of memory, slow network links, and workloads that are changing memory a lot, the partition can end up being suspended for 30 seconds or longer. This resulted in the following scenario: CPU 0 CPU 1 ------------------------------- ---------------------------------- scsi_queue_rq migration_store -> blk_mq_start_request -> rtas_ibm_suspend_me -> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me _______________________________________V | V -> IPI from CPU 1 -> rtas_percpu_suspend_me -> __rtas_suspend_last_cpu -- Linux partition suspended for > 30 seconds -- -> for_each_online_cpu(cpu) plpar_hcall_norets(H_PROD -> scsi_dispatch_cmd -> scsi_times_out -> scsi_abort_command -> queue_delayed_work -> ibmvfc_queuecommand_lck -> ibmvfc_send_event -> ibmvfc_send_crq - returns H_CLOSED <- returns SCSI_MLQUEUE_HOST_BUSY -> __blk_mq_requeue_request -> scmd_eh_abort_handler -> scsi_try_to_abort_cmd - returns SUCCESS -> scsi_queue_insert Normally, the SCMD_STATE_COMPLETE bit would protect against the command completion and the timeout, but that doesn't work here, since we don't check that at all in the SCSI_MLQUEUE_HOST_BUSY path. In this case we end up calling scsi_queue_insert on a request that has already been queued, or possibly even freed, and we crash. The patch below simply increases the default I/O timeout to avoid this race condition. This is also the timeout value that nearly all IBM SAN storage recommends setting as the default value. Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com Signed-off-by: Brian King Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 04b3ac17531db..7865feb8e5e83 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -2891,8 +2891,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev) unsigned long flags = 0; spin_lock_irqsave(shost->host_lock, flags); - if (sdev->type == TYPE_DISK) + if (sdev->type == TYPE_DISK) { sdev->allow_restart = 1; + blk_queue_rq_timeout(sdev->request_queue, 120 * HZ); + } spin_unlock_irqrestore(shost->host_lock, flags); return 0; } -- 2.27.0