Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp930521pxb; Wed, 3 Mar 2021 21:29:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJz/vrmnMAD3cvKLUX7s1WP/FhDANnDVBZCZaFjFVNKcFOdxMOLsuRfLm8SdWQAg89hx1RJi X-Received: by 2002:a17:906:2795:: with SMTP id j21mr2360354ejc.283.1614835751824; Wed, 03 Mar 2021 21:29:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614835751; cv=none; d=google.com; s=arc-20160816; b=JYeFDd85e5cwxaIE3ih1WV90hVE0O4lVqrnDXDWYE2JOUZ9RW7kYS9m+pxvua9o/bb Srpi1ha9qmuvy89xlVDO1iRsNWNRrKTCr2fzUKXpP1Qu7meRji94lqL8GvAqtmESq+cI VeIITAHNr7HWWStqnxpxc8D0USHaJApei4QgNkq9dIRjP8fiSymRFpho/o3GB7RenxMU BjMvKpifSZ8lye0kZtZkMr99+xiByYSy1NgKr6ruqjZ/pr2zeYfq1ykWmt94xfHc9vZr mZG226bFVwkhXnKkUlyDc/v0bopJY7h8hky2GixEbBPNnZB1CgNPbU1yTWanMRrQ8lGL zrJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:from:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:date:dkim-signature; bh=gx141MXHrqlrpx6Y6kWEeEhS4c/SaCjVD3BTygN2Cqg=; b=qMr/+e5xgsmG2+46G4cBnpWescNNMgdffvmel/Wic1DfWpPirgXSxWXLgJ65V6orPj edof3SeWYlx8IE9qxGVdLcxyCLuLb54cvXG4DhLXe3siR2XFmbJNVDHLUEZoDAoT4Bji cTcxFULL8QKGG2iQBZMq3VoDYj7+RkawX1+lLgnO5VILwYWYkHqN0wb8dd5v3AdUkbgR uC2apjF2VhdtZHKoKeDnomUiFgNM4GNmCq4rSNdcqi76Lu0fMJBBE9ANz6roZMwF1zZ0 z2kgy1LjYNvqT4sErD1Loe9cwDaFAlDxCrOz2G+hh4zNSaZVLx3bZDq2laEXJe6v5lR1 Rb0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@axis.com header.s=axis-central1 header.b=JES4lhow; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=axis.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l2si19256305ede.232.2021.03.03.21.28.47; Wed, 03 Mar 2021 21:29:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@axis.com header.s=axis-central1 header.b=JES4lhow; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=axis.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1575377AbhCBDz4 (ORCPT + 99 others); Mon, 1 Mar 2021 22:55:56 -0500 Received: from smtp1.axis.com ([195.60.68.17]:45208 "EHLO smtp1.axis.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240247AbhCAWcT (ORCPT ); Mon, 1 Mar 2021 17:32:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=axis.com; q=dns/txt; s=axis-central1; t=1614637937; x=1646173937; h=date:to:cc:subject:message-id:references:mime-version: content-transfer-encoding:in-reply-to:from; bh=gx141MXHrqlrpx6Y6kWEeEhS4c/SaCjVD3BTygN2Cqg=; b=JES4lhowFPXgNt7p2CJSynPmAOJb/eCbiy22Qrf8LQiV9ApWe8RnaJ/E 23s47FHtY+fZa3EUVu7B0VPVfoUc2Kb4nUE0bEZMNN1D+2w3NCz9NO6P8 hBGs0eeJ52PSqDOcu+/Gjg+YmVoRlYTHioN04G6p5I4cXGO1oVCxRkxge 5n6GcxOfmb/TONp+umN3T3ymQFu/WHjFDCRN9rVOqIqUS8Lds4q9dDgXD yaZl9MTlrok07AZvWyW+jl+wCfkcAMHj8cU29ZPYG8IPFckWgpbaEzN4k TzYXBL68nggT6zX2q3GM4RKtVzbq5H5wz0vrP758eGa4k2ilh+22HA3DH g==; Date: Mon, 1 Mar 2021 23:30:08 +0100 To: Adrian Hunter CC: Ulf Hansson , =?iso-8859-1?Q?M=E5rten?= Lindahl , kernel , "linux-mmc@vger.kernel.org" , Linux Kernel Mailing List Subject: Re: [PATCH] mmc: Try power cycling card if command request times out Message-ID: <20210301223008.glrdupzdgfnb2fwg@axis.com> References: <20210216224252.22187-1-marten.lindahl@axis.com> <8a6bf147-d449-d32e-1969-ef9463859b9b@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8a6bf147-d449-d32e-1969-ef9463859b9b@intel.com> User-Agent: NeoMutt/20170113 (1.7.2) From: Marten Lindahl Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Adrian! Thank you for your comments! On Mon, Mar 01, 2021 at 11:40:03AM +0100, Adrian Hunter wrote: > On 1/03/21 10:50 am, Ulf Hansson wrote: > > + Adrian > > > > On Tue, 16 Feb 2021 at 23:43, M?rten Lindahl wrote: > >> > >> Sometimes SD cards that has been run for a long time enters a state > >> where it cannot by itself be recovered, but needs a power cycle to be > >> operational again. Card status analysis has indicated that the card can > >> end up in a state where all external commands are ignored by the card > >> since it is halted by data timeouts. > >> > >> If the card has been heavily used for a long time it can be weared out, > >> and should typically be replaced. But on some tests, it shows that the > >> card can still be functional after a power cycle, but as it requires an > >> operator to do it, the card can remain in a non-operational state for a > >> long time until the problem has been observed by the operator. > >> > >> This patch adds function to power cycle the card in case it does not > >> respond to a command, and then resend the command if the power cycle > >> was successful. This procedure will be tested 1 time before giving up, > >> and resuming host operation as normal. > > > > I assume the context above is all about the ioctl interface? > > > > So, when the card enters this non functional state, have you tried > > just reading a block through the regular I/O interface. Does it > > trigger a power cycle of the card - and then makes it functional > > again? > > > >> > >> Signed-off-by: M?rten Lindahl > >> --- > >> Please note: This might not be the way we want to handle these cases, > >> but at least it lets us start the discussion. In which cases should the > >> mmc framework deal with error messages like ETIMEDOUT, and in which > >> cases should it be handled by userspace? > >> The mmc framework tries to recover a failed block request > >> (mmc_blk_mq_rw_recovery) which may end up in a HW reset of the card. > >> Would it be an idea to act in a similar way when an ioctl times out? > > > > Maybe, it's a good idea to allow the similar reset for ioctls as we do > > for regular I/O requests. My concern with this though, is that we > > might allow user space to trigger a HW resets a bit too easily - and > > that could damage the card. > > > > Did you consider this? > > > >> > >> drivers/mmc/core/block.c | 20 ++++++++++++++++++-- > >> 1 file changed, 18 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > >> index 42e27a298218..d007b2af64d6 100644 > >> --- a/drivers/mmc/core/block.c > >> +++ b/drivers/mmc/core/block.c > >> @@ -976,6 +976,7 @@ static inline void mmc_blk_reset_success(struct mmc_blk_data *md, int type) > >> */ > >> static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) > >> { > >> + int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; > >> struct mmc_queue_req *mq_rq; > >> struct mmc_card *card = mq->card; > >> struct mmc_blk_data *md = mq->blkdata; > >> @@ -983,7 +984,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) > >> bool rpmb_ioctl; > >> u8 **ext_csd; > >> u32 status; > >> - int ret; > >> + int ret, retry = 1; > >> int i; > >> > >> mq_rq = req_to_mmc_queue_req(req); > >> @@ -994,9 +995,24 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) > >> case MMC_DRV_OP_IOCTL_RPMB: > > SD cards do not have RPMB. Did you mean eMMC? > No, you are right. This action should be excluded from 'case MMC_DRV_OP_IOCTL_RPMB'. > > >> idata = mq_rq->drv_op_data; > >> for (i = 0, ret = 0; i < mq_rq->ioc_count; i++) { > >> +cmd_do: > >> ret = __mmc_blk_ioctl_cmd(card, md, idata[i]); > >> - if (ret) > >> + if (ret == -ETIMEDOUT) { > >> + dev_warn(mmc_dev(card->host), > >> + "error %d sending command\n", ret); > >> +cmd_reset: > >> + mmc_blk_reset_success(md, type); > > mmc_blk_reset_success() is called upon success, not failure. The reset will > not be attempted twice in a row, for a given type, without a "success" in > between. > Ok, yes I see. This line and the cmd_reset label should be removed, and if mmc_blk_reset fails we should break, not retry. Kind regards M?rten > >> + if (retry--) { > >> + dev_warn(mmc_dev(card->host), > >> + "power cycling card\n"); > >> + if (mmc_blk_reset > >> + (md, card->host, type)) > >> + goto cmd_reset; > >> + mmc_blk_reset_success(md, type); > >> + goto cmd_do; > >> + } > >> break; > >> + } > >> } > >> /* Always switch back to main area after RPMB access */ > >> if (rpmb_ioctl) > >> -- > >> 2.11.0 > >> > > > > Kind regards > > Uffe > > >