Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp3415295pxb; Mon, 4 Oct 2021 01:44:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyiY0Q/kKRKKUu8y2vMmcGwljEMRdzvpBHCWoTpfKDtdQASf6XdBmgrsBUVthvQuF3RcoNV X-Received: by 2002:a17:906:7ac4:: with SMTP id k4mr16364043ejo.430.1633337054700; Mon, 04 Oct 2021 01:44:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633337054; cv=none; d=google.com; s=arc-20160816; b=BRPtshMIDx+NYsXVlwJHbgWaPX7gyNOEI7cexYByr8O9MIU6whHV3BbVmXqCAwg3s9 Wljy+blhNbV8+rE8prbOh61kEfkZf2HAE11ZGkMw49bEjukxeQNNAblbPcQi0NVVjRVM TVE+qkbLoMfrVj/uIGfMdWtRXLwVG1PloSMj8oWnTBW/hHCQ+RJ9colzjfpJk3jhB+Cn o9B2koNoLerCgaLwCg4xxduE6krLyk+UgA8U0aeDROwFaESVkv8TE6gdgJqHsqNUKH4L DOKa0RUhHDsoDYY3oxRMNepWKaQjFfOIDqCB1ZGxMbsQgwk+9o+jyHwO/jjtUp4vEIqb iiIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=aqcXzINWt7ixJ/an9Ph9FfXv7PWXY+H3NoMHLOnNhMI=; b=oNETozLZS+M6U+R3mYqSwk8ntv4memoHCAnvqEkb/vpH+AOilftZyzkTTqacjVeCRl UezR4nr6nyaqiaU0kU5+HTBbCzrZC++xGT0vfkiSrwiyKcHMVUM+9aWQ8UE6hiPCx60l Nbhj4fsVAgNG7KwBsejJIGeUgf3Y0s4ijBcC3HeznYxGSjWbec2iFard4AK1LpcCvcyF dSwpLk1ppIR/0nFW01QvQZZcCwiR+8WcSS04C4DP6WQNXT7z8eSaH+4OegSesh/BCAta ofzGC2lOwoJjPms06DI5qV7tYD20Gl+nFI9EuB9hqQThH0h2v5GMttqse+9E11dWSu7x 7Q5A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hp35si31518522ejc.63.2021.10.04.01.43.51; Mon, 04 Oct 2021 01:44:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230423AbhJDInp (ORCPT + 99 others); Mon, 4 Oct 2021 04:43:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229836AbhJDInm (ORCPT ); Mon, 4 Oct 2021 04:43:42 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49BE6C061745 for ; Mon, 4 Oct 2021 01:41:54 -0700 (PDT) Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: bbrezillon) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 2C20F1F422CA; Mon, 4 Oct 2021 09:41:52 +0100 (BST) Date: Mon, 4 Oct 2021 10:41:47 +0200 From: Boris Brezillon To: Sean Nyekjaer Cc: Miquel Raynal , Richard Weinberger , Vignesh Raghavendra , Boris Brezillon , linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend Message-ID: <20211004104147.579f3b01@collabora.com> In-Reply-To: <20211004065608.3190348-1-sean@geanix.com> References: <20211004065608.3190348-1-sean@geanix.com> Organization: Collabora X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 4 Oct 2021 08:56:09 +0200 Sean Nyekjaer wrote: > This will prevent nand_get_device() from returning -EBUSY. > It will force mtd_write()/mtd_read() to wait for the nand_resume() to unlock > access to the mtd device. > > Then we avoid -EBUSY is returned to ubifsi via mtd_write()/mtd_read(), > that will in turn hard error on every error returened. > We have seen during ubifs tries to call mtd_write before the mtd device > is resumed. I think the problem is here. Why would UBIFS/UBI try to write something to a device that's not resumed yet (or has been suspended already, if you hit this in the suspend path). > > Exec_op[0] speed things up, so we see this race when the device is > resuming. But it's actually "mtd: rawnand: Simplify the locking" that > allows it to return -EBUSY, before that commit it would have waited for > the mtd device to resume. Uh, wait. If nand_resume() was called before any writes/reads this wouldn't happen. IMHO, the problem is not that we return -EBUSY without blocking, the problem is that someone issues a write/read before calling mtd_resume(). > > Tested on a iMX6ULL. > > [0]: > ef347c0cfd61 ("mtd: rawnand: gpmi: Implement exec_op") > > Fixes: 013e6292aaf5 ("mtd: rawnand: Simplify the locking") > Signed-off-by: Sean Nyekjaer > --- > > I did this a RFC as we probably will need to remove the suspended > variable as it's kinda made obsolute by this change. > Should we introduce a new mutex? Or maybe a spin_lock? > > drivers/mtd/nand/raw/nand_base.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c > index 3d6c6e880520..0ea343404cac 100644 > --- a/drivers/mtd/nand/raw/nand_base.c > +++ b/drivers/mtd/nand/raw/nand_base.c > @@ -4567,7 +4567,6 @@ static int nand_suspend(struct mtd_info *mtd) > ret = chip->ops.suspend(chip); > if (!ret) > chip->suspended = 1; > - mutex_unlock(&chip->lock); Hm, I'm not sure keeping the lock when you're in a suspended state is a good idea. It just papers over another bug IMO (see above). > > return ret; > } > @@ -4580,7 +4579,6 @@ static void nand_resume(struct mtd_info *mtd) > { > struct nand_chip *chip = mtd_to_nand(mtd); > > - mutex_lock(&chip->lock); > if (chip->suspended) { > if (chip->ops.resume) > chip->ops.resume(chip);