Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp2304236pxb; Fri, 8 Oct 2021 05:18:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJySfP60vk4iBziuR4Ecb9SP2nfLM0fe8v0bmJ8wveQz1fvQFXlfhmh3zRs0Z+9YpiArd/mv X-Received: by 2002:a05:6402:2712:: with SMTP id y18mr12704793edd.116.1633695530028; Fri, 08 Oct 2021 05:18:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633695530; cv=none; d=google.com; s=arc-20160816; b=mCthcTlwfz1SdlqVZurk2XA1jEQvF2YrPFAEjYWQqYmF3BbWbH5ehhBZpYzRO4PyBf NRlALb43NWdh371qnQPTPdxP2tE5huKIKEHjYOq/I9yTWTCBdLkFLaCrE/y8TO320pme HJfkUQIcQiJJB4pArMY/CnMc2IXEdm0mBaIhNxI2iN9uJSdrBpcSl+hCmtJlJWjTLodx k36jiTBjOimdZd6vim16cwzd57ZeKPj1bDzPymyRFruDaXAzJexrAasYVWhi92ZJcrtN aM7vj+a/EtyvdELfwdTMr3O8HWdg9+jSeD1KqqmCyYOeynTRPF30+Ah7uZBDrCNwYf4X ajpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=YFZLkFBOk+QSsD3SFGiUpDJYLWfY2+dP5iJNzACdF3M=; b=B0fCm65FsHDYDxrl4Z5qSDahW4+e/5TtGGHlCEp6Q1t8MRZP5cpELbrT0mxXkLMHYD Svs1tIs97moJc3yBkZSSrVtAx5i+c6b7wgWqbSEb7icYUqLiWGHcF59Nu2fAbOFbRIOd BSJO7hGNXrTuyKGu2X7jzo7ybxdvplDp3hlGSqxr+CNs6wfALm+DWsEetxC2W8tr8WHB IsKSmqdLZBKbIj5u71mffWoljWQ6deFwOou+wN1WIffEd0H8mwsN3YcBI8Sk2/5di88R 3antLoSPaZmOJc6hpFjD+bvxhYaF6ZxizbqY+6kvTqPBiE06+4wQnElB63aWNi8egVBn xeXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jo24si3298247ejb.197.2021.10.08.05.18.21; Fri, 08 Oct 2021 05:18:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241077AbhJHMRZ (ORCPT + 99 others); Fri, 8 Oct 2021 08:17:25 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:47644 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240615AbhJHMRY (ORCPT ); Fri, 8 Oct 2021 08:17:24 -0400 Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: bbrezillon) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 219831F458C3; Fri, 8 Oct 2021 13:15:28 +0100 (BST) Date: Fri, 8 Oct 2021 14:15:24 +0200 From: Boris Brezillon To: Sean Nyekjaer Cc: Miquel Raynal , Richard Weinberger , Vignesh Raghavendra , Boris Brezillon , linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend Message-ID: <20211008141524.20ca8219@collabora.com> In-Reply-To: <20211008115413.cbkdxv3mpmmkyvjx@skn-laptop> References: <20211005070930.epgxb5qzumk4awxq@skn-laptop> <20211005102300.5da6d480@collabora.com> <20211005084938.jcbw24umhehoiirs@skn-laptop> <20211005105836.6c300f25@collabora.com> <20211007114351.3nafhtpefezxhanc@skn-laptop> <20211007141858.314533f2@collabora.com> <20211007123916.w4oaooxfbawe6yw3@skn-laptop> <20211007151426.54db0764@collabora.com> <20211008100425.uudzlda2n5ojqjzc@skn-laptop> <20211008132038.77231e2a@collabora.com> <20211008115413.cbkdxv3mpmmkyvjx@skn-laptop> Organization: Collabora X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 8 Oct 2021 13:54:13 +0200 Sean Nyekjaer wrote: > On Fri, Oct 08, 2021 at 01:20:38PM +0200, Boris Brezillon wrote: > > Hi Sean, > > > > On Fri, 8 Oct 2021 12:04:25 +0200 > > Sean Nyekjaer wrote: > > > > > On Thu, Oct 07, 2021 at 03:14:26PM +0200, Boris Brezillon wrote: > > > > On Thu, 7 Oct 2021 14:39:16 +0200 > > > > Sean Nyekjaer wrote: > > > > > > > > > > > > > > > > wait_queue doesn't really describe what this waitqueue is used for > > > > > > (maybe resume_wq), and the suspended state should be here as well > > > > > > (actually, there's one already). > > > > > > > > > > I'll rename to something meaningful. > > > > > > > > > > > > Actually, what we need is a way to prevent the device from being > > > > > > suspended while accesses are still in progress, and new accesses from > > > > > > being queued if a suspend is pending. So, I think you need a readwrite > > > > > > lock here: > > > > > > > > > > > > * take the lock in read mode for all IO accesses, check the > > > > > > mtd->suspended value > > > > > > - if true, release the lock, and wait (retry on wakeup) > > > > > > - if false, just do the IO > > > > > > > > > > > > * take the lock in write mode when you want to suspend/resume the > > > > > > device and update the suspended field. Call wake_up_all() in the > > > > > > resume path > > > > > > > > > > Could we use the chip->lock mutex for this? It's does kinda what you > > > > > described above? > > > > > > > > No you can't. Remember I suggested to move all of that logic to > > > > mtdcore.c, which doesn't know about the nand_chip struct. > > > > > > > > > If we introduce a new lock, do we really need to have the suspended as > > > > > an atomic? > > > > > > > > Nope, I thought we could do without a lock, but we actually need to > > > > track active IO requests, not just the suspended state. > > > > > > I have only added wait_queue to read and write operations. > > > > It's still racy (see below). > > > > > I'll have a look into where we should add further checks. > > > > > > > > > > > > > > > > > I will test with some wait and retry added to nand_get_device(). > > > > > > > > Again, I think there's a misunderstanding here: if you move it to the > > > > mtd layer, it can't be done in nand_get_device(). But once you've > > > > implemented it in mtdcore.c, you should be able to get rid of the > > > > nand_chip->suspended field. > > > > > > I have moved the suspended atomic and wake_queue to mtdcore.c. > > > > That doesn't work (see below). > > > > > And kept > > > the suspended variable in nand_base as is fine for chip level suspend > > > status. > > > > Why? If you handle that at the MTD level you shouldn't need it at the > > NAND level? BTW, would you please care to detail your reasoning when > > you say you did or didn't do something. It's a bit hard to guess what > > led you to this conclusion... > > > > > > > > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c > > > index c8fd7f758938..6492071eb4da 100644 > > > --- a/drivers/mtd/mtdcore.c > > > +++ b/drivers/mtd/mtdcore.c > > > @@ -42,15 +42,24 @@ static int mtd_cls_suspend(struct device *dev) > > > { > > > struct mtd_info *mtd = dev_get_drvdata(dev); > > > > > > - return mtd ? mtd_suspend(mtd) : 0; > > > + if (mtd) { > > > + atomic_inc(&mtd->suspended); > > > + return mtd_suspend(mtd); > > > + } > > > + + return 0; > > > } > > > > > > static int mtd_cls_resume(struct device *dev) > > > { > > > struct mtd_info *mtd = dev_get_drvdata(dev); > > > > > > - if (mtd) > > > + if (mtd) { > > > mtd_resume(mtd); > > > + atomic_dec(&mtd->suspended); > > > + wake_up_all(&mtd->resume_wq); > > > + } > > > + > > > return 0; > > > } > > > @@ -678,6 +687,10 @@ int add_mtd_device(struct mtd_info *mtd) > > > if (error) > > > goto fail_nvmem_add; > > > > > > + init_waitqueue_head(&mtd->resume_wq); > > > + > > > + atomic_set(&mtd->suspended, 0); > > > + > > > mtd_debugfs_populate(mtd); > > > > > > device_create(&mtd_class, mtd->dev.parent, MTD_DEVT(i) + 1, NULL, > > > @@ -1558,6 +1571,8 @@ int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops) > > > struct mtd_ecc_stats old_stats = master->ecc_stats; > > > int ret_code; > > > > > > + wait_event(mtd->resume_wq, atomic_read(&mtd->suspended) == 0); > > > > That's racy: > > > > thread A thread B > > | > > enters mtd_read() | > > passes the !suspended test | > > | enters mtd_suspend() > > | sets suspended to 1 > > | > > starts the IO | > > | suspends the device > > tries to finish the IO | > > on a suspended device | > > > > BOOM! > > > > > > Using an atomic doesn't solve any of that, you really need to make sure > > nothing tries to communicate with the device while you're suspending > > it, hence the suggestion to use a rw_semaphore to protect against that. > > > > > + > > > ops->retlen = ops->oobretlen = 0; > > > > > > ret_code = mtd_check_oob_ops(mtd, from, ops); > > > @@ -1597,6 +1612,8 @@ int mtd_write_oob(struct mtd_info *mtd, loff_t to, > > > struct mtd_info *master = mtd_get_master(mtd); > > > int ret; > > > > > > + wait_event(mtd->resume_wq, atomic_read(&mtd->suspended) == 0); > > > + > > > > Please don't open-code this in every IO path, add helpers hiding all the > > complexity. > > > > To sum-up, that's more or less what I add in mind: > > > > static void mtd_start_access(struct mtd_info *mtd) > > { > > /* > > * Don't take the suspend_lock on devices that don't > > * implement the suspend hook. Otherwise, lockdep will > > * complain about nested locks when trying to suspend MTD > > * partitions or MTD devices created by gluebi which are > > * backed by real devices. > > */ > > if (!mtd->_suspend) > > return; > > > > /* > > * Wait until the device is resumed. Should we have a > > * non-blocking mode here? > > */ > > while (1) { > > down_read(&mtd->suspend_lock); > > if (!mtd->suspended) > > return; > > > > up_read(&mtd->suspend_lock); > > wait_event(mtd->resume_wq, mtd->suspended == false); > > } > > } > > > > static void mtd_end_access(struct mtd_info *mtd) > > { > > if (!mtd->_suspend) > > return; > > > > up_read(&mtd->suspend_lock); > > } > > > > static void mtd_suspend(struct mtd_info *mtd) > > { > > int ret; > > > > if (!mtd->_suspend) > > return; > > > > down_write(&mtd->suspend_lock); > > if (mtd->suspended == false) { > > ret = mtd->_suspend(mtd); > > if (!ret) > > mtd->suspended = true; > > } > > up_write(&mtd->suspend_lock); > > } > > > > static void mtd_resume(struct mtd_info *mtd) > > { > > if (!mtd->_suspend) > > return; > > > > down_write(&mtd->suspend_lock); > > if (mtd->suspended) { > > if (mtd->_resume) > > mtd->_resume(mtd); > > > > mtd->suspended = false; > > > > /* The MTD dev has been resumed, wake up all waiters. */ > > wake_up_all(&mtd->resume_wq) > > } > > up_write(&mtd->suspend_lock); > > } > > > > You then need to call mtd_{start,end}_access() in all MTD IO path > > (read/write/erase and maybe others too). > > Looks cool. > > But you are introducing a new lock that basically does the > same as chip->lock in nand_base.c one level above ;) It doesn't serve the same purpose, no. This one is making sure suspend can't happen while IOs are in-flight, and IOs can't happen while the device is being suspended. The nand_chip->lock serializes all IO going through a chip (the new mtd->suspend_lock doesn't guarantee that). This being said, once you have this, you should be able to get rid of the nand_chip->suspended field. > You wrote that we didn't want to introduce a new lock :) Again, that's not what I said. I said using a lock to wait on devices going out of suspend was a bad idea, because then the lock is held when you enter suspend, and only released when the device gets resumed. That's quite a big/unbounded critical section, and we try to avoid that in general (ideally locks should be taken/released in the same function). What I do here is quite different, see how the mtd->suspend_lock is released before calling wait_event().