Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1493930rwb; Fri, 23 Sep 2022 13:34:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5qznOyWFOjEzPkcyKjKr0358AP/fyUpOV3s5U0pUrtN/UOu4Vdr3+nzEniRaewGdJ9S8we X-Received: by 2002:aa7:da08:0:b0:456:ea2b:3ce3 with SMTP id r8-20020aa7da08000000b00456ea2b3ce3mr1192966eds.181.1663965280561; Fri, 23 Sep 2022 13:34:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663965280; cv=none; d=google.com; s=arc-20160816; b=vJ99Poffzi0ZOolIJ1G+raGHQDx2rgARyLeeECuKnfa+dWFI/2BHaoYyw6qlwwU0tn MMA35n/vxno6p4Jev4rerMQzs+07y2rZKoxiLlEuaYAklczpi9iiPQLbED5aUUVD19o3 U6mDaPC/kjxCeVgPJzXwO4A7ks/qaWk+0wmBWBxAn6MS3Bl8e9ohm3qRN+4i/rx15rxm kvBBwKnX7fPO8cNr0ATwcerDDxgks9HoLEU1iUa7shPE+NjK7vFo23CP6DaV8jtZZV2b 6D7NGm6uounoVwsyfHRq+7VieqPLfb3d5FgHJn375eVhiyHJrbZT/1GPBd4HLxrUV8GA 4ouw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=IS4nNAzBPF0uuTNWg/cnPlHg+o/lu3+gLEaK/JvSxp4=; b=ddUioHbyKuWD5pMmI9Ety5byW3YskkdA5hqxv3u6a8Ck3AVvDXYoY2p1tt5+FGusSd O6MGx8hHD7wZYAEEG284/pvShnrhq28VXA77OfTeRXHT3Am9OiLlpgKqKwDqZVqG67wH bBHeHYiTDlEg4XDUWzfsixZAZciAjyc92AaSDPbNVbljivRMUNugo6aTpIQhKpLN6M6R FNctUoSA2/S5mHCNRt/TM6wW90wPN0uk8AdKcobAFB/IR+xgAOHHYBrk8odpNEUpsyoJ uj6Dc62WZon+XSWIcx9DusCxUCMPPjTn4PEWiE1I30akBbbS7c8ZotN8cAg72AEH4ECv +XCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=i6GDjkXT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hw19-20020a170907a0d300b0077b7d7d5838si6133959ejc.293.2022.09.23.13.34.09; Fri, 23 Sep 2022 13:34:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=i6GDjkXT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232653AbiIWTbg (ORCPT + 99 others); Fri, 23 Sep 2022 15:31:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229608AbiIWTbe (ORCPT ); Fri, 23 Sep 2022 15:31:34 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9E0F12DEAE for ; Fri, 23 Sep 2022 12:31:32 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id s26so1132453pgv.7 for ; Fri, 23 Sep 2022 12:31:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date; bh=IS4nNAzBPF0uuTNWg/cnPlHg+o/lu3+gLEaK/JvSxp4=; b=i6GDjkXToyYmLBN0Br10nKQ+6R7LjfZd0vaTb4adICEFiCGXoNDWYvUPmMHo/6xyjn NkXgtTZYnVC5so8hacrWyLRnOUUuANpubWU6NNvhr2Aa/jgGavwy4VhGMXWpaUlqxCre 3xCNO3VG6uKV/Ff4XtpgrItnzSRckZPZ4kcUpW7HQ+y/db912rJHKtfGj0V3DL8wrVan MzqbKL3nOopOquRlGsT9NQ2en64IN+u7KolgLwvOThB+Yk/ZSS5YY7TeHC9TYwYs/ZgX uSCOoCl7EUvb8qtTSBWAH1c96PH0pEATwcQkgZdSoSH9jr/UNVUnz2M6wFh3ewHbFAMK q4zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date; bh=IS4nNAzBPF0uuTNWg/cnPlHg+o/lu3+gLEaK/JvSxp4=; b=1ESTWuUWssc8vwQvKqqSSj4GQD6gVgiVke2P9MCI/KFbo+L8nWAk3HtqZNbKLWCBtJ lAYJGtpkvYuwPiSAnMiGueCLYAMuHYpLlWSuwPAru88JcsjZqPut/YZGX39wwnhkKLgh iHY9g3sl19rJUDclSGoiSywwVJzTw4JQlpvqewpK0pwpY22sVPjDrtU3SMmFFvmdA57q +BhTjw2dwgTy5Tvy6oNs/iLy+li7LqSd5MB+tv1OKWcBTmVqy0IMSbRKXiILDgbKHNHA DVL9mJ2oAyqBxs9pTFvOYTmtLnj8+zOMm/2JurxTMoJVtgbht9veN0c52ynKBV0t3sSv rMvA== X-Gm-Message-State: ACrzQf0F4y9DblGoV1gLJshnOaXFV5T92nC+KOwWxKwvT7MSydzeC29g Mu0Rf5bMTIL2BOmvYMQS5yQ= X-Received: by 2002:a63:5d52:0:b0:439:36bb:c07c with SMTP id o18-20020a635d52000000b0043936bbc07cmr9128555pgm.272.1663961492105; Fri, 23 Sep 2022 12:31:32 -0700 (PDT) Received: from google.com (132.197.125.34.bc.googleusercontent.com. [34.125.197.132]) by smtp.gmail.com with ESMTPSA id nh23-20020a17090b365700b00202fbd9c21dsm1944016pjb.48.2022.09.23.12.31.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 12:31:31 -0700 (PDT) Sender: Minchan Kim Date: Fri, 23 Sep 2022 19:31:30 +0000 From: Minchan Kim To: Brian Geffon Cc: Andrew Morton , Nitin Gupta , Sergey Senozhatsky , linux-kernel@vger.kernel.org, Suleiman Souhlal , Rom Lemarchand , linux-mm@kvack.org Subject: Re: [RESEND RFC] zram: Allow rw_page when page isn't written back. Message-ID: References: <20220908125037.1119114-1-bgeffon@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220908125037.1119114-1-bgeffon@google.com> X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 08, 2022 at 08:50:37AM -0400, Brian Geffon wrote: > Today when a zram device has a backing device we change the ops to > a new set which does not expose a rw_page method. This prevents the > upper layers from trying to issue a synchronous rw. This has the > downside that we penalize every rw even when it could possibly Do you mean addiontal bio alloc/free? Please specify something more detail. > still be performed as a synchronous rw. By the very nature of Even though zram go though the block layer in the case, it's still synchronous operation against on in-memory compressed data. Only asynchrnous IO happens for the data in backing device. > zram all writes are synchronous so it's unfortunate to have to > accept this limitation. > > This change will always expose a rw_page function and if the page > has been written back it will return -EOPNOTSUPP which will force the > upper layers to try again with bio. Sounds a good idea. > > To safely allow a synchronous read to proceed for pages which have not > yet written back we introduce a new flag ZRAM_NO_WB. On the first > synchronous read if the page is not written back we will set the > ZRAM_NO_WB flag. This flag, which is never cleared, prevents writeback > from ever happening to that page. Why do we need a addtional flag? Why couldn't we do? 1. expose the rw_page all the time. 2. If the page was written back, just return an error in rw_page to make upper layer retry it with bio. > > This approach works because in the case of zram as a swap backing device > the page is going to be removed from zram shortly thereafter so > preventing writeback is fine. However, if zram is being used as a > generic block device then this might prevent writeback of the page. > > This proposal is still very much RFC, feedback would be appreciated. > > Signed-off-by: Brian Geffon > --- > drivers/block/zram/zram_drv.c | 68 +++++++++++++++++++++-------------- > drivers/block/zram/zram_drv.h | 1 + > 2 files changed, 43 insertions(+), 26 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index 92cb929a45b7..22b69e8b6042 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -52,9 +52,6 @@ static unsigned int num_devices = 1; > static size_t huge_class_size; > > static const struct block_device_operations zram_devops; > -#ifdef CONFIG_ZRAM_WRITEBACK > -static const struct block_device_operations zram_wb_devops; > -#endif > > static void zram_free_page(struct zram *zram, size_t index); > static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec, > @@ -309,7 +306,8 @@ static void mark_idle(struct zram *zram, ktime_t cutoff) > */ > zram_slot_lock(zram, index); > if (zram_allocated(zram, index) && > - !zram_test_flag(zram, index, ZRAM_UNDER_WB)) { > + !zram_test_flag(zram, index, ZRAM_UNDER_WB) && > + !zram_test_flag(zram, index, ZRAM_NO_WB)) { > #ifdef CONFIG_ZRAM_MEMORY_TRACKING > is_idle = !cutoff || ktime_after(cutoff, zram->table[index].ac_time); > #endif > @@ -439,7 +437,6 @@ static void reset_bdev(struct zram *zram) > filp_close(zram->backing_dev, NULL); > zram->backing_dev = NULL; > zram->bdev = NULL; > - zram->disk->fops = &zram_devops; > kvfree(zram->bitmap); > zram->bitmap = NULL; > } > @@ -543,17 +540,6 @@ static ssize_t backing_dev_store(struct device *dev, > zram->backing_dev = backing_dev; > zram->bitmap = bitmap; > zram->nr_pages = nr_pages; > - /* > - * With writeback feature, zram does asynchronous IO so it's no longer > - * synchronous device so let's remove synchronous io flag. Othewise, > - * upper layer(e.g., swap) could wait IO completion rather than > - * (submit and return), which will cause system sluggish. > - * Furthermore, when the IO function returns(e.g., swap_readpage), > - * upper layer expects IO was done so it could deallocate the page > - * freely but in fact, IO is going on so finally could cause > - * use-after-free when the IO is really done. > - */ > - zram->disk->fops = &zram_wb_devops; > up_write(&zram->init_lock); > > pr_info("setup backing device %s\n", file_name); > @@ -722,7 +708,8 @@ static ssize_t writeback_store(struct device *dev, > > if (zram_test_flag(zram, index, ZRAM_WB) || > zram_test_flag(zram, index, ZRAM_SAME) || > - zram_test_flag(zram, index, ZRAM_UNDER_WB)) > + zram_test_flag(zram, index, ZRAM_UNDER_WB) || > + zram_test_flag(zram, index, ZRAM_NO_WB)) > goto next; > > if (mode & IDLE_WRITEBACK && > @@ -1226,6 +1213,10 @@ static void zram_free_page(struct zram *zram, size_t index) > goto out; > } > > + if (zram_test_flag(zram, index, ZRAM_NO_WB)) { > + zram_clear_flag(zram, index, ZRAM_NO_WB); > + } > + > /* > * No memory is allocated for same element filled pages. > * Simply clear same page flag. > @@ -1654,6 +1645,40 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector, > index = sector >> SECTORS_PER_PAGE_SHIFT; > offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT; > > +#ifdef CONFIG_ZRAM_WRITEBACK > + /* > + * With writeback feature, zram does asynchronous IO so it's no longer > + * synchronous device so let's remove synchronous io flag. Othewise, > + * upper layer(e.g., swap) could wait IO completion rather than > + * (submit and return), which will cause system sluggish. > + * Furthermore, when the IO function returns(e.g., swap_readpage), > + * upper layer expects IO was done so it could deallocate the page > + * freely but in fact, IO is going on so finally could cause > + * use-after-free when the IO is really done. > + * > + * If the page is not currently written back then we may proceed to > + * read the page synchronously, otherwise, we must fail with > + * -EOPNOTSUPP to force the upper layers to use a normal bio. > + */ > + zram_slot_lock(zram, index); > + if (zram_test_flag(zram, index, ZRAM_WB) || > + zram_test_flag(zram, index, ZRAM_UNDER_WB)) { > + zram_slot_unlock(zram, index); > + /* We cannot proceed with synchronous read */ > + return -EOPNOTSUPP; > + } > + > + /* > + * Don't allow the page to be written back while we read it, > + * this flag is never cleared. It shouldn't be a problem that > + * we don't clear this flag because in the case of swap this > + * page will be removed shortly after this read anyway. > + */ > + if (op == REQ_OP_READ) > + zram_set_flag(zram, index, ZRAM_NO_WB); > + zram_slot_unlock(zram, index); > +#endif > + > bv.bv_page = page; > bv.bv_len = PAGE_SIZE; > bv.bv_offset = 0; > @@ -1827,15 +1852,6 @@ static const struct block_device_operations zram_devops = { > .owner = THIS_MODULE > }; > > -#ifdef CONFIG_ZRAM_WRITEBACK > -static const struct block_device_operations zram_wb_devops = { > - .open = zram_open, > - .submit_bio = zram_submit_bio, > - .swap_slot_free_notify = zram_slot_free_notify, > - .owner = THIS_MODULE > -}; > -#endif > - > static DEVICE_ATTR_WO(compact); > static DEVICE_ATTR_RW(disksize); > static DEVICE_ATTR_RO(initstate); > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > index 158c91e54850..20e4c6a579e0 100644 > --- a/drivers/block/zram/zram_drv.h > +++ b/drivers/block/zram/zram_drv.h > @@ -50,6 +50,7 @@ enum zram_pageflags { > ZRAM_UNDER_WB, /* page is under writeback */ > ZRAM_HUGE, /* Incompressible page */ > ZRAM_IDLE, /* not accessed page since last idle marking */ > + ZRAM_NO_WB, /* Do not allow page to be written back */ > > __NR_ZRAM_PAGEFLAGS, > }; > -- > 2.37.2.789.g6183377224-goog > >