Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758374AbcCVIUR (ORCPT ); Tue, 22 Mar 2016 04:20:17 -0400 Received: from mail-oi0-f45.google.com ([209.85.218.45]:33075 "EHLO mail-oi0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755926AbcCVIUK (ORCPT ); Tue, 22 Mar 2016 04:20:10 -0400 MIME-Version: 1.0 In-Reply-To: <20160322080005.GC30070@bbox> References: <1458287911-24806-1-git-send-email-minchan@kernel.org> <20160322050859.GB31955@js1304-P5Q-DELUXE> <20160322080005.GC30070@bbox> Date: Tue, 22 Mar 2016 17:20:08 +0900 Message-ID: Subject: Re: [PATCH] zram: revive swap_slot_free_notify From: Joonsoo Kim To: Minchan Kim Cc: Joonsoo Kim , Andrew Morton , LKML , Sergey Senozhatsky , karam.lee@lge.com, sangseok.lee@lge.com, chan.jeong@lge.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7157 Lines: 175 2016-03-22 17:00 GMT+09:00 Minchan Kim : > On Tue, Mar 22, 2016 at 02:08:59PM +0900, Joonsoo Kim wrote: >> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote: >> > "remove compressed copy from zram in-memory" >> > applied swap_slot_free_notify call in *end_swap_bio_read* to >> > remove duplicated memory between zram and memory. >> > >> > However, with introducing rw_page in zram <8c7f01025f7b> >> > "zram: implement rw_page operation of zram", it became void >> > because rw_page doesn't need bio. >> > >> > This patch restores the function for rw_page. >> > >> > Signed-off-by: Minchan Kim >> > --- >> > mm/page_io.c | 93 ++++++++++++++++++++++++++++++++---------------------------- >> > 1 file changed, 50 insertions(+), 43 deletions(-) >> > >> > diff --git a/mm/page_io.c b/mm/page_io.c >> > index ff74e512f029..18aac7819cc9 100644 >> > --- a/mm/page_io.c >> > +++ b/mm/page_io.c >> > @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio) >> > bio_put(bio); >> > } >> > >> > +static void swap_slot_free_notify(struct page *page) >> > +{ >> > + struct swap_info_struct *sis; >> > + struct gendisk *disk; >> > + >> > + /* >> > + * There is no guarantee that the page is in swap cache - the software >> > + * suspend code (at least) uses end_swap_bio_read() against a non- >> > + * swapcache page. So we must check PG_swapcache before proceeding with >> > + * this optimization. >> > + */ >> > + if (unlikely(!PageSwapCache(page))) >> > + return; >> > + >> > + sis = page_swap_info(page); >> > + if (!(sis->flags & SWP_BLKDEV)) >> > + return; >> > + >> > + /* >> > + * The swap subsystem performs lazy swap slot freeing, >> > + * expecting that the page will be swapped out again. >> > + * So we can avoid an unnecessary write if the page >> > + * isn't redirtied. >> > + * This is good for real swap storage because we can >> > + * reduce unnecessary I/O and enhance wear-leveling >> > + * if an SSD is used as the as swap device. >> > + * But if in-memory swap device (eg zram) is used, >> > + * this causes a duplicated copy between uncompressed >> > + * data in VM-owned memory and compressed data in >> > + * zram-owned memory. So let's free zram-owned memory >> > + * and make the VM-owned decompressed page *dirty*, >> > + * so the page should be swapped out somewhere again if >> > + * we again wish to reclaim it. >> > + */ >> > + disk = sis->bdev->bd_disk; >> > + if (disk->fops->swap_slot_free_notify) { >> > + swp_entry_t entry; >> > + unsigned long offset; >> > + >> > + entry.val = page_private(page); >> > + offset = swp_offset(entry); >> > + >> > + SetPageDirty(page); >> > + disk->fops->swap_slot_free_notify(sis->bdev, >> > + offset); >> > + } >> > +} >> > + >> > static void end_swap_bio_read(struct bio *bio) >> > { >> > struct page *page = bio->bi_io_vec[0].bv_page; >> > @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio) >> > } >> > >> > SetPageUptodate(page); >> > - >> > - /* >> > - * There is no guarantee that the page is in swap cache - the software >> > - * suspend code (at least) uses end_swap_bio_read() against a non- >> > - * swapcache page. So we must check PG_swapcache before proceeding with >> > - * this optimization. >> > - */ >> > - if (likely(PageSwapCache(page))) { >> > - struct swap_info_struct *sis; >> > - >> > - sis = page_swap_info(page); >> > - if (sis->flags & SWP_BLKDEV) { >> > - /* >> > - * The swap subsystem performs lazy swap slot freeing, >> > - * expecting that the page will be swapped out again. >> > - * So we can avoid an unnecessary write if the page >> > - * isn't redirtied. >> > - * This is good for real swap storage because we can >> > - * reduce unnecessary I/O and enhance wear-leveling >> > - * if an SSD is used as the as swap device. >> > - * But if in-memory swap device (eg zram) is used, >> > - * this causes a duplicated copy between uncompressed >> > - * data in VM-owned memory and compressed data in >> > - * zram-owned memory. So let's free zram-owned memory >> > - * and make the VM-owned decompressed page *dirty*, >> > - * so the page should be swapped out somewhere again if >> > - * we again wish to reclaim it. >> > - */ >> > - struct gendisk *disk = sis->bdev->bd_disk; >> > - if (disk->fops->swap_slot_free_notify) { >> > - swp_entry_t entry; >> > - unsigned long offset; >> > - >> > - entry.val = page_private(page); >> > - offset = swp_offset(entry); >> > - >> > - SetPageDirty(page); >> > - disk->fops->swap_slot_free_notify(sis->bdev, >> > - offset); >> > - } >> > - } >> > - } >> > - >> > + swap_slot_free_notify(page); >> > out: >> > unlock_page(page); >> > bio_put(bio); >> > @@ -347,6 +353,7 @@ int swap_readpage(struct page *page) >> > >> > ret = bdev_read_page(sis->bdev, swap_page_sector(page), page); >> > if (!ret) { >> > + swap_slot_free_notify(page); >> > count_vm_event(PSWPIN); >> > return 0; >> > } >> >> Hello, > > Hey Joonsoo, > >> >> You need to check PageUpdate() or something because bdev_read_page() >> can be asynchronous. > > I considered it but decided not to add the check :(. > Because I couldn't justify what benfit we can have with the check. > The swap_slot_free_notify is tightly coupled with zram for several > years and zram have been worked synchronously. So if bdev_read_page > returns 0, it means we already have read the page successfully. > Even, when I looked up other rw_page user, it seems there is no async > rw_page users at the moment. Yes, I also looked up other rw_page users and found that there is no async rw_page now. > If there is someone want to use *async* rw_page && *swap_slot_free_noity* > in future, we could add the check easily. But I hope anyone never use > swap_slot_free_notify any more which is mess. :( But, I think that we should add the check. If someone want it, how does he/she know about it? Even, if someone makes zram to read/write asynchronously, we can miss it easily. This is error-prone practice. >> >> BTW, something like as swap_slot_free_notify() which invalidate >> backend of storage can also be possible for frontswap when >> frontswap_load() succeed. Isn't it? > > frontswap_tmem_exclusive_gets_enabled? Wow... yes. that's what I try to find. Do you know the reason why zswap doesn't enable it? Thanks.