<b430e9d1c6d4> "remove compressed copy from zram in-memory"
applied swap_slot_free_notify call in *end_swap_bio_read* to
remove duplicated memory between zram and memory.
However, with introducing rw_page in zram <8c7f01025f7b>
"zram: implement rw_page operation of zram", it became void
because rw_page doesn't need bio.
This patch restores the function for rw_page.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
1 file changed, 50 insertions(+), 43 deletions(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index ff74e512f029..18aac7819cc9 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
bio_put(bio);
}
+static void swap_slot_free_notify(struct page *page)
+{
+ struct swap_info_struct *sis;
+ struct gendisk *disk;
+
+ /*
+ * There is no guarantee that the page is in swap cache - the software
+ * suspend code (at least) uses end_swap_bio_read() against a non-
+ * swapcache page. So we must check PG_swapcache before proceeding with
+ * this optimization.
+ */
+ if (unlikely(!PageSwapCache(page)))
+ return;
+
+ sis = page_swap_info(page);
+ if (!(sis->flags & SWP_BLKDEV))
+ return;
+
+ /*
+ * The swap subsystem performs lazy swap slot freeing,
+ * expecting that the page will be swapped out again.
+ * So we can avoid an unnecessary write if the page
+ * isn't redirtied.
+ * This is good for real swap storage because we can
+ * reduce unnecessary I/O and enhance wear-leveling
+ * if an SSD is used as the as swap device.
+ * But if in-memory swap device (eg zram) is used,
+ * this causes a duplicated copy between uncompressed
+ * data in VM-owned memory and compressed data in
+ * zram-owned memory. So let's free zram-owned memory
+ * and make the VM-owned decompressed page *dirty*,
+ * so the page should be swapped out somewhere again if
+ * we again wish to reclaim it.
+ */
+ disk = sis->bdev->bd_disk;
+ if (disk->fops->swap_slot_free_notify) {
+ swp_entry_t entry;
+ unsigned long offset;
+
+ entry.val = page_private(page);
+ offset = swp_offset(entry);
+
+ SetPageDirty(page);
+ disk->fops->swap_slot_free_notify(sis->bdev,
+ offset);
+ }
+}
+
static void end_swap_bio_read(struct bio *bio)
{
struct page *page = bio->bi_io_vec[0].bv_page;
@@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
}
SetPageUptodate(page);
-
- /*
- * There is no guarantee that the page is in swap cache - the software
- * suspend code (at least) uses end_swap_bio_read() against a non-
- * swapcache page. So we must check PG_swapcache before proceeding with
- * this optimization.
- */
- if (likely(PageSwapCache(page))) {
- struct swap_info_struct *sis;
-
- sis = page_swap_info(page);
- if (sis->flags & SWP_BLKDEV) {
- /*
- * The swap subsystem performs lazy swap slot freeing,
- * expecting that the page will be swapped out again.
- * So we can avoid an unnecessary write if the page
- * isn't redirtied.
- * This is good for real swap storage because we can
- * reduce unnecessary I/O and enhance wear-leveling
- * if an SSD is used as the as swap device.
- * But if in-memory swap device (eg zram) is used,
- * this causes a duplicated copy between uncompressed
- * data in VM-owned memory and compressed data in
- * zram-owned memory. So let's free zram-owned memory
- * and make the VM-owned decompressed page *dirty*,
- * so the page should be swapped out somewhere again if
- * we again wish to reclaim it.
- */
- struct gendisk *disk = sis->bdev->bd_disk;
- if (disk->fops->swap_slot_free_notify) {
- swp_entry_t entry;
- unsigned long offset;
-
- entry.val = page_private(page);
- offset = swp_offset(entry);
-
- SetPageDirty(page);
- disk->fops->swap_slot_free_notify(sis->bdev,
- offset);
- }
- }
- }
-
+ swap_slot_free_notify(page);
out:
unlock_page(page);
bio_put(bio);
@@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
if (!ret) {
+ swap_slot_free_notify(page);
count_vm_event(PSWPIN);
return 0;
}
--
1.9.1
On Fri, 18 Mar 2016 16:58:31 +0900 Minchan Kim <[email protected]> wrote:
> <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> applied swap_slot_free_notify call in *end_swap_bio_read* to
> remove duplicated memory between zram and memory.
>
> However, with introducing rw_page in zram <8c7f01025f7b>
> "zram: implement rw_page operation of zram", it became void
> because rw_page doesn't need bio.
>
> This patch restores the function for rw_page.
This is a bit mysterious. What is the actual runtime effect of the
patch? I assume that 8c7f01025f7b caused duplication of memory and
that the only problem is additional resource consumption? If so, what
are the observable effects? etcetera, please.
Hello Andrew,
On Mon, Mar 21, 2016 at 01:30:17PM -0700, Andrew Morton wrote:
> On Fri, 18 Mar 2016 16:58:31 +0900 Minchan Kim <[email protected]> wrote:
>
> > <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> > applied swap_slot_free_notify call in *end_swap_bio_read* to
> > remove duplicated memory between zram and memory.
> >
> > However, with introducing rw_page in zram <8c7f01025f7b>
> > "zram: implement rw_page operation of zram", it became void
> > because rw_page doesn't need bio.
> >
> > This patch restores the function for rw_page.
>
> This is a bit mysterious. What is the actual runtime effect of the
> patch? I assume that 8c7f01025f7b caused duplication of memory and
> that the only problem is additional resource consumption? If so, what
> are the observable effects? etcetera, please.
>
As you already pointed out, only problem is memory duplication.
In b430e9d1c6d4 description, I wrote down how it saves memory(e.g,
about 19M in kernel build workload).
Memory footprint is really important in embedded platforms which
have small memory, for example, 512M) recently because it could
start to kill processes if memory footprint exceeds some threshold
by LMK or some similar memory management modules.
On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
> <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> applied swap_slot_free_notify call in *end_swap_bio_read* to
> remove duplicated memory between zram and memory.
>
> However, with introducing rw_page in zram <8c7f01025f7b>
> "zram: implement rw_page operation of zram", it became void
> because rw_page doesn't need bio.
>
> This patch restores the function for rw_page.
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
> 1 file changed, 50 insertions(+), 43 deletions(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index ff74e512f029..18aac7819cc9 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
> bio_put(bio);
> }
>
> +static void swap_slot_free_notify(struct page *page)
> +{
> + struct swap_info_struct *sis;
> + struct gendisk *disk;
> +
> + /*
> + * There is no guarantee that the page is in swap cache - the software
> + * suspend code (at least) uses end_swap_bio_read() against a non-
> + * swapcache page. So we must check PG_swapcache before proceeding with
> + * this optimization.
> + */
> + if (unlikely(!PageSwapCache(page)))
> + return;
> +
> + sis = page_swap_info(page);
> + if (!(sis->flags & SWP_BLKDEV))
> + return;
> +
> + /*
> + * The swap subsystem performs lazy swap slot freeing,
> + * expecting that the page will be swapped out again.
> + * So we can avoid an unnecessary write if the page
> + * isn't redirtied.
> + * This is good for real swap storage because we can
> + * reduce unnecessary I/O and enhance wear-leveling
> + * if an SSD is used as the as swap device.
> + * But if in-memory swap device (eg zram) is used,
> + * this causes a duplicated copy between uncompressed
> + * data in VM-owned memory and compressed data in
> + * zram-owned memory. So let's free zram-owned memory
> + * and make the VM-owned decompressed page *dirty*,
> + * so the page should be swapped out somewhere again if
> + * we again wish to reclaim it.
> + */
> + disk = sis->bdev->bd_disk;
> + if (disk->fops->swap_slot_free_notify) {
> + swp_entry_t entry;
> + unsigned long offset;
> +
> + entry.val = page_private(page);
> + offset = swp_offset(entry);
> +
> + SetPageDirty(page);
> + disk->fops->swap_slot_free_notify(sis->bdev,
> + offset);
> + }
> +}
> +
> static void end_swap_bio_read(struct bio *bio)
> {
> struct page *page = bio->bi_io_vec[0].bv_page;
> @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
> }
>
> SetPageUptodate(page);
> -
> - /*
> - * There is no guarantee that the page is in swap cache - the software
> - * suspend code (at least) uses end_swap_bio_read() against a non-
> - * swapcache page. So we must check PG_swapcache before proceeding with
> - * this optimization.
> - */
> - if (likely(PageSwapCache(page))) {
> - struct swap_info_struct *sis;
> -
> - sis = page_swap_info(page);
> - if (sis->flags & SWP_BLKDEV) {
> - /*
> - * The swap subsystem performs lazy swap slot freeing,
> - * expecting that the page will be swapped out again.
> - * So we can avoid an unnecessary write if the page
> - * isn't redirtied.
> - * This is good for real swap storage because we can
> - * reduce unnecessary I/O and enhance wear-leveling
> - * if an SSD is used as the as swap device.
> - * But if in-memory swap device (eg zram) is used,
> - * this causes a duplicated copy between uncompressed
> - * data in VM-owned memory and compressed data in
> - * zram-owned memory. So let's free zram-owned memory
> - * and make the VM-owned decompressed page *dirty*,
> - * so the page should be swapped out somewhere again if
> - * we again wish to reclaim it.
> - */
> - struct gendisk *disk = sis->bdev->bd_disk;
> - if (disk->fops->swap_slot_free_notify) {
> - swp_entry_t entry;
> - unsigned long offset;
> -
> - entry.val = page_private(page);
> - offset = swp_offset(entry);
> -
> - SetPageDirty(page);
> - disk->fops->swap_slot_free_notify(sis->bdev,
> - offset);
> - }
> - }
> - }
> -
> + swap_slot_free_notify(page);
> out:
> unlock_page(page);
> bio_put(bio);
> @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
>
> ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> if (!ret) {
> + swap_slot_free_notify(page);
> count_vm_event(PSWPIN);
> return 0;
> }
Hello,
You need to check PageUpdate() or something because bdev_read_page()
can be asynchronous.
BTW, something like as swap_slot_free_notify() which invalidate
backend of storage can also be possible for frontswap when
frontswap_load() succeed. Isn't it?
Thanks.
On Tue, Mar 22, 2016 at 02:08:59PM +0900, Joonsoo Kim wrote:
> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
> > <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> > applied swap_slot_free_notify call in *end_swap_bio_read* to
> > remove duplicated memory between zram and memory.
> >
> > However, with introducing rw_page in zram <8c7f01025f7b>
> > "zram: implement rw_page operation of zram", it became void
> > because rw_page doesn't need bio.
> >
> > This patch restores the function for rw_page.
> >
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
> > 1 file changed, 50 insertions(+), 43 deletions(-)
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index ff74e512f029..18aac7819cc9 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
> > bio_put(bio);
> > }
> >
> > +static void swap_slot_free_notify(struct page *page)
> > +{
> > + struct swap_info_struct *sis;
> > + struct gendisk *disk;
> > +
> > + /*
> > + * There is no guarantee that the page is in swap cache - the software
> > + * suspend code (at least) uses end_swap_bio_read() against a non-
> > + * swapcache page. So we must check PG_swapcache before proceeding with
> > + * this optimization.
> > + */
> > + if (unlikely(!PageSwapCache(page)))
> > + return;
> > +
> > + sis = page_swap_info(page);
> > + if (!(sis->flags & SWP_BLKDEV))
> > + return;
> > +
> > + /*
> > + * The swap subsystem performs lazy swap slot freeing,
> > + * expecting that the page will be swapped out again.
> > + * So we can avoid an unnecessary write if the page
> > + * isn't redirtied.
> > + * This is good for real swap storage because we can
> > + * reduce unnecessary I/O and enhance wear-leveling
> > + * if an SSD is used as the as swap device.
> > + * But if in-memory swap device (eg zram) is used,
> > + * this causes a duplicated copy between uncompressed
> > + * data in VM-owned memory and compressed data in
> > + * zram-owned memory. So let's free zram-owned memory
> > + * and make the VM-owned decompressed page *dirty*,
> > + * so the page should be swapped out somewhere again if
> > + * we again wish to reclaim it.
> > + */
> > + disk = sis->bdev->bd_disk;
> > + if (disk->fops->swap_slot_free_notify) {
> > + swp_entry_t entry;
> > + unsigned long offset;
> > +
> > + entry.val = page_private(page);
> > + offset = swp_offset(entry);
> > +
> > + SetPageDirty(page);
> > + disk->fops->swap_slot_free_notify(sis->bdev,
> > + offset);
> > + }
> > +}
> > +
> > static void end_swap_bio_read(struct bio *bio)
> > {
> > struct page *page = bio->bi_io_vec[0].bv_page;
> > @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
> > }
> >
> > SetPageUptodate(page);
> > -
> > - /*
> > - * There is no guarantee that the page is in swap cache - the software
> > - * suspend code (at least) uses end_swap_bio_read() against a non-
> > - * swapcache page. So we must check PG_swapcache before proceeding with
> > - * this optimization.
> > - */
> > - if (likely(PageSwapCache(page))) {
> > - struct swap_info_struct *sis;
> > -
> > - sis = page_swap_info(page);
> > - if (sis->flags & SWP_BLKDEV) {
> > - /*
> > - * The swap subsystem performs lazy swap slot freeing,
> > - * expecting that the page will be swapped out again.
> > - * So we can avoid an unnecessary write if the page
> > - * isn't redirtied.
> > - * This is good for real swap storage because we can
> > - * reduce unnecessary I/O and enhance wear-leveling
> > - * if an SSD is used as the as swap device.
> > - * But if in-memory swap device (eg zram) is used,
> > - * this causes a duplicated copy between uncompressed
> > - * data in VM-owned memory and compressed data in
> > - * zram-owned memory. So let's free zram-owned memory
> > - * and make the VM-owned decompressed page *dirty*,
> > - * so the page should be swapped out somewhere again if
> > - * we again wish to reclaim it.
> > - */
> > - struct gendisk *disk = sis->bdev->bd_disk;
> > - if (disk->fops->swap_slot_free_notify) {
> > - swp_entry_t entry;
> > - unsigned long offset;
> > -
> > - entry.val = page_private(page);
> > - offset = swp_offset(entry);
> > -
> > - SetPageDirty(page);
> > - disk->fops->swap_slot_free_notify(sis->bdev,
> > - offset);
> > - }
> > - }
> > - }
> > -
> > + swap_slot_free_notify(page);
> > out:
> > unlock_page(page);
> > bio_put(bio);
> > @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
> >
> > ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> > if (!ret) {
> > + swap_slot_free_notify(page);
> > count_vm_event(PSWPIN);
> > return 0;
> > }
>
> Hello,
Hey Joonsoo,
>
> You need to check PageUpdate() or something because bdev_read_page()
> can be asynchronous.
I considered it but decided not to add the check :(.
Because I couldn't justify what benfit we can have with the check.
The swap_slot_free_notify is tightly coupled with zram for several
years and zram have been worked synchronously. So if bdev_read_page
returns 0, it means we already have read the page successfully.
Even, when I looked up other rw_page user, it seems there is no async
rw_page users at the moment.
If there is someone want to use *async* rw_page && *swap_slot_free_noity*
in future, we could add the check easily. But I hope anyone never use
swap_slot_free_notify any more which is mess. :(
>
> BTW, something like as swap_slot_free_notify() which invalidate
> backend of storage can also be possible for frontswap when
> frontswap_load() succeed. Isn't it?
frontswap_tmem_exclusive_gets_enabled?
>
> Thanks.
2016-03-22 17:00 GMT+09:00 Minchan Kim <[email protected]>:
> On Tue, Mar 22, 2016 at 02:08:59PM +0900, Joonsoo Kim wrote:
>> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
>> > <b430e9d1c6d4> "remove compressed copy from zram in-memory"
>> > applied swap_slot_free_notify call in *end_swap_bio_read* to
>> > remove duplicated memory between zram and memory.
>> >
>> > However, with introducing rw_page in zram <8c7f01025f7b>
>> > "zram: implement rw_page operation of zram", it became void
>> > because rw_page doesn't need bio.
>> >
>> > This patch restores the function for rw_page.
>> >
>> > Signed-off-by: Minchan Kim <[email protected]>
>> > ---
>> > mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
>> > 1 file changed, 50 insertions(+), 43 deletions(-)
>> >
>> > diff --git a/mm/page_io.c b/mm/page_io.c
>> > index ff74e512f029..18aac7819cc9 100644
>> > --- a/mm/page_io.c
>> > +++ b/mm/page_io.c
>> > @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
>> > bio_put(bio);
>> > }
>> >
>> > +static void swap_slot_free_notify(struct page *page)
>> > +{
>> > + struct swap_info_struct *sis;
>> > + struct gendisk *disk;
>> > +
>> > + /*
>> > + * There is no guarantee that the page is in swap cache - the software
>> > + * suspend code (at least) uses end_swap_bio_read() against a non-
>> > + * swapcache page. So we must check PG_swapcache before proceeding with
>> > + * this optimization.
>> > + */
>> > + if (unlikely(!PageSwapCache(page)))
>> > + return;
>> > +
>> > + sis = page_swap_info(page);
>> > + if (!(sis->flags & SWP_BLKDEV))
>> > + return;
>> > +
>> > + /*
>> > + * The swap subsystem performs lazy swap slot freeing,
>> > + * expecting that the page will be swapped out again.
>> > + * So we can avoid an unnecessary write if the page
>> > + * isn't redirtied.
>> > + * This is good for real swap storage because we can
>> > + * reduce unnecessary I/O and enhance wear-leveling
>> > + * if an SSD is used as the as swap device.
>> > + * But if in-memory swap device (eg zram) is used,
>> > + * this causes a duplicated copy between uncompressed
>> > + * data in VM-owned memory and compressed data in
>> > + * zram-owned memory. So let's free zram-owned memory
>> > + * and make the VM-owned decompressed page *dirty*,
>> > + * so the page should be swapped out somewhere again if
>> > + * we again wish to reclaim it.
>> > + */
>> > + disk = sis->bdev->bd_disk;
>> > + if (disk->fops->swap_slot_free_notify) {
>> > + swp_entry_t entry;
>> > + unsigned long offset;
>> > +
>> > + entry.val = page_private(page);
>> > + offset = swp_offset(entry);
>> > +
>> > + SetPageDirty(page);
>> > + disk->fops->swap_slot_free_notify(sis->bdev,
>> > + offset);
>> > + }
>> > +}
>> > +
>> > static void end_swap_bio_read(struct bio *bio)
>> > {
>> > struct page *page = bio->bi_io_vec[0].bv_page;
>> > @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
>> > }
>> >
>> > SetPageUptodate(page);
>> > -
>> > - /*
>> > - * There is no guarantee that the page is in swap cache - the software
>> > - * suspend code (at least) uses end_swap_bio_read() against a non-
>> > - * swapcache page. So we must check PG_swapcache before proceeding with
>> > - * this optimization.
>> > - */
>> > - if (likely(PageSwapCache(page))) {
>> > - struct swap_info_struct *sis;
>> > -
>> > - sis = page_swap_info(page);
>> > - if (sis->flags & SWP_BLKDEV) {
>> > - /*
>> > - * The swap subsystem performs lazy swap slot freeing,
>> > - * expecting that the page will be swapped out again.
>> > - * So we can avoid an unnecessary write if the page
>> > - * isn't redirtied.
>> > - * This is good for real swap storage because we can
>> > - * reduce unnecessary I/O and enhance wear-leveling
>> > - * if an SSD is used as the as swap device.
>> > - * But if in-memory swap device (eg zram) is used,
>> > - * this causes a duplicated copy between uncompressed
>> > - * data in VM-owned memory and compressed data in
>> > - * zram-owned memory. So let's free zram-owned memory
>> > - * and make the VM-owned decompressed page *dirty*,
>> > - * so the page should be swapped out somewhere again if
>> > - * we again wish to reclaim it.
>> > - */
>> > - struct gendisk *disk = sis->bdev->bd_disk;
>> > - if (disk->fops->swap_slot_free_notify) {
>> > - swp_entry_t entry;
>> > - unsigned long offset;
>> > -
>> > - entry.val = page_private(page);
>> > - offset = swp_offset(entry);
>> > -
>> > - SetPageDirty(page);
>> > - disk->fops->swap_slot_free_notify(sis->bdev,
>> > - offset);
>> > - }
>> > - }
>> > - }
>> > -
>> > + swap_slot_free_notify(page);
>> > out:
>> > unlock_page(page);
>> > bio_put(bio);
>> > @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
>> >
>> > ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
>> > if (!ret) {
>> > + swap_slot_free_notify(page);
>> > count_vm_event(PSWPIN);
>> > return 0;
>> > }
>>
>> Hello,
>
> Hey Joonsoo,
>
>>
>> You need to check PageUpdate() or something because bdev_read_page()
>> can be asynchronous.
>
> I considered it but decided not to add the check :(.
> Because I couldn't justify what benfit we can have with the check.
> The swap_slot_free_notify is tightly coupled with zram for several
> years and zram have been worked synchronously. So if bdev_read_page
> returns 0, it means we already have read the page successfully.
> Even, when I looked up other rw_page user, it seems there is no async
> rw_page users at the moment.
Yes, I also looked up other rw_page users and found that
there is no async rw_page now.
> If there is someone want to use *async* rw_page && *swap_slot_free_noity*
> in future, we could add the check easily. But I hope anyone never use
> swap_slot_free_notify any more which is mess. :(
But, I think that we should add the check. If someone want it, how does
he/she know about it? Even, if someone makes zram to read/write
asynchronously, we can miss it easily. This is error-prone practice.
>>
>> BTW, something like as swap_slot_free_notify() which invalidate
>> backend of storage can also be possible for frontswap when
>> frontswap_load() succeed. Isn't it?
>
> frontswap_tmem_exclusive_gets_enabled?
Wow... yes. that's what I try to find.
Do you know the reason why zswap doesn't enable it?
Thanks.
On Tue, Mar 22, 2016 at 05:20:08PM +0900, Joonsoo Kim wrote:
> 2016-03-22 17:00 GMT+09:00 Minchan Kim <[email protected]>:
> > On Tue, Mar 22, 2016 at 02:08:59PM +0900, Joonsoo Kim wrote:
> >> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
> >> > <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> >> > applied swap_slot_free_notify call in *end_swap_bio_read* to
> >> > remove duplicated memory between zram and memory.
> >> >
> >> > However, with introducing rw_page in zram <8c7f01025f7b>
> >> > "zram: implement rw_page operation of zram", it became void
> >> > because rw_page doesn't need bio.
> >> >
> >> > This patch restores the function for rw_page.
> >> >
> >> > Signed-off-by: Minchan Kim <[email protected]>
> >> > ---
> >> > mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
> >> > 1 file changed, 50 insertions(+), 43 deletions(-)
> >> >
> >> > diff --git a/mm/page_io.c b/mm/page_io.c
> >> > index ff74e512f029..18aac7819cc9 100644
> >> > --- a/mm/page_io.c
> >> > +++ b/mm/page_io.c
> >> > @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
> >> > bio_put(bio);
> >> > }
> >> >
> >> > +static void swap_slot_free_notify(struct page *page)
> >> > +{
> >> > + struct swap_info_struct *sis;
> >> > + struct gendisk *disk;
> >> > +
> >> > + /*
> >> > + * There is no guarantee that the page is in swap cache - the software
> >> > + * suspend code (at least) uses end_swap_bio_read() against a non-
> >> > + * swapcache page. So we must check PG_swapcache before proceeding with
> >> > + * this optimization.
> >> > + */
> >> > + if (unlikely(!PageSwapCache(page)))
> >> > + return;
> >> > +
> >> > + sis = page_swap_info(page);
> >> > + if (!(sis->flags & SWP_BLKDEV))
> >> > + return;
> >> > +
> >> > + /*
> >> > + * The swap subsystem performs lazy swap slot freeing,
> >> > + * expecting that the page will be swapped out again.
> >> > + * So we can avoid an unnecessary write if the page
> >> > + * isn't redirtied.
> >> > + * This is good for real swap storage because we can
> >> > + * reduce unnecessary I/O and enhance wear-leveling
> >> > + * if an SSD is used as the as swap device.
> >> > + * But if in-memory swap device (eg zram) is used,
> >> > + * this causes a duplicated copy between uncompressed
> >> > + * data in VM-owned memory and compressed data in
> >> > + * zram-owned memory. So let's free zram-owned memory
> >> > + * and make the VM-owned decompressed page *dirty*,
> >> > + * so the page should be swapped out somewhere again if
> >> > + * we again wish to reclaim it.
> >> > + */
> >> > + disk = sis->bdev->bd_disk;
> >> > + if (disk->fops->swap_slot_free_notify) {
> >> > + swp_entry_t entry;
> >> > + unsigned long offset;
> >> > +
> >> > + entry.val = page_private(page);
> >> > + offset = swp_offset(entry);
> >> > +
> >> > + SetPageDirty(page);
> >> > + disk->fops->swap_slot_free_notify(sis->bdev,
> >> > + offset);
> >> > + }
> >> > +}
> >> > +
> >> > static void end_swap_bio_read(struct bio *bio)
> >> > {
> >> > struct page *page = bio->bi_io_vec[0].bv_page;
> >> > @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
> >> > }
> >> >
> >> > SetPageUptodate(page);
> >> > -
> >> > - /*
> >> > - * There is no guarantee that the page is in swap cache - the software
> >> > - * suspend code (at least) uses end_swap_bio_read() against a non-
> >> > - * swapcache page. So we must check PG_swapcache before proceeding with
> >> > - * this optimization.
> >> > - */
> >> > - if (likely(PageSwapCache(page))) {
> >> > - struct swap_info_struct *sis;
> >> > -
> >> > - sis = page_swap_info(page);
> >> > - if (sis->flags & SWP_BLKDEV) {
> >> > - /*
> >> > - * The swap subsystem performs lazy swap slot freeing,
> >> > - * expecting that the page will be swapped out again.
> >> > - * So we can avoid an unnecessary write if the page
> >> > - * isn't redirtied.
> >> > - * This is good for real swap storage because we can
> >> > - * reduce unnecessary I/O and enhance wear-leveling
> >> > - * if an SSD is used as the as swap device.
> >> > - * But if in-memory swap device (eg zram) is used,
> >> > - * this causes a duplicated copy between uncompressed
> >> > - * data in VM-owned memory and compressed data in
> >> > - * zram-owned memory. So let's free zram-owned memory
> >> > - * and make the VM-owned decompressed page *dirty*,
> >> > - * so the page should be swapped out somewhere again if
> >> > - * we again wish to reclaim it.
> >> > - */
> >> > - struct gendisk *disk = sis->bdev->bd_disk;
> >> > - if (disk->fops->swap_slot_free_notify) {
> >> > - swp_entry_t entry;
> >> > - unsigned long offset;
> >> > -
> >> > - entry.val = page_private(page);
> >> > - offset = swp_offset(entry);
> >> > -
> >> > - SetPageDirty(page);
> >> > - disk->fops->swap_slot_free_notify(sis->bdev,
> >> > - offset);
> >> > - }
> >> > - }
> >> > - }
> >> > -
> >> > + swap_slot_free_notify(page);
> >> > out:
> >> > unlock_page(page);
> >> > bio_put(bio);
> >> > @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
> >> >
> >> > ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> >> > if (!ret) {
> >> > + swap_slot_free_notify(page);
> >> > count_vm_event(PSWPIN);
> >> > return 0;
> >> > }
> >>
> >> Hello,
> >
> > Hey Joonsoo,
> >
> >>
> >> You need to check PageUpdate() or something because bdev_read_page()
> >> can be asynchronous.
> >
> > I considered it but decided not to add the check :(.
> > Because I couldn't justify what benfit we can have with the check.
> > The swap_slot_free_notify is tightly coupled with zram for several
> > years and zram have been worked synchronously. So if bdev_read_page
> > returns 0, it means we already have read the page successfully.
> > Even, when I looked up other rw_page user, it seems there is no async
> > rw_page users at the moment.
>
> Yes, I also looked up other rw_page users and found that
> there is no async rw_page now.
>
> > If there is someone want to use *async* rw_page && *swap_slot_free_noity*
> > in future, we could add the check easily. But I hope anyone never use
> > swap_slot_free_notify any more which is mess. :(
>
> But, I think that we should add the check. If someone want it, how does
> he/she know about it? Even, if someone makes zram to read/write
> asynchronously, we can miss it easily. This is error-prone practice.
Okay, I don't have strong against it.
If we really want to catch such case, let's add WARN_ON_ONCE.
diff --git a/mm/page_io.c b/mm/page_io.c
index 18aac7819cc9..6592893d16ca 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -353,6 +353,7 @@ int swap_readpage(struct page *page)
ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
if (!ret) {
+ WARN_ON_ONCE(!PageUptodate(page));
swap_slot_free_notify(page);
count_vm_event(PSWPIN);
return 0;
>
> >>
> >> BTW, something like as swap_slot_free_notify() which invalidate
> >> backend of storage can also be possible for frontswap when
> >> frontswap_load() succeed. Isn't it?
> >
> > frontswap_tmem_exclusive_gets_enabled?
>
> Wow... yes. that's what I try to find.
> Do you know the reason why zswap doesn't enable it?
Hmm, I couldn't remember. Maybe, it's not zswap stuff but frontswap stuff
so I guess zswap user can enable it via frontswap interface if he want.
>
> Thanks.
On Tue, Mar 22, 2016 at 11:06:29PM +0900, Minchan Kim wrote:
> On Tue, Mar 22, 2016 at 05:20:08PM +0900, Joonsoo Kim wrote:
> > 2016-03-22 17:00 GMT+09:00 Minchan Kim <[email protected]>:
> > > On Tue, Mar 22, 2016 at 02:08:59PM +0900, Joonsoo Kim wrote:
> > >> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
> > >> > <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> > >> > applied swap_slot_free_notify call in *end_swap_bio_read* to
> > >> > remove duplicated memory between zram and memory.
> > >> >
> > >> > However, with introducing rw_page in zram <8c7f01025f7b>
> > >> > "zram: implement rw_page operation of zram", it became void
> > >> > because rw_page doesn't need bio.
> > >> >
> > >> > This patch restores the function for rw_page.
> > >> >
> > >> > Signed-off-by: Minchan Kim <[email protected]>
> > >> > ---
> > >> > mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
> > >> > 1 file changed, 50 insertions(+), 43 deletions(-)
> > >> >
> > >> > diff --git a/mm/page_io.c b/mm/page_io.c
> > >> > index ff74e512f029..18aac7819cc9 100644
> > >> > --- a/mm/page_io.c
> > >> > +++ b/mm/page_io.c
> > >> > @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
> > >> > bio_put(bio);
> > >> > }
> > >> >
> > >> > +static void swap_slot_free_notify(struct page *page)
> > >> > +{
> > >> > + struct swap_info_struct *sis;
> > >> > + struct gendisk *disk;
> > >> > +
> > >> > + /*
> > >> > + * There is no guarantee that the page is in swap cache - the software
> > >> > + * suspend code (at least) uses end_swap_bio_read() against a non-
> > >> > + * swapcache page. So we must check PG_swapcache before proceeding with
> > >> > + * this optimization.
> > >> > + */
> > >> > + if (unlikely(!PageSwapCache(page)))
> > >> > + return;
> > >> > +
> > >> > + sis = page_swap_info(page);
> > >> > + if (!(sis->flags & SWP_BLKDEV))
> > >> > + return;
> > >> > +
> > >> > + /*
> > >> > + * The swap subsystem performs lazy swap slot freeing,
> > >> > + * expecting that the page will be swapped out again.
> > >> > + * So we can avoid an unnecessary write if the page
> > >> > + * isn't redirtied.
> > >> > + * This is good for real swap storage because we can
> > >> > + * reduce unnecessary I/O and enhance wear-leveling
> > >> > + * if an SSD is used as the as swap device.
> > >> > + * But if in-memory swap device (eg zram) is used,
> > >> > + * this causes a duplicated copy between uncompressed
> > >> > + * data in VM-owned memory and compressed data in
> > >> > + * zram-owned memory. So let's free zram-owned memory
> > >> > + * and make the VM-owned decompressed page *dirty*,
> > >> > + * so the page should be swapped out somewhere again if
> > >> > + * we again wish to reclaim it.
> > >> > + */
> > >> > + disk = sis->bdev->bd_disk;
> > >> > + if (disk->fops->swap_slot_free_notify) {
> > >> > + swp_entry_t entry;
> > >> > + unsigned long offset;
> > >> > +
> > >> > + entry.val = page_private(page);
> > >> > + offset = swp_offset(entry);
> > >> > +
> > >> > + SetPageDirty(page);
> > >> > + disk->fops->swap_slot_free_notify(sis->bdev,
> > >> > + offset);
> > >> > + }
> > >> > +}
> > >> > +
> > >> > static void end_swap_bio_read(struct bio *bio)
> > >> > {
> > >> > struct page *page = bio->bi_io_vec[0].bv_page;
> > >> > @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
> > >> > }
> > >> >
> > >> > SetPageUptodate(page);
> > >> > -
> > >> > - /*
> > >> > - * There is no guarantee that the page is in swap cache - the software
> > >> > - * suspend code (at least) uses end_swap_bio_read() against a non-
> > >> > - * swapcache page. So we must check PG_swapcache before proceeding with
> > >> > - * this optimization.
> > >> > - */
> > >> > - if (likely(PageSwapCache(page))) {
> > >> > - struct swap_info_struct *sis;
> > >> > -
> > >> > - sis = page_swap_info(page);
> > >> > - if (sis->flags & SWP_BLKDEV) {
> > >> > - /*
> > >> > - * The swap subsystem performs lazy swap slot freeing,
> > >> > - * expecting that the page will be swapped out again.
> > >> > - * So we can avoid an unnecessary write if the page
> > >> > - * isn't redirtied.
> > >> > - * This is good for real swap storage because we can
> > >> > - * reduce unnecessary I/O and enhance wear-leveling
> > >> > - * if an SSD is used as the as swap device.
> > >> > - * But if in-memory swap device (eg zram) is used,
> > >> > - * this causes a duplicated copy between uncompressed
> > >> > - * data in VM-owned memory and compressed data in
> > >> > - * zram-owned memory. So let's free zram-owned memory
> > >> > - * and make the VM-owned decompressed page *dirty*,
> > >> > - * so the page should be swapped out somewhere again if
> > >> > - * we again wish to reclaim it.
> > >> > - */
> > >> > - struct gendisk *disk = sis->bdev->bd_disk;
> > >> > - if (disk->fops->swap_slot_free_notify) {
> > >> > - swp_entry_t entry;
> > >> > - unsigned long offset;
> > >> > -
> > >> > - entry.val = page_private(page);
> > >> > - offset = swp_offset(entry);
> > >> > -
> > >> > - SetPageDirty(page);
> > >> > - disk->fops->swap_slot_free_notify(sis->bdev,
> > >> > - offset);
> > >> > - }
> > >> > - }
> > >> > - }
> > >> > -
> > >> > + swap_slot_free_notify(page);
> > >> > out:
> > >> > unlock_page(page);
> > >> > bio_put(bio);
> > >> > @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
> > >> >
> > >> > ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> > >> > if (!ret) {
> > >> > + swap_slot_free_notify(page);
> > >> > count_vm_event(PSWPIN);
> > >> > return 0;
> > >> > }
> > >>
> > >> Hello,
> > >
> > > Hey Joonsoo,
> > >
> > >>
> > >> You need to check PageUpdate() or something because bdev_read_page()
> > >> can be asynchronous.
> > >
> > > I considered it but decided not to add the check :(.
> > > Because I couldn't justify what benfit we can have with the check.
> > > The swap_slot_free_notify is tightly coupled with zram for several
> > > years and zram have been worked synchronously. So if bdev_read_page
> > > returns 0, it means we already have read the page successfully.
> > > Even, when I looked up other rw_page user, it seems there is no async
> > > rw_page users at the moment.
> >
> > Yes, I also looked up other rw_page users and found that
> > there is no async rw_page now.
> >
> > > If there is someone want to use *async* rw_page && *swap_slot_free_noity*
> > > in future, we could add the check easily. But I hope anyone never use
> > > swap_slot_free_notify any more which is mess. :(
> >
> > But, I think that we should add the check. If someone want it, how does
> > he/she know about it? Even, if someone makes zram to read/write
> > asynchronously, we can miss it easily. This is error-prone practice.
>
> Okay, I don't have strong against it.
> If we really want to catch such case, let's add WARN_ON_ONCE.
I'm okay with it. But, please add code comment why WARN_ON_ONCE() is
added here.
Then,
Acked-by: Joonsoo Kim <[email protected]>
Thanks.
On Wed, Mar 23, 2016 at 01:45:34PM +0900, Joonsoo Kim wrote:
> On Tue, Mar 22, 2016 at 11:06:29PM +0900, Minchan Kim wrote:
> > On Tue, Mar 22, 2016 at 05:20:08PM +0900, Joonsoo Kim wrote:
> > > 2016-03-22 17:00 GMT+09:00 Minchan Kim <[email protected]>:
> > > > On Tue, Mar 22, 2016 at 02:08:59PM +0900, Joonsoo Kim wrote:
> > > >> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
> > > >> > <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> > > >> > applied swap_slot_free_notify call in *end_swap_bio_read* to
> > > >> > remove duplicated memory between zram and memory.
> > > >> >
> > > >> > However, with introducing rw_page in zram <8c7f01025f7b>
> > > >> > "zram: implement rw_page operation of zram", it became void
> > > >> > because rw_page doesn't need bio.
> > > >> >
> > > >> > This patch restores the function for rw_page.
> > > >> >
> > > >> > Signed-off-by: Minchan Kim <[email protected]>
> > > >> > ---
> > > >> > mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
> > > >> > 1 file changed, 50 insertions(+), 43 deletions(-)
> > > >> >
> > > >> > diff --git a/mm/page_io.c b/mm/page_io.c
> > > >> > index ff74e512f029..18aac7819cc9 100644
> > > >> > --- a/mm/page_io.c
> > > >> > +++ b/mm/page_io.c
> > > >> > @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
> > > >> > bio_put(bio);
> > > >> > }
> > > >> >
> > > >> > +static void swap_slot_free_notify(struct page *page)
> > > >> > +{
> > > >> > + struct swap_info_struct *sis;
> > > >> > + struct gendisk *disk;
> > > >> > +
> > > >> > + /*
> > > >> > + * There is no guarantee that the page is in swap cache - the software
> > > >> > + * suspend code (at least) uses end_swap_bio_read() against a non-
> > > >> > + * swapcache page. So we must check PG_swapcache before proceeding with
> > > >> > + * this optimization.
> > > >> > + */
> > > >> > + if (unlikely(!PageSwapCache(page)))
> > > >> > + return;
> > > >> > +
> > > >> > + sis = page_swap_info(page);
> > > >> > + if (!(sis->flags & SWP_BLKDEV))
> > > >> > + return;
> > > >> > +
> > > >> > + /*
> > > >> > + * The swap subsystem performs lazy swap slot freeing,
> > > >> > + * expecting that the page will be swapped out again.
> > > >> > + * So we can avoid an unnecessary write if the page
> > > >> > + * isn't redirtied.
> > > >> > + * This is good for real swap storage because we can
> > > >> > + * reduce unnecessary I/O and enhance wear-leveling
> > > >> > + * if an SSD is used as the as swap device.
> > > >> > + * But if in-memory swap device (eg zram) is used,
> > > >> > + * this causes a duplicated copy between uncompressed
> > > >> > + * data in VM-owned memory and compressed data in
> > > >> > + * zram-owned memory. So let's free zram-owned memory
> > > >> > + * and make the VM-owned decompressed page *dirty*,
> > > >> > + * so the page should be swapped out somewhere again if
> > > >> > + * we again wish to reclaim it.
> > > >> > + */
> > > >> > + disk = sis->bdev->bd_disk;
> > > >> > + if (disk->fops->swap_slot_free_notify) {
> > > >> > + swp_entry_t entry;
> > > >> > + unsigned long offset;
> > > >> > +
> > > >> > + entry.val = page_private(page);
> > > >> > + offset = swp_offset(entry);
> > > >> > +
> > > >> > + SetPageDirty(page);
> > > >> > + disk->fops->swap_slot_free_notify(sis->bdev,
> > > >> > + offset);
> > > >> > + }
> > > >> > +}
> > > >> > +
> > > >> > static void end_swap_bio_read(struct bio *bio)
> > > >> > {
> > > >> > struct page *page = bio->bi_io_vec[0].bv_page;
> > > >> > @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
> > > >> > }
> > > >> >
> > > >> > SetPageUptodate(page);
> > > >> > -
> > > >> > - /*
> > > >> > - * There is no guarantee that the page is in swap cache - the software
> > > >> > - * suspend code (at least) uses end_swap_bio_read() against a non-
> > > >> > - * swapcache page. So we must check PG_swapcache before proceeding with
> > > >> > - * this optimization.
> > > >> > - */
> > > >> > - if (likely(PageSwapCache(page))) {
> > > >> > - struct swap_info_struct *sis;
> > > >> > -
> > > >> > - sis = page_swap_info(page);
> > > >> > - if (sis->flags & SWP_BLKDEV) {
> > > >> > - /*
> > > >> > - * The swap subsystem performs lazy swap slot freeing,
> > > >> > - * expecting that the page will be swapped out again.
> > > >> > - * So we can avoid an unnecessary write if the page
> > > >> > - * isn't redirtied.
> > > >> > - * This is good for real swap storage because we can
> > > >> > - * reduce unnecessary I/O and enhance wear-leveling
> > > >> > - * if an SSD is used as the as swap device.
> > > >> > - * But if in-memory swap device (eg zram) is used,
> > > >> > - * this causes a duplicated copy between uncompressed
> > > >> > - * data in VM-owned memory and compressed data in
> > > >> > - * zram-owned memory. So let's free zram-owned memory
> > > >> > - * and make the VM-owned decompressed page *dirty*,
> > > >> > - * so the page should be swapped out somewhere again if
> > > >> > - * we again wish to reclaim it.
> > > >> > - */
> > > >> > - struct gendisk *disk = sis->bdev->bd_disk;
> > > >> > - if (disk->fops->swap_slot_free_notify) {
> > > >> > - swp_entry_t entry;
> > > >> > - unsigned long offset;
> > > >> > -
> > > >> > - entry.val = page_private(page);
> > > >> > - offset = swp_offset(entry);
> > > >> > -
> > > >> > - SetPageDirty(page);
> > > >> > - disk->fops->swap_slot_free_notify(sis->bdev,
> > > >> > - offset);
> > > >> > - }
> > > >> > - }
> > > >> > - }
> > > >> > -
> > > >> > + swap_slot_free_notify(page);
> > > >> > out:
> > > >> > unlock_page(page);
> > > >> > bio_put(bio);
> > > >> > @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
> > > >> >
> > > >> > ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> > > >> > if (!ret) {
> > > >> > + swap_slot_free_notify(page);
> > > >> > count_vm_event(PSWPIN);
> > > >> > return 0;
> > > >> > }
> > > >>
> > > >> Hello,
> > > >
> > > > Hey Joonsoo,
> > > >
> > > >>
> > > >> You need to check PageUpdate() or something because bdev_read_page()
> > > >> can be asynchronous.
> > > >
> > > > I considered it but decided not to add the check :(.
> > > > Because I couldn't justify what benfit we can have with the check.
> > > > The swap_slot_free_notify is tightly coupled with zram for several
> > > > years and zram have been worked synchronously. So if bdev_read_page
> > > > returns 0, it means we already have read the page successfully.
> > > > Even, when I looked up other rw_page user, it seems there is no async
> > > > rw_page users at the moment.
> > >
> > > Yes, I also looked up other rw_page users and found that
> > > there is no async rw_page now.
> > >
> > > > If there is someone want to use *async* rw_page && *swap_slot_free_noity*
> > > > in future, we could add the check easily. But I hope anyone never use
> > > > swap_slot_free_notify any more which is mess. :(
> > >
> > > But, I think that we should add the check. If someone want it, how does
> > > he/she know about it? Even, if someone makes zram to read/write
> > > asynchronously, we can miss it easily. This is error-prone practice.
> >
> > Okay, I don't have strong against it.
> > If we really want to catch such case, let's add WARN_ON_ONCE.
>
> I'm okay with it. But, please add code comment why WARN_ON_ONCE() is
> added here.
>
> Then,
> Acked-by: Joonsoo Kim <[email protected]>
>
> Thanks.
>From 45a05828a83eb6111230801a60ff973aa7ed3ebf Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Mon, 28 Mar 2016 13:34:50 +0900
Subject: [PATCH] mm: catch swap_slot_free_notify's misuse
swap_slot_free_notify works with synchronous rw_page driver(i.e,
zram) but someday, if someone implements asynchonous rw_page
driver and he want to hook swap_slot_free_notify to remove
dulicate memory, it will break current swap_slot_free_notify.
This patch catches the situation.
Signed-off-by: Minchan Kim <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
---
mm/page_io.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/page_io.c b/mm/page_io.c
index 18aac7819cc9..32c82cdbed8e 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -105,6 +105,13 @@ static void swap_slot_free_notify(struct page *page)
swp_entry_t entry;
unsigned long offset;
+ /*
+ * If a driver supports async rw_page and
+ * swap_slot_free_notify, swap_slot_free_notify
+ * will not work rightly.
+ */
+ WARN_ON_ONCE(!PageUptodate(page));
+
entry.val = page_private(page);
offset = swp_offset(entry);
--
1.9.1
On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
> <b430e9d1c6d4> "remove compressed copy from zram in-memory"
> applied swap_slot_free_notify call in *end_swap_bio_read* to
> remove duplicated memory between zram and memory.
>
> However, with introducing rw_page in zram <8c7f01025f7b>
> "zram: implement rw_page operation of zram", it became void
> because rw_page doesn't need bio.
>
> This patch restores the function for rw_page.
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/page_io.c | 93 ++++++++++++++++++++++++++++++++----------------------------
> 1 file changed, 50 insertions(+), 43 deletions(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index ff74e512f029..18aac7819cc9 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
> bio_put(bio);
> }
>
> +static void swap_slot_free_notify(struct page *page)
> +{
> + struct swap_info_struct *sis;
> + struct gendisk *disk;
> +
> + /*
> + * There is no guarantee that the page is in swap cache - the software
> + * suspend code (at least) uses end_swap_bio_read() against a non-
> + * swapcache page. So we must check PG_swapcache before proceeding with
> + * this optimization.
> + */
> + if (unlikely(!PageSwapCache(page)))
> + return;
> +
> + sis = page_swap_info(page);
> + if (!(sis->flags & SWP_BLKDEV))
> + return;
> +
> + /*
> + * The swap subsystem performs lazy swap slot freeing,
> + * expecting that the page will be swapped out again.
> + * So we can avoid an unnecessary write if the page
> + * isn't redirtied.
> + * This is good for real swap storage because we can
> + * reduce unnecessary I/O and enhance wear-leveling
> + * if an SSD is used as the as swap device.
> + * But if in-memory swap device (eg zram) is used,
> + * this causes a duplicated copy between uncompressed
> + * data in VM-owned memory and compressed data in
> + * zram-owned memory. So let's free zram-owned memory
> + * and make the VM-owned decompressed page *dirty*,
> + * so the page should be swapped out somewhere again if
> + * we again wish to reclaim it.
> + */
> + disk = sis->bdev->bd_disk;
> + if (disk->fops->swap_slot_free_notify) {
> + swp_entry_t entry;
> + unsigned long offset;
> +
> + entry.val = page_private(page);
> + offset = swp_offset(entry);
> +
> + SetPageDirty(page);
> + disk->fops->swap_slot_free_notify(sis->bdev,
> + offset);
> + }
> +}
> +
> static void end_swap_bio_read(struct bio *bio)
> {
> struct page *page = bio->bi_io_vec[0].bv_page;
> @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
> }
>
> SetPageUptodate(page);
> -
> - /*
> - * There is no guarantee that the page is in swap cache - the software
> - * suspend code (at least) uses end_swap_bio_read() against a non-
> - * swapcache page. So we must check PG_swapcache before proceeding with
> - * this optimization.
> - */
> - if (likely(PageSwapCache(page))) {
> - struct swap_info_struct *sis;
> -
> - sis = page_swap_info(page);
> - if (sis->flags & SWP_BLKDEV) {
> - /*
> - * The swap subsystem performs lazy swap slot freeing,
> - * expecting that the page will be swapped out again.
> - * So we can avoid an unnecessary write if the page
> - * isn't redirtied.
> - * This is good for real swap storage because we can
> - * reduce unnecessary I/O and enhance wear-leveling
> - * if an SSD is used as the as swap device.
> - * But if in-memory swap device (eg zram) is used,
> - * this causes a duplicated copy between uncompressed
> - * data in VM-owned memory and compressed data in
> - * zram-owned memory. So let's free zram-owned memory
> - * and make the VM-owned decompressed page *dirty*,
> - * so the page should be swapped out somewhere again if
> - * we again wish to reclaim it.
> - */
> - struct gendisk *disk = sis->bdev->bd_disk;
> - if (disk->fops->swap_slot_free_notify) {
> - swp_entry_t entry;
> - unsigned long offset;
> -
> - entry.val = page_private(page);
> - offset = swp_offset(entry);
> -
> - SetPageDirty(page);
> - disk->fops->swap_slot_free_notify(sis->bdev,
> - offset);
> - }
> - }
> - }
> -
> + swap_slot_free_notify(page);
> out:
> unlock_page(page);
> bio_put(bio);
> @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
>
> ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> if (!ret) {
> + swap_slot_free_notify(page);
> count_vm_event(PSWPIN);
> return 0;
> }
> --
> 1.9.1
>
Kyeongdon, Could you try this patch?
>From 09497ba48f5621cd2737f72ac4d15e6b3e5e3d14 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Mon, 28 Mar 2016 23:08:14 +0900
Subject: [PATCH] mm: call swap_slot_free_notify with holding page lock
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page))
in page_swap_info.
The reason is that page_endio in rw_page unlocks the page if read I/O
is completed so we need to hold a PG_lock again to check PageSwapCache.
Otherwise, the page can be removed from swapcache and trigger below
BUG_ON.
[27833.995833] ------------[ cut here ]------------
[27833.995853] Kernel BUG at c00f9040 [verbose debug info unavailable]
[27833.995865] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[27833.995876] Modules linked in:
[27833.995892] CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W
3.10.84-g9f14aec-dirty #73
[27833.995903] task: c3b73200 ti: dd192000 task.ti: dd192000
[27833.995922] PC is at page_swap_info+0x10/0x2c
[27833.995934] LR is at swap_slot_free_notify+0x18/0x6c
[27833.995946] pc : [<c00f9040>] lr : [<c00f5560>] psr: 400f0113
[27833.995946] sp : dd193d78 ip : c2deb1e4 fp : da015180
[27833.995959] r10: 00000000 r9 : 000200da r8 : c120fe08
[27833.995968] r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : =
c249a6c0
[27833.995979] r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : =
c249a6c0
..<snip>
[27833.996273] [<c00f9040>] (page_swap_info+0x10/0x2c) from [<c00f5560>]
(swap_slot_free_notify+0x18/0x6c)
[27833.996288] [<c00f5560>] (swap_slot_free_notify+0x18/0x6c) from
[<c00f5c5c>] (swap_readpage+0x90/0x11c)
[27833.996302] [<c00f5c5c>] (swap_readpage+0x90/0x11c) from [<c00f62dc>]
(read_swap_cache_async+0x134/0x1ac)
[27833.996317] [<c00f62dc>] (read_swap_cache_async+0x134/0x1ac) from
[<c00f63c4>] (swapin_readahead+0x70/0xb0)
[27833.996334] [<c00f63c4>] (swapin_readahead+0x70/0xb0) from =
[<c00e87e0>]
(handle_pte_fault+0x320/0x6fc)
[27833.996348] [<c00e87e0>] (handle_pte_fault+0x320/0x6fc) from
[<c00e8c7c>] (handle_mm_fault+0xc0/0xf0)
[27833.996363] [<c00e8c7c>] (handle_mm_fault+0xc0/0xf0) from =
[<c001ac18>]
(do_page_fault+0x11c/0x36c)
[27833.996378] [<c001ac18>] (do_page_fault+0x11c/0x36c) from =
[<c000838c>]
(do_DataAbort+0x34/0x118)
[27833.996392] [<c000838c>] (do_DataAbort+0x34/0x118) from [<c000d8b4>]
(__dabt_usr+0x34/0x40)
Reported-by: Kyeongdon Kim <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/page_io.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index 18aac7819cc9..57e279b71695 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -353,7 +353,11 @@ int swap_readpage(struct page *page)
ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
if (!ret) {
- swap_slot_free_notify(page);
+ if (trylock_page(page)) {
+ swap_slot_free_notify(page);
+ unlock_page(page);
+ }
+
count_vm_event(PSWPIN);
return 0;
}
--
1.9.1
On Mon, Mar 28, 2016 at 11:30:56PM +0900, Minchan Kim wrote:
> On Fri, Mar 18, 2016 at 04:58:31PM +0900, Minchan Kim wrote:
>> <b430e9d1c6d4> "remove compressed copy from zram in-memory"
>> applied swap_slot_free_notify call in *end_swap_bio_read* to
>> remove duplicated memory between zram and memory.
>>
>> However, with introducing rw_page in zram <8c7f01025f7b>
>> "zram: implement rw_page operation of zram", it became void
>> because rw_page doesn't need bio.
>>
>> This patch restores the function for rw_page.
>>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> mm/page_io.c | 93
> ++++++++++++++++++++++++++++++++----------------------------
>> 1 file changed, 50 insertions(+), 43 deletions(-)
>>
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index ff74e512f029..18aac7819cc9 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -66,6 +66,54 @@ void end_swap_bio_write(struct bio *bio)
>> bio_put(bio);
>> }
>>
>> +static void swap_slot_free_notify(struct page *page)
>> +{
>> + struct swap_info_struct *sis;
>> + struct gendisk *disk;
>> +
>> + /*
>> + * There is no guarantee that the page is in swap cache - the software
>> + * suspend code (at least) uses end_swap_bio_read() against a non-
>> + * swapcache page. So we must check PG_swapcache before proceeding with
>> + * this optimization.
>> + */
>> + if (unlikely(!PageSwapCache(page)))
>> + return;
>> +
>> + sis = page_swap_info(page);
>> + if (!(sis->flags & SWP_BLKDEV))
>> + return;
>> +
>> + /*
>> + * The swap subsystem performs lazy swap slot freeing,
>> + * expecting that the page will be swapped out again.
>> + * So we can avoid an unnecessary write if the page
>> + * isn't redirtied.
>> + * This is good for real swap storage because we can
>> + * reduce unnecessary I/O and enhance wear-leveling
>> + * if an SSD is used as the as swap device.
>> + * But if in-memory swap device (eg zram) is used,
>> + * this causes a duplicated copy between uncompressed
>> + * data in VM-owned memory and compressed data in
>> + * zram-owned memory. So let's free zram-owned memory
>> + * and make the VM-owned decompressed page *dirty*,
>> + * so the page should be swapped out somewhere again if
>> + * we again wish to reclaim it.
>> + */
>> + disk = sis->bdev->bd_disk;
>> + if (disk->fops->swap_slot_free_notify) {
>> + swp_entry_t entry;
>> + unsigned long offset;
>> +
>> + entry.val = page_private(page);
>> + offset = swp_offset(entry);
>> +
>> + SetPageDirty(page);
>> + disk->fops->swap_slot_free_notify(sis->bdev,
>> + offset);
>> + }
>> +}
>> +
>> static void end_swap_bio_read(struct bio *bio)
>> {
>> struct page *page = bio->bi_io_vec[0].bv_page;
>> @@ -81,49 +129,7 @@ static void end_swap_bio_read(struct bio *bio)
>> }
>>
>> SetPageUptodate(page);
>> -
>> - /*
>> - * There is no guarantee that the page is in swap cache - the software
>> - * suspend code (at least) uses end_swap_bio_read() against a non-
>> - * swapcache page. So we must check PG_swapcache before proceeding with
>> - * this optimization.
>> - */
>> - if (likely(PageSwapCache(page))) {
>> - struct swap_info_struct *sis;
>> -
>> - sis = page_swap_info(page);
>> - if (sis->flags & SWP_BLKDEV) {
>> - /*
>> - * The swap subsystem performs lazy swap slot freeing,
>> - * expecting that the page will be swapped out again.
>> - * So we can avoid an unnecessary write if the page
>> - * isn't redirtied.
>> - * This is good for real swap storage because we can
>> - * reduce unnecessary I/O and enhance wear-leveling
>> - * if an SSD is used as the as swap device.
>> - * But if in-memory swap device (eg zram) is used,
>> - * this causes a duplicated copy between uncompressed
>> - * data in VM-owned memory and compressed data in
>> - * zram-owned memory. So let's free zram-owned memory
>> - * and make the VM-owned decompressed page *dirty*,
>> - * so the page should be swapped out somewhere again if
>> - * we again wish to reclaim it.
>> - */
>> - struct gendisk *disk = sis->bdev->bd_disk;
>> - if (disk->fops->swap_slot_free_notify) {
>> - swp_entry_t entry;
>> - unsigned long offset;
>> -
>> - entry.val = page_private(page);
>> - offset = swp_offset(entry);
>> -
>> - SetPageDirty(page);
>> - disk->fops->swap_slot_free_notify(sis->bdev,
>> - offset);
>> - }
>> - }
>> - }
>> -
>> + swap_slot_free_notify(page);
>> out:
>> unlock_page(page);
>> bio_put(bio);
>> @@ -347,6 +353,7 @@ int swap_readpage(struct page *page)
>>
>> ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
>> if (!ret) {
>> + swap_slot_free_notify(page);
>> count_vm_event(PSWPIN);
>> return 0;
>> }
>> --
>> 1.9.1
>>
>
> Kyeongdon, Could you try this patch?
>
> From 09497ba48f5621cd2737f72ac4d15e6b3e5e3d14 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <[email protected]>
> Date: Mon, 28 Mar 2016 23:08:14 +0900
> Subject: [PATCH] mm: call swap_slot_free_notify with holding page lock
>
> Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page))
> in page_swap_info.
> The reason is that page_endio in rw_page unlocks the page if read I/O
> is completed so we need to hold a PG_lock again to check PageSwapCache.
> Otherwise, the page can be removed from swapcache and trigger below
> BUG_ON.
>
> [27833.995833] ------------[ cut here ]------------
> [27833.995853] Kernel BUG at c00f9040 [verbose debug info unavailable]
> [27833.995865] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [27833.995876] Modules linked in:
> [27833.995892] CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W
> 3.10.84-g9f14aec-dirty #73
> [27833.995903] task: c3b73200 ti: dd192000 task.ti: dd192000
> [27833.995922] PC is at page_swap_info+0x10/0x2c
> [27833.995934] LR is at swap_slot_free_notify+0x18/0x6c
> [27833.995946] pc : [<c00f9040>] lr : [<c00f5560>] psr: 400f0113
> [27833.995946] sp : dd193d78 ip : c2deb1e4 fp : da015180
> [27833.995959] r10: 00000000 r9 : 000200da r8 : c120fe08
> [27833.995968] r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : =
> c249a6c0
> [27833.995979] r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : =
> c249a6c0
> ..<snip>
> [27833.996273] [<c00f9040>] (page_swap_info+0x10/0x2c) from [<c00f5560>]
> (swap_slot_free_notify+0x18/0x6c)
> [27833.996288] [<c00f5560>] (swap_slot_free_notify+0x18/0x6c) from
> [<c00f5c5c>] (swap_readpage+0x90/0x11c)
> [27833.996302] [<c00f5c5c>] (swap_readpage+0x90/0x11c) from [<c00f62dc>]
> (read_swap_cache_async+0x134/0x1ac)
> [27833.996317] [<c00f62dc>] (read_swap_cache_async+0x134/0x1ac) from
> [<c00f63c4>] (swapin_readahead+0x70/0xb0)
> [27833.996334] [<c00f63c4>] (swapin_readahead+0x70/0xb0) from =
> [<c00e87e0>]
> (handle_pte_fault+0x320/0x6fc)
> [27833.996348] [<c00e87e0>] (handle_pte_fault+0x320/0x6fc) from
> [<c00e8c7c>] (handle_mm_fault+0xc0/0xf0)
> [27833.996363] [<c00e8c7c>] (handle_mm_fault+0xc0/0xf0) from =
> [<c001ac18>]
> (do_page_fault+0x11c/0x36c)
> [27833.996378] [<c001ac18>] (do_page_fault+0x11c/0x36c) from =
> [<c000838c>]
> (do_DataAbort+0x34/0x118)
> [27833.996392] [<c000838c>] (do_DataAbort+0x34/0x118) from [<c000d8b4>]
> (__dabt_usr+0x34/0x40)
>
> Reported-by: Kyeongdon Kim <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
I tested this patch for 48 hours and can't reproduce the problem.
Tested-by: Kyeongdon Kim <[email protected]>
Thanks.