2013-06-07 12:59:22

by Artem Savkov

[permalink] [raw]
Subject: [BUG] non-swapcache page in end_swap_bio_read()

Hello all,

I'm hitting the following BUG_ON during boot when
CONFIG_PM_STD_PARTITION or "resume" kernel boot option are set. Looks
like this issue was introduced in (or brought up to light by)
"mm: remove compressed copy from zram in-memory"
(84e5bb4f06e6d6f0c4dfc033b4700702ed8aaccc in linux-next.git)
What happens is that during swsusp_check() bio is created with
bio_end_io set to end_swap_bio_read(), but the page is not in swap
cache.
Not sure how to handle this the right way, but proceeding with the
optimization in end_swap_bio_read() only after checking PageSwapCache
flag does help.

[ 2.065206] kernel BUG at mm/swapfile.c:2361!
[ 2.065469] invalid opcode: 0000 [#1] SMP
[ 2.065469] Modules linked in:
[ 2.065469] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc4-next-20130607+ #61
[ 2.065469] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2.065469] task: ffff88001e5ccfc0 ti: ffff88001e5ea000 task.ti: ffff88001e5ea000
[ 2.065469] RIP: 0010:[<ffffffff811462eb>] [<ffffffff811462eb>] page_swap_info+0xab/0xb0
[ 2.065469] RSP: 0000:ffff88001ec03c78 EFLAGS: 00010246
[ 2.065469] RAX: 0100000000000009 RBX: ffffea0000794780 RCX: 0000000000000c0b
[ 2.065469] RDX: 0000000000000046 RSI: 0000000000000000 RDI: 0000000000000000
[ 2.065469] RBP: ffff88001ec03c88 R08: 0000000000000000 R09: 0000000000000000
[ 2.065469] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 2.065469] R13: 0000000000000001 R14: ffff88001e7f6200 R15: 0000000000001000
[ 2.065469] FS: 0000000000000000(0000) GS:ffff88001ec00000(0000) knlGS:0000000000000000
[ 2.065469] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2.065469] CR2: 0000000000000000 CR3: 000000000240b000 CR4: 00000000000006e0
[ 2.065469] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.065469] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2.065469] Stack:
[ 2.065469] ffffea0000794780 ffff88001e7f6200 ffff88001ec03cb8 ffffffff81145486
[ 2.065469] ffff88001e5cd5c0 ffff88001c02cd20 0000000000001000 0000000000000000
[ 2.065469] ffff88001ec03cc8 ffffffff81199518 ffff88001ec03d28 ffffffff81518ec3
[ 2.065469] Call Trace:
[ 2.065469] <IRQ>
[ 2.065469] [<ffffffff81145486>] end_swap_bio_read+0x96/0x130
[ 2.065469] [<ffffffff81199518>] bio_endio+0x18/0x40
[ 2.065469] [<ffffffff81518ec3>] blk_update_request+0x213/0x540
[ 2.065469] [<ffffffff81518fa0>] ? blk_update_request+0x2f0/0x540
[ 2.065469] [<ffffffff817986a6>] ? ata_hsm_qc_complete+0x46/0x130
[ 2.065469] [<ffffffff81519212>] blk_update_bidi_request+0x22/0x90
[ 2.065469] [<ffffffff8151b9ea>] blk_end_bidi_request+0x2a/0x80
[ 2.065469] [<ffffffff8151ba8b>] blk_end_request+0xb/0x10
[ 2.065469] [<ffffffff817693aa>] scsi_io_completion+0xaa/0x6b0
[ 2.065469] [<ffffffff817608d8>] scsi_finish_command+0xc8/0x130
[ 2.065469] [<ffffffff81769aff>] scsi_softirq_done+0x13f/0x160
[ 2.065469] [<ffffffff81521ebc>] blk_done_softirq+0x7c/0x90
[ 2.065469] [<ffffffff81049030>] __do_softirq+0x130/0x3f0
[ 2.065469] [<ffffffff810d454e>] ? handle_irq_event+0x4e/0x70
[ 2.065469] [<ffffffff81049405>] irq_exit+0xa5/0xb0
[ 2.065469] [<ffffffff81003cb1>] do_IRQ+0x61/0xe0
[ 2.065469] [<ffffffff81c2832f>] common_interrupt+0x6f/0x6f
[ 2.065469] <EOI>
[ 2.065469] [<ffffffff8107ebff>] ? local_clock+0x4f/0x60
[ 2.065469] [<ffffffff81c27f85>] ? _raw_spin_unlock_irq+0x35/0x50
[ 2.065469] [<ffffffff81c27f7b>] ? _raw_spin_unlock_irq+0x2b/0x50
[ 2.065469] [<ffffffff81078bd0>] finish_task_switch+0x80/0x110
[ 2.065469] [<ffffffff81078b93>] ? finish_task_switch+0x43/0x110
[ 2.065469] [<ffffffff81c2525c>] __schedule+0x32c/0x8c0
[ 2.065469] [<ffffffff81c2c010>] ? notifier_call_chain+0x150/0x150
[ 2.065469] [<ffffffff81c259d4>] schedule+0x24/0x70
[ 2.065469] [<ffffffff81c25d42>] schedule_preempt_disabled+0x22/0x30
[ 2.065469] [<ffffffff81093645>] cpu_startup_entry+0x335/0x380
[ 2.065469] [<ffffffff81c1ed7e>] start_secondary+0x217/0x219
[ 2.065469] Code: 69 bc 16 82 48 c7 c7 77 bc 16 82 31 c0 49 c1 ec 39 49 c1 e9 10 41 83 e1 01 e8 6c d2 ad 00 5b 4a 8b 04 e5 e0 bf 14 83 41 5c c9 c3 <0f> 0b eb fe 90 48 8b 07 55 48 89 e5 a9 00 00 01 00 74 12 e8 3d
[ 2.065469] RIP [<ffffffff811462eb>] page_swap_info+0xab/0xb0
[ 2.065469] RSP <ffff88001ec03c78>

--
Regards,
Artem


2013-06-07 15:27:02

by Minchan Kim

[permalink] [raw]
Subject: Re: [BUG] non-swapcache page in end_swap_bio_read()

Hello Artem,

On Fri, Jun 07, 2013 at 04:59:09PM +0400, Artem Savkov wrote:
> Hello all,
>
> I'm hitting the following BUG_ON during boot when
> CONFIG_PM_STD_PARTITION or "resume" kernel boot option are set. Looks
> like this issue was introduced in (or brought up to light by)
> "mm: remove compressed copy from zram in-memory"
> (84e5bb4f06e6d6f0c4dfc033b4700702ed8aaccc in linux-next.git)
> What happens is that during swsusp_check() bio is created with
> bio_end_io set to end_swap_bio_read(), but the page is not in swap
> cache.

True. I totally missed it.

> Not sure how to handle this the right way, but proceeding with the
> optimization in end_swap_bio_read() only after checking PageSwapCache
> flag does help.

I'd like to go with your way.
We already have SetPageUptodate so PageSwapCache's cost would be really
cheap from hitting from cacheline. Even, we can optimize it with unlikely
hint.

The credit should be yours so could you send a patch with your SOB?
Please write a comment in code about that why we need such check.

If you have any problem about sending a patch, I will send it for you
on Monday if others don't suggest another solution.

Thanks for the report!

>
> [ 2.065206] kernel BUG at mm/swapfile.c:2361!
> [ 2.065469] invalid opcode: 0000 [#1] SMP
> [ 2.065469] Modules linked in:
> [ 2.065469] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc4-next-20130607+ #61
> [ 2.065469] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [ 2.065469] task: ffff88001e5ccfc0 ti: ffff88001e5ea000 task.ti: ffff88001e5ea000
> [ 2.065469] RIP: 0010:[<ffffffff811462eb>] [<ffffffff811462eb>] page_swap_info+0xab/0xb0
> [ 2.065469] RSP: 0000:ffff88001ec03c78 EFLAGS: 00010246
> [ 2.065469] RAX: 0100000000000009 RBX: ffffea0000794780 RCX: 0000000000000c0b
> [ 2.065469] RDX: 0000000000000046 RSI: 0000000000000000 RDI: 0000000000000000
> [ 2.065469] RBP: ffff88001ec03c88 R08: 0000000000000000 R09: 0000000000000000
> [ 2.065469] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 2.065469] R13: 0000000000000001 R14: ffff88001e7f6200 R15: 0000000000001000
> [ 2.065469] FS: 0000000000000000(0000) GS:ffff88001ec00000(0000) knlGS:0000000000000000
> [ 2.065469] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2.065469] CR2: 0000000000000000 CR3: 000000000240b000 CR4: 00000000000006e0
> [ 2.065469] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.065469] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 2.065469] Stack:
> [ 2.065469] ffffea0000794780 ffff88001e7f6200 ffff88001ec03cb8 ffffffff81145486
> [ 2.065469] ffff88001e5cd5c0 ffff88001c02cd20 0000000000001000 0000000000000000
> [ 2.065469] ffff88001ec03cc8 ffffffff81199518 ffff88001ec03d28 ffffffff81518ec3
> [ 2.065469] Call Trace:
> [ 2.065469] <IRQ>
> [ 2.065469] [<ffffffff81145486>] end_swap_bio_read+0x96/0x130
> [ 2.065469] [<ffffffff81199518>] bio_endio+0x18/0x40
> [ 2.065469] [<ffffffff81518ec3>] blk_update_request+0x213/0x540
> [ 2.065469] [<ffffffff81518fa0>] ? blk_update_request+0x2f0/0x540
> [ 2.065469] [<ffffffff817986a6>] ? ata_hsm_qc_complete+0x46/0x130
> [ 2.065469] [<ffffffff81519212>] blk_update_bidi_request+0x22/0x90
> [ 2.065469] [<ffffffff8151b9ea>] blk_end_bidi_request+0x2a/0x80
> [ 2.065469] [<ffffffff8151ba8b>] blk_end_request+0xb/0x10
> [ 2.065469] [<ffffffff817693aa>] scsi_io_completion+0xaa/0x6b0
> [ 2.065469] [<ffffffff817608d8>] scsi_finish_command+0xc8/0x130
> [ 2.065469] [<ffffffff81769aff>] scsi_softirq_done+0x13f/0x160
> [ 2.065469] [<ffffffff81521ebc>] blk_done_softirq+0x7c/0x90
> [ 2.065469] [<ffffffff81049030>] __do_softirq+0x130/0x3f0
> [ 2.065469] [<ffffffff810d454e>] ? handle_irq_event+0x4e/0x70
> [ 2.065469] [<ffffffff81049405>] irq_exit+0xa5/0xb0
> [ 2.065469] [<ffffffff81003cb1>] do_IRQ+0x61/0xe0
> [ 2.065469] [<ffffffff81c2832f>] common_interrupt+0x6f/0x6f
> [ 2.065469] <EOI>
> [ 2.065469] [<ffffffff8107ebff>] ? local_clock+0x4f/0x60
> [ 2.065469] [<ffffffff81c27f85>] ? _raw_spin_unlock_irq+0x35/0x50
> [ 2.065469] [<ffffffff81c27f7b>] ? _raw_spin_unlock_irq+0x2b/0x50
> [ 2.065469] [<ffffffff81078bd0>] finish_task_switch+0x80/0x110
> [ 2.065469] [<ffffffff81078b93>] ? finish_task_switch+0x43/0x110
> [ 2.065469] [<ffffffff81c2525c>] __schedule+0x32c/0x8c0
> [ 2.065469] [<ffffffff81c2c010>] ? notifier_call_chain+0x150/0x150
> [ 2.065469] [<ffffffff81c259d4>] schedule+0x24/0x70
> [ 2.065469] [<ffffffff81c25d42>] schedule_preempt_disabled+0x22/0x30
> [ 2.065469] [<ffffffff81093645>] cpu_startup_entry+0x335/0x380
> [ 2.065469] [<ffffffff81c1ed7e>] start_secondary+0x217/0x219
> [ 2.065469] Code: 69 bc 16 82 48 c7 c7 77 bc 16 82 31 c0 49 c1 ec 39 49 c1 e9 10 41 83 e1 01 e8 6c d2 ad 00 5b 4a 8b 04 e5 e0 bf 14 83 41 5c c9 c3 <0f> 0b eb fe 90 48 8b 07 55 48 89 e5 a9 00 00 01 00 74 12 e8 3d
> [ 2.065469] RIP [<ffffffff811462eb>] page_swap_info+0xab/0xb0
> [ 2.065469] RSP <ffff88001ec03c78>
>
> --
> Regards,
> Artem
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2013-06-07 20:23:25

by Artem Savkov

[permalink] [raw]
Subject: [PATCH] non-swapcache pages in end_swap_bio_read()

There is no guarantee that page in end_swap_bio_read is in swapcache so we need
to check it before calling page_swap_info(). Otherwise kernel hits a bug on
like the one below.
Introduced in "mm: remove compressed copy from zram in-memory"

kernel BUG at mm/swapfile.c:2361!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc4-next-20130607+ #61
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff88001e5ccfc0 ti: ffff88001e5ea000 task.ti: ffff88001e5ea000
RIP: 0010:[<ffffffff811462eb>] [<ffffffff811462eb>] page_swap_info+0xab/0xb0
RSP: 0000:ffff88001ec03c78 EFLAGS: 00010246
RAX: 0100000000000009 RBX: ffffea0000794780 RCX: 0000000000000c0b
RDX: 0000000000000046 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001ec03c88 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000001 R14: ffff88001e7f6200 R15: 0000000000001000
FS: 0000000000000000(0000) GS:ffff88001ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000000240b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffea0000794780 ffff88001e7f6200 ffff88001ec03cb8 ffffffff81145486
ffff88001e5cd5c0 ffff88001c02cd20 0000000000001000 0000000000000000
ffff88001ec03cc8 ffffffff81199518 ffff88001ec03d28 ffffffff81518ec3
Call Trace:
<IRQ>
[<ffffffff81145486>] end_swap_bio_read+0x96/0x130
[<ffffffff81199518>] bio_endio+0x18/0x40
[<ffffffff81518ec3>] blk_update_request+0x213/0x540
[<ffffffff81518fa0>] ? blk_update_request+0x2f0/0x540
[<ffffffff817986a6>] ? ata_hsm_qc_complete+0x46/0x130
[<ffffffff81519212>] blk_update_bidi_request+0x22/0x90
[<ffffffff8151b9ea>] blk_end_bidi_request+0x2a/0x80
[<ffffffff8151ba8b>] blk_end_request+0xb/0x10
[<ffffffff817693aa>] scsi_io_completion+0xaa/0x6b0
[<ffffffff817608d8>] scsi_finish_command+0xc8/0x130
[<ffffffff81769aff>] scsi_softirq_done+0x13f/0x160
[<ffffffff81521ebc>] blk_done_softirq+0x7c/0x90
[<ffffffff81049030>] __do_softirq+0x130/0x3f0
[<ffffffff810d454e>] ? handle_irq_event+0x4e/0x70
[<ffffffff81049405>] irq_exit+0xa5/0xb0
[<ffffffff81003cb1>] do_IRQ+0x61/0xe0
[<ffffffff81c2832f>] common_interrupt+0x6f/0x6f
<EOI>
[<ffffffff8107ebff>] ? local_clock+0x4f/0x60
[<ffffffff81c27f85>] ? _raw_spin_unlock_irq+0x35/0x50
[<ffffffff81c27f7b>] ? _raw_spin_unlock_irq+0x2b/0x50
[<ffffffff81078bd0>] finish_task_switch+0x80/0x110
[<ffffffff81078b93>] ? finish_task_switch+0x43/0x110
[<ffffffff81c2525c>] __schedule+0x32c/0x8c0
[<ffffffff81c2c010>] ? notifier_call_chain+0x150/0x150
[<ffffffff81c259d4>] schedule+0x24/0x70
[<ffffffff81c25d42>] schedule_preempt_disabled+0x22/0x30
[<ffffffff81093645>] cpu_startup_entry+0x335/0x380
[<ffffffff81c1ed7e>] start_secondary+0x217/0x219
Code: 69 bc 16 82 48 c7 c7 77 bc 16 82 31 c0 49 c1 ec 39 49 c1 e9 10 41 83 e1 01 e8 6c d2 ad 00 5b 4a 8b 04 e5 e0 bf 14 83 41 5c c9 c3 <0f> 0b eb fe 90 48 8b 07 55 48 89 e5 a9 00 00 01 00 74 12 e8 3d
RIP [<ffffffff811462eb>] page_swap_info+0xab/0xb0
RSP <ffff88001ec03c78>

Signed-off-by: Artem Savkov <[email protected]>
---
mm/page_io.c | 68 +++++++++++++++++++++++++++++++++---------------------------
1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index 1897abb..2b76ac7 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -82,38 +82,46 @@ void end_swap_bio_read(struct bio *bio, int err)
iminor(bio->bi_bdev->bd_inode),
(unsigned long long)bio->bi_sector);
} else {
- struct swap_info_struct *sis;
-
SetPageUptodate(page);
- sis = page_swap_info(page);
- if (sis->flags & SWP_BLKDEV) {
- /*
- * The swap subsystem performs lazy swap slot freeing,
- * expecting that the page will be swapped out again.
- * So we can avoid an unnecessary write if the page
- * isn't redirtied.
- * This is good for real swap storage because we can
- * reduce unnecessary I/O and enhance wear-leveling
- * if an SSD is used as the as swap device.
- * But if in-memory swap device (eg zram) is used,
- * this causes a duplicated copy between uncompressed
- * data in VM-owned memory and compressed data in
- * zram-owned memory. So let's free zram-owned memory
- * and make the VM-owned decompressed page *dirty*,
- * so the page should be swapped out somewhere again if
- * we again wish to reclaim it.
- */
- struct gendisk *disk = sis->bdev->bd_disk;
- if (disk->fops->swap_slot_free_notify) {
- swp_entry_t entry;
- unsigned long offset;

- entry.val = page_private(page);
- offset = swp_offset(entry);
-
- SetPageDirty(page);
- disk->fops->swap_slot_free_notify(sis->bdev,
- offset);
+ /*
+ * There is no guarantee that the page is in swap cache, so
+ * we need to check PG_swapcache before proceeding with this
+ * optimization.
+ */
+ if (unlikely(PageSwapCache(page))) {
+ struct swap_info_struct *sis;
+
+ sis = page_swap_info(page);
+ if (sis->flags & SWP_BLKDEV) {
+ /*
+ * The swap subsystem performs lazy swap slot freeing,
+ * expecting that the page will be swapped out again.
+ * So we can avoid an unnecessary write if the page
+ * isn't redirtied.
+ * This is good for real swap storage because we can
+ * reduce unnecessary I/O and enhance wear-leveling
+ * if an SSD is used as the as swap device.
+ * But if in-memory swap device (eg zram) is used,
+ * this causes a duplicated copy between uncompressed
+ * data in VM-owned memory and compressed data in
+ * zram-owned memory. So let's free zram-owned memory
+ * and make the VM-owned decompressed page *dirty*,
+ * so the page should be swapped out somewhere again if
+ * we again wish to reclaim it.
+ */
+ struct gendisk *disk = sis->bdev->bd_disk;
+ if (disk->fops->swap_slot_free_notify) {
+ swp_entry_t entry;
+ unsigned long offset;
+
+ entry.val = page_private(page);
+ offset = swp_offset(entry);
+
+ SetPageDirty(page);
+ disk->fops->swap_slot_free_notify(sis->bdev,
+ offset);
+ }
}
}
}
--
1.8.3

2013-06-07 20:43:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] non-swapcache pages in end_swap_bio_read()

On Sat, 8 Jun 2013 00:23:18 +0400 Artem Savkov <[email protected]> wrote:

> There is no guarantee that page in end_swap_bio_read is in swapcache so we need
> to check it before calling page_swap_info(). Otherwise kernel hits a bug on
> like the one below.
> Introduced in "mm: remove compressed copy from zram in-memory"
>
> kernel BUG at mm/swapfile.c:2361!

Fair enough.

> --- a/mm/page_io.c
> +++ b/mm/page_io.c
>
> ...
>
> + /*
> + * There is no guarantee that the page is in swap cache, so
> + * we need to check PG_swapcache before proceeding with this
> + * optimization.
> + */

My initial thought was "how the heck can this not be a swapcache page".
So let's add the important details to this comment.

> + if (unlikely(PageSwapCache(page))) {

Surely this is "likely".

> + struct swap_info_struct *sis;
> +
> + sis = page_swap_info(page);
> + if (sis->flags & SWP_BLKDEV) {
> + /*
> + * The swap subsystem performs lazy swap slot freeing,
> + * expecting that the page will be swapped out again.
> + * So we can avoid an unnecessary write if the page
> + * isn't redirtied.
> + * This is good for real swap storage because we can
> + * reduce unnecessary I/O and enhance wear-leveling
> + * if an SSD is used as the as swap device.
> + * But if in-memory swap device (eg zram) is used,
> + * this causes a duplicated copy between uncompressed
> + * data in VM-owned memory and compressed data in
> + * zram-owned memory. So let's free zram-owned memory
> + * and make the VM-owned decompressed page *dirty*,
> + * so the page should be swapped out somewhere again if
> + * we again wish to reclaim it.
> + */

This comment now makes a horrid mess in an 80-col display. I think we
may as well remove a tabstop here.

> + struct gendisk *disk = sis->bdev->bd_disk;
> + if (disk->fops->swap_slot_free_notify) {
> + swp_entry_t entry;
> + unsigned long offset;
> +
> + entry.val = page_private(page);
> + offset = swp_offset(entry);
> +
> + SetPageDirty(page);
> + disk->fops->swap_slot_free_notify(sis->bdev,
> + offset);
> + }
> }
> }
> }

Result:

--- a/mm/page_io.c~mm-remove-compressed-copy-from-zram-in-memory-fix-2-fix
+++ a/mm/page_io.c
@@ -81,51 +81,54 @@ void end_swap_bio_read(struct bio *bio,
imajor(bio->bi_bdev->bd_inode),
iminor(bio->bi_bdev->bd_inode),
(unsigned long long)bio->bi_sector);
- } else {
- SetPageUptodate(page);
+ goto out;
+ }
+
+ SetPageUptodate(page);
+
+ /*
+ * There is no guarantee that the page is in swap cache - the software
+ * suspend code (at least) uses end_swap_bio_read() against a non-
+ * swapcache page. So we must check PG_swapcache before proceeding with
+ * this optimization.
+ */
+ if (likely(PageSwapCache(page))) {
+ struct swap_info_struct *sis;
+
+ sis = page_swap_info(page);
+ if (sis->flags & SWP_BLKDEV) {
+ /*
+ * The swap subsystem performs lazy swap slot freeing,
+ * expecting that the page will be swapped out again.
+ * So we can avoid an unnecessary write if the page
+ * isn't redirtied.
+ * This is good for real swap storage because we can
+ * reduce unnecessary I/O and enhance wear-leveling
+ * if an SSD is used as the as swap device.
+ * But if in-memory swap device (eg zram) is used,
+ * this causes a duplicated copy between uncompressed
+ * data in VM-owned memory and compressed data in
+ * zram-owned memory. So let's free zram-owned memory
+ * and make the VM-owned decompressed page *dirty*,
+ * so the page should be swapped out somewhere again if
+ * we again wish to reclaim it.
+ */
+ struct gendisk *disk = sis->bdev->bd_disk;
+ if (disk->fops->swap_slot_free_notify) {
+ swp_entry_t entry;
+ unsigned long offset;
+
+ entry.val = page_private(page);
+ offset = swp_offset(entry);

- /*
- * There is no guarantee that the page is in swap cache, so
- * we need to check PG_swapcache before proceeding with this
- * optimization.
- */
- if (unlikely(PageSwapCache(page))) {
- struct swap_info_struct *sis;
-
- sis = page_swap_info(page);
- if (sis->flags & SWP_BLKDEV) {
- /*
- * The swap subsystem performs lazy swap slot freeing,
- * expecting that the page will be swapped out again.
- * So we can avoid an unnecessary write if the page
- * isn't redirtied.
- * This is good for real swap storage because we can
- * reduce unnecessary I/O and enhance wear-leveling
- * if an SSD is used as the as swap device.
- * But if in-memory swap device (eg zram) is used,
- * this causes a duplicated copy between uncompressed
- * data in VM-owned memory and compressed data in
- * zram-owned memory. So let's free zram-owned memory
- * and make the VM-owned decompressed page *dirty*,
- * so the page should be swapped out somewhere again if
- * we again wish to reclaim it.
- */
- struct gendisk *disk = sis->bdev->bd_disk;
- if (disk->fops->swap_slot_free_notify) {
- swp_entry_t entry;
- unsigned long offset;
-
- entry.val = page_private(page);
- offset = swp_offset(entry);
-
- SetPageDirty(page);
- disk->fops->swap_slot_free_notify(sis->bdev,
- offset);
- }
+ SetPageDirty(page);
+ disk->fops->swap_slot_free_notify(sis->bdev,
+ offset);
}
}
}

+out:
unlock_page(page);
bio_put(bio);
}
_

2013-06-07 20:51:37

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] non-swapcache pages in end_swap_bio_read()

On Fri, 2013-06-07 at 13:43 -0700, Andrew Morton wrote:
> On Sat, 8 Jun 2013 00:23:18 +0400 Artem Savkov <[email protected]> wrote:
[]
> +++ a/mm/page_io.c
[]
> + /*
> + * There is no guarantee that the page is in swap cache - the software
> + * suspend code (at least) uses end_swap_bio_read() against a non-
> + * swapcache page. So we must check PG_swapcache before proceeding with
> + * this optimization.
> + */
> + if (likely(PageSwapCache(page))) {

or

if (unlikely(!PageSwapCache(page)))
goto out;

to save an indent level

> +out:
> unlock_page(page);
> bio_put(bio);
> }